Cruncher is an application that crawls over TechCrunch informations about posts, stores data about authors, articles and provides a web API to consume it.
Cruncher is an application that crawls over TechCrunch informations about posts, stores data about authors, articles and provides a web API to consume it.
Why? This is a challenge for Cheesecake Labs. Because I love this company and I need this job 😃
This is my first real project with Django and Django REST Framework, including applying DevOps during the process. I tried to follow the recommendations and best practices available on the Internet and the documentations. Here are some considerations:
The production API endpoint can be consumed at cruncher.oclubecast.com/api/v1/
.
Here are the GETs:
/api/v1/authors/
: get all the authors paginated./api/v1/authors/<id>
: get details on one author./api/v1/articles/
: get all the articles paginated./api/v1/articles/<id>
: get details on one article.Here’s a list of softwares, libraries and servicesused in this project:
As Freddie Mercury used to say… “It’s so eaaaasy” - Understanders will understand. But you need a Docker and Docker-Compose installed.
Repository clone:
git clone https://github.com/perylemke/cruncher
cd cruncher
Run docker-compose to build and migrate Django models:
docker-compose build
docker-compose up -d
docker-compose run web python manage.py makemigrations
docker-compose run web python manage.py migrate
Run the crawler (here with 100 requests limit):
docker-compose run scrap scrapy crawl cruncher --set CLOSESPIDER_PAGECOUNT=100
You’re ready to go! Consume the API at http://localhost:8000/api/v1/
.
Now Cruncher is not more in production. Clone this repository and DO IT! 😃