cruncher

Cruncher is an application that crawls over TechCrunch informations about posts, stores data about authors, articles and provides a web API to consume it.

perylemke

JavaScript

Cheesecake

Cruncher aka Scrapery =)

Cruncher is an application that crawls over TechCrunch informations about posts, stores data about authors, articles and provides a web API to consume it.

Why? This is a challenge for Cheesecake Labs. Because I love this company and I need this job 😃

Considerations

This is my first real project with Django and Django REST Framework, including applying DevOps during the process. I tried to follow the recommendations and best practices available on the Internet and the documentations. Here are some considerations:

I need improve the front-end.
Digital Ocean is a big jerk sometimes, but work as well.
Improve production deployment. It’s my first time with DO.
Write more tests.
Collect more data with the crawler.

API Endpoint

The production API endpoint can be consumed at cruncher.oclubecast.com/api/v1/.

Here are the GETs:

/api/v1/authors/: get all the authors paginated.
/api/v1/authors/<id>: get details on one author.
/api/v1/articles/: get all the articles paginated.
/api/v1/articles/<id>: get details on one article.

Softwares used

Here’s a list of softwares, libraries and servicesused in this project:

Digital Ocean
Docker
docker-compose
NGINX
Django
Django REST Framework
Gunicorn
coverage
PostgreSQL
psycopg2
Scrapy

Up and running: local!

As Freddie Mercury used to say… “It’s so eaaaasy” - Understanders will understand. But you need a Docker and Docker-Compose installed.

Repository clone:

git clone https://github.com/perylemke/cruncher
cd cruncher

Run docker-compose to build and migrate Django models:

docker-compose build
docker-compose up -d
docker-compose run web python manage.py makemigrations
docker-compose run web python manage.py migrate

Run the crawler (here with 100 requests limit):

docker-compose run scrap scrapy crawl cruncher --set CLOSESPIDER_PAGECOUNT=100

You’re ready to go! Consume the API at http://localhost:8000/api/v1/.

Run in production

Now Cruncher is not more in production. Clone this repository and DO IT! 😃