Web Scraping Framework

2398
274
Python

Grab Framework Project

Grab Test Status
Code Quality
Type Check
Grab Test Coverage Status
Pypi Downloads
Grab Documentation

Status of Project

I myself have not used Grab for many years. I am not sure it is being used by anybody at present time.
Nonetheless I decided to refactor the project, just for fun. I have annotated
whole code base with mypy type hints (in strict mode). Also the whole code base complies to
pylint and flake8 requirements. There are few exceptions: very large methods and classes with too many local
atributes and variables. I will refactor them eventually.

The current and the only network backend is urllib3.

I have refactored a few components into external packages: proxylist,
procstat, selection,
unicodec, user_agent

Feel free to give feedback in Telegram groups: @grablab and @grablab_ru

Things to be done next

  • Refactor source code to remove all pylint disable comments like:
    • too-many-instance-attributes
    • too-many-arguments
    • too-many-locals
    • too-many-public-methods
  • Make 100% test coverage, it is about 95% now
  • Release new version to pypi
  • Refactor more components into external packages
  • More abstract interfaces
  • More data structures and types
  • Decouple connections between internal components

Installation

That will install old Grab released in 2018 year: pip install -U grab

The updated Grab available in github repository is 100% not compatible with spiders and crawlers
written for Grab released in 2018 year.

Documentation

Updated documenation is here https://grab.readthedocs.io/en/latest/ Most updates are removings
content related to features I have removed from the Grab since 2018 year.

Documentation for old Grab version 0.6.41 (released in 2018 year) is here https://grab.readthedocs.io/en/v0.6.41-doc/