Python

Top Python Frameworks & Libraries for web crawling 59

Scrapy, a fast high-level web crawling & scraping framework for Python.

53244
10570
Python

Web crawling framework based on asyncio.

2007
213
Python

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

1401
135
Python

htcap is a web application scanner able to crawl single page application (SPA) recursively by intercepting ajax calls and DOM changes....

611
114
Python

ISP Data Pollution to Protect Private Browsing History with Obfuscation

591
53
Python

:sunrise: next generation web crawling using machine intelligence

329
44
Python

The simple, easy to use command line web crawler.

341
69
Python

Lightweight web scraping toolkit for documents and structured data.

310
59
Python

Web crawling with IP rotation via Tor

194
54
Python

Web Crawling UI and HTTP API, based on Scrapy and Tornado

148
63
Python

An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site...

123
46
Python

Set up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO :point_right:...

115
87
Python

A tool to crawl systems like crawlers for the web

106
41
Python

Scraping Ebay's products using Scrapy Web Crawling Framework

73
31
Python

This repo is a part of blog series on several web scraping projects where we will explore scraping techniques to crawl data from simple websites to websites using...

75
66
Python

An Open Source Web Application for Genetic Data (SNPs) using 23AndMe and Data Crawling Technologies

69
7
Python

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the web...

68
21
Python

This package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can crawl the web,...

64
21
Python

Scraping and Web Crawling Framework For Zhihu Live

63
31
Python

Screen scraping and web crawling framework

61
11
Python

easy crawl web resource , extract web infomation/简单的爬虫框架

60
8
Python

A serverless web browser which crawls websites and compares pages by schedule.

58
14
Python

A queue-controlled browser automation tool for improving web crawl quality

57
26
Python

WebCollector-Python is an open source web crawler framework based on Python.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded...

54
21
Python

Python web crawler / scraper for WG-Gesucht. Crawls the WG-Gesucht site for new apartment listings and send a message to the poster, based off your saved filters a...

54
20
Python

Web scraping and automation using python

53
14
Python

Calibre new douban metadata source plugin. Douban no longer provides book APIs to the public, so it can only use web crawling to obtain data. This is a calibre Dou...

1010
42
Python

📊 Blazing fast Python framework for web crawling, scraping, testing, and reporting. Supports pytest. Stealth abilities: UC Mode and CDP Mode....

5429
982
Python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG,...

4641
319
Python

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML...

3661
263
Python

Web Scraping Framework

2394
274
Python

Web crawling framework based on asyncio.

2035
207
Python

ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main content, then answer your questions based on the content, or summarize the key points....

887
135
Python

WarcDB: Web crawl data as SQLite databases.

394
11
Python

A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file for each page, designed for LLM RAG...

311
32
Python

Scalable Python web scraping scripts for +40 popular domains

322
88
Python

Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

1484
69
Python