Python

Top Python Frameworks & Libraries for web crawling 59

Scrapy, a fast high-level web crawling & scraping framework for Python.

54265
10666
Python

Web crawling framework based on asyncio.

2007
213
Python

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

1448
142
Python

htcap is a web application scanner able to crawl single page application (SPA) recursively by intercepting ajax calls and DOM changes....

616
112
Python

ISP Data Pollution to Protect Private Browsing History with Obfuscation

598
52
Python

:sunrise: next generation web crawling using machine intelligence

331
44
Python

The simple, easy to use command line web crawler.

345
68
Python

Lightweight web scraping toolkit for documents and structured data.

311
60
Python

Web crawling with IP rotation via Tor

194
54
Python

Web Crawling UI and HTTP API, based on Scrapy and Tornado

148
63
Python

An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site...

123
46
Python

Set up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO :point_right:...

115
87
Python

A tool to crawl systems like crawlers for the web

106
41
Python

Scraping Ebay's products using Scrapy Web Crawling Framework

73
31
Python

This repo is a part of blog series on several web scraping projects where we will explore scraping techniques to crawl data from simple websites to websites using...

75
66
Python

An Open Source Web Application for Genetic Data (SNPs) using 23AndMe and Data Crawling Technologies

69
7
Python

ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the web...

68
21
Python

This package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can crawl the web,...

64
21
Python

Scraping and Web Crawling Framework For Zhihu Live

63
31
Python

Screen scraping and web crawling framework

61
11
Python

easy crawl web resource , extract web infomation/简单的爬虫框架

60
8
Python

A serverless web browser which crawls websites and compares pages by schedule.

58
14
Python

A queue-controlled browser automation tool for improving web crawl quality

57
26
Python

WebCollector-Python is an open source web crawler framework based on Python.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded...

54
21
Python

Python web crawler / scraper for WG-Gesucht. Crawls the WG-Gesucht site for new apartment listings and send a message to the poster, based off your saved filters a...

54
20
Python

Web scraping and automation using python

53
14
Python

Calibre new douban metadata source plugin. Douban no longer provides book APIs to the public, so it can only use web crawling to obtain data. This is a calibre Dou...

1084
43
Python

Python APIs for web automation, testing, and bypassing bot-detection.

9410
1211
Python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG,...

5307
351
Python

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML...

3966
285
Python

Web Scraping Framework

2402
275
Python

Web crawling framework based on asyncio.

2039
208
Python

ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main content, then answer your questions based on the content, or summarize the key points....

895
135
Python

WarcDB: Web crawl data as SQLite databases.

398
11
Python

A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file for each page, designed for LLM RAG...

358
39
Python

Scalable Python web scraping scripts for +40 popular domains

405
106
Python

🕷️ Undetectable, Lightning-Fast, and Adaptive Web Scraping for Python

2680
174
Python

Official repository for "Crawl4LLM: Efficient Web Crawling for LLM Pretraining"

374
27
Python