Scrapy, a fast high-level web crawling & scraping framework for Python.
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
htcap is a web application scanner able to crawl single page application (SPA) recursively by intercepting ajax calls and DOM changes....
ISP Data Pollution to Protect Private Browsing History with Obfuscation
Lightweight web scraping toolkit for documents and structured data.
An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site...
Set up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO :point_right:...
Scraping Ebay's products using Scrapy Web Crawling Framework
This repo is a part of blog series on several web scraping projects where we will explore scraping techniques to crawl data from simple websites to websites using...
An Open Source Web Application for Genetic Data (SNPs) using 23AndMe and Data Crawling Technologies
ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the web...
This package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can crawl the web,...
A serverless web browser which crawls websites and compares pages by schedule.
A queue-controlled browser automation tool for improving web crawl quality
WebCollector-Python is an open source web crawler framework based on Python.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded...
Python web crawler / scraper for WG-Gesucht. Crawls the WG-Gesucht site for new apartment listings and send a message to the poster, based off your saved filters a...
Calibre new douban metadata source plugin. Douban no longer provides book APIs to the public, so it can only use web crawling to obtain data. This is a calibre Dou...
📊 Blazing fast Python framework for web crawling, scraping, testing, and reporting. Supports pytest. Stealth abilities: UC Mode and CDP Mode....
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG,...
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML...
ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main content, then answer your questions based on the content, or summarize the key points....
A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file for each page, designed for LLM RAG...
Scalable Python web scraping scripts for +40 popular domains