Top Java Frameworks & Libraries for web crawling

CrawlScript/WebCollector

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawl...

3075

1446

Java

apache/nutch

Apache Nutch is an extensible and scalable web crawler

3049

1260

Java

ScaleUnlimited/flink-crawler

Continuous scalable web crawler built on top of Flink and crawler-commons

Java

crazyacking/zeekEye

:octocat:A Fast and Powerful Scraping and Web Crawling Framework.

Java

amankaushik/Stock-Market-Recommendation-System

The information system chosen for the project was a stock investment management website providing live prices, historical data, news articles, etc and also basic a...

Java

opencharles/charles

Java web crawling library

Java

peterbencze/serritor

Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaScript to rend...

Java

forcedotcom/SiteCrawler

This is a Java library which can be used to crawl the content of some of web properties (www.salesforce.com, blogs.salesforce.com for example). It supports dynamic...

Java

yida-lxw/spider4j

Spider4j is an open source web crawler expand from webmagic for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-thread...

Java

jerry-sc/webhunger

WebHunger is an extensible, full-scale crawler framework that supports distributed crawling, aiming at getting users focused on web page parsing without concerning...

Java

ekalgolas/Relation-extraction-using-Semantic-Web

We will process unstructured data from web (obtained by crawling some sample websites) by maybe: having a Apache SolR installation locally and manually feeding it...

Java

SI3P/TwitterAnalyzer

A tweet analyzer capable of performing a wide range of tasks such as identification, crawling, sentiment analysis, co-occurrence analisys, web scaping, predictions...

Java

Hubs-App/Hubs

Hubs is a content crawler application on Android. It provides apis to crawl web content and display data....

Java

kamranzafar/treeing

Crawl, index and search web content

Java

Cutta/EksiSeyler

Sample MVP project uses jsoup-web-crawl like API

Java

flyonok/flipboard

web content crawl

Java

khalid64927/WebCrawler

An android application which does web crawl

Java

jonathanthom/PolyCrawler

Web Crawling Project

Java

commoncrawl/cc-webgraph

Tools to construct and process Common Crawl webgraphs

Java

jerrycshen/webhunger

WebHunger is an extensible, full-scale crawler framework that supports distributed crawling, aiming at getting users focused on web page parsing without concerning...

Java

pc8544/Website-Crawler

Extract data from websites in LLM ready JSON or CSV format. Crawl or Scrape entire website with Website Crawler...

Java