Java

Top Java Frameworks & Libraries for web crawling

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawl...

3070
1453
Java

Apache Nutch is an extensible and scalable web crawler

2927
1245
Java

Continuous scalable web crawler built on top of Flink and crawler-commons

51
18
Java

:octocat:A Fast and Powerful Scraping and Web Crawling Framework.

37
17
Java

The information system chosen for the project was a stock investment management website providing live prices, historical data, news articles, etc and also basic a...

40
23
Java

Java web crawling library

32
9
Java

Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaScript to rend...

31
15
Java

This is a Java library which can be used to crawl the content of some of web properties (www.salesforce.com, blogs.salesforce.com for example). It supports dynamic...

22
9
Java

Spider4j is an open source web crawler expand from webmagic for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-thread...

17
11
Java

WebHunger is an extensible, full-scale crawler framework that supports distributed crawling, aiming at getting users focused on web page parsing without concerning...

16
3
Java

We will process unstructured data from web (obtained by crawling some sample websites) by maybe: having a Apache SolR installation locally and manually feeding it...

17
4
Java

A tweet analyzer capable of performing a wide range of tasks such as identification, crawling, sentiment analysis, co-occurrence analisys, web scaping, predictions...

10
2
Java

Hubs is a content crawler application on Android. It provides apis to crawl web content and display data....

10
1
Java

Crawl, index and search web content

9
4
Java

Sample MVP project uses jsoup-web-crawl like API

9
1
Java

web content crawl

8
6
Java

An android application which does web crawl

8
4
Java

Web Crawling Project

5
2
Java

Tools to construct and process webgraphs from Common Crawl data

80
4
Java

WebHunger is an extensible, full-scale crawler framework that supports distributed crawling, aiming at getting users focused on web page parsing without concerning...

18
4
Java