WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawl...
Continuous scalable web crawler built on top of Flink and crawler-commons
The information system chosen for the project was a stock investment management website providing live prices, historical data, news articles, etc and also basic a...
Serritor is an open source web crawler framework built upon Selenium and written in Java. It can be used to crawl dynamic web pages that require JavaScript to rend...
This is a Java library which can be used to crawl the content of some of web properties (www.salesforce.com, blogs.salesforce.com for example). It supports dynamic...
Spider4j is an open source web crawler expand from webmagic for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-thread...
WebHunger is an extensible, full-scale crawler framework that supports distributed crawling, aiming at getting users focused on web page parsing without concerning...
We will process unstructured data from web (obtained by crawling some sample websites) by maybe: having a Apache SolR installation locally and manually feeding it...
A tweet analyzer capable of performing a wide range of tasks such as identification, crawling, sentiment analysis, co-occurrence analisys, web scaping, predictions...
Hubs is a content crawler application on Android. It provides apis to crawl web content and display data....
WebHunger is an extensible, full-scale crawler framework that supports distributed crawling, aiming at getting users focused on web page parsing without concerning...