JSoup - Scrapes, parses, manipulates and cleans HTML.
websphinx - Website-Specific Processors for HTML information extraction.
Open Search Server - A full set of search functions. Build your own indexing strategy. Parsers extract full-text data. The crawlers can index everything.
spider-flow - A visual spider framework, it's so good that you don't need to write any code to crawl the website.
C#
ccrawler - Built in C# 3.5 version. it contains a simple extension of web content categorizer, which can saparate between the web page depending on their content.
SimpleCrawler - Simple spider base on mutithreading, regluar expression.
DotnetSpider - This is a cross platfrom, ligth spider develop by C#.
Abot - C# web crawler built for speed and flexibility.
Hawk - Advanced Crawler and ETL tool written in C#/WPF.
SkyScraper - An asynchronous web scraper / web crawler using async / await and Reactive Extensions.
Infinity Crawler - A simple but powerful web crawler library in C#.
scrala - Scala crawler(spider) framework, inspired by scrapy.
ferrit - Ferrit is a web crawler service written in Scala using Akka, Spray and Cassandra.
We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.