Select a tag to browse associated projects and drill deeper into the tag cloud.
Check websites and HTML documents for broken links. * recursive and multithreaded checking * output in colored or normal text, HTML, SQL, CSV, XML or a sitemap graph in different formats * HTTP/1.1, HTTPS, FTP, mailto:, news:, nntp:, Telnet and local file links support * restriction of link ... [More]
OpenWebSpider - The Open Source Web Spider And Search Engine The OpenWebSpider project was born from the idea that internet is free and all informations must be freely available for all users! Using all free software and being Open Source, OpenWebSpider would be the base for a new Search engine ... [More]
WebChuan is a set of open source libraries and tools for getting and parsing web pages of website. It is written in Python, based on Twisted and lxml. It is inspired by GStreamer. WebChuan is designed to be back-end of web-bot, it is easy to use, powerful, flexible, reusable and efficient.
Spidr is a versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Aranya is spider, using distributed architecture. this project is to complete a safe, efficient, and Configurable Internet information collection system, through the profile, it can provide effective data(pages, photos, etc.) for many kinds of search engines.
Because the contrains of some database providers, sometimes it is hard to download the book. With this framework you can make it as a "private local library". Also, it is very convenice to search. http://code.google.com/p/harvestman-crawler/
Web robot (spider) writen in php with curl. The goal, is index some webs of products , to find the cheapest.
this is a spider used to crawl webpages from the internet. urls.py: used at the server side collect urls sent from the client to avoid the webpages overloaded send urls to the client spider.py: used at the client side get the urls sent by the server crawl web pages analysis webpages ... [More]
Main project: Mapping,discovering relations and mining conclusive data from social networks. There are various other projects meant as utilities or code that can be re-used in some projects.