Select a tag to browse associated projects and drill deeper into the tag cloud.
Beautiful Soup parses XML and HTML as seen in the wild, and provides a variety of methods and Pythonic idioms for iterating and searching the parse tree. Beautiful Soup development is now done at https://www.launchpad.net/beautifulsoup. The discussion forum is still at http://groups.google.com/group/beautifulsoup/.
Scrapy is a fast high-level scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
Pythonic Crawling / Scraping Framework Built on Eventlet Features * High Speed WebCrawler built on Eventlet. * Supports databases engines like Postgre, Mysql, Oracle, Sqlite. * Command line tools. * Extract data using your favourite tool. XPath or Pyquery (A Jquery-like library for python).
A html extractor in javascript. usage: ---- jhe_im(extract_conditions...) return inner html match the extract conditions. jhe_om(extract_conditions...) return outter html match the extract conditions. jhe_ma(extract_conditions..., attributeName) return the attribute value in the special tag
python program that scrapes all types of web sources for torrents whether it is a rss, atom or even web site that doesnt have aggregating feeds. supports episodes (series). smart downloading and regexp filtering and hopefully a nice simple gui but at least a nice modular gui configurable through a config file.
Content Extractor is professional data-mining software that organizes collected information for a convenient work. You can use it for a regular automatic data collection or extraction of any web content manually. The program is very accurate and collects data from pages associated with the specified
This project aims to sse Manchester City Council Meeting's Minutes as a basis for highlighting the good work that Councillors do, and to find out further information on the activity of Councillors and the decision making process of our Council. We will be using a Minute Scraper as a basis for
Copyright
©
2013
Black Duck Software, Inc.
and its contributors, Some Rights Reserved. Unless otherwise marked, this work is licensed under a
Creative Commons Attribution 3.0 Unported License
. Ohloh
®
and the Ohloh logo are trademarks of
Black Duck Software, Inc.
in the United States and/or other jurisdictions. All other trademarks are the property of their respective holders.