Select a tag to browse associated projects and drill deeper into the tag cloud.
A Python based HTML parser/tokenizer based on the WHATWG HTML specification for maximum compatibility with major desktop web browsers.
Jodd is an open-source Java utility library and set of frameworks. Jodd tools enriches JDK with many powerful and feature rich utilities. It helps with everyday task, makes code more robust and reliable. Jodd frameworks is set of lightweight application frameworks, compact yet powerful. Designed ... [More]
Gumbo is an implementation of the HTML5 parsing algorithm implemented as a pure C99 library with no outside dependencies. It's designed to serve as a building block for other tools and libraries such as linters, validators, templating languages, and refactoring and analysis tools.
HtmlCleaner is HTML parser written in Java. It transforms dirty HTML to well-formed XML following the same rules that the most web-browsers use.
The tool can copy tile and content from URL which is provided by you. For example, if you want to copy a post from Sina blog to a BBS, you can use this tool to copy title and content separately for you automatically. So, what you need to do, it is just to paste! The tool will also format the ... [More]
Always ,We download many movies from HDC(A Chinese Private Tracker),but as time goes by,we forget all the movie information about contents,actors,picture information except the name displayed in local HDD.So I hope to write a java application to solve the problem. The Process WILL be as follows: ... [More]
Main Features: want an easy-use web page parser? with built-in vb.net script supporting(including a small IDE environment) to deal complicated situation using IE / Htmlparser Core to parse ajax / html page built-in c/s struction allow you parse some of hardcore page with capacha or ip streagy ... [More]
It's a general Markup Langauge parser. Any kind of markup language can be processed, including html, xhtml, wml, xml and so on. The powerful feature is that it can deal with wrong format html content.