Hpricot is a very flexible HTML parser, based on Tanaka Akira's HTree and John Resig's JQuery, but with the scanner recoded in C (using Ragel for scanning.) I've borrowed what I believe to be the best
... [More] ideas from these wares to make Hpricot heaps of fun to use. [Less]
Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).
Texy is one of the most complex lightweight markup language. It allows adding of images, links, nested lists, tables and has full support for typography and CSS.
Texy allows you to enter content
... [More] using an easy to read Texy syntax which is filtered into structurally valid XHTML. No knowledge of HTML is required. [Less]
Html Agility Pack is an agile HTML parser library that proposes a read/write DOM and supports plain XPATH or XSLT. It allows you to parse "out of the web" HTML files. The parser is very tolerant with
... [More] "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams). [Less]
A Python HTML/XML parser for quick turnaround projects like screen-scraping.
1. Beautiful Soup won't choke if you give it bad markup. It yields a parse tree that makes approximately as much sense as
... [More] your original document. This is usually good enough to collect the data you need and run away.
2. Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. You don't have to create a custom parser for each application.
3. Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don't have to think about encodings, unless the document doesn't specify an encoding and Beautiful Soup can't autodetect one. [Less]
Nokogiri is a libxml wrapper. It features an HTML, XML, SAX, and Reader parser, as well as XPath and CSS interfaces for searching. Nokogiri is also a drop in replacement for Hpricot.
A framework of frameworks for rapid application development in Python. It includes packages for XML and XHTML parsing and generating, SNMP manager, SMI query API, Cisco-style CLI framework, QA automation, program control, and more.
Textile-J is a Java library that provides a simple parser for multiple wiki markup languages[1],[2] (Textile, MediaWiki / WikiMedia, Confluence, and TracWiki), an Eclipse editor for editing Textile
... [More] markup, and a simple JFace text viewer that can be used to display the markup in an SWT or eclipse environment. The Java library may be used standalone or as an Eclipse plugin.
The parser can be used on its own to convert markup to XHTML or DocBook, or the parser can be used with the provided JFace viewer to display the Textile in a UI such as eclipse.
This project has been contributed to Elipse Mylyn as WikiText. Find out more here: http://greensopinion.blogspot.com/2008/08/textile-j-is-moving-to-mylyn-wikitext.html [Less]