Apache Cocoon is a web development framework built around the concepts of separation of concerns and component-based web development. Cocoon is "web glue for your web application development needs".
... [More] It is a glue that keeps concerns separate and allows parallel evolution of all aspects of a web application, improving development pace and reducing the chance of conflicts. [Less]
Apache Forrest is a publishing framework that transforms input from various sources into a unified presentation in one or more output formats. The modular and extensible plugin architecture is based
... [More] on Apache Cocoon and relevant standards, which separates presentation from content. Forrest can generate static documents, or be used as a dynamic server, or be deployed by its automated facility. [Less]
Sarissa is an ECMAScript library acting as a cross-browser wrapper for native XML APIs. It offers various XML related goodies like Document instantiation, XML loading from URLs or strings, XSLT
... [More] transformations, XPath queries etc and comes especially handy for people doing what is lately known as "AJAX" development. [Less]
Set of extensions to Apache and other Web servers to provide consistent authentication and presentation of various existing 3rd party web applications.
Minimalistic but massively web enabled address book running completely in web browser.
Supported: Skype, Twitter, Flickr, Facebook, Delicious, LinkedIn, Microformats (hCard, XFN), instant
... [More] messengers, etc.
How it works: XML + XSLT = HTML in your web browser. [Less]
OverviewThis is a group project. The project should be exclusively the collaborative result of work by members of the team. The main task of this project is to implement a Weather Map mashup.
The
... [More] program will get current weather data on the form of an XML data file from the Bureau of Meteorology website for an Australian state. [Less]
PHP5 includes an object for handling XML files using DOM; TDOM seeks to simplify the handling of different types of XML-based files, including: SVG, XUL Interfaces, DocBook?, RDF, XML, (x) HTML and
... [More] thanks to its interface via plugins try to incorporate many more (eg RSS, Atom).
* Reading XML Files
* Writing XML Files
* Creation of Files (SVG, Docbook, etc)
* Creating XML files from another sources (databases, files, LDAP, etc)
* Modification and transformation
* Screen output [Less]
HXTS, as an xml-based document, provides an easy and normative method to transfomate data from Html to Xml through most of the browser engine.
Mainly ReferenceXML Schema Stardard (XSD) XSL
... [More] Transformation (XSLT) XML Path Language (XPath) Regular Expression (Regex) General DescriptionOne day, I was asked to gather some infomation from a given website. For most of the programmer, it is just a piece of cake, and I was familier with javascript, so naturally I built a pipe between my programe and the browser, with which the js running result can be transfer to my own database. But some day after that, my boss thought he'd like to use the infomation as a constant(fu*k!) data source, there is to say, I had to build a web service to provide html analyst and conversion ability. I had tried out all the html transfomation protocol and standard available, the most satisfied one is XSLT, but it is not supported in HTML DOM. Hense, I have to write a new standard, that is HXTS. Featuresgeneral format, can be used in all lauguage visual and easy to understanding protosomatic validation field, prevent website struct change support AJAX page(optional) Simple Examplethis is a HXTS for the google search page:
using the 'HXTS' above, the google search page http://www.google.com/#hl=en&q=dd&btnG=Google+Search&aq=f&oq=dd&aqi=&fp=Q7MZYxCgTv8 can be convert into xml like this:
1
10
232000000
Dunkin' Donuts Coffee | Buy Coffee Beans Online
http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.dunkindonuts.com%2F&ei=NlkqSqyWKJGVkAWDr7XkCg&rct=j&q=dd&usg=AFQjCNH0I5iuCNmR4xsJUTqZ5xh1TdVQBg
Buy Dunkin' Donuts Coffee Beans Online - ground coffee or whole coffee beans shipped by the pound in Original Blend, Decaf or flavored coffee beans; ...
http://www.google.com/url?sa=t&source=web&ct=clnk&cd=1&url=http%3A%2F%2F74.125.153.132%2Fsearch%3Fq%3Dcache%3Ab7-EDVibxMAJ%3Awww.dunkindonuts.com%2F%2Bdd%26cd%3D1%26hl%3Den%26ct%3Dclnk&ei=NlkqSqyWKJGVkAWDr7XkCg&rct=j&q=dd&usg=AFQjCNFiW65jkgU0HPoRvdh2xCBqdtE5Wg
http://www.google.com/search?hl=en&newwindow=1&q=related:www.dunkindonuts.com/
Democratic Party (United States) - Wikipedia, the free encyclopedia
http://www.google.com/url?sa=t&source=web&ct=res&cd=2&url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FDemocratic_Party_(United_States)&ei=qFsqSpubJ9CLkAWGm_WDCw&rct=j&q=d&usg=AFQjCNFmALnFb7L2K7caR8fymwhozss0XA
Franklin D. Roosevelt, elected to presidency in 1932, came forth with government programs ... The economically activist philosophy of Franklin D. Roosevelt, ...
http://www.google.com/url?sa=t&source=web&ct=clnk&cd=2&url=http%3A%2F%2F74.125.153.132%2Fsearch%3Fq%3Dcache%3Aba6qjfbVPJcJ%3Aen.wikipedia.org%2Fwiki%2FDemocratic_Party_(United_States)%2Bd%26cd%3D2%26hl%3Den%26ct%3Dclnk&ei=qFsqSpubJ9CLkAWGm_WDCw&rct=j&q=d&usg=AFQjCNHExRZEzr0eUgDNKr6OWxpiyn2NjQ
http://www.google.com/search?hl=en&newwindow=1&q=related:en.wikipedia.org/wiki/D
...(omitted)
Bitch Ph.D.
http://www.google.com/url?sa=t&source=web&ct=res&cd=11&url=http%3A%2F%2Fbitchphd.blogspot.com%2F&ei=qFsqSpubJ9CLkAWGm_WDCw&rct=j&q=d&usg=AFQjCNF_aYJAryk-D63OoXB_yKBcBuH0XQ
1 Jun 2009 ... Ranting about current events from a feminist perspective.
http://www.google.com/search?hl=en&newwindow=1&q=related:bitchphd.blogspot.com/
http://www.google.com/search?q=d&hl=en&newwindow=1&start=10&sa=N
Grammarschema nodeevery HXTS document has this node, just a namespace definition, copy it from any of the exist HXTS to your new one.
about and progress node
this node is used as a simple vision control, everytime when a important update has made in the document, just remember to fill in the about node.
for example, it may be change into something below in the future:
url and urlparameter node
this node defines that which kind of url can be parse by this XHTS, for example, the code above show us that it can be used in a url which main body is "http://www.google.com/search?" and has least two parameters, one is "hl" and the value is "en", the other is "q".
accordingly,
"http://www.google.com/search?hl=en&newwindow=1&site=query&q=dd&btnG=Search" match the node "http://www.google.com/search?hl=zh&newwindow=1&site=query&q=dd&btnG=Search" doesn't match the node
because different query url can be linked into one page, HXTS allow more than one url node existing.
!!!THE URL MATCHING IS NOT ARBITRARY, BUT RECOMMENDED
element nodeelement node is the most complex and important node in HXTS, and it defines that how to crawl the HTML and find the useful DOM to convert into xml DOM.
element node can be nested, and only the leaf node really exacts information from HTML DOM.
element node has many attributes:
name attributename determines the element name writing in the xml, for instance, name="Result"
will convert into
...xpath attributexpath is the path to parse the HTML DOM, for detail, please refer http://www.w3.org/TR/xpath
"/a/b/c" this is an absolute xpath, which starts with "/" "b/c" this is a relative xpath, which base on the xpath of its parent element node if the xpath indictes more than one HTML DOM, then all the selected DOM will be converted. For example, there is a HTML DOM
t1
t2
t3
the xpath equals "/tr/td", so all three td will take part in the XML conversion
source attributesource attribute decides what text will be used as the content of XML node, and its value could be any of the string below:
"innertext"(default): text in the DOM "innerhtml": html in the DOM "href": only avaible at link DOM, and will be automatically converted into absolute url "onclick": javascript code in the 'onclick' attribute of the DOM if a element node doesn't hold the source attribute, then "innertext" will be used as default content.
regexextract attributethe regexextract provides a way to extract specified detail from source through regular expression, see here for more info:http://javascriptkit.com/javatutors/redev2.shtml
for example, regexextract=".+(?=\n)" this regex extract any text before a newline symbol
regexmatch and regexreplace attributethis two must appear in pairs, 'regexmatch' attribute provides a regex which could find text, and 'regexreplace' replace anyone 'regexmatch' could match.
for example, when regexmatch="," regexreplace="", "12,333,000" ===> "12333000"
regexvalidate attributethe regexvalidate attribute is important, because it provides validation ability for HXTS to avoid extracting wrong infomation when website has updated its page struct.
to pass the validation, the final infomation has to be completely matching the reguar expression。
attribute processxpath => source => regexextract => regexmatch/regexreplace => regexvalidate => name =>XML Node [Less]