A Java desktop application (J2SE 5+Swing Application Framework) for automatic classification of documents against a given training set. It has been developed, and is packaged, as a Netbeans project. It uses the stemmers created with Snowball (http://snowball.tartarus.org, released under the BSD license) for text pre-processing, TF-IDF or the Bhattacharrya distance to rank the documents of the training set to the query document, and the K-NN algorithm to classify it. As of now, it only supports the classification of news from the ANSA website (http://www.ansa.it - The Italian main news agency), but the program has a modular architecture, that allows it to be extended by writing plugins for scraping the content of other websites, or other types of documents (PDF, DOC, ODT, etc...).
Copyright © 2013 Black Duck Software, Inc. and its contributors, Some Rights Reserved. Unless otherwise marked, this work is licensed under a Creative Commons Attribution 3.0 Unported License . Ohloh ® and the Ohloh logo are trademarks of Black Duck Software, Inc. in the United States and/or other jurisdictions. All other trademarks are the property of their respective holders.