Select a tag to browse associated projects and drill deeper into the tag cloud.
NLTK — the Natural Language Toolkit — is a suite of open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of NLP tasks, with distributions for Windows, Mac OSX and Linux.
Treex (formerly TectoMT) is a highly modular NLP software system implemented in Perl programming language under Linux. It is primarily aimed at Machine Translation, making use of the ideas and technology created during the Prague Dependency Treebank project. At the same time, it is also hoped to ... [More]
What is FastRandomForest?FastRandomForest is a re-implementation of the Random Forest classifier (RF) for the Weka environment that brings speed and memory use improvements over the original Weka RF. Speed gains depend on many factors, but a 5-10x increase over Weka 3-6-1 on a quad core computer ... [More]
TestEl is a Java-based learning analyzer for HTML (and possibly other) structured documents. It can be trained to detect structures in such documents and renders hits in XML.
Ruby interface to the CRM114 Controllable Regex Mutilator, an advanced and fast text classifier that uses sparse binary polynomial matching with a Bayesian Chain Rule evaluator and a hidden Markov model to categorize data with up to a 99.87% accuracy.
The coolest bayesian antispam addon for World of Warcraft that uses the original SpamBayes algorythm
l7-filter is an application layer packet classifier that can differentiate types of network traffic by protocol; for example it can identify bittorrent, IRC, SIP and many other types of traffic. In turn, these classifiers can be used to block or shape traffic according to a defined network policy.
A high-level interface to the CMU Link Grammar. This binding wraps the link-grammar shared library provided by the AbiWord project for their grammar-checker.