Select a tag to browse associated projects and drill deeper into the tag cloud.
DKPro is a collection of software components for natural language processing (NLP) based on the Apache UIMA framework. Many powerful and state-of-the-art NLP components are already freely available in the NLP research community. New and improved components are being developed and released ... [More]
CORSIS (formerly Tenka Text) is a performance‐oriented, open‐source library for corpus analysis. It utilizes typed assembly, task‐specific compilers and parallelization to deliver the best performance with elegant design. Demonstrative GUI of the project comes with Wordlister - an advanced ... [More]
PyGrams converts text to n-grams. Conversion is a three step process. 1) Extract all possible n-grams. Run "form_candidates.py" to create a file containing all possible n-grams. 2) Filter possible n-grams. Run "filter_candidates.py" to find just the n-grams which appear ... [More]
SummaryThe tokstream library allows you to read text files and split them up into individual tokens. It is, in a sense, a glorified version of strtok with file reading and a few tricks to make the process as efficient as possible. Featuresclean and minimal interface simple to use wraps file I/O ... [More]
html5cppThis library aims to implement the tokenization and tree construction algorithms described in the WHATWG HTML5 working draft. It will not handle XHTML parsing.
jTokeniser is a set of classes that provide a variety of tokenisers for your Java projects. Simple tokenisers such as WhiteSpaceTokeniser or StringTokeniser provide basic token extraction whereas RegexTokeniser and BreakIteratorTokeniser give more advantage possibilities for more thorough tokenisers ... [More]
creates a compressed trie that maps keys to values and values to keys. Compression is on the front end of keys. Useful for lightweight reserved word creation in constrained memory/processor power situations. Written in C.
WDependency is a PHP tool that analyzes the content of a directory to analyzes dependencies between files and classes and generate dependencies schema in various export format (dot, png, graphml, json, php...)