Select a tag to browse associated projects and drill deeper into the tag cloud.
DKPro is a collection of software components for natural language processing (NLP) based on the Apache UIMA framework. Many powerful and state-of-the-art NLP components are already freely available in the NLP research community. New and improved components are being developed and released
CORSIS (formerly Tenka Text) is a performance‐oriented, open‐source library for corpus analysis. It utilizes typed assembly, task‐specific compilers and parallelization to deliver the best performance with elegant design. Demonstrative GUI of the project comes with Wordlister - an advanced
Splender is a JavaScript-based, token-driven syntax highlighting engine with theme support. It allows for very efficient syntax highlighting of plain text content embedded in HTML documents. By utilizing a proper lexer/tokenizer, Splender offers optimal performance. Other similar solutions use
PyGrams converts text to n-grams. Conversion is a three step process. 1) Extract all possible n-grams. Run "form_candidates.py" to create a file containing all possible n-grams. 2) Filter possible n-grams. Run "filter_candidates.py" to find just the n-grams which appear
SummaryThe tokstream library allows you to read text files and split them up into individual tokens. It is, in a sense, a glorified version of strtok with file reading and a few tricks to make the process as efficient as possible. Featuresclean and minimal interface simple to use wraps file I/O
html5cppThis library aims to implement the tokenization and tree construction algorithms described in the WHATWG HTML5 working draft. It will not handle XHTML parsing.
jTokeniser is a set of classes that provide a variety of tokenisers for your Java projects. Simple tokenisers such as WhiteSpaceTokeniser or StringTokeniser provide basic token extraction whereas RegexTokeniser and BreakIteratorTokeniser give more advantage possibilities for more thorough tokenisers
creates a compressed trie that maps keys to values and values to keys. Compression is on the front end of keys. Useful for lightweight reserved word creation in constrained memory/processor power situations. Written in C.
WDependency is a PHP tool that analyzes the content of a directory to analyzes dependencies between files and classes and generate dependencies schema in various export format (dot, png, graphml, json, php...)
Copyright
©
2013
Black Duck Software, Inc.
and its contributors, Some Rights Reserved. Unless otherwise marked, this work is licensed under a
Creative Commons Attribution 3.0 Unported License
. Ohloh
®
and the Ohloh logo are trademarks of
Black Duck Software, Inc.
in the United States and/or other jurisdictions. All other trademarks are the property of their respective holders.