Projects tagged ‘corpora’


[19 total ]

19 Users
 

NLTK — the Natural Language Toolkit — is a suite of open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of ... [More] NLP tasks, with distributions for Windows, Mac OSX and Linux. [Less]
Created over 3 years ago.

2 Users
 

Use the internet as a linguistic corpus: Provide tools and infrastructure for acquisition, visual annotation, merging and storage of web pages as parts of bigger corpora. Develop a ... [More] classification engine that learns to automatically annotate pages, provide visual tools for inspection of results. [Less]
Created about 1 year ago.

1 Users
   

Affisix is a program for automatic recognition of affixes. It takes large amount of words and according to the user setting it tries to determine which segments of these words are prefixes.
Created about 1 year ago.

1 Users

The LexAt "lexical attraction" aka the RelEx Statistical Linguistics package adds statistical algorithms to the RelEx. Corpus statistics, including mutual information, are maintained in an SQL ... [More] database, and drawn on to enhance various RelEx functions, such as parse ranking and chunk ranking, and word-sense disambiguation (Mihalcea algo). [Less]
Created 6 months ago.

1 Users

A high-level interface to the CMU Link Grammar. This binding wraps the link-grammar shared library provided by the AbiWord project for their grammar-checker.
Created about 1 year ago.

1 Users

CORSIS (formerly Tenka Text) is a performance‐oriented, open‐source library for corpus analysis. It utilizes typed assembly, task‐specific compilers and parallelization to deliver the best ... [More] performance with elegant design. Demonstrative GUI of the project comes with Wordlister - an advanced, extremely fast graphical wordlist tool and a regex concordance tool. CORSIS - the open-source answer to WordSmith Tools. [Less]
Created over 3 years ago.

1 Users

TestEl is a Java-based learning analyzer for HTML (and possibly other) structured documents. It can be trained to detect structures in such documents and renders hits in XML.
Created about 1 year ago.

0 Users

Rails code for the submission site.
Created about 1 year ago.

0 Users

I'm putting this on the web so that Geoff and I can have subversion access to the code.
Created about 1 year ago.

0 Users

Get1T is a tool for filtering through the massive quantity of data available in the Web 1T corpus and extracting only the counts you need - including for simple wildcard patterns.
Created about 1 year ago.