Browsing projects by Tag(s)

Select a tag to browse associated projects and drill deeper into the tag cloud.

Showing page 1 of 4

NLTK — the Natural Language Toolkit — is a suite of open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of NLP tasks, with distributions for Windows, Mac OSX and Linux.

5.0
 
  0 reviews  |  40 users  |  214,312 lines of code  |  42 current contributors  |  Analyzed 7 days ago
 
 

Use the internet as a linguistic corpus: Provide tools and infrastructure for acquisition, visual annotation, merging and storage of web pages as parts of bigger corpora. Develop a classification engine that learns to automatically annotate pages, provide visual tools for inspection of results.

5.0
 
  0 reviews  |  3 users  |  114,262 lines of code  |  1 current contributor  |  Analyzed 7 days ago
 
 

The OCTC hosts open-content texts, encoded in TEI P5 XML, for many languages, each in a separate subcorpus. Another part of the OCTC stores interlanguage alignment info. The project is intended to be an open platform for academic and research projects of various kinds (tool-, markup-, or ... [More] language-documentation-oriented) and for collaboration on multilingual corpus encoding in general and application of the TEI Guidelines for that purpose in particular. ("TEI" stands for the Text Encoding Initiative, http://www.tei-c.org/) [Less]

0
 
  0 reviews  |  2 users  |  1,194,047 lines of code  |  0 current contributors  |  Analyzed 2 days ago
 
 

Package of tools for automatic creating corpora from web.

0
 
  0 reviews  |  1 user  |  10,973 lines of code  |  0 current contributors  |  Analyzed 2 days ago
 
 

Greenstone is a suite of software for building and distributing digital library collections. It provides a new way of organizing information and publishing it on the Internet or on CD-ROM. Greenstone is produced by the New Zealand Digital Library Project at the University of Waikato, and developed ... [More] and distributed in cooperation with UNESCO and the Human Info NGO. [Less]

0
 
  0 reviews  |  1 user  |  505,537 lines of code  |  0 current contributors  |  Analyzed about 2 years ago
 
 

The LexAt "lexical attraction" aka the RelEx Statistical Linguistics package adds statistical algorithms to the RelEx. Corpus statistics, including mutual information, are maintained in an SQL database, and drawn on to enhance various RelEx functions, such as parse ranking and chunk ranking, and word-sense disambiguation (Mihalcea algo).

0
 
  0 reviews  |  1 user  |  9,596 lines of code  |  0 current contributors  |  Analyzed 2 days ago
 
 

CORSIS (formerly Tenka Text) is a performance‐oriented, open‐source library for corpus analysis. It utilizes typed assembly, task‐specific compilers and parallelization to deliver the best performance with elegant design. Demonstrative GUI of the project comes with Wordlister - an advanced ... [More] , extremely fast graphical wordlist tool and a regex concordance tool. CORSIS - the open-source answer to WordSmith Tools. [Less]

0
 
  0 reviews  |  1 user  |  50,431 lines of code  |  0 current contributors  |  Analyzed 7 days ago
 
 

The IMS Open Corpus Workbench is a collection of tools for managing and querying large text corpora (100 M words and more) with linguistic annotations. Its central component is the flexible and efficient query processor CQP.

0
 
  0 reviews  |  1 user  |  754,849 lines of code  |  2 current contributors  |  Analyzed 8 days ago
 
 

This is a project of the Nara Institute of Science and Technology Computational Linguistics lab. It is a Ruby on Rails Corpus Search web application.

0
 
  0 reviews  |  0 users  |  301,926 lines of code  |  0 current contributors  |  Analyzed 3 days ago
 
 
Compare

CorpusCatcher is a corpus collection toolset. It can help you to build language or topic specific corpora from publicly available web resources. This can be very useful for many purposes, especially for data to build spell checkers.

0
 
  0 reviews  |  0 users  |  813 lines of code  |  0 current contributors  |  Analyzed 1 day ago
 
 
 
 

Creative Commons License Copyright © 2013 Black Duck Software, Inc. and its contributors, Some Rights Reserved. Unless otherwise marked, this work is licensed under a Creative Commons Attribution 3.0 Unported License . Ohloh ® and the Ohloh logo are trademarks of Black Duck Software, Inc. in the United States and/or other jurisdictions. All other trademarks are the property of their respective holders.