Projects tagged ‘corpus’


[27 total ]

2 Users
 

Use the internet as a linguistic corpus: Provide tools and infrastructure for acquisition, visual annotation, merging and storage of web pages as parts of bigger corpora. Develop a ... [More] classification engine that learns to automatically annotate pages, provide visual tools for inspection of results. [Less]
Created about 1 year ago.

1 Users

Greenstone is a suite of software for building and distributing digital library collections. It provides a new way of organizing information and publishing it on the Internet or on CD-ROM. Greenstone ... [More] is produced by the New Zealand Digital Library Project at the University of Waikato, and developed and distributed in cooperation with UNESCO and the Human Info NGO. [Less]
Created over 2 years ago.

1 Users

The LexAt "lexical attraction" aka the RelEx Statistical Linguistics package adds statistical algorithms to the RelEx. Corpus statistics, including mutual information, are maintained in an SQL ... [More] database, and drawn on to enhance various RelEx functions, such as parse ranking and chunk ranking, and word-sense disambiguation (Mihalcea algo). [Less]
Created 6 months ago.

1 Users

CORSIS (formerly Tenka Text) is a performance‐oriented, open‐source library for corpus analysis. It utilizes typed assembly, task‐specific compilers and parallelization to deliver the best ... [More] performance with elegant design. Demonstrative GUI of the project comes with Wordlister - an advanced, extremely fast graphical wordlist tool and a regex concordance tool. CORSIS - the open-source answer to WordSmith Tools. [Less]
Created over 3 years ago.

0 Users

Poliqarp is a universal suite of utilities for large corpora processing.
Created 12 months ago.

0 Users

This is a project of the Nara Institute of Science and Technology Computational Linguistics lab. It is a Ruby on Rails Corpus Search web application.
Created about 1 year ago.

0 Users

Estudio del año de 1985
Created about 1 month ago.

0 Users

Hunpos is an open source reimplementation of TnT, the well known part-of-speech tagger by Thorsten Brants. FeaturesFree and open source, even for commercial use. For languages with more complex ... [More] morphologies, HMM tagging could be quite competitive with the current generation of learning algorithms applying e.g. SVM and CRF methods. A major advantage is that the training/tagging cycle is orders of magnitude faster than in more complex models. Precision of tagging on unknown and unseen words was a major priority for us during the development of hunpos. Works smoothly with large tag sets. For example in Hungarian, as in other highly inflecting languages, it is important to preserve detailed morphological information in the POS tags in order to provide useful clues for higher level processing tasks. This leads to a significantly larger tagset than is common in English (744 tags here as opposed to the 36 standardly used in Treebank work), but does not degrade training and tagging performance. Although it would make the training process of non-generative models computationally expensive. Effortless integration of knowledge from morphological analyzers/dictionaries into best path calculation. Contextualized lexical probabilities with a context window of any size. Unlike traditional HMM models, HunPos estimates emission (lexical) probabilities based on the current tag and previous tags as well. Hunpos has been implemented in OCaml, a high-level language which supports a succinct, well-maintainable coding style. OCaml has a high-performance compiler that produces native code with speed comparable to C/C++ implementations. [Less]
Created about 1 year ago.

0 Users

Spelt is a simple graphical program that can be used to classify words in a language. It is particularly designed to identify word roots and to classify them according to part-of-speech. The initial ... [More] development of this program was specifically meant to simplify work on spell checkers, but you might find it useful for many other purposes. [Less]
Created about 1 year ago.

0 Users

Coeval is a free Corpus Evaluation software written in Java.It allows you to create, manage and customize your own corpus of documents. Coeval can be used to train classifiers, evaluate performance ... [More] and cross-compare classifiers on the same corpus. A Support Vector Machine classifier (LIBSVM -- A Library for Support Vector Machines) is provided with this release and it also allows you easily add and test out your classifiers. RequirementsApache Tomcat 6.x MySQL 5.1 Java EE 5 Eclipse 3.x Instructionsopen mysql command line, type and execute "source COEVAL_HOME/db/dump.sql" edit COEVAL_HOME/WEB-INF/mysql.properties with your mysql account data edit COEVAL_HOME/localhost.properties with your Apache Tomcat account data load with ant "build.xml" and deploy Coeval Contact MePlease feel free to contact me if you have any questions or comments. [Less]
Created 2 months ago.