Browsing projects by Tag(s)

Select a tag to browse associated projects and drill deeper into the tag cloud.

Showing page 1 of 2

The TEI is an international and interdisciplinary community-based open standard used by research project, libraries, museums, publishers, and academics to represent all kinds of literary and linguistic texts, using an encoding scheme that is maximally expressive and minimally obsolescent.

5.0
 
  0 reviews  |  3 users  |  1,099,312 lines of code  |  13 current contributors  |  Analyzed 4 days ago
 
 

The OCTC hosts open-content texts, encoded in TEI P5 XML, for many languages, each in a separate subcorpus. Another part of the OCTC stores interlanguage alignment info. The project is intended to be an open platform for academic and research projects of various kinds (tool-, markup-, or ... [More] language-documentation-oriented) and for collaboration on multilingual corpus encoding in general and application of the TEI Guidelines for that purpose in particular. ("TEI" stands for the Text Encoding Initiative, http://www.tei-c.org/) [Less]

0
 
  0 reviews  |  2 users  |  1,194,047 lines of code  |  0 current contributors  |  Analyzed 4 days ago
 
 

RelEx is an English-language semantic relationship extractor, built on the Carnegie-Mellon link parser. It can identify subject, object, indirect object and many other relationships between words in a sentence. It can also provide part-of-speech tagging, noun-number tagging, verb tense tagging ... [More] , gender tagging, and so on. Relex includes a basic implementation of the Hobbs anaphora (pronoun) resolution algorithm. Optionally, it can use GATE for entity detection. RelEx also provides semantic relationship framing, similar to that of FrameNet. [Less]

0
 
  0 reviews  |  2 users  |  17,564 lines of code  |  1 current contributor  |  Analyzed 7 days ago
 
 

Package of tools for automatic creating corpora from web.

0
 
  0 reviews  |  1 user  |  10,973 lines of code  |  0 current contributors  |  Analyzed 4 days ago
 
 

The LexAt "lexical attraction" aka the RelEx Statistical Linguistics package adds statistical algorithms to the RelEx. Corpus statistics, including mutual information, are maintained in an SQL database, and drawn on to enhance various RelEx functions, such as parse ranking and chunk ranking, and word-sense disambiguation (Mihalcea algo).

0
 
  0 reviews  |  1 user  |  9,596 lines of code  |  0 current contributors  |  Analyzed 4 days ago
 
 

Implementation of Porter stemming algorithm in vim script. See https://www.ohloh.net/p/stem-search-vim for a script that makes use of this.

0
 
  0 reviews  |  0 users  |  202 lines of code  |  0 current contributors  |  Analyzed 7 months ago
 
 

StmSrch is a reverse-stem searching script. It implements the Porter stemming algorithm, by Martin Porter. It also handles irregular verbs and noun pluralizations. This script can be useful for searching or scanning through corpus files. Each word input to the :StmSrch command will be stemmed ... [More] and then formulated in such a way as to match possible conjugations or pluralizations. Without any word given for input, it will attempt to stem the current word under the cursor. The matching is done using word boundaries so not just any substring will match. For example: - :StmSrch searcher will match any of: - search, searching, searches, searchers, searched, ... and a string of words will work as well, matching in order: - :StmSrch thieves are running from bunnies will match strings of word [Less]

0
 
  0 reviews  |  0 users  |  308 lines of code  |  0 current contributors  |  Analyzed 7 months ago
 
 

A specialized crawler for the German newspaper 'Die Zeit'. Starting from the front page or from a given list of links, the crawler retrieves newspaper articles and gathers new links to explore as it goes, stripping the text of each article out of the HTML formatting and saving it into a ... [More] raw text file. The project includes scripts to convert it into the XML format for further use with natural language processing tools. [Less]

0
 
  0 reviews  |  0 users  |  537 lines of code  |  2 current contributors  |  Analyzed 3 days ago
 
 

A specialized crawler for the French sport newspaper L'Équipe. Starting from the front page or from a given list of links, the crawler retrieves newspaper articles and gathers new links to explore as it goes, stripping the text of each article out of the HTML formatting and saving it into a ... [More] raw text file. The project includes scripts to convert it into the XML format for further use with natural language processing tools. [Less]

0
 
  0 reviews  |  0 users  |  401 lines of code  |  2 current contributors  |  Analyzed 1 day ago
 
 
 
 

Creative Commons License Copyright © 2013 Black Duck Software, Inc. and its contributors, Some Rights Reserved. Unless otherwise marked, this work is licensed under a Creative Commons Attribution 3.0 Unported License . Ohloh ® and the Ohloh logo are trademarks of Black Duck Software, Inc. in the United States and/or other jurisdictions. All other trademarks are the property of their respective holders.