Browsing projects by Tag(s)

Select a tag to browse associated projects and drill deeper into the tag cloud.

Showing page 1 of 2

The BerkeleyAligner is a word alignment software package that implements recent innovations in unsupervised word alignment. To learn more about the project and surrounding research, visit the Berkeley word aligner website. News9/28 As of release 2.1, we have split the Berkeley aligner into two ... [More] downloads. The unsupervised aligner doesn't require a set of hand-labeled word alignments. The supervised aligner does, and it depends on the unsupervised aligner. Recent changes and bug fixes9/28 You can now run the unsupervised aligner without a hand-aligned test set; the evaluation phase will be skipped. 9/28 Loading trained models for evaluation only now works correctly (just give an empty training sequence) 9/28 Output can now be split into multiple alignment files corresponding to multiple input files (alignInputsSeparately option) 9/28 The test set does not need to be included in the training sets [Less]

0
 
  0 reviews  |  1 user  |  21,869 lines of code  |  0 current contributors  |  Analyzed 8 days ago
 
 

MediaGlyphs: an international language based on multimedia ideograms.It allows to read, think or type in your own language sentences written with the shared mediaglyphs: a common writing system for the world, Simple Unambiguous Neutral & Universal.

0
 
  0 reviews  |  1 user  |  16,157 lines of code  |  1 current contributor  |  Analyzed over 3 years ago
 
 

eConference plugin to enable realtime machine translation for incoming messages.

0
 
  0 reviews  |  1 user  |  124,271 lines of code  |  2 current contributors  |  Analyzed 1 day ago
 
 

SummaryA transduction parser is the basis for NLP tasks such as paraphrasing and machine translation. This parser supports transduction via weighted synchronous context free grammar rules. treegraft is currently in the beta stage of development by Jonathan Clark (http://www.cs.cmu.edu/~jhclark). ... [More] Getting StartedFor examples of how to use treegraft as a parser, take a look at ChartParserTest (the treegraft JUnit tests). As an MT decoder, treegraft is based on the Statistical Transfer concept from Carnegie Mellon's Avenue MT group. To test treegraft from the terminal, you can run: # make utestTo use treegraft as a decoder, modify data/treegraft.properties to suite your needs and run: # treegraft.sh data/treegraft.propertiesInformation on loading plugins such as parsers, forest unpackers/decoders, lattice decoders and features at runtime via config files will be posted soon. Also, check back for a Moses-style "full pipeline" training script that will build a system given training data. DocumentationAlso, consider taking a look at the Javadoc API [Less]

0
 
  0 reviews  |  0 users  |  0 current contributors  |  Analyzed 8 days ago
 
 

By creating a language-independent representation of text in XML we are able to ensure lossless translation to any implemented target language.

0
 
  0 reviews  |  0 users  |  0 current contributors  |  Analyzed about 7 hours ago
 
 

rewrite the pharaoh decoder

0
 
  0 reviews  |  0 users  |  0 current contributors  |  Analyzed 8 days ago
 
 

This is api for partial machine translation to help human translators translating specific messages, like messages that contains plurals.

0
 
  0 reviews  |  0 users  |  0 current contributors  |  Analyzed 8 days ago
 
 

Within the current explosion in the quantity of information and in the means to access it, much of the world has been left behind because the information is not in a language that they understand. The L3 project ("Learning Lots of Languages") has the long-term goal of developing a system ... [More] to translate to and from many under-represented languages of the Global South and (less ambitiously) of creating tools to be used in information retrieval and computer-assisted language learning with these languages. [Less]

0
 
  0 reviews  |  0 users  |  1,092,826 lines of code  |  3 current contributors  |  Analyzed 2 days ago
 
 

Extract-Tmx-Corpus is a Windows program (Vista and XP supported) that enables translators not necessarily with a deep knowledge of linguistic tools to create highly customised corpora that can be used with the Moses machine translation system and with other systems. In order to create corpora ... [More] that are most useful to train machine translation systems, one should strive to include segments that are relevant for the task in hand. One of the ways of finding such segments could involve the usage of previous translation memory files (TMX files). This way the corpora could be customised for the person or for the type of task in question. The present program uses such files as input. The program can create strictly aligned corpora for a single pair of languages, several pairs of languages or all the pairs of languages contained in the TMX files. The program creates 2 separate files (UTF-8 format; Unix line endings) for each language pair that it processes: one for the starting language and another for the destination language. The lines of a given TMX translation unit are placed in strictly the same line in both files. The program suppresses empty TMX translation units, as well as those where the text for the first language is the same as that of the second language (like translation units consisting solely of numbers, or those in which the first language segment has not been translated into the second language). If you are interested in another format of corpus, it should be relatively easy to adapt this format to the format you are interested in. The program also informs about errors that might occur during processing and creates a file that lists the name(s) of the TMX files that caused them, as well as a separate one listing the files successfully treated and the number of segments extracted for the language pair. [Less]

0
 
  0 reviews  |  0 users  |  494 lines of code  |  0 current contributors  |  Analyzed 8 days ago
 
 

Resources for Quechua-to-English MT

0
 
  0 reviews  |  0 users  |  5,056 lines of code  |  0 current contributors  |  Analyzed 7 days ago
 
 
 
 

Creative Commons License Copyright © 2013 Black Duck Software, Inc. and its contributors, Some Rights Reserved. Unless otherwise marked, this work is licensed under a Creative Commons Attribution 3.0 Unported License . Ohloh ® and the Ohloh logo are trademarks of Black Duck Software, Inc. in the United States and/or other jurisdictions. All other trademarks are the property of their respective holders.