Browsing projects by Tag(s)

Select a tag to browse associated projects and drill deeper into the tag cloud.

Showing page 1 of 3
Compare

Smart Common Input Method platform is a development platform that significantly reduces the difficulty of input method development. SCIM splits input method into three parts: FrontEnd, which handles user interface and communication with client application

3.5
   
  0 reviews  |  15 users  |  1,162,553 lines of code  |  1 current contributor  |  Analyzed 2 days ago
 
 

This project aims to develop the most complete, standard compliant, high-quality Chinese (and CJKV) fonts and resources, including bitmap and outline fonts of various styles. We also develop web-based tools to facilitate online font-dev collaborations.

4.0
   
  0 reviews  |  3 users  |  0 current contributors
 
 

Cjklib provides language routines related to Han characters (characters based on Chinese characters named Hanzi, Kanji, Hanja and chu Han respectively) used in writing of the Chinese, the Japanese, infrequently the Korean and formerly the Vietnamese language(s). Functionality is included for ... [More] character pronunciations, radicals, glyph components, stroke decomposition and variant information. Cjklib is implemented in Python. [Less]

0
 
  0 reviews  |  1 user  |  18,729 lines of code  |  0 current contributors  |  Analyzed 1 day ago
 
 

Eclectus is a small Han character dictionary especially designed for learners of Chinese character based languages like Mandarin Chinese or Japanese.

0
 
  0 reviews  |  1 user  |  9,193 lines of code  |  1 current contributor  |  Analyzed 3 days ago
 
 

A library for manipulating Chinese and Japanese scripts using Python. The API includes methods for script detection, reading alternations, common dictionary formats, as well as general enhancements for working with iterators, sequences and python objects

0
 
  0 reviews  |  1 user  |  2,788 lines of code  |  0 current contributors  |  Analyzed about 2 years ago
 
 

Font meta-family, multiple styles, for Japanese, English, and Korean, made with Metafont. Full coverage of hiragana, katakana, hangul, and Latin. Partial coverage of grade-school kanji. Also includes IDSgrep, a tool for querying kanji databases by partial layout, like a more advanced version of the ... [More] popular "radical search." Includes code to generate dictionaries from Tsukurimashou, KanjiVG, and EDICT2. 作りましょうは、Metafontで作った日本語と英語と韓国語のフォントファミリ。カバレッジはぜんぶのひらがなとカタカナとハングルとLatin。第2年の教育漢字と半の第3年の教育漢字。予定は、ぜんぶの常用漢字です。明朝、ゴシック、外のスタイル。その上、IDSgrepあります。漢字の探すツール。『作りましょう』や『KanjiVG』や『EDICT2』から辞典の抽出ができます。 [Less]

0
 
  0 reviews  |  1 user  |  67,679 lines of code  |  1 current contributor  |  Analyzed 8 days ago
 
 

Paoding Analysis摘要Paoding's Knives 中文分词具有极 高效率 和 高扩展性 。引入隐喻,采用完全的面向对象设计,构思先进。 高效率:在PIII 1G内存个人机器上,1秒 可准确分词 100万 汉字。 采用基于 不限制个数 ... [More] 的词典文件对文章进行有效切分,使能够将对词汇分类定义。 能够对未知的词汇进行合理解析 欢迎如果对该项目您有任何建议,欢迎您在http://code.google.com/p/paoding/issues/list 中提出各种issues. 用心的贡献,极其能鼓励人 ----------------------! 2010-01-20 庖丁 Lucene 3.0 升级说明 (代码已提交svn,下载包稍后稍推迟下) 这次升级的主要目的是支持Lucene 3.0,具体改动如下: (1)支持Lucene 3.0,对Lucene 3.0以下的版本,请使用 http://paoding.googlecode.com/svn/branches/paoding-for-lucene-2.4/ 中的代码编译。 (2)使用Java 5.0编译,不再支持Java 1.4,以后的新功能将会在Java 5上开发。 (3)PaodingAnalyzer的调用接口没有改动,但在使用上需要适应Lucene 3.0的API,分词示例如下: //生成analyzer实例 Analyzer analyzer = new PaodingAnalyzer(properties); //取得Token流 TokenStream stream = analyzer.tokenStream("", reader); // [Less]

0
 
  0 reviews  |  1 user  |  6,611 lines of code  |  0 current contributors  |  Analyzed 3 days ago
 
 

CJK Decomposition FileThe CJK Decomposition File is a graphical analysis of the most common 20,934 Chinese/Japanese characters in Unicode (the 20,922 characters in the Unicode CJK common ideograph block, plus the 12 unique characters from the CJK compatibility block). For each character, I've ... [More] recorded one or two constituent components, and a decomposition type. Only pictorial configurations are used, not semantic ones. Where characters have typeface differences I've used the one in the Unicode spec reference listing. When there's more than one possible configuration, I've selected one only. I've "created" a few thousand characters to cater for decomposition components not themselves among the collected characters. (Although many are in the CJK extension A and B blocks, I kept those out of scope.) To represent these extra characters in the data, sometimes I've used a multi-character sequence, sometimes a user-defined glyph. DownloadsThe download file is a zip containing 2 files: (1) The CSV-format data file, with 4 fields: the character, first component, second component (or -), type of decomposition. (2) The truetype font file to make viewing the data file easier. LicenceThe CSV-format file is totally my own work, and distributed under both the Apache software licence 2.0 and LGPL licences. Although I used many internet-based listings to help create the data, there's no trace of them in the data file itself. The font file is based on some pre-existing proprietary or copyleft font, and inherits that licence. If you need to use the decomposition data in some BSD-style-licenced work, you can write a quick script to replace the user-defined glyphs with a unique multi-character identifying sequence. Notes(1) This data has been used in Christoph Burgmer's CJKLib software. (2) See my own ongoing progress on creating a CJK-character-based programming language at http://gavingrover.blogspot.com and http://code.google.com/p/groovyscript. (3) The vy-language beta downloads are now all deprecated because development on it has ceased. [Less]

0
 
  0 reviews  |  0 users  |  3,755 lines of code  |  0 current contributors  |  Analyzed 4 months ago
 
 

pymmseg-cpp is a Python port of the rmmseg-cpp project. rmmseg-cpp is a MMSEG Chinese word segmenting algorithm implemented in C++ with a Ruby interface.

5.0
 
  0 reviews  |  0 users  |  1,312 lines of code  |  1 current contributor  |  Analyzed 11 days ago
 
 

Font Industry industrialize the procedure of big charset font production. It free the big charset font creation from artists' studio to the average John's basement. The font market will be flooded with huge amount of cheap, low quality, big charset, hand script font in no time. Be scared! Be ... [More] very scared! What does Font Industry really do? The program converts a scanned in grid sheet containing a lot of glyphs into a bitmap font. The glyphs will be automatically indexed with unicode. That's it. More information can be found in the wiki page, 中文维基网页. 从字体设计作坊到字体制造托拉斯的飞跃 Linkshashao's other little projects. 非程序员 Python 编程概念: 让非程序员迅速上手写程序,一堂课的内容。 News2008-10-12 Release Font Industry 0.0.9. 2008-10-01 Release CenterGlyph Version 0.1 2008-07-18 Release Version 0.0.8. 2008-03-05 Release Version 0.0.7. 2008-03-02 Release version 0.0.6. 2008-01-19 Release version 0.0.5. 一朝被蛇咬,十年怕井绳: Once bitten, twice shy: 第一次被咬,第二次害臊。 [Less]

0
 
  0 reviews  |  0 users  |  0 current contributors  |  Analyzed 3 days ago
 
 
 
 

Creative Commons License Copyright © 2013 Black Duck Software, Inc. and its contributors, Some Rights Reserved. Unless otherwise marked, this work is licensed under a Creative Commons Attribution 3.0 Unported License . Ohloh ® and the Ohloh logo are trademarks of Black Duck Software, Inc. in the United States and/or other jurisdictions. All other trademarks are the property of their respective holders.