Projects tagged ‘information_retrieval’


[35 total ]

196 Users
   

Lucene is an information retrieval API originally implemented in Java. Lucene has been ported to other programming languages including Perl, C#, C++, Python, Ruby and PHP.
Created over 3 years ago.

56 Users
   

Lucene.Net is a source code, class-per-class, API-per-API and algorithmatic port of the Java Lucene search engine to the C# and .NET platform utilizing Microsoft .NET Framework.
Created over 2 years ago.

19 Users
 

NLTK — the Natural Language Toolkit — is a suite of open source Python modules, linguistic data and documentation for research and development in natural language processing, supporting dozens of ... [More] NLP tasks, with distributions for Windows, Mac OSX and Linux. [Less]
Created over 3 years ago.

13 Users
   

Xapian is an Open Source Search Engine Library, released under the GPL. It's written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C#, and Ruby (so far!) Xapian is a highly ... [More] adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators. [Less]
Created over 3 years ago.

13 Users
   

Strigi is an information extraction and indexing library, that comes wih a daemon which uses a very fast and efficient crawler that can index data on your harddrive. Indexing operations are performed ... [More] without hammering your system, this makes Strigi the fastest and smallest desktop searching program. [Less]
Created over 3 years ago.

9 Users

Hibernate Search brings the power of full text search engines to the persistence domain model and Hibernate experience, through transparent configuration (Hibernate Annotations) and a common API. ... [More] Full text search engines like Apache Lucene(tm) allow applications to execute free-text search queries. However, it becomes increasingly more difficult to index a more complex object domain model - keeping the index up to date, dealing with the mismatch between the index structure and the domain model, querying mismatches, and so on. [Less]
Created over 2 years ago.

8 Users

Snowball is a small string processing language designed for creating stemming algorithms for use in Information Retrieval. The project includes a compiler for Snowball, and several useful stemmers which have been implemented using it.
Created over 3 years ago.

7 Users

GATE (General Architecture for Text Engineering) is an architecture, framework and development environment for developing, evaluating and embedding Human Language Technology
Created over 3 years ago.

7 Users

Sphinx is a full-text search engine, it's a standalone search engine, meant to provide fast, size-efficient and relevant full-text search functions to other applications. Sphinx was specially designed ... [More] to integrate well with SQL databases and scripting languages. [Less]
Created about 1 year ago.

5 Users
 

Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files. Swish-e is ideally suited for collections of a million documents or smaller. Using the ... [More] GNOME™ libxml2 parser and a collection of filters, Swish-e can index plain text, e-mail, PDF, HTML, XML, Microsoft® Word/PowerPoint/Excel and just about any file that can be converted to XML or HTML text. [Less]
Created over 3 years ago.