Projects tagged ‘indexer’ and ‘information_retrieval’


[8 total ]

8 Users

Sphinx is a full-text search engine, it's a standalone search engine, meant to provide fast, size-efficient and relevant full-text search functions to other applications. Sphinx was specially designed ... [More] to integrate well with SQL databases and scripting languages. [Less]
Created about 1 year ago.

2 Users

Key features: Support for http, https, ftp, nntp and news URL schemes. htdb virtual URL scheme for indexing SQL databases. Indexes text/html, text/xml, text/plain, audio/mpeg (MP3) and image/gif ... [More] mime types natively. External parsers support for other document types, including Microsoft Word, Excel, RTF, PowerPoint, Adobe Acrobat PDF and Flash. Can index multilingual sites using content negotiation. Searching all of the word forms using ispell affixes and dictionaries. Synonym, acronym and abbreviation query expansion based on editable dictionaries, specified by language and charset. Stop-words, synonyms and acronyms lists. Options to query with all words, all words near to each others, any words, or Boolean queries. A subset of VQL (Verity Query Language) is supported. Popularity Rank ba [Less]
Created about 1 year ago.

2 Users

An open source search engine based on best open source technologies: lucene, zkoss, tomcat, poi, tagsoup. A stable, high-performance piece of software. It is both a modern search engine and a suite of high-powered full text search algorithms.
Created 5 months ago.

1 Users

mnoGoSearch is a full-featured SQL based web search engine.
Created about 1 year ago.

1 Users

HITEC is a software package for very high accuracy automatic text categorization . The engine of HITEC is the implementation of UFEX (Universal Feature EXtractor) for textual documents. UFEX is a very ... [More] sophisticated learning method that ensures the outstanding categorizing performance of HITEC, hence HITEC outperforms its competitors in case of all investigated document collections. (For further details, read the white paper). HITEC applies supervised learning method, that is it learns based on training data (learning phase), and is able to classify new documents to known categories (operational phase). Obviously, the performace of categorization strongly depends on the quality of training data. For efficient training HITEC requires - fixed category system (usually ordered in hierarchy); during the operational phase the new, unknown documents will be classified into that system; - some relevant training documents for each category of the category system. During the operation, HITEC returns an ordered list of most relevant categories for unknown documents based on confidence values. The greater is this value HITEC deems the more relevant the corresponding category to the document. The returned list if documents can be further processed depending on the nature of classification problem. If perfect accuracy is required for the classification, an expert can accept, revise, or reject categories proposed by HITEC. If the accuracy of around 90\% having been experienced at tests is sufficient, then proposed categories can be accepted based upon their confidence value. HITEC is programmed very efficiently, therefore its high performace comes with fast operation even on very large document collections. Once the training of HITEC has been done for a document collection, the operation phase is performed in real-time (see also test pages). It is able to process hunderds of gigabytes in reasonable time (training phase) and work with thousands of categories on an average PC. [Less]
Created about 1 year ago.

1 Users
 

Flax is a project to develop an open source enterprise search engine application based on the Xapian search engine library. It also contains a clean-and-simple Python interface suitable for many users ... [More] of Xapian, built on the standard Xapian Python interface, together with various other add-ons such as performance testing utilities. [Less]
Created over 2 years ago.

1 Users

Pyndexter provides a uniform API for accessing a variety of full-text search and indexing engines. It aims to be to full-text indexing systems what the Python DB API is to databases. It presents a ... [More] uniform query syntax to the user, with support for quoted search terms, boolean operations, sub-expressions and attribute (metadata) querying. Indexers supported are a basic but functional pure-Python indexer, adapters for Hype, Hyperestraier, Lucene, Lupy, Pyndex, Swish-e and Xapian. [Less]
Created over 3 years ago.

0 Users

Namazu is a full-text search engine intended for easy use. Not only does it work as a small or medium scale Web search engine, but also as a personal search system for email or other files.
Created about 1 year ago.