Browsing projects by Tag(s)

Select a tag to browse associated projects and drill deeper into the tag cloud.

Showing page 1 of 1

最近用Python来处理大量的Log数据,发现Native Python虽然程序简单可靠,但是运行效率上,很成问题,所以计划将一些关键应用部分,用C语言来实现,进一步提高性能。 目前已经实现的两个功能# 对dict的快速序列化和反序列化 ... [More] , key, value只能是String/Int类型,效率是cPickle的600%。(fastmap.dumps, fastmap.loads) #.根据key的hash value对dict进行分区切分操作(fastmap.partition) [Less]

0
 
  0 reviews  |  0 users  |  924 lines of code  |  0 current contributors  |  Analyzed 7 days ago
 
 

FastMap is first introduced in the paper, "FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets". The abstract of this paper is below: A very promising idea for fast searching in traditional and multimedia databases is to map objects ... [More] into points in k-d space, using k feature-extraction functions, provided by a domain expert Jag91. Thus, we can subsequently use highly fine-tuned spatial access methods (SAMs), to answer several types of queries, including the Query By Example ' type (which translates to a range query); the all pairs ' query (which translates to a spatial join BKSS94); the nearest-neighbor or best-match query, etc. However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points. This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some k-dimensional space (k is user-defined), such that the dis-similarities are preserved. There are two benefits from this mapping: (a) efficient retrieval, in conjunction with a SAM, as discussed before and (b) visualization and data-mining: the objects can now be plotted as points in 2-d or 3-d space, revealing potential clusters, correlations among attributes and other regularities that data-mining is looking for. We introduce an older method from pattern recognition, namely, Multi-Dimensional Scaling (MDS) Tor52; although unsuitable for indexing, we use it as yardstick for our method. Then, we propose a much faster algorithm to solve the problem in hand, while in addition it allows for indexing. Experiments on real and synthetic data indeed show that the proposed algorithm is significantly faster than MDS, (being linear, as opposed to quadratic, on the database size N), while it manages to preserve distances and the overall structure of the data-set. [Less]

0
 
  0 reviews  |  0 users  |  0 current contributors  |  Analyzed 9 days ago
 
 
 
 

Creative Commons License Copyright © 2013 Black Duck Software, Inc. and its contributors, Some Rights Reserved. Unless otherwise marked, this work is licensed under a Creative Commons Attribution 3.0 Unported License . Ohloh ® and the Ohloh logo are trademarks of Black Duck Software, Inc. in the United States and/or other jurisdictions. All other trademarks are the property of their respective holders.