Browsing projects by Tag(s)

Select a tag to browse associated projects and drill deeper into the tag cloud.

Showing page 1 of 10

CouchDb is a distributed document database system with bi-directional replication. It makes it simple to build collaborative applications that can be replicated offline by users, with full interactivity (query, add, update, delete), and later "synced up" with everyone else's changes when back online.

4.74286
   
  0 reviews  |  113 users  |  133,364 lines of code  |  39 current contributors  |  Analyzed 7 days ago
 
 

Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named map/reduce, where the application is divided into many small fragments ... [More] of work, each of which may be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework. [Less]

4.63158
   
  0 reviews  |  69 users  |  2,229,693 lines of code  |  35 current contributors  |  Analyzed about 20 hours ago
 
 

Apache Mahout's goal is to build scalable machine learning libraries. With scalable we mean: Scalable to reasonably large data sets. Our core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm. ... [More] However we do not restrict contributions to Hadoop based implementations: Contributions that run on a single node or on a non-Hadoop cluster are welcome as well. The core libraries are highly optimized to allow for good performance also for non-distributed algorithms [Less]

4.25
   
  0 reviews  |  23 users  |  128,512 lines of code  |  14 current contributors  |  Analyzed 2 days ago
 
 
Compare

Riak combines a decentralized key-value store, a flexible map/reduce engine, and a friendly HTTP/JSON query interface to provide a database ideally suited for Web applications.

5.0
 
  0 reviews  |  11 users  |  99,685 lines of code  |  59 current contributors  |  Analyzed 6 days ago
 
 

Hundreds of functions of a variety of topics, from statistics to string parsing, module utilities to network tools. Everyone's pet library accumulates features over time. My erlang library got big, fast. I often find myself giving functions from it out to other people, and a lot of my other ... [More] libraries are dependant on ScUtil in various ways, so I figured what the hell, let's give it away. This library is believed to be efficiently implemented at all points. Efficiency tips are, however, both appreciated and taken seriously. ScUtil uses the TestErl library for unit, regression and stochastic testing. ScUtil is free and MIT licensed, because the GPL is evil. ScUtil is written by John Haugeland, from http://fullof.bs/ . [Less]

4.8
   
  0 reviews  |  11 users  |  8,986 lines of code  |  1 current contributor  |  Analyzed 4 days ago
 
 

Infinispan is an open source, JVM based data grid platform. Infinispan is a high performance, distributed and highly concurrent data structure. Also supports JTA transactions, eviction, and passivation/overflow to external storage.

5.0
 
  0 reviews  |  9 users  |  262,830 lines of code  |  36 current contributors  |  Analyzed 2 days ago
 
 

O/C mapper (object to cloud). Leverage Windows Azure without getting dragged down by low level technicalities. Key features * Queue Services as a scalable equivalent of Windows Services. * Scheduled Services as a cloud equivalent of the task scheduler. * Strong-typed blob I/O. * Scalable logs ... [More] and monitoring. * Inversion of Control on the cloud. * Web administration console for cloud services. [Less]

4.5
   
  0 reviews  |  5 users  |  54,669 lines of code  |  2 current contributors  |  Analyzed about 5 hours ago
 
 

Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers. The Disco core is written in Erlang, a functional language that is designed for ... [More] building robust fault-tolerant distributed applications. Users of Disco typically write jobs in Python, which makes it possible to express even complex algorithms or data processing tasks often only in tens of lines of code. This means that you can quickly write scripts to process massive amounts of data. [Less]

0
 
  0 reviews  |  3 users  |  31,137 lines of code  |  2 current contributors  |  Analyzed 5 minutes ago
 
 

Apache Flume is a system for reliably collecting high-throughput data from streaming data sources like logs.

0
 
  0 reviews  |  2 users  |  64,610 lines of code  |  11 current contributors  |  Analyzed 3 days ago
 
 

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster.

0
 
  0 reviews  |  2 users  |  58,249 lines of code  |  1 current contributor  |  Analyzed 12 months ago
 
 
 
 

Creative Commons License Copyright © 2013 Black Duck Software, Inc. and its contributors, Some Rights Reserved. Unless otherwise marked, this work is licensed under a Creative Commons Attribution 3.0 Unported License . Ohloh ® and the Ohloh logo are trademarks of Black Duck Software, Inc. in the United States and/or other jurisdictions. All other trademarks are the property of their respective holders.