Select a tag to browse associated projects and drill deeper into the tag cloud.
CouchDb is a distributed document database system with bi-directional replication. It makes it simple to build collaborative applications that can be replicated offline by users, with full interactivity (query, add, update, delete), and later "synced up" with everyone else's changes when back online.
Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named map/reduce, where the application is divided into many small fragments ... [More]
Apache Mahout's goal is to build scalable machine learning libraries. With scalable we mean: Scalable to reasonably large data sets. Our core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm. ... [More]
Riak combines a decentralized key-value store, a flexible map/reduce engine, and a friendly HTTP/JSON query interface to provide a database ideally suited for Web applications.
Hundreds of functions of a variety of topics, from statistics to string parsing, module utilities to network tools. Everyone's pet library accumulates features over time. My erlang library got big, fast. I often find myself giving functions from it out to other people, and a lot of my other ... [More]
Infinispan is an open source, JVM based data grid platform. Infinispan is a high performance, distributed and highly concurrent data structure. Also supports JTA transactions, eviction, and passivation/overflow to external storage.
O/C mapper (object to cloud). Leverage Windows Azure without getting dragged down by low level technicalities. Key features * Queue Services as a scalable equivalent of Windows Services. * Scheduled Services as a cloud equivalent of the task scheduler. * Strong-typed blob I/O. * Scalable logs ... [More]
Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers. The Disco core is written in Erlang, a functional language that is designed for ... [More]
Apache Flume is a system for reliably collecting high-throughput data from streaming data sources like logs.
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on a Hadoop cluster.