Browsing projects by Tag(s)

Select a tag to browse associated projects and drill deeper into the tag cloud.

Showing page 1 of 1

MalGenIs a set of scripts which generate large, distributed data sets suitable for testing and benchmarking software designed to perform parallel processing on large data sets. The data sets can be thought of as site-entity log files. After an initial seeding, the scripts allow for the data ... [More] generation to be initiated from a single central node to run the generation concurrently on multiple remote nodes of the cluster. The data generated follows certain statistical distributions which we believe presents a usable model for such logs. There are two intended uses for MalGen is to generate a large, possibly distributed, data set for use with analytics. is to generate data for use with benchmarking algorithms or applications. With the first use, records are generated probabilistically and extra records may be produced so that the entire data set follows the specified distribution. With the second use, strict adherence to the distribution is not necessary as the user is more interested in generating exactly the specified number of records. Release v0.9 exposes a switch which can be used at the command line to toggle between following the distribution and generating exactly the number of records specified. When the distribution is followed, the number of records generated is probabilistic, so there is no way to accurately determine how many records will be included in each generated file. When the exact number of records is generated, the data may be slightly inappropriate for statistical analysis. View MalGen_vX.X_Overview.pdf in the distribution or download it separately for more details, including information on using the scripts. MalStone2009-06-18. v0.8.2 has just been released. MalStone is a stylized benchmark for data intensive computing that uses records generated by MalGen. The MalStone A-10 and B-10 benchmarks each consist of 10 billion records and the timestamps are all within a year period. The MalStone A-10 benchmark computes a ratio for each site w as follows: for each site w, aggregate all entities that visited the site at any time, and compute the percent of visits for which the entity became compromised at any future time subsequent to the visit. MalStone B-10 is similar except that the ratio is computed each week d, and computes: for each site w, and for all entities that visited the site at week d or earlier, the percent of visits for which the entity became compromised at any time between the visit and the end of the week d. The MalStone package is available on the Downloads tab. Sample run of MalGen data generation Compromised Stage Uncompromised Stage Num Records RAM Duration Num Records RAM Duration 100 M 16 GB 60 min 100 M 4 GB 54 min 500 M 16 GB 190 min 500 M 4 GB 275 min [Less]

0
 
  0 reviews  |  0 users  |  465 lines of code  |  0 current contributors  |  Analyzed 2 days ago
 
 

ThriftStoreThere are several open-source cloud storage systems currently in use, such as Hadoop, CloudStore, and Sector. Although all of these systems implement a distributed file system providing reliable storage of large data sets, each uses its own client interface to access this data. ... [More] Additionally, using the client interfaces requires coding in the implementation language of each system. With Thrift we're able to provide a common interface to multiple cloud storage systems which can be accessed from multiple languages. Version 0.6.0 has been released (2009-08-05) and on the server side, it associates all open files with the client which opened them. In a multi-client environment, this allows you to completely shutdown some clients while others are still running. [Less]

0
 
  0 reviews  |  0 users  |  2,560 lines of code  |  0 current contributors  |  Analyzed 11 days ago
 
 

The Open Cloud Consortium (OCC):Supports the development of standards for cloud computing and frameworks for inter-operating between clouds; develops benchmarks for cloud computing; supports reference implementations for cloud computing, preferably open source reference implementations; manages a ... [More] testbed for cloud computing called the Open Cloud Testbed; sponsors workshops and other events related to cloud computing. Large Data Cloud Interoperability This site discusses various open source projects intended to support interoperability between large data clouds. These projects include: Sector JNI A Java Native Interface / NIO interface to the Sector C++ client API, allowing Java clients to access the Sector distributed file system. PySector A Python extension to the Sector C++ client, allowing Python clients to access the Sector distributed file system. PySphere A Python embedding of the Sector / Sphere MapReduce framework. SectorFileSystem An implementation of the Hadoop File System abstraction which allows Hadoop MapReduce applications to be executed against data stored in the Sector distributed file system. ThriftStore A Thrift based service providing a common interface over multiple cloud storage systems. Detailed information can be found at ComponentsOverview [Less]

0
 
  0 reviews  |  0 users  |  0 current contributors  |  Analyzed 1 day ago
 
 
 
 

Creative Commons License Copyright © 2013 Black Duck Software, Inc. and its contributors, Some Rights Reserved. Unless otherwise marked, this work is licensed under a Creative Commons Attribution 3.0 Unported License . Ohloh ® and the Ohloh logo are trademarks of Black Duck Software, Inc. in the United States and/or other jurisdictions. All other trademarks are the property of their respective holders.