Projects tagged ‘data_management’


[6 total ]

4 Users
 

A flexible metadata database that utilizes XML as a common syntax for representing the large number of metadata content standards relevant to ecology. Thus, Metacat is a generic XML database that ... [More] allows storage, query, and retrieval of arbitrary XML documents without prior knowledge of the XML schema. [Less]
Created about 1 year ago.

3 Users
 

Morpho allows you to create and manage your data, and to share it with others. It was created to provide an easy-to-use, cross-platform application for accessing and manipulating metadata and data ... [More] (both locally and on the network). Morpho allows ecologists to create metadata, (i.e. describe their data in a standardized format), and create a catalog of data & metadata upon which to query, edit and view data collections. In addition, It also provides the means to access network servers, in order to query, view and retrieve all relevant, public ecological data! [Less]
Created about 1 year ago.

2 Users
 

The Ecological Metadata Language (EML) project is an open source, community oriented project dedicated to providing a high-quality metadata specification for describing scientific data relevant to the ... [More] ecological discipline. These data often consist of observational data on the distribution and abundance of organisms, the processes that control these systems, and the biotic and abiotic environment in which the organisms are embedded. The project is completely comprised of voluntary project members who donate their time and experience in order to advance information management for ecology. [Less]
Created about 1 year ago.

0 Users

Falkon aims to enable the rapid and efficient execution of many tasks on large compute clusters. Falkon integrates (1) multi-level scheduling to separate resource acquisition from task dispatch, and ... [More] (2) a streamlined dispatcher. Falkon’s integration of multi-level scheduling and streamlined dispatchers delivers performance not provided by any other system. Microbenchmarks show that Falkon throughput (ranging from 100s to 1000s of tasks/sec) and scalability (to 54K executors and 2M queued tasks) are several orders of magnitude better than other systems used in production Grids. Furthermore, we have extended Falkon to include data management functionality. Scientific and data-intensive applications often require exploratory analysis on large datasets, which is often carried out on large scale distributed resources where data locality is crucial to achieve high system throughput and performance. We propose a “data diffusion” approach that acquires resources for data analysis dynamically, schedules computations as close to data as possible, and replicates data in response to workloads. As demand increases, more resources are acquired and “cached” to allow faster response to subsequent requests; resources are released when demand drops. This approach can provide the benefits of dedicated hardware without the associated high costs, depending on the application workloads and the performance characteristics of the underlying infrastructure. This data diffusion concept is reminiscent of cooperative Web-caching and peer-to-peer storage systems. Other data-aware scheduling approaches assume static or dedicated resources, which can be expensive and inefficient if load varies significantly. The challenges to our approach are that we need to co-allocate storage resources with computation resources in order to enable the efficient analysis of possibly terabytes of data without prior knowledge of the characteristics of application workloads. To explore the proposed data diffusion, we have extended Falkon to allow the compute resources to cache data to local disks, and perform task dispatch via a data-aware scheduler. The integration of Falkon and the Swift parallel programming system provides us with access to a large number of applications from astronomy, astro-physics, medicine, and other domains, with varying datasets, workloads, and analysis codes. Large-scale astronomy and medical applications executed under Falkon by the Swift parallel programming system achieve up to 90% reduction in end-to-end run time, relative to versions that execute tasks via separate scheduler submissions. Furthermore, data diffusion can further decrease application execution times by several factors and improve overall application scalability. Falkon Goals Reducing task dispatch time by using a streamlined dispatcher that eliminates support for features such as multiple queues, priorities, accounting, etc. Using an adaptive provisioner to acquire and/or release resources as application demand varies. Improve application performance and scalability through data diffusion and data-aware scheduling to leverage the co-located computational and storage resources offloading the shared file systems I/O with local disk I/O. For more information on the project, please see the main Falkon site at http://dev.globus.org/wiki/Incubator/Falkon. [Less]
Created 12 months ago.

0 Users
 

OMERO is modern client-server software for visualising, managing, and annotating microscope images and metadata. The OMERO components also provide image importing, archiving, protocol recording, and ... [More] user administration. OMERO consists of a Java server, several Java client applications, as well as Python and C++ bindings and a Django-based web application. OMERO is designed, developed and released by the Open Microscopy Environment, with contributions from Glencoe Software, Inc. OMERO is released under the GNU General Public License (GPL). [Less]
Created about 1 year ago.

0 Users

ftools-qgis provides a set of advanced spatial analysis tools designed to extend the functionality of Quantum GIS, a free, open-source GIS. Currently, ftools-qgis features: 1) manageR - Interface ... [More] to R statistical analysis Provides advanced statistical functionality within QGIS by loosely coupling QGIS with the R statistical programming language. Allows upload of QGIS layers directly into R, and the ability to perform R operations on the data directly from within QGIS. It interfaces with R using RPy, which is a Python interface to the R Programming Language 2) voronoipolygons - Voronoi/Thiessen tessellation Given a set of points in the plane, there exists an associated set of regions surrounding these points, such that all locations within any given region are closer to one of the points than to any other point. These regions are often referred to as proximity polygons, Voronoi polygons or Thiessen regions. [Less]
Created 9 months ago.