Indiana University

Follow us on Facebook!

Tools

Overview

As part of its mission, the Data to Insight Center is committed to develop and deploy tools for data management, data discovery, data reuse and metadata creation for researchers. We have a number of tools that are freely available to researchers.

SEAD: An integrated infrastructure to support data stewardship in sustainability science
HTRC: Transition to Outreach Phase II
Temporal Representation for Scientific Data Provenance Provenance of digital scientific data is an important piece of the metadata of a data object. It can however grow voluminous quickly because the granularity level of capture can be high. It can also be quite feature rich. We propose a representation of the provenance data based on logical time that reduces the feature space. Creating time and frequency domain representations of the provenance, we apply clustering, classification and association rule mining to the abstract representations to...
Digital Library Tools The Indiana University Digital Library Program (DLP) is dedicated to the production, maintenance, delivery, and preservation of a wide range of high-quality networked resources for scholars and students at Indiana University and elsewhere.
HathiTrust Research Center Tools The HTRC will offer a suite of tools for computational text analysis. These tools will cover a wide variety of functions ranging from simple statistical analysis of words to complex algorithms relating concepts and meaning.
XML Metadata Concept Catalog (XMC Cat) XMC Cat is a metadata catalog that stores rich metadata describing data objects that are themselves stored in files, storage repositories, or on the web. Its features include adaptability to domain schemata through configuration instead of code changes, support for automatic capture of metadata through the use of curation plugins, and search and browse capabilities through a web-based GUI that is dynamically generated from a domain schema. IT can be deployed in different scientific and...
Karma Provenance Collection Tool The Karma tool is a standalone tool that can be added to existing cyberinfrastructure for purposes of collection and representation of provenance data. Karma utilizes a modular architecture that permits support for multiple instrumentation plugins that make it usable in different architectural settings.
The HathiTrust Research Center and tool builders Presentation given by John Unsworth on the HathiTrust Research Center at Corpora Space Workshop II, June 6, 2011.
Data Catalog Data Catalog harvests data product metadata from distributed THREDDS catalogs into an XMC Cat instance. The metadata (including data product location) is then available to applications via the XMC Cat API. Data catalog metadata harvesting employs a shared nothing ingest pipeline to allow for the indexing of large catalogs such as NEXRADIII and has indexed of over 17 thousand collections and 2 million files.
Streamflow Streamflow integrates data streams into a standard workflow system through a programming model approach that introduces new workflow semantics that enable scientific workflow designers to incorporate data streams into the experiment without major changes to the infrastructure. It utilizes XBaya as a graphical client program for workflow composition, execution and monitoring.
Sigiri We propose a simple abstraction for interaction with heterogeneous resource managers spanning grid and cloud computing, and on features that make the tool useful for the mid-scale physical or natural scientist. Key strengths of the abstraction are its support for multiple standard job specification languages, preservation of direct user interaction with the service, removing the delay that can come through layers of services, and the predictable behaviour under heavy loads.
D2I/LEAD II Successfully Supports Vortex2

In support of the 2010 Vortex2 campaign, LEAD II successfully executed 214 workflows, used 109,568 CPU hours, generated 215 GB of data and over 9100 2D products. Use the Field Viewer or Mobile Viewer or access the archived data sets.