Indiana University

Follow us on Facebook!

Karma Provenance Collection Tool

Overview
Provenance (or lineage, trace) of digital scientific data is a critical component to broadening sharing and reuse of scientific data.  Provenance captures the information needed to attribute ownership and determine, among other things, the quality of a particular data set.  Provenance collection is often a tightly coupled part of a cyberinfrastructure system, but is better served as a standalone tool.  The Karma tool is a standalone tool that can be added to existing cyberinfrastructure for purposes of collection and representation of provenance data. Karma utilizes a modular architecture that permits support for multiple instrumentation plugins that make it usable in different architectural settings.

Visualization of provenance data is more useful with support for manipulating very large structures, for displaying different views and for interactivity. This can help a user to navigate their experiment information with a mental map of what is going on in the experiment, to compare different experiment runs quantitatively, and to do model selection with an effective collaboration between the user and the discovery system. We developed two plugins to Cytoscape to aid the visual representation and navigation of provenace information.

The Karma Provenance Tool is licensed under Apache License, Version 2.0 (the "License") (http://www.apache.org/licenses/LICENSE-2.0).  The code is copyrighted and copyright owned by The Trustees of Indiana University.  Karma is a product of the Data to Insight Center of Pervasive Technology Institue (http://pti.iu.edu) at Indiana University. See Digital Data Provenance for more information.

Features of Latest Release (v3.2.3)

  •  Add Karma Adaptor package, which is one of the collection tools that make up the Karma provenance collection toolkit to harvest provenance from log files.
  •  getDataForwardFlow() provides the downstream provenance given a data object. The resulting provenance trace has the input data object as the cause for all other things in the provenance trace. This is analogous to getProvenanceHistory().
  •  Optional OPM extensions (wasExecutedOn and wasConnectedTo) support now in graph queries: getWorkflowGraph(), getProvenanceHistory(), and getDataForwardFlow().
  •  Collection support added to getDataProvenanceHistory().
  •  Addition of cache expiration parameter in karma.properties. Defaults to 30 minutes.
  •   Bugfixes:
    •   getProvenanceHistory() - informationDetailLevel is now optional as specified in WSDL. Defaults to COARSE if unspecified in the query document.
    •   getWorkflowGraph() - Fixed duplicate used dependencies in getOPMUsed(). - informationDetailLevel is now optional as specified in WSDL. Defaults to COARSE if unspecified in the query document.

Contact Us

Downloads for v3.2.3

Contact

  • Beth Plale [plale at indiana dot edu]

Project Contributors

Current:

  • Beth Plale, Project Director 
  • Scott Jensen, Senior Researcher
  • You-Wei Cheah
  • Peng Chen
  • Devarshi Ghoshal
  • Yuan Luo

Historical:

  • Yiming Sun, Senior Software Developer
  • Mehmet Aktas, Associated Faculty
  • Bin Cao
  • Dennis Gannon
  • Prajakta Purohit
  • Ed Robertson
  • Yogesh Simmhan
  • Girish Subramanian

Digital Data Provenance >>

 

Sponsors, August 2010 - present

Related News, Events and Publications:

Professor Beth A. Plale co-leads U.S. involvement in new international Research Data Alliance U.S. involvement is led by Rensselaer Polytechnic Institute Computer Science Professor Francine Berman and Professor Beth A. Plale, of the School of Informatics and Computing at Indiana University.
Provenance from Log Files: a BigData Problem
D2I announces release of Karma v3.2.3 Includes a new package, Karma Adaptor, which is one of the collection tools that make up the Karma provenance collection toolkit to harvest provenance from log files.
PRAGMA: Building the PRAGMA Multi-Cloud PRAGMA Students Online Seminar Series, Philip Papadopoulos, San Diego Supercomputer Center, Speaker.
Gigabyte Synthetic Database Provenance of scientific data is a key piece of the metadata record for the data's ongoing discovery and reuse. Provenance collection systems capture provenance on the fly, however, the protocol between application and provenance tool may not be reliable. Consequently, the provenance record can be partial, partitioned, and simply inaccurate. The Gigabyte Synthetic Database is a noisy data collection generated using the Workflow Emulator Tool (WORKEM) with a number of scientific workflow...
2012 Fall Seminar Series D2I hosts a series of seminars each semester. Click here for dates, locations, abstracts, bios, slides, archives of talks and general information about the Fall 2012 Seminar Series.
Indiana University Pervasive Technology Institute Report to the Lilly Endowment, Inc. 48 Month Program Report Jun 1 - Nov 30, 2012 Bi-Annual report to the Lilly Endowment, Inc. Search for "Lilly Report" to find all reports.
Provenance Analysis: Towards Quality Provenance
Temporal Representation for Scientific Data Provenance Provenance of digital scientific data is an important piece of the metadata of a data object. It can however grow voluminous quickly because the granularity level of capture can be high. It can also be quite feature rich. We propose a representation of the provenance data based on logical time that reduces the feature space. Creating time and frequency domain representations of the provenance, we apply clustering, classification and association rule mining to the abstract representations to...
Visualization of Network Data Provenance Visualization facilitates the understanding of scientific data both through exploration and explanation of the visualized data. Provenance also contributes to the understanding of data by containing the contributing factors behind a result. The visualization of provenance, although supported in existing workflow management systems, generally focuses on small (medium) sized provenance data, lacking techniques to deal with big data with high complexity. This paper discusses visualization...