Indiana University

Follow us on Facebook!

Karma Provenance Collection Tool

Provenance (or lineage, trace) of digital scientific data is a critical component to broadening sharing and reuse of scientific data.  Provenance captures the information needed to attribute ownership and determine, among other things, the quality of a particular data set.  Provenance collection is often a tightly coupled part of a cyberinfrastructure system, but is better served as a standalone tool.  The Karma tool is a standalone tool that can be added to existing cyberinfrastructure for purposes of collection and representation of provenance data. Karma utilizes a modular architecture that permits support for multiple instrumentation plugins that make it usable in different architectural settings.

Visualization of provenance data is more useful with support for manipulating very large structures, for displaying different views and for interactivity. This can help a user to navigate their experiment information with a mental map of what is going on in the experiment, to compare different experiment runs quantitatively, and to do model selection with an effective collaboration between the user and the discovery system. We developed two plugins to Cytoscape to aid the visual representation and navigation of provenace information.

The Karma Provenance Tool is licensed under Apache License, Version 2.0 (the "License") (  The code is copyrighted and copyright owned by The Trustees of Indiana University.  Karma is a product of the Data to Insight Center of Pervasive Technology Institue ( at Indiana University. See Digital Data Provenance for more information.

Features of Latest Release (v3.2.3)

  •  Add Karma Adaptor package, which is one of the collection tools that make up the Karma provenance collection toolkit to harvest provenance from log files.
  •  getDataForwardFlow() provides the downstream provenance given a data object. The resulting provenance trace has the input data object as the cause for all other things in the provenance trace. This is analogous to getProvenanceHistory().
  •  Optional OPM extensions (wasExecutedOn and wasConnectedTo) support now in graph queries: getWorkflowGraph(), getProvenanceHistory(), and getDataForwardFlow().
  •  Collection support added to getDataProvenanceHistory().
  •  Addition of cache expiration parameter in Defaults to 30 minutes.
  •   Bugfixes:
    •   getProvenanceHistory() - informationDetailLevel is now optional as specified in WSDL. Defaults to COARSE if unspecified in the query document.
    •   getWorkflowGraph() - Fixed duplicate used dependencies in getOPMUsed(). - informationDetailLevel is now optional as specified in WSDL. Defaults to COARSE if unspecified in the query document.

Contact Us

Downloads for v3.2.3


  • Beth Plale [plale at indiana dot edu]

Project Contributors


  • Beth Plale, Project Director 
  • Scott Jensen, Senior Researcher
  • You-Wei Cheah
  • Peng Chen
  • Devarshi Ghoshal
  • Yuan Luo


  • Yiming Sun, Senior Software Developer
  • Mehmet Aktas, Associated Faculty
  • Bin Cao
  • Dennis Gannon
  • Prajakta Purohit
  • Ed Robertson
  • Yogesh Simmhan
  • Girish Subramanian

Digital Data Provenance >>


Sponsors, August 2010 - present

Related News, Events and Publications:

IU develops Komadu, a new suite of data provenance software tools The Indiana University Data to Insight Center (D2I) has released a new suite of software tools, Komadu, designed to help researchers track and verify digital data, a crucial step in computational research.
Komadu Provenance Collection Tool Komadu is a W3C PROV compliant standalone provenance collection tool that can be added to an existing cyberinfrastructure for the purpose of collecting and visualizing provenance data. Komadu is the successor of Karma and it comes with a set of new features and a new API to support easier provenance collection.
Provenance Collection of Biodiversity Analysis on PRAGMA Cloud for Data Sharing
Professor Beth A. Plale co-leads U.S. involvement in new international Research Data Alliance U.S. involvement is led by Rensselaer Polytechnic Institute Computer Science Professor Francine Berman and Professor Beth A. Plale, of the School of Informatics and Computing at Indiana University.
Provenance from Log Files: a BigData Problem
D2I announces release of Karma v3.2.3 Includes a new package, Karma Adaptor, which is one of the collection tools that make up the Karma provenance collection toolkit to harvest provenance from log files.
PRAGMA: Building the PRAGMA Multi-Cloud PRAGMA Students Online Seminar Series, Philip Papadopoulos, San Diego Supercomputer Center, Speaker.
Gigabyte Synthetic Database Provenance of scientific data is a key piece of the metadata record for the data's ongoing discovery and reuse. Provenance collection systems capture provenance on the fly, however, the protocol between application and provenance tool may not be reliable. Consequently, the provenance record can be partial, partitioned, and simply inaccurate. The Gigabyte Synthetic Database is a noisy data collection generated using the Workflow Emulator Tool (WORKEM) with a number of scientific workflow...
2012 Fall Seminar Series D2I hosts a series of seminars each semester. Click here for dates, locations, abstracts, bios, slides, archives of talks and general information about the Fall 2012 Seminar Series.
Indiana University Pervasive Technology Institute Report to the Lilly Endowment, Inc. 48 Month Program Report Jun 1 - Nov 30, 2012 Bi-Annual report to the Lilly Endowment, Inc. Search for "Lilly Report" to find all reports.