Provenance Capture of Unmanaged Workflows with Karma with Beth Plale
Abstract: For the digital data created as an outcome of scientific discovery to retain its value over time, the data must undergo some level of curation. In order for archival of scientific data to be fully realized, however, curation costs must come down. This will be achieved in part through tools that automate metadata and provenance collection. In this talk I present a logical architecture of a standalone provenance system, and the Karma system that implements it. We focus on the implications of unmanaged workflows particularly on the representation of provenance information. Achieving flexible forms provenance creation has tradeoffs in where the burden of effort lay and in accuracy of the results. Finally, we discuss an evaluation of the performance of Karma under two capture scenarios and increasing workloads and determine the system to be scalable to a mid-range workload.
Bio: Beth Plale is Director of the Data to Insight Center and an Associate Professor in the School of Informatics and Computing at Indiana University Bloomington. Professor Plale did her postdoctoral work at Georgia Institute of Technology and has a Ph.D. in computer science from State University of New York Binghamton. Plale is an experimental computer scientist whose research is on data cyberinfrastructure and tools in an interdisciplinary research setting. Her research interests are in data provenance, metadata catalogs, automated digital curation, workflow systems in e-Science, and complex events processing. Plale is a recipient of the DOE Early Career award and is an ACM Senior Member and IEEE Member. (personal website)
This talk was sponsored by the Data to Insight Center.
Trouble viewing? Try: