Indiana University

Follow us on Facebook!

XML Metadata Concept Catalog (XMC Cat)

D2I is no longer actively supporting XMC Cat, but the source code is still available.


XMC Cat is a metadata catalog that stores rich metadata describing data objects that are themselves stored in files, storage repositories, or on the web. It is an open source web service written in Java that utilizes the Axis2 web service engine and Apache Tomcat. Its features include adaptability to domain schemata through configuration instead of code changes, support for automatic capture of metadata through the use of curation plugins, and search and browse capabilities through a web-based GUI that is dynamically generated from a domain schema. This allows XMC Cat to be deployed in different scientific and educational domains without requiring new code to be written. XMC Cat is currently in use in the LEAD Science Gateway.

Role of Metadata Concepts in XMC Cat

Metadata schemas used in science and education are composed of complex concepts that describe the data products generated by a community. XMC Cat exploits this unique feature of scientific metadata to both efficiently store metadata and perform detailed data discovery queries.  This concept-based approach also enables the automatic generation of the data search GUI and easier deployment of metadata catalogs in diverse scientific domains.  An XML metadata schema (or schemas) is partitioned into the concepts it contains and metadata can be efficiently ingested and validated incrementally using concepts as the unit of storage.  These concepts are also shredded to allow detailed data discovery through a point-and-click search GUI.  This combination of concepts as the unit of metadata storage along with shredded metadata provide efficiency for both insert and query operations by enabling for the rapid rebuilding of the XML metadata in response to detailed data discovery queries.  This approach also enables the query interface to dynamically adapt to the native data type for each metadata element, be it numeric, string, temporal, or spatial.

Query interfaces customized for the schema are constructed automatically based on the metadata schema for which each community deploys XMC Cat.  Additionally, the necessary XML Beans and XSLT code needed to configure XMC Cat for a domain schema can be generated through a point-and-click web interface.

License
XMC Cat is an open source tool licensed under the Apache 2.0 license.  A copy of the license is available at http://www.apache.org/licenses/LICENSE-2.0

News
  • D2I's Scott Jensen gave a talk on "Adaptable and Incremental Metadata Capture in e-Science" at University of Chicago's Computation Institute on March 2, 2012.
  • Beth Plale and Scott Jensen, along with DataONE colleagues from University of New Mexico and Oak Ridge National Labs, will be presenting the tutorial M13: Big Data Means Your Metadata Must Work at SC11 in Seattle on Monday November 14th. XMC Cat will be one of the metadata tools discussed.
  • Stop by the Indiana University booth on the exhibition floor at SC11 on November 15th - 17th. See the exciting work D2I is doing, including demos of XMC Cat!
  • Scientific Data Discovery with XMC Cat. Pushing Back on the Data Deluge: Advancements in Metadata, Archival and Workflows. Presented at Supercomputing 2010, Nov 15-19; PPTX
  • XMC Cat: An Adaptive Catalog for Scientific Metadata, Improving Observing Network Coordination:  A Cyberinformatics Forum,  Boulder, CO, US, May 17-18, 2010; PDF; PPTX 
  • National News Story, May 2010 
  • Watch a short video of Scott Jensen describing XMC Cat: http://pti.iu.edu/video/xmccat

Contacting Us
If you have questions or comments on XMC Cat, you can contact us at: xmccat [at] cs [dot] indiana [dot] edu

Get Up and Running With XMC Cat Installation prerequisites, build instructions, installation instructions for the server and client, as well as additional help on configuration settings, sample code for building your own client tools to work with XMC Cat, and the XMC Cat FAQ can all be found on the Data to Insight's wiki. Below are direct links to the relevant wiki pages:

 

Contributors
Scott Jensen
Scott Jensen's research focus is on metadata management (with a particular focus on scientific data), data management, data provenance, services and SOA, XML, XML-Relational storage, search interfaces, and the Semantic Web. His dissertation work focused on identifying the characteristics of XML-based metadata and differences from general XML storage that can be exploited to provide faster query response for scientific data while using a flexible, scalable, and adaptable generic relational database structure that can be applied to varied scientific domains using different metadata schemas and data hierarchies.

Beth Plale
Professor Beth Plale serves as Director of the Data to Insight Center and the Center for Data and Search Informatics for Pervasive Technology Institute. She is an associate professor of Computer Science and Informatics. Plale is a national leader in data and information management and serves on leadership teams of several major grant funded projects including the large NSF funded LEAD project in cyberinfrastructure for mesoscale meteorology forecasting.
Additional Contributors
Yiming Sun
Yiming Sun is a PhD Candidate whose research areas focus on the long-term preservation of e-Science experiments and artifacts, reuseable preservation objects, data provenance, metadata, cyberinfrastructure, and services. He is also a research staff in the Data-to-Insight center, currently working on the HathiTrust Research Center project (HTRC).

Shobana Krishnan

Bina Bhaskar
Master's student in the CS department of IUB.

Kavitha Chandrasekar
Kavitha Chandrasekar is a Research Software Engineer at Data to Insight Center. She has worked on the Lead II project, running workflows with Trident Scientific Workflow Workbench. She is currently working as a programmer on the Sustainable Environment Actionable Data (SEAD) project and is also involved in projects on running workflows on the cloud.

Kalani Ruwanpathirana
Kalani is a software analyst in University Information Technology Services (UITS) at Indiana University-Bloomington (IUB).

Bimalee Salpitikorala

 

Publications & Tutorials

 

Sponsors, Oct 2005 - present

 

Related News, Events and Publications:

2012 Fall Seminar Series D2I hosts a series of seminars each semester. Click here for dates, locations, abstracts, bios, slides, archives of talks and general information about the Fall 2012 Seminar Series.
Indiana University Pervasive Technology Institute Report to the Lilly Endowment, Inc. 48 Month Program Report Jun 1 - Nov 30, 2012 Bi-Annual report to the Lilly Endowment, Inc. Search for "Lilly Report" to find all reports.
Adaptable and Incremental Metadata Capture in e-Science

Presented by Scott Jensen, March 2, 2012. Scientific communities are recognizing an increasing need to enable reuse of the deluge (or bonanza) of scientific data currently being generated. Detailed metadata, or 'data about data', is key to preserving the value, as well as enabling the sharing and reuse of data. Communities have developed detailed XML schemata to capture and communicate metadata describing scientific data. Historically however, to the extent metadata has been captured at all...

D2I's Scott Jensen gives invited talk on metadata capture in e-Science at Computation Institute

"Adaptable and Incremental Metadata Capture in e-Science", March 2, 2012, Searle 240A, University of Chicago Computation Institute.

D2I: Adaptable and Incremental Metadata Capture in e-Science Scott Jensen, Post Doc Research Associate, Data to Insight Center, Indiana University. Scientific communities are recognizing an increasing need to enable reuse of the deluge (or bonanza) of scientific data currently being generated. Detailed metadata, or “data about data”, is key to preserving the value, as well as enabling the sharing and reuse of data. Communities have developed detailed XML schemata to capture and communicate metadata describing scientific data. Historically however, to...
XML Metadata Concept Catalog (XMC Cat) XMC Cat is a metadata catalog that stores rich metadata describing data objects that are themselves stored in files, storage repositories, or on the web. Its features include adaptability to domain schemata through configuration instead of code changes, support for automatic capture of metadata through the use of curation plugins, and search and browse capabilities through a web-based GUI that is dynamically generated from a domain schema. IT can be deployed in different scientific and...
Karma Provenance Collection Tool The Karma tool is a standalone tool that can be added to existing cyberinfrastructure for purposes of collection and representation of provenance data. Karma utilizes a modular architecture that permits support for multiple instrumentation plugins that make it usable in different architectural settings.
XMC Cat Downloads Both the client and server can either be downloaded as binaries or compiled from source code by downloading the source tarball. The source code is configured to be built using Maven2, and the build script will generate the client, server, and some additional utilities used in XMC Cat. Since XMC Cat is a web service described by a WSDL, clients can also use the tool of their choice to to build a client. In the server installation of XMC Cat, we use the XML Beans data binding.
Provenance and Metadata The Data to Insight Center has a strong presence in provenance and metadata for scientific data through numerous funded projects and efforts with collaborators both at Indiana University and at other institutions.
Provenance
As research digital data collections become more accessible, it becomes increasingly important to address the issues of data validity and quality: To record and manage information about where each data object originated, the processes applied to...
Indiana University Pervasive Technology Institute Report to the Lilly Endowment, Inc. 36 Month Program Report Jun 1, 2011 - Nov 30, 2011 Bi-Annual report to the Lilly Endowment, Inc. Search for "Lilly Report" to find all reports.