Skip to Main Content

Research Data Management

This guide provides information on how to better manage and share research data in any discipline.

Email this link:

Documenting Data

Data documentation will ensure that your data will be understood and interpreted by any user.  It will explain how your data was created, what the context is for the data, structure of the data and its contents, and any manipulations that have been done to the data. Also see: Guide to Writing a "readme" File.

What's important to document? 

  • Context of data collection
  • Data collection methodology
  • Structure and organization of data files
  • Data validation and quality assurance
  • Data manipulations through data analysis from raw data
  • Data confidentiality, access and use conditions

Data-level documentation

  • Variable names and descriptions
  • Definition of codes and classification schemes
  • Codes of, and reasons for, missing values
  • Definitions of specialty terminology and acronyms
  • Algorithms used to transform data
  • File format and software used

Creating Metadata

Properly describing and documenting data allows users (yourself included) to understand and track important details of the work. In addition to describing data, having metadata about the data also facilitates search and retrieval of the data when deposited in a data repository.  In a lab setting, much of the content used to describe data is initially collected in a notebook; metadata is a more formal, sharable expression of this information.  Where no appropriate formal metadata standard exists, writing “readme” style metadata is an appropriate strategy.

Metadata is information about data, and describes basic characteristics, such as:

  • Who created the data
  • Contact information
  • What the data file contains
  • When and where (geographic location)  the data was generated
  • Why the data was generated
  • How the data was generated
  • Details about units of measure
  • Abbreviations or codes used in the dataset
  • Instrument / protocol information / survey tool details,
  • Provenance and version information

Metadata makes it easier to identify and reuse data at a later date.

Recommended Metadata Elements

Title Name of the project or collection of datasets
Creator Names and institutions of the people who created the data
Date Key dates associated with the data, such as dates covered by the data or date of creation
Description Description of the resource
Keywords or subjects Keywords or subjects describing the content of the data>/td>
Identifier Unique number or alphanumeric string used to identify the data
Coverage (if applicable) Geographic coverage
Language Language of the resource
Publisher Entity responsible for making the dataset available
Funding Agencies Organization or agency who funded the research
Access restrictions Where and how your data can be accessed by other researchers
Copyright Copyright date and type
Format What format your data is in

 

Metadata Redefined

Well-structured metadata supports the long-term discovery and preservation of research data, but allows for the aggregation and simultaneous searching of research data from tens or hundreds or thousands of researchers. This is why domain-specific repositories typically require highly structured metadata with your data submissions: it enables highly granular searches on their aggregated content. This in turn makes your data easier to find.

Metadata can take many different forms, from free text to standardized, structured, machine-readable, extensible content. Specific disciplines, repositories or data centers may guide or even dictate the content and format of metadata, possibly using a formal standard. Because creation of standardized metadata can be difficult and time consuming, another consideration when selecting a standard is the availability of tools that can help generate the metadata (e.g. Morpho allows for easy creation of EML, Nesstar for DDI data, etc.).

The Digital Curation Center provides a catalog of common metadata standards, organized by discipline: http://www.dcc.ac.uk/resources/metadata-standards.

Some specific examples of metadata standards, both general and domain specific are:

  • Dublin Core - domain agnostic, basic and widely used metadata standard
  • DDI (Data Documentation Initiative) - common standard for social, behavioral and economic sciences, including survey data
  • EML (Ecological Metadata Language) - specific for ecology disciplines
  • ISO 19115 and FGDC-CSDGM (Federal Geographic Data Committee's Content Standard for Digital Geospatial Metadata) - for describing geospatial information
  • MINSEQE (MINimal information about high throughput SEQeuencing Experiments) - Genomics standard
  • FITS (Flexible Image Transport System) - Astronomy digital file standard that includes structured, embedded metadata
  • MIBBI - Minimum Information for Biological and Biomedical Investigations

Metadata Tools