Skip to Main Content

Research Data Management

This guide provides information on how to better manage and share research data in any discipline.

Email this link:

Confidentiality

It's vital to maintain confidentiality of research subjects for reasons of ethics and to ensure continued participation in research.  Sometimes, research data resulting from funded research cannot be shared.  There are policies that address this, such as Privacy Rule of the Health Insurance Portability and Accountability Act (HIPAA).

Researchers who want to ethically share sensitive and confidential data may want to consider the following:

  • Include a provision for data sharing when obtaining informed consent of research participants.  UCI's Office of Research provides examples and templates for consent forms with provisions for data sharing. 
  • Protect privacy through anonymizing data
  • Evaluate the sensitivity of your data -- researchers should consider if their data contains either direct or indirect identifiers that could be combined with other public information to identify research participants
  • Obtain a confidentiality review -- some data archives, such as Inter-University Consortium for Political and Social Research (ICPSR), will review your data for the presence of confidential information

Intellectual Property

Your primary asset is your intellectual work, it is important to understand and maintain your intellectual property rights.  You will need to articulate how you are providing permissions or licenses in your Data Management Plan.  This may or may not involve intellectual property rights depending on the type of data. 

  • Data is not copyrightable (although a particular expression can be, such as a table or chart)
  • Data can be licensed.  Some data providers license data to limit how the data can be used (to protect the privacy of study participants or guide downstream use of data)
  • If you want to promote sharing and unlimited use of your data, you can make you data available under a CC0 Declaration to make it explicit. There are other Creative Commons licenses with additional protection.

The following are some relevant University of California / UC Irvine policies related to research data, intellectual property, and confidentiality:

Data Licensing

In order to facilitate the reuse of data, it is imperative that others know the terms of use for the database and the data content. Fortunately, the Open Data Commons group (http://opendatacommons.org) has been developing legally binding tools to govern the use of data sets. Using a combination of copyright and contractual standards, they have created three standard licenses. In addition, it is possible to articulate a set of “community norms” that complement the use of formal licenses. While not having the force of law, norms can express the shared beliefs of a community vis-à-vis data sharing and reuse.

The three ODC licenses are:

  1. Public Domain Dedication and License (PDDL): This dedicates the database and its content to the public domain, free for everyone to use as they see fit.
  2. Attribution License (ODC-By): Users are free to use the database and its content in new and different ways, provided they provide attribution to the source of the data and/or the database.
  3. Open Database License (ODC-ODbL): ODbL stipulates that any subsequent use of the database must provide attribution, an unrestricted version of the new product must always be accessible, and any new products made using ODbL material must be distributed using the same terms. It is the most restrictive of all ODC licenses.

Creative Commons (http://www.creativecommons.org/) also has a library of standardized licenses, and some of them apply to data and databases. The ODC-By license, for example, is the equivalent of a Creative Commons Attribution license (CC BY). CC BY licenses, however, require copyright ownership of the underlying work, whereas the ODC-By license applies to works not protected by copyright (such as factual data)

The two CC licenses that are of greatest relevance to data management are:

  1. CC0 (i.e., "CC Zero"): When an owner wishes to waive their copyright and/or database rights, they can use the CC0 mark. It effectively places the database and data into the public domain. It is the functional equivalent of an ODC PDDL license.
  2. Public Domain mark (PDM): It is used to mark works that are in the public domain, and for which there are no known copyright or database restrictions. It is possible to flag factual data as PDM in a database, for example, in order to make it clear it is free to use.

 

Selecting a data license

There is no single right answer as to which license to assign to a database or content. Note, however, that anything other than an ODC PDDL or CC0 license may cause serious problems for subsequent scientists and other users. This is because of the problem of attribution stacking. It may be possible to extract data from a data set, use it in a research project, and still maintain information as to the source of that data. It is possible to create a data set derived from hundreds of sources with each source requiring acknowledgement. Furthermore, the data in the other databases may not have originated with it, but instead sourced from other databases that also demand attribution. Rather than legally require that everyone provide attribution to the data, it might be enough to have a community norm that says “if you make extensive use of data from this data set, please credit the authors.”

 

Data Ownership at UCI

The ownership of works produced by UC Irvine faculty, students, and non-academic staff is governed by the University of California Office of the President. This blog post summarizes current thinking about data ownership. The precise answer will depend on whether the project was created as part of sponsored research; the employment status of the creator; whether the work was conducted “pursuant to a specific direction or assigned duty…from the University”; and, if deemed to be an “encoded work,” whether substantial university resources were used in the creation of the encoded work.