Skip to main content
* UC Irvine access only

Research Data Management: Working with Sensitive Data


URL: https://guides.lib.uci.edu/datamanagement

Data Security and Privacy

Institutional Review Board (IRB)

If your research involves human subjects, your work may need to be overseen by UC Irvine's Institutional Review Board (IRB). The IRB's goal is to protect human research participants in both medical and non-medical research projects. You should contact the IRB when you are planning a research project involving human subjects.

Encryption

The translation of data into a secret code. Encryption is the most effective way to achieve data security. To read an encrypted file, you must have access to a secret key or password that enables you to decrypt it. Unencrypted data is called plain text ; encrypted data is referred to as cipher tex.  Discuss your options with the Research Cyberinfrastructure Center.

Health Insurance Portability and Accountability Act (HIPAA)

HIPAA (Health Insurance Portability and Accountability Act of 1996) is United States legislation that provides data privacy and security provisions for safeguarding medical information. The law has emerged into greater prominence in recent years with the proliferation of health data breaches caused by cyberattacks and ransomware attacks on health insurers and providers.  Take a look at the HIPAA for Professionals from the U.S. Department of Health & Human Services.

See also: Confidentiality, Intellectual Property, and Licensing

Software for Sensitive Information

The following are tools available to UC Irvine researchers who are collecting and managing patient health or other sensitive information. It is not recommended that you collect sensitive data using Excel. Use Excel only for analysis of de-identified or anonymized data.

REDCap logo

 

REDCap

REDCap (Research Electronic Data Capture) is an application for building and managing online databases.  REDCap provides a web-based interface for collecting data with data validation and includes the ability for automated export to statistical packages. The software also includes data logging for HIPAA compliance and the ability for administrators to define access rights on a per-user basis. Data stored in production REDCap databases is not automatically purged, but archiving of completed projects within REDCap is recommended. In the event the REDCap service were to be replaced or discontinued, all project owners would be notified and plan devised that would allow ample time for owners to export their data. The Institute for Clinical and Translational Sciences is a REDCap partner,

Qualtrics logo

 

Qualtrics

Qualtrics is an online survey tool with customizable templates, the ability to send and track invitations and reminders, and in-depth reporting. The service includes the ability to generate reports, view statistics, and export data for analysis. Qualtrics may be used to store and transmit Low, Moderate, and High Risk Data containing patient health information (PHI). It may not be used to store and transmit other types of non-PHI High Risk Data.

Eventually the EEE Survey tool will be retired and replaced with a campus wide Qualtrics implementation. If you have any questions, please contact OIT.

Storage for Sensitive Data

Data classifications

Data can be classified as either low, moderate, or high risk.

  • Low: Data that is not moderate or high risk and is intended for public disclosure or whose loss of confidentiality, integrity, or availability would have no adverse impact on UCI's mission, safety, finances, or reputation. This includes but is not limited to research data, UCNet IDs, and other information in the public domain.
  • Moderate: Data that is not high risk and is not generally available to the public or whose loss of confidentiality, integrity, or availability could have a mildly adverse impact adverse on UCI's mission, safety, finances, or reputation. This includes but is not limited to unpublished research data, student records and applications, personnel files, non-public policies and contracts, and PTA numbers.
  • High: Data that is protected by law/regulation, that when inappropriately accessed UCI is required to self-report to the government and/or provide notice to the individual regarding when it has been inappropriately accessed, or whose loss of confidentiality, integrity, or availability could have a significantly adverse impact adverse on UCI's mission, safety, finances, or reputation. This includes but is not limited to health information, social security numbers, credit card and financial information, passport and visa numbers, and driver's license numbers.

Options for secure storage and backup

  • Campus Research Storage Pool: Will provide a high performance, reliable and secure storage service to all campus research groups and users.  Will start with approximately 1,000 TB (1 petabyte – PB) of storage with expansion to several PB.  Each campus research faculty group that requests space will be provided with 1 TB.  Extended capacity will be provided on a recharge basis.  Multiple types of access to the system (web, NFS, ‘drag and drop’, etc.) will be provided to simplify ease of use.  Data management and curation services and support will be provided as well. Available in 2019.

  • SRE – Secure Research Environment Pilot Service: SRE is a service being piloted by research groups in the Schools of Social Ecology and Education to provide a FISMA ‘Moderate’ computing and storage environment for research projects that require a higher level of monitoring, logging, and security to meet compliance requirements from funding agencies and UC policies.

Sharing Sensitive Data

Although it may not be possible in all cases, it is a good idea to obtain informed consent from the participants in your study to allow for publication of their anonymized data from the research.

Modifying sensitive data for public release

Sensitive data that contain potentially identifying information -- whether it be human subject data or other types of sensitive data -- will likely need to be modified prior to sharing these data with the public. It is important that these modifications are made in order to protect participant confidentiality, the location of endangered wildlife, or for other relevant reasons. However, these modifications may affect the data to the point where reproducibility or additional subsequent research by others is no loner possible. You might consider retaining multiple versions of the data: one that is suitable for public release, and one that is suitable for further research but that is available on a highly restricted basis.

For patient health information (PHI), HIPAA privacy rules provide two methods for de-identification: the expert determination method and the safe harbor method. See the resource listed below for documentation on these methods from the US Department of Health and Human Services, as well as information on how to satisfying these two methods.

Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule

Types of identifying information

Identifying information is classified as one of two types: direct and indirect.

Direct identifiers

These data point directly to an individual and are typically removed from data sets before sharing with the public.

These may include:

  • name
  • initials
  • mailing address
  • phone number
  • email address
  • unique identifying numbers, like Social Security numbers or driver's license numbers
  • vehicle identifiers
  • medical device identifiers
  • web or IP addresses
  • biometric data
  • photographs of the person
  • audio recordings
  • names of relatives
  • dates specific to individual, like date of birth, marriage, etc.
Indirect identifiers

These may seem harmless on their own, but can point to an individual when combined with other data. It has been recommended (see BMJ article reference below) that datasets containing three or more indirect identifiers should be reviewed by an independent researcher or ethics committee to evaluate identification risk. Any indirect information not needed for the analysis should be removed. It may be reasonable to supply some of these types of data in aggregated form (like ranges of annual incomes instead of exact numbers).

Indirect identifiers may include:

  • place of medical treatment or doctor's name
  • gender
  • rare disease or treatment
  • sensitive data like illicit drug use or other "risky behaviors"
  • place of birth
  • socioeconomic data, like workplace, occupation, annual income, education, etc
  • general geographic indicators, like postal code of residence
  • household and family composition
  • ethnicity
  • birth year or age
  • verbatim responses or transcripts

Resources

"Guidance Regarding Mehtods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule," US Department of Health and Human Services, Office for Civil Rights.

Hrynaszkiewicz, I, Norton, ML, Vickers, AJ and Altman, DG. "Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers.BMJ 2010;340:c181.

"Preparing Data for Sharing" from the Inter-University Consortium for Political and Social Research (ICPSR). (2012). Guide to Social Science Data Preparation and Archiving: Best Practice Throughout the Data Life Cycle (5th ed.). Ann Arbor, MI. 

Meet with a Data Librarian

Contact