Good data organization is the foundation of any research project. Most researchers begin their projects with data stored in spreadsheets. Computers, however, have specific requirements for data organization. To utilize tools that enhance computational efficiency, such as programming languages like R or Python, researchers need to structure their data the way that computers read data.
In this section, you will learn:
Much of your time as a researcher will be dedicated to the initial 'data wrangling' stage, during which you must organize the data to facilitate proper analysis later on. Learning strategies for effective data organization can improve the formatting of existing data and help plan new data collection methods for more efficient data wrangling.
You need to download some files to follow this lesson:
1. Download the following three files:
2. Place these 3 files in a folder you can easily find and access on your computer.
Microsoft provides Microsoft Office 365 ProPlus to UC Irvine students at no cost thanks to a staff campus agreement program (MCCA). This agreement allows Microsoft to provide current students with the latest version of full Microsoft Office for their personally-owned computers, smartphones, and tablets and 1TB of OneDrive cloud storage.
Full-time faculty and staff whose departments are enrolled in MCCA licensing are eligible to install Microsoft Office on their personal devices and computers. Note that all installations create an ongoing financial responsibility for your department. Your license may be revoked for non-payment.
To install Microsoft Excel on your personal computing device through UCI, check out the Office of Information Technology's Microsoft 365 page.
If you do not have or do not want to use Microsoft Excel, you can use LibreOffice. It is a free, open source spreadsheet program.
Windows
macOS
Linux
pacman -S libreoffice
yum install libreoffice
apt install libreoffice
cleaned data - data that has been manipulated post-collection to remove errors or inaccuracies, introduce desired formatting changes, or otherwise prepare the data for analysis
conditional formatting - formatting that is applied to a specific cell or range of cells depending on a set of criteria
CSV (comma separated values) format - a plain text file format in which values are separated by commas
factor - a variable that takes on a limited number of possible values (i.e. categorical data)
metadata - data which describes other data
null value - a value used to record observations missing from a dataset
observation - a single measurement or record of the object being recorded (e.g. the weight of a particular mouse)
plain text - unformatted text
quality assurance - any process which checks data for validity during entry
quality control - any process which removes problematic data from a dataset
raw data - data that has not been manipulated and represents actual recorded values
rich text - formatted text (e.g. text that appears bolded, colored or italicized)
string - a collection of characters (e.g. “thisisastring”)
TSV (tab separated values) format - a plain text file format in which values are separated by tabs
variable - a category of data being collected on the object being recorded (e.g. a mouse’s weight)
Off-campus? Please use the Software VPN and choose the group UCIFull to access licensed content. For more information, please Click here
Software VPN is not available for guests, so they may not have access to some content when connecting from off-campus.