Skip to Main Content

DSS Social Sciences Workshop Resources


Email this link:

Data Wrangling with dplyr and tidyr

Objectives:

1. Describe the purpose of an R package and the dplyr and tidyr packages..
2. Select certain columns in a dataframe with the dplyr function select.
3. Select certain rows in a dataframe according to filtering conditions with the dplyr function filter.
4. Link the output of one dplyr function to the input of another function with the ‘pipe’ operator %>%.
5. Add new columns to a dataframe that are functions of existing columns with mutate.
6. Use the split-apply-combine concept for data analysis.
7. Use summarize, group_by, and count to split a dataframe into groups of observations.
8. Apply a summary statistics for each group, and then combine the results.
9. Describe the concept of a wide and a long table format and for which purpose those formats are useful.
10. Describe the roles of variable names and their associated values when a table is reshaped.
11. Reshape a dataframe from long to wide format and back with the pivot_wider and pivot_longer commands from tidyr.
12. Export a dataframe to a csv file.

Key Points:

1. Use the dplyr package to manipulate dataframes.
2. Use select() to choose variables from a dataframe.
3. Use filter() to choose data based on values.
4. Use group_by() and summarize() to work with subsets of data.
5. Use mutate() to create new variables.
6. Use the tidyr package to change the layout of data frames.
7. Use pivot_wider() to go from long to wide format.
8. Use pivot_longer() to go from wide to long format.

Recommended Guides

Online Forums

Online Tutorials