```{r}
library(tidyverse)
library(skimr)
```
Final project data
I would like you to have identified a dataset for your final project, and tried loading it into R and/or Tableau. This will allow you to identify any data-cleaning or -wrangling issues you will need to address before you get to visualization.
I think everyone is going to need to use R for their process document, so this will be the first few lines of that Quarto document. For example this amount of code is sufficient:
---
title: "Final project data"
author: "Professor McNamara"
format: html
editor: visual
embed-resources: true
---
```{r}
#| message: false
GSS <- read_csv("GSS_raw.csv")
skim(GSS)
```
I’d like you to look carefully at the data you are considering for your project, and write some sentences about any issues you see. (No need to fix them yet!)
- Think about the mappings you might want to make in the project. Is the data in the right format for those mappings? If not, will you need a
pivot_longer()
or apivot_wider()
to fix it? - Determine if the variables you have are the appropriate variable type. Does something you want to use as a numeric variable look like a character string?
- See if there are any missing values, and if there are, if they are coded correctly. For example, R likes missing values to be coded NA but some data providers use 9999 to mean missing.
- Are the variable names usable? Do they have spaces in them that will make it hard to refer to them in R?
- For categorical variables, consider the categories. Are they readable? Are they understandable phrases or cryptic codes? Are there the right number of categories, or will you want to reduce them?
These are just examples, you don’t necessarily need to address every single question if it is not appropriate for your data, and there are probably other things you will see in your particular dataset that aren’t covered here.