The panel on the lower left is where the action happens. It's called the *console*. Everytime you launch RStudio, it will have the same text at the top of the console telling you the version of R that you're running. Below that information is the *prompt*. As its name suggests, this prompt is really a request: a request for a command. Initially, interacting with R is all about typing commands and interpreting the output. These commands and their syntax have evolved over decades (literally) and now provide what many users feel is a fairly natural way to access data and organize, describe, and invoke statistical computations. The panel in the upper right contains your *workspace* as well as a history of the commands that you've previously entered. Any plots that you generate will show up in the panel in the lower right corner. This is also where you can browse your files, access help, manage packages, etc. ### R Packages R is an open-source programming language, meaning that users can contribute packages that make our lives easier, and we can use them for free. For this lab, and many others in the future, we will use the following R packages: - `dplyr`: for data wrangling - `ggplot2`: for data visualization - `oilabs`: for data and custom functions with the OpenIntro labs If these packages are not already available in your R environment, install them by typing the following three lines of code into the console of your RStudio session, pressing the enter/return key after each one. Note that you can check to see which packages (and which versions) are installed by inspecting the *Packages* tab in the lower right panel of RStudio. ```{r install-packages, message = FALSE, eval=FALSE} install.packages("dplyr") install.packages("ggplot2") install.packages("oilabs") ``` You may need to select a server from which to download; any of them will work. Next, you need to load these packages in your working environment. We do this with the `library` function. Run the following three lines in your console. ```{r load-packages, message = FALSE, eval=TRUE} library(dplyr) library(ggplot2) library(oilabs) ``` Note that you only need to *install* packages once, but you need to *load* them each time you relaunch RStudio. ### Creating a reproducible lab report We will be using R Markdown to create reproducible lab reports. See the following videos describing why and how: [**Why use R Markdown for Lab Reports?**](https://youtu.be/lNWVQ2oxNho) [**Using R Markdown for Lab Reports in RStudio**](https://youtu.be/o0h-eVABe9M) Going forward you should refrain from typing your code directly in the console, and instead type any code (final correct answer, or anything you're just trying out) in the R Markdown file and run the chunk using either the Run button on the chunk (green sideways triangle) or by highlighting the code and clicking Run on the top right corner of the R Markdown editor. If at any point you need to start over, you can Run All Chunks above the chunk you're working in by clicking on the down arrow in the code chunk. ## Dr. Arbuthnot's Baptism Records To get you started, run the following command to load the data. ```{r load-abrbuthnot-data, eval=TRUE} data(arbuthnot) ``` You can do this by - clicking on the green arrow at the top right of the code chunk in the R Markdown (Rmd) file, or - putting your cursor on this line, and hit the **Run** button on the upper right corner of the pane, or - hitting `Ctrl-Shift-Enter`, or - typing the code in the console. This command instructs R to load some data: the Arbuthnot baptism counts for boys and girls. You should see that the workspace area in the upper righthand corner of the RStudio window now lists a data set called `arbuthnot` that has 82 observations on 3 variables. As you interact with R, you will create a series of objects. Sometimes you load them as we have done here, and sometimes you create them yourself as the byproduct of a computation or some analysis you have performed. The Arbuthnot data set refers to Dr. John Arbuthnot, an 18

**A note on piping: ** Note that we can read these three lines of code as the following:
*"Take the `arbuthnot` dataset and **pipe** it into the `mutate` function.
Mutate the `arbuthnot` data set by creating a new variable called `total` that is the sum of the variables
called `boys` and `girls`. Then assign the resulting dataset to the object
called `arbuthnot`, i.e. overwrite the old `arbuthnot` dataset with the new one
containing the new variable."*
This is equivalent to going through each row and adding up the `boys`
and `girls` counts for that year and recording that value in a new column called
`total`.

**Where is the new variable? ** When you make changes to variables in your dataset,
click on the name of the dataset again to update it in the data viewer.

You'll see that there is now a new column called `total` that has been tacked on
to the data frame. The special symbol `<-` performs an *assignment*, taking the
output of one line of code and saving it into an object in your workspace. In
this case, you already have an object called `arbuthnot`, so this command updates
that data set with the new mutated column.
We can make a plot of the total number of baptisms per year with the command
```{r plot-total-vs-year}
qplot(x = year, y = total, data = arbuthnot, geom = "line")
```
Similarly to how we computed the total number of births, we can compute the ratio
of the number of boys to the number of girls baptized in 1629 with
```{r calc-prop-boys-to-girls-numbers}
5218 / 4683
```
or we can act on the complete columns with the expression
```{r calc-prop-boys-to-girls-vars}
arbuthnot <- arbuthnot %>%
mutate(boy_to_girl_ratio = boys / girls)
```
We can also compute the proportion of newborns that are boys in 1629
```{r calc-prop-boys-numbers}
5218 / (5218 + 4683)
```
or this may also be computed for all years simultaneously and append it to the dataset:
```{r calc-prop-boys-vars}
arbuthnot <- arbuthnot %>%
mutate(boy_ratio = boys / total)
```
Note that we are using the new `total` variable we created earlier in our calculations.
3. Now, generate a plot of the proportion of boys born over time. What do you see?
**Tip: ** If you use the up and down arrow keys, you can scroll through your
previous commands, your so-called command history. You can also access it
by clicking on the history tab in the upper right panel. This will save
you a lot of typing in the future.

Finally, in addition to simple mathematical operators like subtraction and
division, you can ask R to make comparisons like greater than, `>`, less than,
`<`, and equality, `==`. For example, we can ask if boys outnumber girls in each
year with the expression
```{r boys-more-than-girls}
arbuthnot <- arbuthnot %>%
mutate(more_boys = boys > girls)
```
This command add a new variable to the `arbuthnot` dataframe containing the values
of either `TRUE` if that year had more boys than girls, or `FALSE` if that year
did not (the answer may surprise you). This variable contains a different kind of
data than we have encountered so far. All other columns in the `arbuthnot` data
frame have values that are numerical (the year, the number of boys and girls). Here,
we've asked R to create *logical* data, data where the values are either `TRUE`
or `FALSE`. In general, data analysis will involve many different kinds of data
types, and one reason for using R is that it is able to represent and compute
with many of them.
* * *
## More Practice
In the previous few pages, you recreated some of the displays and preliminary
analysis of Arbuthnot's baptism data. Your assignment involves repeating these
steps, but for present day birth records in the United States. Load the
present day data with the following command.
```{r load-present-data}
data(present)
```
The data are stored in a data frame called `present`.
4. What years are included in this data set? What are the dimensions of the
data frame? What are the variable (column) names?
5. How do these counts compare to Arbuthnot's? Are they of a similar magnitude?
6. Make a plot that displays the proportion of boys born over time. What do you see?
Does Arbuthnot's observation about boys being born in greater proportion than girls
hold up in the U.S.? Include the plot in your response. *Hint:* You should be
able to reuse your code from Ex 3 above, just replace the dataframe name.
7. In what year did we see the most total number of births in the U.S.? *Hint:*
First calculate the totals and save it as a new variable. Then, sort your
dataset in descending order based on the total column. You can do this
interactively in the data viewer by clicking on the arrows next to the
variable names. To include the sorted result in your report you will need
to use two new functions: `arrange` (for sorting). We can arrange the data
in a descending order with another function: `desc` (for descending order).
Sample code provided below.
```{r eval=FALSE}
present %>%
arrange(desc(total))
```
These data come from reports by the Centers for Disease Control. You can learn more about them
by bringing up the help file using the command `?present`.
This is a product of OpenIntro that is released under a
[Creative Commons Attribution-ShareAlike 3.0 Unported](http://creativecommons.org/licenses/by-sa/3.0).
This lab was adapted for OpenIntro by Andrew Bray and Mine Çetinkaya-Rundel
from a lab written by Mark Hansen of UCLA Statistics.

* * *
## Resources for learning R and working in RStudio
That was a short introduction to R and RStudio, but we will provide you with more
functions and a more complete sense of the language as the course progresses.
In this course we will be using R packages called `dplyr` for data wrangling
and `ggplot2` for data visualization. If you are googling for R code, make sure
to also include these package names in your search query. For example, instead
of googling "scatterplot in R", google "scatterplot in R with ggplot2".
These cheatsheets may come in handy throughout the semester:
- [RMarkdown cheatsheet](http://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf)
- [Data wrangling cheatsheet](http://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf)
- [Data visualization cheatsheet](http://www.rstudio.com/wp-content/uploads/2015/12/ggplot2-cheatsheet-2.0.pdf)
Chester Ismay has put together a resource for new users of R, RStudio, and R Markdown
[here](https://ismayc.github.io/rbasics-book). It includes examples showing working with R Markdown files
in RStudio recorded as GIFs.
Note that some of the code on these cheatsheets may be too advanced for this course,
however majority of it will become useful throughout the semester.