Final project
Format of final project
For your final project, you must work together in a group of 4-5 to produce a substantive product in R. What “substantive product” means can vary, but I expect most projects will be:
An R package
This is what I envisioned as the final project for this course when I was developing it. An R package is just a group of functions, data, and documentation that can be easily installed. R packages that make it to CRAN need to be “serious” and useful to more people than just you. But, there are many, many R packages that don’t live on CRAN. For example, many packages live on Github. That’s the sort that I’m envisioning. Some broad ideas of types of packages you might want to contemplate:
- a package that is mostly about sharing interesting data with people who might want to use it, like the fivethirtyeight package
- a package that is mostly about packaging up some data analysis you have done, like a research compendium
Of course, you could also think of a new statistical idea that needs to be implemented as an R package and do that, but I think that’s probably outside the scope of the class.
A shiny app
Another possibility for a substantive product made using R is a shiny app. Shiny apps are reactive web applications made using R code. They get used in business quite frequently, because they make it easy to generate a dashboard. For this type of project, you would probably do some cool data analysis with an interactive component. You may want to look through the gallery of examples to get some idea of what Shiny apps can do.
Something else??
I suppose it’s possible there are additional possibilities for final projects. If you have a brilliant idea, talk to me!
Places to find data
This class is a little different than other statistics classes in that you don’t need to find numeric data at all. So, the places you can look for data is broader! Here are a few ideas to get you started:
Data is Plural tinyletter and associated spreadsheet
FiveThirtyEight data archive
Kaggle datasets
Projects on The Pudding
Blogs of data scientists like David Robinson, Hilary Parker, and Maelle Salmon
Projects linked on flowingdata
IRE and NICAR are good resources for the types of data journalists care about. For example, Energy data sources and Chrys Wu’s resource page.
Some less-relevant places that I point students in other classes:
Data.gov 186,000+ datasets!
Social Explorer is a great interface to Census and American Community Survey data (much more user-friendly than the official government sites). Smith has a site license, but you may need to create an account.
Gallup Analytics (available through the library databases)
Data and Story Library (DASL). (This, and more ideas from Robin Lock.)
Jo Hardin at Pomona College has a nice list of data sources on her website.
U.S. Census Bureau
Gapminder, data about the world.
Nathan Yau’s (old) guide to finding data on the internet
Deliverables
Initial project proposal
First, you just need to tell me a little about your final project idea. What type of project is it?
- Package?
- data package
- analysis package
- Shiny app?
- Something else?
Where do you expect to find data? Do you need help finding data? What question will your project answer or purpose will it solve?
File structure
If you’re doing an R package for your final project, the file structure deliverable is the skeleton of a package. You can get this by running usethis::create_package("package-name")
. Please edit a few of the first details, like we did in the packages slidedeck. For example, get the package authors’ names, the name of your package, etc. To get more of a jump on the project, also create a data
directory, and put some data in there!
If you’re doing a shiny app, the file structure deliverable is a directory with at least an app.R
file in it with the three initial pieces of the app: a server
object, a ui
object, and a call to shinyApp()
, as describted in the shiny tutorial. To get more of a jump on the project, also create a data
directory, and put some data in there!
Presentation
For your presentation, I would like you to walk us through the product you created. You can do your presentation however you would like– with slides, by pulling up RStudio and clicking through things, whatever! When I do presentations like this, I try to have screenshots and code on slides as a backup in case of the “curse of the live demo,” although it would be nice to see the code live. I’ll ask your peers to help evaluate your presentation, using this form. It should be clear you’ve practiced, even if you’re doing the (more casual) clickthrough on RStudio.
Presentations should be about 8 minutes long.
Deadlines
Checkpoint | Due Date | Credit | Submission |
---|---|---|---|
Initial project proposal | 4/2 1:30 pm | 5 pts | via GitHub |
File structure | 5/2 1:30 pm | 5 pts | via GitHub |
Presentation | 5/16 or final period | 30 pts | in person |
Final code | 5/24 | 40 pts | via GitHub |
Group Dynamic | 5/24 | 5 pts | via Canvas |