Final project

For your final project, you will do some data analysis and produce a piece of “data communication.” This communication must include some data visualization, and one or the other of writing about data or speaking about data.

You have two deliverables for the project:

  1. The data communication. I imagine this will be one of the following:
    • A written work, such as a piece of data journalism, a blog post, or an executive summary for a company. This will be turned in as either an HTML document (probably easiest to make using Quarto) or a PDF (either rendered from Quarto or created from a Word document).
    • A presentation, either as a recorded video or as a live presentation. This will likely involve slides, created using PowerPoint, Keynote, Quarto, or other slide-creation software. If you do a live presentation, please turn in your slides an HTML document (easiest to make from Quarto) or a PDF (possible with any software). If you do a recorded video, upload or link to your video! (Slides optional.)
    • Something else?? I’m really flexible here, so if you have a vision for something different, just run it by me. I guess you could do a TikTok or something?
  2. A meta-document of the project. This will be your space to connect the work you did to what you learned in this class. This document will tell me where you found your data, how you cleaned it, why you made the analysis and visualization decisions that you did. It should be full of citations from the readings from class to back up your decisions. I don’t care what citation format you use (APA, MLA, etc) but please be consistent. Ask me if you need help determining how to cite something! I would like to know:
    • Why did you choose the topic you did?
    • What is your intended audience? (E.g., the CEO of Uber, readers of the Star Tribune, people subscribed to the r/dataisbeautiful subreddit, etc. Your audience should not be “students in STAT 336” or “Dr. McNamara.”)
    • Where did your data came from? In broad strokes, what did you need to do in order to clean and visualize it?
    • Why did you make the design decisions you did? (E.g., mappings in the visualization, color scheme choices, rounding decisions, specific language in a written piece, images on a PowerPoint slide, etc.)

I also want to see any code you used for your project. Depending on how you create your deliverable (1) you might be submitting a Qmd document that generates (1), or your code might be in your meta-document (2). If you do data cleaning in R, I want to see the code you used. If you use spreadsheet software, you need to submit your original uncleaned dataset as an Excel file, and your “code” will be a description of every spreadsheet operation necessary to reproduce your work. If you make your visualizations in R, I want to see the code you used. If you make visualizations in Tableau, please describe the steps you took in Tableau, so I could reproduce your work.

Here are some rough guidelines for length:

For the data communication,

The meta-document should be at least 500 words, not counting citations.

When I’m grading, I’ll be looking for the following things:

  1. For the data communication, I’ll be looking mostly to see if your finished product looks finished, and whether it seems appropriate for the audience you describe in your meta-document. I’ll also look for:
    • Titles and axis labels on plots. These should be polished, not the default variable names that ggplot2 or Tableau sticks in when you don’t specify them
    • Encoding choices. Did you stick with default colors, or make them more appropriate for your audience? Does the visualization you chose make sense for the data?
    • Consistency across the product. If you have multiple visualizations, do they hang together? (E.g., are the color choices consistent across plots, or do they change seemingly without reason? Are axes consistent for comparison? If there are tables in addition to visualizations, do they appear visually consistent with the plots? In a presentation, are the headings consistent across slides?)
    • Typos/copyediting. This goes for writing in a written piece or on slides in a presentation, titles on graphs, etc. Remember, RStudio has spellcheck, you just need to click the green checkmark button!
  2. For the meta-document, I will be considering whether your analysis decisions make sense and are well-justified.
    • Are all your observations the same type of thing? Did you exclude cases that should be excluded? Did you include all the cases that should be included?
    • Did you do appropriate data cleaning? Are there any obvious mistakes in your coding?
    • Are the analysis decisions well-documented? Can I reproduce your analysis, either with code or software?
    • Are your visualization and communication decisions backed up by citations? For example, if you chose not to round numbers, do you have a reason why? If you chose a non-standard color scheme, do you explain it?