Your assignment is to create a choropleth map of an issue. It should incorperate a shapefile (such as states or zipcodes) and some data you believe it makes journalistic sense to map. You cannot use the Happiness polling data I demoed on, but you can use the state shapefile with a different Gallup question. You could also choose to go further and get zipcode shapefiles and visualize BLL data, or any other shapefile/data connection you can dream up!
Like the data visualization assignment, I would like a little text to support the map. Think of a headline/title, and 2-3 sentences explaining the lede/nutgraph of your mapped story. This can be submitted as part of the same document, or separately.
When you have geographic data! Maps are very popular in data journalism because almost all readers have some geographic context, so they can “see themselves” in the map.
When the geographic information doesn’t add anything to the story. Matthew Ericson has some thoughts about when maps shouldn’t be maps.
There are three main types: points, lines, and polygons.
We’ll be focusing on polygons, but most of this stuff generalizes pretty well to the other types of geographic data.
I grabbed some polling data from Gallup for us to map. I chose the response about experiencing happiness, but you could choose a different question if one interests you more.
To get this data, go to the Gallup website. I think this only works from on-campus. If that link doesn’t work, try going to the library guide for Data Journalism and clicking through the link there.
Once you’re on the site, you can browse however you’d like. I recommend choosing:
If you are working on the server, there is a three-part process to getting data into R/RStudio. You need to:
If you followed the instructions above, you’ve downloaded the data. Now, you’ll need to upload it to the RStudio server.
The final step is to load the data into R. There are many ways to do this. RStudio has a convenience wizard that I like to use when I make my first attempt to load in data. I’ll often refine the code myself later, but using the wizard makes it less of a guess-and-check process.
To use this,
Here’s the code I ended up with:
library(readxl)
GallupAnalytics_Export_20180405_102840 <- read_excel("Downloads/GallupAnalytics_Export_20180405_102840.xlsx", skip=6)
Now, we have a flat, “tidy” file of people’s happiness in the States. But, it doesn’t have any polygons.
There’s one more piece of data you will need to make a map, and that’s a “shapefile” (basically, a set of outlines of whatever polygon you’re interested in).
In this case, we need the outlines of the states to be able to plot them. I got my shapefile by googling “census states shapefile.” The first result was this, so I scrolled down to State and clicked that.
There are files from every year, because some boundaries change over time (like voting districts). States are pretty stable, so it doesn’t really matter which you grab. You can choose the resolution you want, but for our purposes we don’t need anything too fine-grained. I chose 1:500,000k.
Just like flat files, you need to Download, Upload, and Load your data to get R to know about it. Hopefully you’ve completed the Download part, and have a zip file.
When you Upload to RStudio, upload the zip file as-is, and RStudio should automatically un-zip it into a folder containing a bunch of similarly-name files with weird file extensions. Different spatial analysis packages use different numbers of those files, so it’s a good idea to just leave all of them there. Once you have that folder, you can load the data.
The Load part is going to be different for this special (and spatial) data type. We’ll use the old-school way, using readOGR
, but there’s a new package called sf
that is gaining traction as well. Here’s how readOGR
works:
library(rgdal)
states_rgdal <- readOGR("Downloads/cb_2015_us_state_500k/", layer="cb_2015_us_state_500k")
It looks a little repetitive here, but that first quoted string is the filepath to my folder. I stuck my folder right in my “working directory” (check yours by running getwd()
in your Console), but you can store the folder anywhere you want as long as you have the filepath correct. If I was working on my local computer, that filepath might be “/Users/amelia/Documents/cb_2015_us_state_500k” (note that filepaths are different on Windows and Mac)
The second argument to the function is the name of the files themselves. To make my life easier, I always just leave these the same as the name of the folder. Notice there’s no trailing slash here. And, even if I had used the full filepath for the first argument, I’d use the same name for this one.
If you’re on dev-rstudio.edu, you can use the sf
package. It reads in shapefiles differently:
library(sf)
states_sf <- st_read("Downloads/cb_2015_us_state_500k/")
Notice that this looks a lot more like a “flat” dataset, but with a geometry
column containing the specification for the polygon shape.
In order to use a shapefile with another datset (like the Gallup data we found), we need to join them together.
The way you do the join depends on what datatype your shapefile is in. If you used rgdal
, you have to join on the data “slot” (sort of like a variable, but it can contain a whole dataset!). Just follow this code:
library(dplyr)
states_rgdal@data <- left_join(states_rgdal@data, GallupAnalytics_Export_20180405_102840, by = c("NAME" = "Geography"))
If you used sf
, it’s easier:
states_sf <- states_sf %>%
left_join(happiness, by=c("NAME" = "Geography"))
Now that everything is joined, we can plot! Both datsets have some generic base plotting that works okay for checking that your data is there, but it’s not that pretty.
plot(states_rgdal)
plot(states_sf["Yes"])
Leaflet is a Javascript library for interactive maps. A bunch of people worked to make an R package that works with leaflet, but you can use leaflet in many more situations (for example, if you do data visualization in d3.js, it’s easy to integrate with leaflet).
# install.packages("leaflet")
library(leaflet)
pal <- colorNumeric(
palette = "Greens",
domain = states_rgdal$Yes
)
m <- leaflet(data=states_rgdal) %>%
addProviderTiles("Stamen.Watercolor") %>%
setView(lng = -98.35, lat = 39.8, zoom = 03) %>%
addPolygons(stroke = FALSE, fillOpacity = 0.5, smoothFactor = 0.5, color =~pal(Yes)
) %>%
addLegend("bottomright", pal = pal, values = ~Yes,
title = "Percent of people reporting happiness",
opacity = 1
)
pal <- colorNumeric(
palette = "Blues",
domain = states_sf$Yes
)
leaflet(data=states_sf) %>%
addProviderTiles("Stamen.Watercolor") %>%
setView(lng = -98.35, lat = 39.8, zoom = 04) %>%
addPolygons(stroke = FALSE, fillOpacity = 0.5, smoothFactor = 0.5, color =~pal(Yes)
) %>%
addLegend("bottomright", pal = pal, values = ~Yes,
title = "Percent of people reporting happiness",
opacity = 1
)
There are tons of things you can change! Lots of information is available on the RStudio page for leaflet.
I recommend checking out ?addProviderTiles
in particular. The Stamen Toner map is a very simple, black and white basemap I like.
The colors from RColorBrewer are based on ColorBrewer. You can see all the available palettes by using display.brewer.all()
.
display.brewer.all(type="seq")
You can customize your legend– check out ?addLegend
to see options. In particular, you might want to adjust the bins
.
The easiest way is probably just to “knit” your RMarkdown document. Another option could be
library(htmlwidgets)
saveWidget(m, file="m.html")