Finding data

As the week progresses, we’ll have many opportunities to work with data. This is usually more exciting and fruitful if you are working with data you have a vested interest in, and/or some contextual knowledge about.

Searching for data

If you have an idea of what you’re looking for, Google dataset search can be a good place to start. It is a particular version of google that just searches for data. It works much like regular google, so you can search for things with or without quotation marks around them, and limit searches with -. For example, I often find Figshare results less than useful, so I add -site:figshare.com to remove those results.

If you have no idea where to start, the Data is Plural newsletter and associated spreadsheet can help spark ideas. I will often use Ctrl+F to search for a word within the spreadsheet, or just scroll through looking for phrases that catch my eye. Newer datasets are at the bottom of the spreadsheet.

More places to look:

If you have data already on your computer somewhere, that’s fine, as well. “Rectangular” data is usually the simplest to work with– think anything that could be found in an Excel spreadsheet or csv file. Text data would also work.

Considering data

When you have found a dataset that interests you, consider:

  1. How F.A.I.R. was it?
  2. Who collected the data? Why?
  3. When was it collected? Does it still feel relevant?
  4. Where was it collected? What geographic region or other area does it represent?
  5. Is it licensed for reuse?
  6. What are the observational units in the data? How many observational units are there? What level of aggregation is present in the data?
  7. What are the variables? What are the variable types?
  8. What might be missing from this dataset?
  9. Is there a question you think you could answer with the data?