Datasets for Class Exercises

This is a list of data sets that we will provide for various class assignments.

With each, we try to provide at least some background.

A Quick EDA Example with the ATUS data

in Datasets for Class Exercises
This is a quick example of doing Exploratory Data Analysis (EDA) using the ATUS data set (see ATUS: American Time Usage Survey). Note: my goal here is to make some quick visualizations in order to figure out what better visualizations to make (or deeper analysis to do). I’ll start with the vague questions: how much variance is there in how much people sleep? Who sleeps more/less? What do people do instead? Read more…

ATUS: American Time Usage Survey

in Datasets for Class Exercises
The American Time Usage Survey is a big data collection effort from the U.S. BUREAU OF LABOR STATISTICS. They provide the data files in a very detailed form: data files page One thing that makes this data interesting is that it is done with a great deal of statistical care to document each thing very carefully so that it can be used correctly. Another interesting thing… it’s a massive data set that is a familiar enough topic. Read more…

Data Sets from old Design Challenges

in Datasets for Class Exercises
Here are some pointers to the old Design Challenges that provide data sets. Fall 2021 - Design Challenge 1 Data Sets Spotify College Scorecard (heavily processed) Census Data by County Time Usage Survey Fall 2020 - DC1 Data Sets Census Data Airline On-Time Performance Time Usage (how people spend their time) IMDB movie data Fall 2019 - DC1 Data Sets Airline ontime performance Census Data by Country Beijing Air Quality Fall 2018 - DC1 Approved Data Sets - this year we had a diverse and long list with lots of options. Read more…

Census Data Set

in Datasets for Class Exercises
The “Census Data by County” data set aggregates many different quantities of interest over the counties of the US. I (census_counties.csv 4.6mb) The USDA provides this data as 4 separate sheets (on This page). Any one of them could tell an interesting story but together, they provide a very rich and complex data set full of stories. We have (well, Young Wu, the 765 TA in 2020 has) joined the 4 spreadsheets together (joining by the “FIPS Code” column) creating a single file. Read more…

AidData data set

in Datasets for Class Exercises
The “AidData” data set was suggestion by Prof. Enrico Bertini as one that is good for use in Visualization class assignments. We will use it for a number of assignments in class. The AidData data set: (aiddata.xlsx 4.8mb) The “Reduced” AidData data set: (aiddata_reduced.csv 0.5mb) The data set that we will use is a version provided by Prof. Bertini. I’ll refer to this as the “AidData” data set. As this version is still quite large, have provided a further reduced data set. Read more…