Design Exercise 1: Questions from Visualizations and Data

Page content

In this exercise, you will look at some data and some visualizations and think about the kinds of questions that you can ask/answer (tasks that you can do). We will ask you to make some lists of questions and enter them in a Canvas Assignment DE01: Questions from Data and Vis (due Tue, Sep 20).

Note: this assignment is due Tue, Sep 20 (the same day as a discussion) because we want you to do it before lecture on the next day.

Description

We make a big point that the point of a visualization is to “help someone do something”.

Another way to look at this is that Visualizations help answer questions. Although, they also serve to inspire asking questions (actually, that is a task/question: “What other questions should I ask?”)

In this exercise, I would like to get you to think about the questions that:

  1. You might want to ask from data
  2. That a visualization might answer
  3. That a visualization inspires you to ask (but might not answer)

One thing we need to differentiate: we don’t know what the designers intent was (usually). They might have intended to answer one question, but also answered a lot of others.

The idea of using a visualization (or some preliminary analysis) to explore data in order to better formulate questions is known as “Exploratory Data Analysis.”

Note: We will ask you to turn in your answers as a Canvas Survey. I recommend that you write your answers off-line in a text editor and then enter them into Canvas.

Part A

I want you to consider the AidData data set. Be warned: it is somewhat complicated, and it will come back (you will be asked to make visualizations from it), so spending some time with it may be useful. Read the description, have an initial look at it using whatever tools you might have at your disposal, look at the visualizations provided in the description, …

Then… make two lists of questions. One list of questions should be ones that probably don’t need a visualization - because they have a simple answer that could be expressed in text or as a number (even if they require some statistical analysis). One list should be questions that probably do “need” a visualization. This isn’t a hard and fast “rule”, but try to group them.

Asides:

  • Even if a question doesn’t need to have a visualization to answer it, it still might benefit from having a visualization.
  • Even questions that do “need” a visualization can often be answered with a short sentence / answer.
  • You may not know how to make a visualization to answer the question (now) - that’s OK, we’re not asking you to answer these questions.

Rules for this assignment:

  • This should be a simple numbered list. Each question should be a sentence (or phrase) - it doesn’t need to be much longer.
  • You need to give 3-5 questions for each, but feel free to give a few extras.
  • You cannot use “find more questions” as a question.
  • Try to make your questions be diverse.
  • You cannot use the examples.
  • You should ask questions that could conceivably be answered using the data. (you don’t need to check, but make reasonable assumptions)
  • Try to come up with questions that are more interesting and complex than the examples.

Questions (to be answered as type-ins on Canvas):

A.1 List questions (at least 3-5) that probably don’t need a visualization. For example, “What country does the US send the most aid to?”

A.2 List questions (at least 3-5) that probably would probably benefit from a visualization. For example, “How is the aid given by the US distributed around the globe?”

A.3 Food for thought question (you might not be able to answer it yet): what visualizations might you want to make to help you figure out what are good questions to ask?

Part B

This assignment will be more interesting if you don’t look ahead (look at Visualization 1 for the first questions, and look at the others afterwards).

In this part, we are going to a similar consideration starting with 3 visualizations made from the same data set (the Census Data Set data set, if you’re curious). They are effectively an answer to an assignment last year (but, they are not an actual student solution). In the assignment, students were asked to identify stories in the data and make visualizations that present them. The course staff created these visualizations (they are designed to be used as a prop for critique, rather than as great visualizations).

B.1 Make a list of questions (at least 3-5) that the 1st visualization can answer. If you think there was one the designer was aiming for, put that first. (same rules as above) Try to come up with questions that benefit from having a visualization (rather than just the table of numbers).

B.2 Make a list of questions (at least 3-5) that the visualization inspires you to ask next - that you probably would need to make another visualization. Try to pick questions that would “need” a visualization to answer (rather than ones with a specific textual answer), and that cannot be readily answered with this visualization.

Now, look at the two other visualizations. Remember, these are all from the same data set, and were designed to answer similar questions. For these, try to come up with 3-5 questions - but this is admittedly harder.

B.3 Make a list of questions that the 2nd visualization can answer that would be difficult to answer with the 1st or the 3rd.

B.4 Make a list of questions that all three of the visualizations could answer.

Not for this week… Visualizations 1 and 3 are, in some ways, meant to “tell a similar story” (with different levels of effectiveness). What are questions that you can answer with either one, but is much easier with one or the other.

Details not important for the assignment

The visualizations provided have the data aggregated by state, so that all 3 show the same data points.

The actual data was provided by county. You can see visualizations of the more detailed data as (Census-22-maps.pdf 3.1mb) and (Census-22-scatter-2.pdf 0.3mb). It doesn’t make sense to show this detail for the bar chart.

The aggregation of the data (how the counties are averaged to compute the state values) are wrong in these visualizations. The county values are simply averaged (without accounting for their varying populations). So, the numbers in these visualizations are off. So, consider this as “fake data” used to learn about visualization.

And if you’re wondering: these are (intentionally) not great visualizations; so we can use them in the future for thinking about how to improve them!

Administrative Rules

This exercise is turned in as a Canvas Survey, where you must put the 6 lists into type-in boxes. Be warned: you can effectively turn things in once. I recommend typing your answers offline, and then copy-pasting them into the form.

As a survey, Canvas will automatically give you full credit for completing the survey. We use this score to keep track of your consistency at completing assignments. If you turn things in late, or give answers that are clearly inadequate, we will take back some of this credit. The late penalty is severe: we really want you to do this exercise before the lecture on the next day.

We will separately keep track of quality of answers so that we can reward excellent ones.