For week 50 I picked a dataset which I thought was nice and simple, something easy as we approach the holidays and the end of the year. Turns out, this dataset might have been a bit more challenging than I thought.
Before I get into the lessons, I want to thank Carl Allchin, Assistant Head Coach at the Data School, for filling in for me during Viz Review. I spent the last four days in New York for work and couldn’t fit the webinar into my schedule. I am sure everyone valued his comments and suggestions.
While I was in New York I had the pleasure to run a #MakeoverMonday live event at the #NYCTUG hosted by Spotify. A great excuse to spend a Monday evening playing with data, meeting new people and working on a music-themed viz.
Thanks Skylar and Jacob for the invite!
— Jacob Olsufka (@j_olsufka) December 11, 2018
Now let’s focus our attention on this week’s dataset which is about land use by food type. A simple bar chart, or so I thought. Turns out there was a risk of the data not being interpreted correctly.
LESSON 1: GET TO KNOW AND UNDERSTAND YOUR DATA
With MakeoverMonday datasets it can be tempting to jump straight into visualization mode. After all, the data prep has been done for you, so why wait?
One big advantage of visualizing data is not to build a dashboard, but to understand the data better by identifying trends, clusters, outliers, changes over time, correlations and other patterns in the data visually. Looking at numbers doesn’t allow us to gain the same insights as quickly, so by building a few charts we have the chance to get a better understanding of what we’re dealing with.
That first phase of working with a new dataset shouldn’t focus on what the end result will look like. It should be time dedicated to understanding the data. That is a crucial step for representing the results accurately later on.
This week, a number if submissions added the different food types together to represent the land area required to produce 1g of protein. This is incorrect.
The original visualization states in its subtitle: ‘Average land use area needed to produce one unit of protein by food type, measured in meters squared, per gram of protein over a crop’s annual cycle or the average animal’s lifetime.’
This does NOT mean that all of the bars taken together are the land area required to produce a gram of protein. As a result, we also cannot assign percentages of total to the individual food types.The way to read the original chart can be put into a plain English sentence, something like ‘To produce a gram of protein from beef, we need 1.02 sqm, while producing the same amount of protein using rice requires only 0.02 sqm’.
Getting things right is important, so the fundamental part of any analysis and visualization exercise should always be to gain a good understanding of the data we’re dealing with.
What I recommend is to start not with the end game in mind but by simply creating a bunch of different charts with the fields you have available. See how the data behaves when you use different dimensions, how do things change over time, where are things happening etc. Don’t think about your final dashboard or viz. Think about the data, learn what’s contained in the dataset and how things are related.
This process will likely help you find an interesting story too.
Also make sure to read the article to understand the context of the original chart. What do the descriptions tell you? What questions did the authors have and which ones did they answer with the data?
Try to also test your own assumptions. Read your conclusions out loud and ask yourself whether they are correct, logical, and whether they can be made in the first place.
All of these steps and many more should be in the toolkit of any data analyst to ensure we don’t just visualize data, but really probe it, test hypotheses, work through the twists and turns, find insights, put them into context and ask others to challenge our assumptions.
We have a complete chapter dedicated to analytical skills in our Makeover Monday book. Having strong foundations is important for being successful in the field of data analysis.
LESSON 2: MATCHING YOUR VIZ TO THE TOPIC
Simple vizzes often work best and Andy and I regularly advocate for keeping things nice and simple. This week we saw a lot of bar charts and quite a few charts with squares to compare land sizes. These are great choices for representing surface or land area because we can see length and width and imagine ourselves outside looking at cropland or pasture.
What is much more difficult to figure out is relating circular shapes to land size, so packed bubble charts don’t make it nearly as simple to understand the data.
Aside from that, rectangles and squares are a better match for the topic: fields and pasture, paddocks and sheds are usually rectangular or square. Not perfect circles. So the association of these more intuitive shapes with the topic of land size will be more logical for your audience than relating circles back to the dataset.