I had been reading lots of stories recently about how bad the flu has been this year in the US. Every year the CDC creates a vaccine that it “thinks” will work against the predicted strain. Of course they encourage everyone to get vaccinated. But predictions are just that…predictions. And this year, the CDC was way off. The number of flu cases really spiked for the 2017-18 flu season. So I was curious to see what the Makeover Monday crowd would do with the data.
When I prepared the data, I downloaded it EXACTLY how the CDC provided it, knowing full well that it would cause people some challenges, which I’ll review in this week’s lessons.
LESSON 1: PROPERLY SORTING THE DATA
The data this week was seasonal, that is, the data doesn’t start on January 1. Rather, the data starts, generally, around week 40 each year, hence why they name the seasons 2016-17, 2017-18, etc. I was quite surprised that so many people made the mistake of assuming the data started January 1. I can’t count the number of times I replied to tweets telling people that they had the weeks sorted wrong. Here are a few examples of what I’m referring to.
What happened is many people used the week column and visualized the data by season. They assumed week 1 was the first week, but it actually represents the week of the year, not the week of the season. To align all of the weeks properly, you should create a couple of calculations.
STEP 1: Create a date field from the Year and Week fields.
STEP 2: Create a calculation to get the first date for each season.
To put this into a simple sentence, this calculation says “For each season, return the first date.” Now that you have the first date for each season, you can create a calculation to normalize the weeks.
STEP 3: Create a calculation to count the number of week into the season
Note: Tableau will put this into the measures section of the data pane because it returns a number. Just drag it to the dimensions pane. This isn’t a field we want to aggregate.
STEP 4: You can now compare the seasons according to how far each week is into the season.
LESSON 2: CHECKING FOR OUTLIERS
As I mentioned, I downloaded the data “as-is” from the CDC. For some strange reason, the percentage fields at the end were suddenly really large. Note how 2017-18 values are so much higher than 2016-17.
Think about it, these are percentages. How can 500%+ of the population have the flu? Not only does this make the data incomparable, think about how aggregation would work. You can’t add or average the percentages together if you want yearly data. You need to create a calculation to properly aggregate the data. We have a column for the flu cases (ILITOTAL) and we have a column for the TOTAL PATIENTS. A simple aggregation calc will return the correct, weighted percentages regardless of the level of dimensions.
There are two lessons here:
- If the data looks wrong, it probably is and you should investigate it further.
- Creating your own calculations that aggregate the data properly will rarely serve you wrong.