When less isn't always more
So often we are told less is more; keep it simple; white space is good. These are all statements I completely agree with, however when designing my Makeover Monday this week I found myself challenging them and approaching the visualisation from a slightly different perspective.
Before you start wondering if #Data17 has sent my visualisation crazy let's clear things up a little, I am not talking about designing a cluttered Makeover Monday visualisation, instead I am talking about the perils of by keeping things simple we rely solely on aggregated or high level measures to complete our analysis.
This week's Makeover Monday data set provided us with obesity rates for States in the USA for a 5 year period, 2011 through 2015. When looking into the data set not only did you have access to overall obesity rates, you also had access to category level obesity rates including age, gender, income and education. The incredible depth of data immediately engaged me and energised my inner geek; and it also got me remembering one of my pet hates, how in modern day analysis so many people often quote aggregated measures that often judge a large population of people through a single number.
There have been 2 prime examples of this in recent months, Brexit and the US election. In the case of Brexit 51.9% of the British voters chose to leave the European Union however this aggregated percentage hides a wide variation within the voting population of the United Kingdom; how would you feel if you were one of the 73% of 18-24 year olds or 62% of Scottish voters who actually voted to remain in the European Union. Likewise on a similar note, and I know this is slightly more ironic due to Clinton winning the popular vote, but 55% of 18-29 voted for Clinton whereas only 37% of the same age range voted for Trump.
The challenge I faced
So back to Makeover Monday. I wanted to tread the fine line between visualising State level obesity rates whilst also allowing people to interact with the finer detail of the data set; when initially analysing the data set what really struck me was that even the healthiest State in 2015, Colorado, with a 20.2% obesity rate, had pockets of its population that were less healthy, for example Coloradans educated to a less than high school level had an obesity rate of 28.2%, almost that of the overall median State obesity rate (29.8%). The risk I faced is that, even for an individual years data, when visualising every category and sub-category obesity rate I was faced with 1080 marks; the ideal recipe for data overload and an over-crowded visualisation. (Image obtained from http://www.josebaldaia.com/)
So how did I meet the challenge of visualising 1080 marks whilst still allowing people to understand the aggregated obesity rates as well as the individual category and sub-category obesity rates?
Firstly I took inspiration from a blog by Bora Beran (Trellis Charts with Tableau 8 | Bora Beran), going all the way back to Tableau 8, in which he shared how to create a Trellis Chart (I know this has been done by many others since then). By creating a Trellis Chart I could visualise all 54 states in a smaller screen area when compared to listing them; I will get to the reason why I didn't do the obvious and plot them on a filled map as per the original visualisation.
A quick guide as to how I created the Trellis Chart
A Trellis Chart is a way of visualising a dimension within a x by y grid; replicating the same visualisation for each entry within the dimension but placing the entries in a set of rows and columns. In this instance there were 54 States so the Trellis Chart grid visualised the States in an 8 rows by 7 columns grid. To create the Trellis Chart I created 3 calculations, as per Bora's blog post:
Calculates how many unique data items are trying to be visualised, in this instance the 54 States (Thanks to a suggestion by Rodrigo Calloni (@tableauing) I wanted to put the State abbreviation on labels, rather than the full State name, hence the CountD uses [State Abbr] rather than [State].
Based on the 54 data items being visualised how many columns should there be?
Based on the 54 data items being visualised how many rows should there be?
The calculations were the complex part of the process, the visualisation was the easy part. Place the [Columns] calculated field in the Columns Shelf and the [Rows] calculated field in the Rows shelf. The calculated fields will automatically be a table calculation, but you need to set them to discrete rather than continuous.
Next I dropped the [States Abbr] onto the detail marks card and changed both the [Columns] and [Rows] table calculations to compute using [State Abbr].
This created the 8 x 7 grid.
A couple of small steps and the Trellis Chart will be complete (albeit Trellis visualising blocks of colours rather than the traditional 1 line chart per section of the trellis). I place [Year] (2015) and [Category] (Total) in the filters and [% Obesity] onto the Colour marks card. Set the borders of the colour to white and place [State Abbr] onto the Text marks card. Taking inspiration from Matt Francis I then changed the colour range to blue and orange and reversed the colours so the least obese States show as blue and most obese as orange. Lastly sort the [State Abbr] on the details mark card to order by [% Obesity] descending. This is important as it will ensure the most obese States are in the top left of the visualisation and the least obese in the bottom right.
So that is the aggregated, total State obesity rates, sorted but how about the detailed categories and sub-categories data?
Visualising through a 'block chart' - I really am not sure if that's the right technical term for it but it works for me!
Now I had a coloured grid visualising State obesity rates I chose to replicate this grid but increasing the elements within the grid to visualise each of the categories and sub-categories for a State; I refer to this as a block chart but can someone please tell me if there is actually a technical term for it?!
The block chart works by replicating the order of the original grid, i.e. highest obesity in top left, lowest obesity in bottom right, but rather than showing a single square for each State the square is split into 4 rows, 1 for each category and then each category is split into the number of sub-categories for that category. For gender this means the gender row block is split into 2 elements (Male or Female) whereas the age row is split into 6 blocks (18-24,25-34,35-44,45-54,55-64 and 65 or older).
Creating the block chart took a little bit of creativity but was actually quite simple:
Replicate the [Columns] and [Rows] of the previous Trellis Chart and place [State Abbr] onto detail and sort by [% Obesity] descending.
Add [Category] onto the Rows shelf and [Sub Category] onto detail (I chose to order the sub categories to make them easier to read)
Place [Number of Records] onto the Columns shelf and make it a summed table calculation, calculating by Pane (Across). This is the creative bit of the visualisation that creates 1 row per category and 1 'block' per sub-category.
Colour the visualisation by [% Obese]
Lastly set the colour range of both the 1st visualisation and this block chart to be the same range to ensure the colours are consistent (I set it to be 10% to 45%)
This will create the following visualisation:
The individual blocks of the 2nd visualisation do not need the State labelled on them (although it is added into the Tooltip) as the user is informed the order of the block chart is the same as the State level obesity rate visualisation. I think this provides the user with a really visual way of engaging with over 1,000 marks; avoiding the need to create large tables of numbers.
I did receive one further bit of inspiration from Marcela Janowska (@ekomarcek); by adding a highlight action onto the 1st State level visualisation the user is able to click on a State and the detail for that State will be highlighted in the detailed 'block chart'.
The full visualisation for my Week 41 of Makeover Monday can be found on my Tableau Public profile:
I hope you found the walkthrough of how I created my Makeover Monday visualisation useful but more importantly I hope I shared an example of how a large volume of data can be visualised in a way that doesn't overpower the user; allowing them to interact with both State level aggregated data and the detail of category and sub-category data.