This chart has been posted in a variety of places over the last week (@Flowingdata). Purportedly from UCSB’s newspaper, clearly (hopefully?) an earlier version of the graphic was submitted for printing in error. It got me thinking a little about how to display this information in the best way.
As with any charting decision you have to understand both what the primary message is, but also design the chart so that other information can be extracted. In this case we have that the public perception of drug use by USCB students far exceeds actual student use, but there’s other tidbits we can take away as well.
I can’t discuss this chart without also touching on methodology – e.g. for ‘actual’ use did they conduct drug testing on students? Probably not, I bet they just asked the students – see my post on explaining your methodology.
So what would the chart look like if we just redrew it to show the real charts? The current design (albeit with the right data) is a pretty good choice – using the concept of small-multiples (otherwise known as trellis or panel charts) works well in this case. So leaving the chart in this form – nine individual bar charts, we can perhaps surmise that this was the intended chart.
While the data is represented accurately, allowing each bar chart to have different vertical axis scales means that you lose data comprehension – for example, I am interested in comparing how the public thought the frequency of use of cocaine varied – i.e. how many people thought UCSB students used it a few times a month, through to daily use. Because the axes have different scales, I have to resort to using the data labels.
I’m still not sure about it though – it isn’t as visually appealing as before, even if it provides more information. Now part of that I’m going to attribute to the choice of time spans – the first time category spans 9 days, the second 19 days, and the third 30 days. However, is ‘daily’ (i.e. every day for the 30ish days in the month) that functionally different from 29 times a month?
A better choice of times, that make more sense to the public, both during questioning and understanding of the results, may have been the less numeric, but more descriptive ‘a few times a month’, ‘ a few times a week’, and ‘pretty much daily’. When discussing the choice of charts, its easy to overlook that perhaps the study should have been designed differently from the outset.
So perhaps there’s a different chart style that preserves the data richness, but is also visually appealing?
Oh, yes. It’s a bunch of the much maligned pie charts – 18 of them in fact. I think pie charts are fine when you’re just comparing two data points (in this case the percent answering yes or no to the 1-9 times month question for example). One great thing about this design is now all of the charts are on the same scale, so comparing vertically has become easier – i.e. does the public think that cocaine use or alcohol use is more common.
I’ve also switched the order of the series around – I think it’s more natural to read from left to right “what’s the perception?”, “what’s the reality?” I’ve removed shading from the other pie slice – by deemphasizing this, you reduce the complexity. Equally, you limit some criticism of pie charts – that it’s difficult to compare areas with each other – now you can easily use the angle of the slice for comparison.
Finally we have a bubble chart – quickly knocked together in Tableau Public. Now we are forced to compare areas, and while this (somewhat falsely) emphasizes the difference between perception and actuality, I prefer the pie chart version.
Which do you prefer? Given the original choice of time slices, how would you design this chart?