I love data visualization, but it's not the only way of giving your data a voice.
From pen and paper through Excel, Google Viz, R, and D3, visualization brings out the flavor of your data. I believe it should play a role at every stage of data analysis: use it to picture the process of data collection, to understand how data moves through your system, to clean data, to find patterns, and to show results. But visualization is not the only way of giving your data a voice.
Statements, models, and anecdotes are three other means of communicating the message of your data -- both to others and to yourself, for your own understanding. Let’s study how each of these could be applied by using a very simple example.
The case study
Research you’ve done has found that, by taking into account the weather forecast, a supermarket would be better able to predict (and thus satisfy) customer demand. You’ve looked at several key product lines, you’ve run some heavy machine learning tasks, and your hard-drive and head are filled with numbers. It’s time to take the message to the C-level and get approval for a new ordering process to go into production. What do you present?
Statement
By taking weather forecasts into account we will capture $20 million in sales we would have missed.
Anecdotes can be as effective as visualizations
in communicating big data insights
It’s simple and quantitative, and this grabs the attention of the C-level. But don’t think that the utility of simple statements like this is only useful in presenting your findings; hold on to pithy thoughts as you explore the data, and use them to motivate your team and maintain your footing as you move through massive analytic tasks.
While powerful -- and probably the thing you should lead with in this scenario -- it invites a question how? I would recommend a graph at this point -- perhaps a time-series of the volume of demand for weather-affected products, with the temperature shown on a secondary axis, but then back that graph up with something else.
Models
Despite several hundred of years of success in general science, models are somewhat unfashionable in data science. Using machine learning bypasses the need to construct models of cause and effect, and this can make it much quicker to get to results. But the brain deals very well with understanding models of cause and effect and much less so with understanding probability trees. Even if you arrived at your results through machine learning, that doesn’t mean the results can’t be described in model terms.
Here’s a simple example: “For every one degree above 25 degrees, our customers buy $100,000 extra ice cream per day, but often we run out.” A model is quantitative, usually implies some causal connection, and it’s battle-tested by hundreds of years of science.
Anecdote
The plural of anecdote is not data. Nonetheless, datasets are often replete with anecdotes that can aid your understanding. Try this: “During the heat wave last May, we ran out of ice cream entirely in 20 stores. Our staff told us they were frequently being asked to look in the backroom.” Maybe your CEO even tried to buy ice cream that week. This is a story that’s easy to relate to, and it also shows a direct link between your data and satisfaction of your customers.
Anecdotes might seem anathema to data, but they’re not. In fact, pretty much every visualization is just an anecdote writ large and colored in. In the context of big data, it is extremely rare that an entire dataset can be encapsulated in a single visualization. Drawing on just a portion of the data produces results anecdotal in the sense that they can’t encompass everything.
Use of anecdote is common in journalism -- including data journalism -- and drawing on this technique can not only help the C-level understand what you’re doing, it can aid your own thinking. Use anecdotes, but use them with caution, being mindful that each one misses huge swathes of data.
Related posts:
— James Robinson, CTO, OpenSignal.com