An investigative journalist's approach to starting their investigations with a headline, and then looking for the evidence to prove it, serves the data science community well.
Mark Hunter, an American journalist in Paris, says you should start any investigation by writing the headline. For example:
Corruption in the school system has destroyed parents’ hopes that their children will lead better lives.
Now, write the first paragraph of the story. All this should be done before you have any idea if it's true or not.
Following scientific example
Why? It's a method taken from scientific experiments. You have a hypothesis about the world -- for example, gravity bends light rays in a certain way -- so you test it by, say, measuring a solar eclipse. Should you find out during the investigation that your headline or story isn't true, then you write a new one. The great thing is, if it's no longer an interesting story, you can save time and money by dropping the investigation.
If you start by just randomly investigating, then you don't really know where you're going, which makes it harder to get there. It's the same for businesses analyzing their data. I think the job of a good data scientist is exactly the same as the job of
a good, investigative data journalist. One is using data to investigate government and businesses. The other is using data to investigate a company and competitors. Both will talk to individuals to fill out the details and human side of the story.
As every advertising campaign about big data says, the key value proposition is making better decisions. A good way to start data science is with the decision.
What are the two actions to choose between?
It's not just about the decision; it's also how you will communicate it. What's a headline that the CEO would find interesting, if it were true?
Back to Mark, who, with funding from UNESCO, has written a useful manual called "Story-Based Inquiry."
Here's to more businesses investigating themselves!
User Rank: Exabyte Executive 11/30/2012 | 7:28:25 PM
Re: Data Scientific Method @Daniel Yes, its about time. Innovation of data is competitive leverage. Investing in such innovations, applying the right strategies will poise leading organizations to successful growth. We clearly see today how data fuels all economic channels.
Re: Data Scientific Method "Buzz and hype" is exactly that, not much substance. It is hard for me to relate to buzz and hype for a technology that I've been involved with way before it was called Big Data. Suddenly, enterprises are waking up and seeing all the hidden value in their data assets. Long over due, but I'll take the awareness all the buzz and hype has brought. Viva la Data!
User Rank: Exabyte Executive 11/29/2012 | 2:07:39 PM
Re: Data Scientific Method @Daniel: I think that every part of the organization has to have that kind of mindset -- awareness of what is emerging knowing that it could one day become opportunity. I was involved in social media ahead of the trend for example, and when suddenly it shifted from "emerging" to "hot" I was in a very good position.
Doubly so because social media has a strong connection with big data tech. The old business question of "how do we get value from our social media" can be answered when data science is applied to it. In my experience, the marketing industry is just starting to become aware of what's possible with data science.
Do you think the buzz and hype of big data has hit the levels social media has/had, or is it still to come?
User Rank: Exabyte Executive 11/27/2012 | 9:58:34 PM
Re: The CEO's appetite is limited. What about letting the CEO's write the headlines?
"How would you like to save money/make more money?"
Most of the time, they have an idea of how they could save or make more money. Then ask for details, and ask what thresholds would prove or disprove their theories.
Then test their theories. It should hold their interest because it is in part their own work, and by making them decide the factors of success or failure it denies bias to colour the results because again, it is their own work.
Re: Data Scientific Method From a Data Science perspective, I think it is important to work with all data sources available at the time of the engagement. So if emergent data becomes part of the equation, all the better. Besides what is considered "emergent" today, may be perfectly feasible tomorrow. The data science space is moving that fast. It is truly a whirlwind and I'm kinda loving it.
User Rank: Blogger 11/27/2012 | 6:34:26 AM
Re: Data Scientific Method @Technetronic starting with that hypothesis alone unbinds you from the daily grind, and allows ambitions to be set high. Your end, proven, hypothesis might not be the hallowed ground you had imagined, but it could well be over in that direction.
Re: Data Scientific Method Good question. I think that emergent data (classes of information that were previously impossible, or at least impractical, to gather but now are made feasible by advances in information technology) has an important place today in increasing the value of corporate data assets. With commodity pricing for data storage and machine learning technology to make sense of the volume and velocity of the data, emergent data will flourish.