In The Pitfalls of Big Data Prediction Analysis, I cited a situation where the big data predictive analytics algorithms did not work. So, what can be done to limit -- or, if possible -- prevent, incorrect outcomes from the big data prediction algorithms? The answer: “Go to ground.”
The issue with big numbers
When you read future-thought pieces like GE’s Industrial Internet report, you can see just how large big data is expected to get. As the size of data grows, so do the dollars, and so billions turn to trillions. With the pace of change, the size of the Internet information pie, and the proliferation of information and data driven decision-making expectations, errors will be made.
When you have trillions of data elements to deal with, being 99.999 percent accurate can still leave you with tens of millions of incorrect prediction elements -- any one of which could be catastrophic.
A little history
Back in 1985, I was studying remote sensing utilizing Landsat imagery. Landsat images are satellite sensor images of the earth's surface. The images were built from values of the intensity of light hitting a sensor array on the satellite. We looked at them in three areas of the light spectrum -- red, blue, and yellow. A mini-computer the size of four filing cabinets was used to crunch the numbers and align the images to geo-spacial reference points.
We built composite images, and then analyzed what we “saw.” The data often required filter modifications, and we also changed the relative pixel intensities to highlight features of interest. So what? All this can now be done on any paint program on an iPad. But back in 1985, this was “big data.”
The challenge was that while we were studying what we thought were seagrass beds in 10 to 30 feet of water, we could not tell for certain, as the images were not always photo and geo-spacially aligned. So, what did we do to get our analysis and predictions right? We had to "Go to ground."
We went out to the field and looked with our God-given eyes, seeing where we were right and where we were wrong. We then recalibrated our prediction algorithms and analyses, and made better delineations of the features in the imagery.
Fast forward to present day, and we have more computing power and storage at our fingertips than we could ever conceive of back in 1985. We also have more data. However, we have less time to get things right. Why?
The expectation is that data-driven decisions can be done quickly with the right prediction algorithm. That may be the case theoretically, but without going to ground and doing some real life assessment (using your God-given eyes to see if what is actually there is being represented by the big data), there is a chance, however slim, of getting it wrong. Getting it wrong with big data, which includes trillions of data elements, means a lot of errors.
One of the skills of a Data Scientist is to figure out how to sample and evaluate the real world “truth” of what is being seen in the big data analysis. The challenge is to do this quickly, effectively, and with skill to ensure the errors are either extremely small (less than 0.001 percent), or non-existent.
User Rank: Exabyte Executive 2/25/2013 | 11:23:48 AM
Re: What are data analytics to do about getting it wrong? It is a good article. Validation of the analytics methodology indeed is very important. It is important to make sure the input of the model is correct,and the output of the model is making sense.
User Rank: Blogger 12/13/2012 | 2:54:41 AM
Re: How am I doin'? But how much connection is set up there @legalcio? There needs to be a way to capture that information digitally to analyze in full effect. Terry's example in this instance is very binary, are you in the right place or are you not. Other samples will be more based on perception... is the missing peice here an easy way for frontline staff to report in, and then sentiment analyze the language used to homogenize results, making an entire, constantly updated loop.
User Rank: Petabyte Pathfinder 12/13/2012 | 12:59:11 AM
Re: How am I doin'? I like where you're getting at. A collaboration of sorts when it comes to these fronts is ideal. Going with just one, ie. just big data, might result in one-sided conclusions. I do not see why both cannot be used together.
User Rank: Blogger 12/12/2012 | 8:54:33 AM
Re: How am I doin'? @Saul Well he was mayor for 3 terms and quite popular, despite the fact that the city was not doing very well then. I was still too young to vote then, but as I recall Koch was known to be blunt, so people would not feel the need to be overly polite to spare his feelings.
User Rank: Blogger 12/12/2012 | 5:08:28 AM
Re: How am I doin'? @Ariella, who do you think gets a more honest opinion? Would that mayor have people give their frank opinions, or be polite to his face and keep true sentiment tucked away, ready to be unveiled on polling day?
That's what the anonymity of online buys you, the chance to be honest without recriminations... for better or worse.
But a lot of that might depend on where he is based, and how shy/unconfrontational most people are there.
User Rank: Exabyte Executive 12/10/2012 | 3:17:41 PM
Re: How am I doin'? The numbers can't be discounted @Saul, but it depends on the product. Utilizing big data to analyze investment trends might be enough for a marketing campaign to investors of a certain wealth range. If big data is used to increase retail sales at the branch level then you've got branch personnel who can check the customer's pulse on the ground.
User Rank: Blogger 12/10/2012 | 3:13:39 PM
Re: How am I doin'? That might make sense for a mayor, @legalcio - but will it always apply? A mayor needs to know how people feel about them, what is being thought etc. And there is a vested interest for that mayor to know this 'on the ground' opinion.
When it comes to financial systems, might it just be the case that the numbers coming in are as close to 'on the ground' as you'll need to get?