Without a real business imperative, collecting big data is effectively pointless. Use a practical set of steps to get ready for business intelligence manipulation.
A truth is starting to dawn across enterprise IT: Big data collection without a business point is just a database-filling exercise. Organizations are appreciating that, once big data is integrated and mined, correlative and predictive analytical exercises are required with an end goal in mind.
Getting the steps right
The problem is that big data analysis is a multistep process that includes creating a hypothesis, performing analytic formulae upon the sample data, and refining those formulae. Only then, when the formulae are properly worked out, can you run them across the main batches of prepared data. Lots of valuable products can help, including Pervasive Software, KXEN, Quiterian, FuzzyLogix and Revolution.
The irony is that the results are not that useful -- unless you are very clear about what you are doing and why. Hopefully, a few pragmatic suggestions can help you stay on the right path with your big data exercise. Predictive analytics is fun when you understand what you might be seeking.
Analysis makes big data relevant
What results from big data is ultimately a list of relevant possible targets -- e.g., prospects, next best products to offer, risky candidates to avoid, stocks to target, or web pages that need enhancing. This process repeats over time as the data refreshes and the analysis improves.
However, this aspect of finding relevant target data from vast raw content is just a part of the process of tapping big data's value. Meaning can be found from the relevant list only when it is combined, enhanced, and integrated with your BI packages. After all, these systems have access to traditional data sources, such as your CRM application, financial package, or inventory control system -- and they can do useful stuff with that data.
Visualization makes big data meaningful
Meaning is often best derived from visualizing the shape of the data you want, such as time series charts or dynamic cross-tabs. Standard BI products integrate disparate sources and define highly visual content as dashboards and reports, thus making them ideal for this type of endeavor. Even better is an environment that will help visually communicate this meaning as insights to users. A platform that scales and can transform content from print to Web to mobile interfaces is ideal and where you need to direct your efforts.
Big data's contribution is best delivered to the business in the form of visualizations. That's great news, because your proven BI platforms can take this mass of information and produce interactive and/or mobile reports and operational and performance dashboards that management needs to run the enterprise better.
And it's the best way to avoid turning your incredibly valuable big data initiative into a data-for-data's-sake box-ticking exercise.
User Rank: Exabyte Executive 12/19/2012 | 2:03:34 PM
Re: to boldly go @JeffMorris Great points. Visualization benefits greatly in the multi-step and multi-analysis approach which comes across disciplines, software, data... When you unify many aspects of Big Data those simplified tasks provide the ROI opportunities.
Re: Data for data's sake Reading this must make data teams nervous - especially if they have cost constraints on their storage. If they can't afford a 'delete nothing' approach - everytime they sacrfice seemingly unimportant data for space they must be wondering "what if I just deleted next year's goldmine?"
User Rank: Exabyte Executive 12/5/2012 | 12:41:35 PM
Re: to boldly go Good analysis is absolutely the most important goal for any of what we're doing in the big data space.
What's great is that visualization helps analysis in ways that spreadsheets cannot. I remember a course at the University of Southern California funded by the Dept of Defense to come up with new ways to process information in 2005. The winning project turned sentences from Shakespeare (or any text) and visualized them as birds; longer wings meant longer sentences, and you could quickly see outliers.
The point is that sites like InformationIsBeautiful are driving ways to see outliers and trends, and that's a hugely important part of modern analysis.
User Rank: Exabyte Executive 12/4/2012 | 5:04:13 PM
Re: to boldly go Looking at the three key elements you highlight Jeff, I'm going to put analysis as the priority. I think mining the data will become easier; we'll be able to either narrow down the pools or expand them, but good analysis provides the leverage and competitive advantage.
Re: Data for data's sake @Saul, for supervised machine learning (predictive), being careful in the selection of your feature set is entirely warranted. In fact, fine tuning the model's results often entails going back and deselecting certain features, and selecting other features. But with unsupervised learning (clustering for example), many times we do in fact go where "no man has gone before" as so aptly put by @Jeff. I'm not advocating collecting meaningless data, but I've seen cases where clustering revealed some rather surprising correlations in the training set. Data collection is really give-and-take when engaging the Data Science process.
to boldly go Even the Starship Enterprise had the goal to “boldly go where no man has gone before.” This is basically the equivalent of having the goal of searching for unknown
unknowns. And using Venn diagrams and other clustering visuals will help at facilitating those elusive epiphanies which may trigger the more structured analytic lifecycle
that we’re describing. Eventually you’re going to need to share your find with someone.
Re: Data for data's sake You'll forgive me @Daniel - but many people are saying to be selective in what you pull in data (i.e. don't spend on storage you don't really need). But this cluster based analysis really works best in a 'save everything' scenario. Is opposition of messages just a symptom of the relative immaturity of the big data conversation?
Re: Data for data's sake Unsupervised learning using clustering is more of finding "unknown unknowns." An unknown unknown is a question you did not know you should ask. Such discoveries expand your awareness. An effective pathway to unknown unknowns comes through cluster analysis. The clusters tell you how strong a relationship exists between data attributes. But such analysis doesn't tell you what the clusters mean. It just identifies that a cluster is forming and how strong it is. Domain experts definitely need to be part of the solution since identifying clusters in data sets requires contemplation of the foundations of the business.