Sponsored by:
 
 

Getting to the Real Point of Big Data

Jeff Morris
50%
50%
Newest First | Oldest First | Threaded View
comments
Page 1 / 2   >   >>
MDMConsult
50%
50%
MDMConsult, User Rank: Exabyte Executive
12/19/2012 | 2:03:34 PM


Re: to boldly go
@JeffMorris Great points. Visualization benefits greatly in the multi-step and multi-analysis approach which comes across disciplines, software, data... When you unify many aspects of Big Data those simplified tasks provide the ROI opportunities.

legalcio
50%
50%
legalcio, User Rank: Exabyte Executive
12/5/2012 | 5:29:36 PM


Re: to boldly go
@saul I think that analysis ascends in the priority scale. It's got to be about delivering actionable information to someone seeking an answer to a question.

Saul Sherry
50%
50%
Saul Sherry, User Rank: Blogger
12/5/2012 | 1:27:52 PM


Re: Data for data's sake
Reading this must make data teams nervous - especially if they have cost constraints on their storage. If they can't afford a 'delete nothing' approach - everytime they sacrfice seemingly unimportant data for space they must be wondering "what if I just deleted next year's goldmine?"

technetronic
50%
50%
technetronic, User Rank: Exabyte Executive
12/5/2012 | 12:41:35 PM


Re: to boldly go
Good analysis is absolutely the most important goal for any of what we're doing in the big data space.

What's great is that visualization helps analysis in ways that spreadsheets cannot.  I remember a course at the University of Southern California funded by the Dept of Defense to come up with new ways to process information in 2005.  The winning project turned sentences from Shakespeare (or any text) and visualized them as birds; longer wings meant longer sentences, and you could quickly see outliers.

The point is that sites like InformationIsBeautiful are driving ways to see outliers and trends, and that's a hugely important part of modern analysis.

Saul Sherry
50%
50%
Saul Sherry, User Rank: Blogger
12/5/2012 | 7:03:13 AM


Re: to boldly go
@legalcio do you think thagt angle is specific to your situation or do you think that analysis would be the priority across different verticles?

legalcio
50%
50%
legalcio, User Rank: Exabyte Executive
12/4/2012 | 5:04:13 PM


Re: to boldly go
Looking at the three key elements you highlight Jeff, I'm going to put analysis as the priority. I think mining the data will become easier; we'll be able to either narrow down the pools or expand them, but good analysis provides the leverage and competitive advantage.

Daniel Gutierrez
50%
50%
Daniel Gutierrez, User Rank: Blogger
12/4/2012 | 4:53:06 PM


Re: Data for data's sake
@Saul, for supervised machine learning (predictive), being careful in the selection of your feature set is entirely warranted. In fact, fine tuning the model's results often entails going back and deselecting certain features, and selecting other features. But with unsupervised learning (clustering for example), many times we do in fact go where "no man has gone before" as so aptly put by @Jeff. I'm not advocating collecting meaningless data, but I've seen cases where clustering revealed some rather surprising correlations in the training set. Data collection is really give-and-take when engaging the Data Science process.

Jeff Morris
50%
50%
Jeff Morris, User Rank: Blogger
12/4/2012 | 3:54:15 PM


to boldly go
Even the Starship Enterprise had the goal to “boldly go where no man has gone before.” This is basically the equivalent of having the goal of searching for unknown unknowns. And using Venn diagrams and other clustering visuals will help at facilitating those elusive epiphanies which may trigger the more structured analytic lifecycle that we’re describing. Eventually you’re going to need to share your find with someone.

Saul Sherry
50%
50%
Saul Sherry, User Rank: Blogger
12/4/2012 | 3:29:48 PM


Re: Data for data's sake
You'll forgive me @Daniel - but many people are saying to be selective in what you pull in data (i.e. don't spend on storage you don't really need). But this cluster based analysis really works best in a 'save everything' scenario. Is opposition of messages just a symptom of the relative immaturity of the big data conversation?

 

Daniel Gutierrez
50%
50%
Daniel Gutierrez, User Rank: Blogger
12/4/2012 | 2:01:12 PM


Re: Data for data's sake
Unsupervised learning using clustering is more of finding "unknown unknowns." An unknown unknown is a question you did not know you should ask. Such discoveries expand your awareness. An effective pathway to unknown unknowns comes through cluster analysis. The clusters tell you how strong a relationship exists between data attributes. But such analysis doesn't tell you what the clusters mean. It just identifies that a cluster is forming and how strong it is. Domain experts definitely need to be part of the solution since identifying clusters in data sets requires contemplation of the foundations of the business.

Page 1 / 2   >   >>
More Blogs from Jeff Morris
On its 20th anniversary, the Balanced Scorecard has more relevance than ever in an era of big data.
A new survey reveals 40 percent of the Global 9,000 are not engaging with big data, due to cost concerns and skill shortages.
Don't get lost in the size of your big data; pick your problem first for better results.
The only way to get to real projects with measurable benefits is to focus on the practical work likely to result in business benefit.
Flash Poll
Data Visualization Showcase
This Tableau visualization of international debt demonstrates how simple visualizations can give great insight
Explore this data here.
More Data Visualization Showcase
BDR in your Inbox
Featured Video
9
Big Data Explained: What Is ETL?
OK, so it's Extract, Transform and Load - but we'll show you what it really means.
Watch This Video
Follow Us on Twitter
Like Us on Facebook
Accolades
Accolades