Sponsored by:
 
Latest Comments
Blogs
Saul Sherry
10
Saul Sherry, Editor, 5/17/2013   Comment now
James Robinson, co-founder of Open Signal, tells us why it takes two to get great visualizations.
Christian Prokopp
5
Christian Prokopp, Data Scientist, Rangespan, 5/9/2013   Comment now
RCFile (Record Columnar File), the state-of-the-art big data storage format in Hive, is about to be challenged by ORC (Optimized Row Columnar).
Christian Prokopp
5
Christian Prokopp, Data Scientist, Rangespan, 5/6/2013   Comment now
Are sequence files or RCFile (Record Columnar File) the best way to store big data in Hive?
Chris Taylor
18
Chris Taylor, Director of Marketing, TIBCO Software, 5/2/2013   Comment now
Big data is a polarizing topic whenever it comes up in discussions -- which is often. Like any hot trend, the combination of excitement, ignorance, and opportunism creates noise that leads to a healthy amount of skepticism, cynicism, and a few other isms as well.
James M. Connolly
10
James M. Connolly, US Correspondent, 4/30/2013   Comment now
Having built out and marketed a big data platform long before big data became a buzzword -- 12 years ago, in fact -- placed MarkLogic Corp. in a pretty good position, and it has resulted in a $25 million investment now that interest in big data is really big.
Yves de Montcheuil
20
Yves de Montcheuil, VP Marketing, Talend, 4/29/2013   Comment now
Big data there is. To master it you must learn, but of dark data, beware you must.
most commented last month
24
Encryption Steps Up for Big Data
James M. Connolly, US Correspondent, 4/25/2013
Video Blogs
Message Boards
Chat
Flash Poll
  LinkedIn     RSS
Data Visualization Showcase
This Tableau visualization of international debt demonstrates how simple visualizations can give great insight
Explore this data here.
More Data Visualization Showcase
Digital Audio
Latest Archived Broadcast
Join this radio show to truly understand what a CIO needs to do to build a successful private cloud and what skills and values the IT team will need to embody.
BDR in your Inbox
Information Resources
Like Us on Facebook
Follow Us on Twitter
Accolades
Accolades
 


Saul Sherry
Video: Visualization Is a Team Sport

5|17|13   |   1:23   |   (10) comments


At The Big Data Show, we caught up with James Robinson of Open Signal, who encourages a team approach to visualizations. One of the reasons is that it sometimes takes a graphic designer or project manager to get the technical-minded visualization producer to go the extra distance.
Saul Sherry
Big Data Explained: What Is HDFS?

Part of 9   |  
See complete series
4|4|13   |   1:05   |   (13) comments


Big data is awash with acronyms at the moment, none more widely used than HDFS. Let's cut to the chase... it stands for Hadoop Distributed File System.

This is the system of distributing files that allows Hadoop to work on huge data sets at speed. It spreads blocks of data across different servers, as well as duplicating those blocks of data, and storing them distinctly.

Let's see why with an example.

Sarianne works in the financial markets, and runs a lot of predictive models to make sure her investments are minimum risk.

Utilising HDFS, her queries through Hadoop can run quickly because the data blocks are stored separately -- meaning all the computation can happen in one go, rather than queuing up behind each other.

As an added benefit, if one server fails (as one is bound to, given the amount of servers and disk drives needed to run big data projects) it won't stop Sarianne's models from pulling the data they need, because HDFS duplicated those blocks -- meaning Hadoop can return Sarianne's results in double quick time.

Saul Sherry
Big Data in Use: OpenSignal's Telecommunications Solution

3|18|13   |   1:46   |   (3) comments


In the first of a series of interviews with business leaders who leverage big data, we talk to James Robinson, CTO and co-founder of OpenSignal.

OpenSignal combines big data technologies and sensor data from mobile phones to give insight to both mobile consumers and telecommunications giants. Robinson is also a contributing writer on Big Data Republic.

Saul Sherry
Big Data Explained: What Is Hadoop?

Part of 9   |  
See complete series
3|5|13   |   1:13   |   (9) comments


Hadoop is the open-source software framework that quickly became almost synonymous with big data. But what does it actually do?

Whereas traditional data queries were run on one server, Hadoop enables you to run data queries across a large number of machines. By spreading the computational load across many servers, Hadoop enables you to deal with big data in a timely fashion.

An example:

Tobias runs an online DVD store -- and he wants to increase sales by recommending products to customers as they check out. But he doesn't just want to recommend bestsellers, he wants a smart system that recommends based on the buyer's demographics and taste.

That's where Hadoop helps out. For each customer, Hadoop enables Tobias to spot patterns across all of his customers' data, based on age, sex, genre preference, actor preference, period of production, and many other defining elements. He can access this information quickly, because different elements of the search can be carried out individually and simultaneously, instead of having to take place on a single machine.

Using MapReduce (as discussed in a previous video), these queries are then returned in a way that can guide the customer and increase Tobias's revenue.

Saul Sherry
Big Data Explained: What Is MapReduce?

Part of 9   |  
See complete series
2|26|13   |   1:16   |   (7) comments


I want to tackle Hadoop, but before we get there, we're going to need to explore MapReduce. MapReduce is a programming model for processing large datasets, and the clue to its function is in its name.

When you want to pull certain information from your datasets, it "maps" out the relevant information for your query.

Then it "reduces" the information down, sorts it based on any rules you've applied, and gives you just the data you were after.

An example:

Virginia is a medical researcher looking to carry out research on diabetes patients. For the purposes of her study, she wants to see any geographical concentrations of diabetes patients who are male, between the ages of 40 and 50, and who smoke.

The map in the MapReduce model finds the data sets which fit Virginia's needs.

Then begins the reduce function -- aggregating geographical data of these records and providing an ordered list of cities with the highest population of the defined type. This simple process has allowed Virginia to identify areas of concentration for further study.

MapReduce itself is pretty straightforward, but once we start ramping up the amount and types of data used we will need Hadoop's help -- which is where things get a bit more complex.

Saul Sherry
Big Data Explained: What Is Volume?

Part of 9   |  
See complete series
11|28|12   |   1:44   |   (23) comments


Today we're going to take a look at the V that makes big data big: Volume.

It's no secret we're inundated with data these days, from mobile devices, machines, social media, transactions, satellites… pretty much everything is throwing data out. And technology has reached a point that allows us to capture and keep everything, too.

Why would we bother?

Because controlling such a vast quantity of data can reveal information and patterns about the people and objects that we otherwise can't see.

An example:

John runs a tradeshow and wants to make it a really unique and repeatable experience for all his attendees.

Tim is an attendee at the show, and has been for five years. John's company has been tracking his every data point with the show for that whole time – from his online activity before the show, checking in to his hotel, scanning his ticket as he enters the show, the stands and sessions he has attended in previous years, even down to what he has had for his lunch.

Keeping hold of all of this data on Tim means John can present him with a really personalized experience – with a dedicated map and timetable guiding Tim to the content he has a history of making a beeline for, and even getting him a voucher for his favorite vegetarian lunch!

That's a lot of data, but what makes this really big data is that John's company has been collecting this information from every one of the attendees at every one of its shows – allowing it to offer this personalized and highly valued experience to everyone.

With good management of the volume of data, big data allows organizations to grow and experiment based on previous encounters.