Zubin Dowlaty, Head of Innovation & Development, 3/3/2014 Comment now
Big data analytics is making big waves across all facets of industry, with adoption stories and use-cases reaching new zeniths. This pace can be attributed to the information explosion that is leading to unprecedented levels of focus on the ability to store, manage, and analyze data.
Daniel D. Gutierrez, Data Scientist, 1/23/2014 Comment now
Computational marketing is an emerging field that uses the big data technology stack to orchestrate complex marketing strategies. Computation is the means to give form and function to stores of information too large for any one person to analyze.
Dustin Eastman, Webscale Engineer, Rackspace, 12/20/2013 Comment now
Companies like Starbucks have made the collection and mining of big datasets child's play through both mobile apps and a process known as "gamification."
But the question is, just how do they entice people to give up that data? Through appealing to our own human nature, and competitive spirit.
Robert Plant, Associate Professor, School of Business Administration, University of Miami, 11/27/2013 Comment now
Customers have been historically viewed through several lenses. One is a "process" lens, which reflects their linear relationship with a firm. This relationship can be broken down into the sub stages underlying of the customer-firm relationship: customer acquisition; customer product selection choices; the sales transaction processes; and the customer ...
Ariella Brown, Technology Blogger, 11/14/2013 Comment now
Big data algorithms make it possible for sites like Amazon and Netflix to offer selections tailored to each person's taste. Now, they've been adapted to solve the problem of having to go through countless too loose and too tight options to find the bra that's just right.
Online lingerie sellers like True&Co and HerRoom use data to offer women results ...
John Edwards, Technology Journalist & Author, 11/12/2013 Comment now
Finance is emerging as an area where big data has the potential to exert transformational changes. Pivotal technologies like Hadoop, NoSQL, and Storm now enable investment analysts and their employers to scrutinize non-traditional data sets for the first time and in exciting ways.
With the help of big data tools, analysts and staff can work with both ...
Think of mobile big data and you instantly get an image of millions of sensors on phones and tablets creating innumerable data streams -- but what about those who analyze big data while mobile?
Simplicity is key to mobile big data analysis
Speaking to Catherine Gluckstein, President of SumAll, I decided to explore just how far an organization might ...
Roderick Morris, Senior Vice President of Marketing & Operations, Opower, 10/24/2013 Comment now
The average email user receives 75 emails per day. What if you started receiving 3,000 times as much? It's an intimidating, likely even terrifying, thought. But there is also almost certainly important new information within that deluge of emails. So, how would you go about pinpointing and consuming the most important insights?
Last month, ESPN rated the Scaramento Kings as the worst sporting franchise in the US. Yesterday I watched the Kings' president and chief operating officer stand up and declare that other teams will be following in their wake.
As part of Tibco's Tucon event in Las Vegas, Chris Granger was outlining his "Architecture of a Turn Around" plan for the ...
George Frey, VP Marketing, emarsys, 10/7/2013 Comment now
Analytics has been around in one form or another for more than a century, but there is a consensus in the industry that it has had a less immediate tangible impact on business. In a world where every large organization has analytics of some kind, where will we find the real competitive advantage?
Enterprise Apps Today recently discussed a Gartner report ...
We've put together this short video to help you show your team just what you mean by "data warehouse."
A data warehouse is a collection of data, specifically designed to let you run reports and analytics.
Tobias has brought in a data specialist to get his online movie retail business into the big time, but he's confused about what the specialist keeps referring to as a data warehouse. He's got his database set up. Isn't that his data warehouse?
Technically, no. A database is simply the way we store any kind of data, whereas a data warehouse is specifically set up to run reports and analyze. It's data, but it has specific, definable business value.
This will allow Tobias to tailor his storage solution. His data warehouse takes pride of place on faster, more expensive disks in more accessible forms. The rest of his data is collected for potential use or out of legal obligation on more standard and cheaper disks.
Data scraping is the act of getting hold of data which lacks inherent structure, usually because it wasn't meant to be collected in a way that would bring value to your business.
Let's explore this concept with an example.
Tobias is looking to expand his online DVD shop's selection of classic cult films. As part of his research to select stock, he wants to pull in a load of data from online review spaces like Amazon and Rotten Tomatoes.
All this data, from star ratings to personal opinion, was added for human consumption. There's no easy "export as XML" function on Amazon review lists. That is where data scraping comes in. In essence, it's about Tobias identifying the elements he needs and running a program to export them into a database of his own. Of course, he'll need to match up the ratings across stars, percentages, scores, etc., as well as figure out what kind of sentiment analysis he wants to carry out on written comments -- but that's up to him.
Now, this "meant for human eyes" data is computer readable, and Tobias can begin munging and rearranging that data until it suits his analytical needs.
Data quality can be measured and assessed in a number of ways.
Let's explore some main criteria with an example
Tobias has decided to create a data governance plan for his wildly successful online DVD shop. Before he does so, he'll need to decide how to define quality data.
There's validity, which is all about making sure new data conforms to the defined rules of your business. We all know how irritating it can be to get incomplete phone numbers, postcodes where names should be, or prices in the unique identifier fields.
There's consistency, which Tobias can use to make sure he has the right information on his customers. If one entry says a customer lives in Perth, Australia but a ZIP code in Beverly Hills appears elsewhere, an alarm bell should be ringing.
There's accuracy, which is incredibly hard for Tobias to achieve, because there isn't really a go-to, faultless version of his data out there. Having said that, if he had an external database of postal codes and their relevant geographic locations, he'd have a better chance of figuring out if that customer really lived in WA6000 or 90210.
There's completeness. In Tobias's case, a problem there would look like blank cells in his records. That's very difficult to fix, short of just making stuff up.
There are many more criteria for checking data quality. Tobias can put a data governance plan in to improve data input, but there are no easy criteria to use once bad data has been collected and stored.
Semi-supervised learning meets unsupervised learning half way by combining both labelled and unlabelled data in the learning process.
Here's a simple example of how it might work for a business
We're back with Tobias and his online DVD business. We've already seen him use unsupervised learning to split his mailing list into two groups so that each can receive messaging more likely to result in a sale. Now, he wants to define these groups to a higher degree of accuracy. Unsupervised learning allows him to do this by working with labelled data (which can be expensive or time consuming to come by) with unlabelled data (which is more readily available and generally cheaper) to train the algorithm.
Using his labelled data as an anchor point, semi-supervised learning can then spot where clusters of the unlabelled data points can fit around it. What this means is he essentially has access to a bigger training set of data... and more data means more accuracy. Adjusting his groupings, he will be able to more accurately split his data into clusters in the hope that those marketing newsletters will yield even more sales.
We're unearthing more insight from the Big Data Show earlier this year -- today featuring Amanda Kahlow, CEO & Founder at 6Sense Insights Inc.
Amanda's approach to big data veers away from the vendor trap of getting overly invested in the storage, processing, and querying steps. She's more keen for businesses to bring together their interactive channels (and maybe some third party data) to see who is going to buy, when, and how much.
Amanda's pragmatic message sets a great goal for businesses looking to leverage their big data -- understand your customers before they have to tell you who they are.
Unsupervised learning allows us to apply labels to data that was previously undefined.
Let's explore this with an example
The last time we saw Tobias, he was using machine learning on a suggestion engine to get more products sold based on which films his customers were viewing. He also uses machine learning -- more specifically, unsupervised learning -- in his marketing campaigns. In this simple example, he'll send two versions of his newsletter: one featuring full-price blockbuster Hollywood films and one featuring sale-price world cinema.
It would take him months to sort through his customer list manually and determine who should get which version. To divide his customer data quickly into two coherent sets, he can use unsupervised learning algorithms, which will define the two groups for him by sorting them into clusters. This will allow him to increase the chances of his newsletters finding a receptive audience.
Machine Learning is a form of artificial intelligence that can be used to automate a lot of big data processes.
Here's an example
By using machine learning technology, customers to Tobias's online movie store get a more personalized, evolving service. Based on pages and products viewed, a customer to the site is presented with potential films he or she might like to purchase. This is based on the machine learning engine spotting correlations in the data of customers with similar demographics who have viewed similar pages, and recommending potential purchases from their purchase history.
As this is an automated system that "learns," Tobias doesn't need to be constantly tweaking the algorithms, and the machine learning tool continues to learn, so when a new natural purchasing trend emerges among customers, it makes recommendations based on having recognized these new patterns.
Sentiment analysis is one way data miners can take the legwork out of understanding the meanings and feelings behind statements made in social media and other forums.
Let's take a look at how it works with an example.
Fatima works on the marketing team for a company developing casual games. It launched its flagship game, Disgruntled Dogs, seven days ago -- and Fatima wants to know how the game has been received.
The sales team can pull sales data, which is a good indicator of success, but analyzing conversations will give Fatima a more nuanced idea of what people do and don't like. Gathering together all mentions of her company, the game's title, and mentions of a few key characters and elements of the game results in a pool of over 1,000,000 mentions -- in the first week. This is a well talked-about game!
The technology Fatima has chosen is effective but relatively simple. It aims to "read" the words and the way they are used, based on a set of semantic rules, to determine the feelings expressed -- in this case via the polarities "Positive" or "Negative."
Oh dear, it seems people have an overly negative view of Disgruntled Dogs. Still, it's better that Fatima can find this out quickly, and it gives her company the chance to alter the way the game works or change messaging to make sure this negative sentiment doesn't result in poor sales.
Jamie Turner, CTO at Postcode Anywhere, caught up with us at The Big Data Show this spring to give us his take on big data for SMEs.
He makes some really interesting points. One that sticks with me: "Not using elaborate stuff like multinationals and governments" is probably an underdelivered message in this space. It's easy to salivate at the opportunities exploited by Walmart and Tesco, but having a clear understanding of your own business needs (and not Walmart's or Tesco's) will be the best way to reap rewards.
Sam Zindel, Data Strategist at iCrossing Digital Marketing, filled us in on how supermarket delivery giant Ocado uses big data to identify and serve specific content to vegetarians once they spot them on their website.
This was part of Sam's talk at The Big Data Show about putting the customer at the heart of digital marketing, as well as making the most of the data you already have.
On the opening day of the Big Data Show, Mike Cornwell, CEO of The IDM, was generous enough to give us some time to discuss his afternoon panel session. He also offered a word of caution on the state of marketing data. We're all getting excited about big data, but it seems most people still can't deal with their small data in the best possible manner.
"Just knowing enough to find some insight from information and using it intelligently for marketing still seems to be beyond a lot of organizations," he said.
Does this resonate with your business? Have you got the small data figured out before you invest time in de-siloing and bringing more information together?
Continuing our series of interviews with businesses leveraging big data, we talk to James Gill, CEO of GoSquared.
GoSquared offers real-time web analytics, using big data technologies to surface the analytical data that counts. Marketing managers and IT departments benefit from GoSquared's ability to pick out the most actionable insights as they happen.
In the first of a series of interviews with business leaders who leverage big data, we talk to James Robinson, CTO and co-founder of OpenSignal.
OpenSignal combines big data technologies and sensor data from mobile phones to give insight to both mobile consumers and telecommunications giants. Robinson is also a contributing writer on Big Data Republic.
Hadoop is the open-source software framework that quickly became almost synonymous with big data. But what does it actually do?
Whereas traditional data queries were run on one server, Hadoop enables you to run data queries across a large number of machines. By spreading the computational load across many servers, Hadoop enables you to deal with big data in a timely fashion.
Tobias runs an online DVD store -- and he wants to increase sales by recommending products to customers as they check out. But he doesn't just want to recommend bestsellers, he wants a smart system that recommends based on the buyer's demographics and taste.
That's where Hadoop helps out. For each customer, Hadoop enables Tobias to spot patterns across all of his customers' data, based on age, sex, genre preference, actor preference, period of production, and many other defining elements. He can access this information quickly, because different elements of the search can be carried out individually and simultaneously, instead of having to take place on a single machine.
Today we're going to take a look at the V that allows big data to be immediate and reactive: Velocity.
As well as having to master the sheer volume and variety of information within big data, organizations also have to be able to contend with the speed at which all of this data is generated. Real benefit can be gained by pouncing on this data in real-time -- affecting outcomes while they are still forming.
What kind of benefit?
Well, as we've already established, data can take many different forms. How working on this stream of real-time big data will benefit you will depend on your industry. For this example I'll focus on the financial services sector.
Andy is in charge of online security for a big bank, trying to make sure his customers' money is safe. When he can detect fraud after the event, it's fairly useless, but if he can spot it as it happens, it can be priceless. If a malicious computerized attack is started on Andy's bank, it will be generating thousands of events every second -- but Andy has put the right system in place to detect these events by comparing them to the way actual, normal customers behave. And it happens in real time, so alarms are going off to let him know.
Many fraudsters will access online banking and go directly to the transfer section of a Website without first checking balances and transactions. That clickstream is foreign and unfamiliar to the complex event processing engine and thus gets flagged.
In this way the bank can stamp down on the illegal activity as it happens, rather than chasing up after the event.
There's plenty of talk about big data's three V's: volume, velocity, and variety. But what exactly do these terms mean?
We're going to take a quick trip through one of these today: Variety.
This exciting concept within big data gives you the opportunity to gain insight by combining a variety of data sets that would not traditionally sit together. By enabling you to link up your traditional analytical data sets with many different types of information, a new world of analytical possibilities is opened.
So what's so exciting about this?
Well, it allows you to collate data sets that don't obviously relate to each other. Data experts can then analyse this collated data, to spot patterns or create new insights you would previously have been blind to. Variety, when tackled well in big data, allows you to see new revelations in the data your organization already produces.
An example: Judith is a brand manager, she loves her job and is very good at it, but knows she would benefit from being able to listen even more closely to the voice of her customer.
Taking traditional financial information, Judith can already see the performance of her brand. It doesn't take a data scientist to see which week did well, and which week did badly. But it won't tell her why.
Harnessing variety in data, Judith's data team can create relations between this data and what's being said on social media about her brand, as well as in text-input fields on customer satisfaction surveys. These disparate sets of data can be brought together, contextualized, and visualized in a way that gives Judith clues as to what her brand has done to influence customer behavior.
Suddenly, Judith now has the vision to generate hypotheses on ways to amplify positive results and mitigate negative trends.