Slowly, ever so slowly, medical data will find itself freer to be exploited in the pursuit of efficient healthcare.
We see this progress everywhere. My parents' generation would have been aghast at having any fewer than three television channels, and my kids (when I get round to having them) will most likely be aghast at even the concept of a ...
Robert Plant, Associate Professor, School of Business Administration, University of Miami, 11/20/2013 Comment now
In 2010, the United States government enacted the Patient Protection and Affordable Care act, intended to combine the best of public and private insurance coverage for the population, and thereby controlling and reducing healthcare costs. The act, better known as "Obamacare," has not been without controversy, and almost brought the country to default on ...
Robert Plant, Associate Professor, School of Business Administration, University of Miami, 11/6/2013 Comment now
Big data has thus far been mainly connected to consumer-orientated domains, but its role in professional domains has been growing.
In Medicine, big data has been utilized to create a persuasive case for evidence-based healthcare; treatment of chronic diseases and health management in conjunction with data from pharmaceutical companies; wellness ...
Roderick Morris, Senior Vice President of Marketing & Operations, Opower, 10/24/2013 Comment now
The average email user receives 75 emails per day. What if you started receiving 3,000 times as much? It's an intimidating, likely even terrifying, thought. But there is also almost certainly important new information within that deluge of emails. So, how would you go about pinpointing and consuming the most important insights?
IBM Watson strikes again: This time, IBM has paired its "cognitive computing system" with MD Anderson Cancer Center, based in Houston. Watson will aid that institution's "moon shots" initiative to find cures for eight kinds of cancer while also helping Anderson's clinicians improve their treatment of cancer patients. Initially, Watson will focus on ...
One of the factors keeping doctors from getting a complete picture of a patient's health condition is lack of patient cooperation. Patients are often advised by doctors to regularly record measurements such as glucose and blood pressure levels and chart their home and work readings, but compliance is really low.
According to Dr. Eric Topol, in his ...
After a rather aggressive pilot program last year, the US government seems poised to perform large-scale, random HIPAA audits. Needless to say, this can be a frightening prospect for healthcare IT, but being audited is not the end of the world. The key to a successful audit is to have a strategy in place before the audit occurs.
What's at ...
Robert Plant, Associate Professor, School of Business Administration, University of Miami, 9/18/2013 Comment now
For many organizations the gap between a conceptual understanding of big data to the possession of a capability for execution is limited by their desire to embrace the challenge of managing the process.
In part this may be driven by the internal limitations of their organizational structures, their existing technology functions, and their desires to ...
In an ideal world, useful and pragmatic data governance would be born from the need to perform great analytics. It's not. It's pretty much down to heavy regulation -- but there may well be a change in that influencing trend.
Insurance driving innovation
In a recent telephone chat with Eric Carpenter, director of enterprise applications at RAND ...
Despite all the hoopla about clinical and business intelligence (C&BI) applications, the use of these tools by hospitals and healthcare systems is still in an early phase, a new report indicates.
Only 46 percent of the 529 respondents to a HIMSS Analytics survey said they were using C&BI, and the majority of those indicated they were still learning ...
Supervised Learning's main difference to its unsupervised counterpart is the presence of a "training" set of data used to prepare an algorithm before it is unleashed on a "live" data set.
Let's explore this with a simple example
Remember Michael, the medical researcher?
He now needs a set of patients to test a new treatment on. To find the right patients for the test, he uses a smaller training set of data (in which he has already identified the correct patients for his test) and creates an algorithm that can pick these candidates out based on the data held on them -- a combination of age, weight, addictions, previous ailments, genetic dispositions, and existing medication needs.
This is possible because the data set already contains all of these classifications. If it didn't, he'd need to use unsupervised learning to identify the inherent classifications in the data.
Once he is happy with the results returned from the training data set, he can unleash it on the wider set of patients, as well as applying it to the data from any patients to be captured in the future -- and go about trying to improve their lives.
Hive allows users to take advantage of Hadoop using a language similar to SQL, something most relational database developers have in their toolkit.
Let's examine how it helps with an example
Michael is a medical researcher who has had experience running relational data-bases, but knows that real insight could be found by accessing more data. After years of lobbying, he's managed to create a project which combines data from a variety of hospitals.
The upside is he has much more data to experiment with, the downside is that to get results quickly, he's having to use Hadoop, something he is unfamiliar with.
However, by leveraging Hive, he can write instructions in Hive Query Language (HQL), which isn't a huge leap from the SQL he knows so well. That means less time studying up on his language, and more time looking for correlations that can help patients recover quicker.
At last week's Big Data Show we were lucky enough to speak to Lauren Walker, Sales Leader at IBM Big Data Solutions, who gave us a great message from her real-time analytics talk: Babies, Brains, and Buses.
This case study focused on the big data's ability to help the survival rate of premature babies by combining machine information and human content in real time.
I want to tackle Hadoop, but before we get there, we're going to need to explore MapReduce. MapReduce is a programming model for processing large datasets, and the clue to its function is in its name.
When you want to pull certain information from your datasets, it "maps" out the relevant information for your query.
Then it "reduces" the information down, sorts it based on any rules you've applied, and gives you just the data you were after.
Virginia is a medical researcher looking to carry out research on diabetes patients. For the purposes of her study, she wants to see any geographical concentrations of diabetes patients who are male, between the ages of 40 and 50, and who smoke.
The map in the MapReduce model finds the data sets which fit Virginia's needs.
Then begins the reduce function -- aggregating geographical data of these records and providing an ordered list of cities with the highest population of the defined type. This simple process has allowed Virginia to identify areas of concentration for further study.
MapReduce itself is pretty straightforward, but once we start ramping up the amount and types of data used we will need Hadoop's help -- which is where things get a bit more complex.