So many problems are the result of the limitations of the human mind, despite how amazing our brains actually are. A great example would be one of the biggest challenges in a rapidly globalizing, connected world: the need to manage the flow, prioritization, and buffering across the Internet and other networks.
In all of our hand-coded brilliance (and I say that seriously), we created the system of Transmission Control Protocol (TCP) as the way to manage volume and speed of data flows. While TCP has made the Internet possible to date, it is showing signs of wear and tear.
We know we need to do better than what we do today to manage the Internet's very quickly approaching big data problems. Hand coded protocols are inherently resource wasting, and as it turns out, much slower than what a machine can learn to do with this problem. In response, MIT recently announced Remy:
Remy is a computer program that figures out how computers can best cooperate to share a network.
Remy creates end-to-end congestion-control algorithms that plug into the Transmission Control Protocol (TCP). These computer-generated algorithms can achieve higher performance and greater fairness than the most sophisticated human-designed schemes.
This comes as no surprise to big data experts who see machine learning as the answer to rapidly rising volume, velocity, and variety of big data. In a recent post, Machine Learning: Our Life Ring in the Sea of Big Data, TIBCO CTO Matt Quinn talked about three powerful drivers for machine learning:
A surge of data being liberated from places where it was previously hidden (a.k.a. big data's volume challenge)
A need for automation that manages the complexity of Big Data in an environment where humans have no time to intervene (a.k.a. big data's velocity challenge)
An absolute requirement to create adaptable, less fragile systems that can manage the combination of structured and unstructured data without having a human write complex code and rules with each change (a.k.a. big data's variety challenge)
Remy helps to solve all three of these problems with its machine learning algorithms. Surges of data create the need for complicated buffering schemes, as computers need to hold enough fast-moving data to create complete files that can be viewed, processed, or sent. Knowing in advance how to size buffers is tricky business and one that Remy can perform far more efficiently. Rather than make assumptions, Remy has a perfect memory and a predictive capability that allow it to size buffers on the fly in the most efficient manner possible so that the most restrictive point on a network, the end points where storage and processing is done, can run at maximum speed.
Moving away from instability
The variety inherent in big data is also one of Remy's strengths. There are so many sources and destinations for data today as mobility and the Internet of Things drive enormous variability. All of this variability introduces significant instability into the system as connections start and stop constantly and manage unpredictable data circumstances. Here's where the real difference lies: Remy doesn't need to have perfect knowledge at the outset, as a TCP protocol does. Provided with the goals of the system, machine learning systems like Remy move us beyond initial assumptions and create their own algorithms for managing a network's end points that achieve those goals... algorithms that humans don't need to understand, and quite possibly, couldn't understand.
Remy represents what will be seen as an early example of how our best designs can be improved upon by increasingly smart algorithms that allow machines to learn what humans desire (in this case, faster and more efficient networks). There will be those who see this is a step toward the Terminator's SkyNet, but that's too simplistic.
Critical need to govern data
Alongside the need to manage big data's "big three" challenges of volume, velocity, and variety, is the need to govern these systems so that the human perspective is ever-present. Sometimes lost in the shuffle (and awe... and hyperbole) of big data is the need for organizations to understand the sources, storage, and destinations of data so that what's being pulled, managed, and sent has the right level of oversight and control.
This goes well beyond the need to be prevent the "SkyNet Syndrome" and involves the need to have great information security, protections against invasion of privacy and data "creepiness," protection of personal data, and a way to know data's level of accuracy and freshness (recency but also durability).
As this example shows, machine learning is the answer to the big data challenges we're seeing across every industry and increasingly, in every corner of the planet. If you think about it, machines have given humans the ability to overcome hunger, transportation, connectivity, and every other challenge we've encountered as our world developed. It should come as no surprise that machine learning is the solution to big data as well.
Re: Machine Learning Saves the Internet From Itself We've already seen some of this PR spin when assigning blame to systems for problems such as network and cloud services outages. It usually goes something like this: "We had a backup system that failed at the same time as the primary system did for completely different reasons, and when we tried to restore it the system only saw empty files. It was a million to one shot, and it shouldn't happen again." The "human" element in that case is that they throw an unnamed network engineer under the bus for a faulty design.
That said, as much as we've talked about computers replacing people for decades, history shows that it's more a case of computers supplementing people. In the machine learning cases that we have discussed in this series this month, the common thread seems to be that what the computer does isn't that different than what its human educators did, it just does the same things faster than we can.
User Rank: Bit Player 7/31/2013 | 10:44:31 AM
Re: Machine Learning Saves the Internet From Itself I am all for using machines and technology to make human life easier. But there comes a time when a human needs to sit down and analyze something with his or her brain, and not the brain of a machine sitting on a desk or in a closet. Even as we automate (seemingly everything) we still need human interaction with nealry every step of the process to ensure the automation is occuring how we want it. It is a necessary checks and balances system that needs to be in place to ensure we are getting the analysis we want out of our automation from our machines. I totally agree, humans cannot sit and sift through "a quadrillion gigabytes" of data, so we need machine automation to do that. But the humans should be overseeing that machine automation to make sure we get the outputs and analysis we need.
User Rank: Bit Player 7/31/2013 | 10:39:10 AM
Re: TCP, yeah you know me! I was thinking the same thing after reading that article. The fact that we have improvements on the way for our networking foundation is very exciting - a little scary, but mostly exciting.
Sounds like Remy can be a marketable improvement over our current and beloved TCP, which does have its flaws and its faults, but we have always looked past them (mostly because we had no choice and because it always seemed to work, eventually).
Remy seems like it can be faster and more reliable, which in an information age and in talks about data, faster and more reliable are buzzwords that will have everyone excited. As long as we include security (confidentiality and intergrity) into reliability - I am all for it.
User Rank: Exabyte Executive 7/31/2013 | 9:06:13 AM
Re: TCP, yeah you know me! I don't think we'll lose that much slack Saul. Immediate forecasts will continue to be a game of chance, but we'll be much better at analyzing longer term trends. Tsunami's are a seismic event, and maybe we can leverage Big Data to better predict what's going on under the earth as well as on top of it.
Re: TCP, yeah you know me! One hypothetical concern @legalcio. If we chase this much efficiency and rely on weather forecasts, is there are chance we will lose the slack we rely on when weather occasionally gets it wrong (or worse, an unannounced tsanami, tropical storm appears)?
Re: Machine Learning Saves the Internet From Itself For sure @AlphaEdge, there's the trouble there's a potential PR get out clause that organisations can use, pointing at 'Machine Learning' (when addressing an audience uneducated in ML) and say 'the machine's did it'.
That's OUR skynet, not machines taking over - but us apportioning blame to them.
User Rank: Exabyte Executive 7/30/2013 | 8:58:23 AM
Re: Machine Learning Saves the Internet From Itself @Saul, are you referring to that decision makers have to ask the machine learning algorithm to generate output, but final decision has to be made by themselves?
Re: Machine Learning Saves the Internet From Itself Is this similar to the drone-based "I didn't shoot them, this button did" argument? Sure machine learning will do the dredging, but the goals will set by us meat sacks. And any insight which needs to be "acted on" will too.