Qubole offers more features than EMR, which make it an interesting offer. However, for regular use, cost becomes an important metric, and it is tied to the efficient use of resources. This last post highlights what Qubole's pricing and scaling ability mean for its customers compared to EMR.
Qubole and EMR costs
A Quoble cluster starts twice as fast, reads and writes faster from S3, and offers advanced web interface for data exploration, job design, and scheduling. However, Qubole costs more than EMR on an hourly basis (we ignore the compute hours included in its package fees for the moment to make the comparison simpler). In approximation, you can expect to pay a premium of $0.15/hour for Qubole and $0.12 for EMR for a m1.xlarge compute instance. Different instance types' prices scale according to their relative computing power to a m1.xlarge instance.
Surprisingly, the additional $0.03/hour could quickly turn out to be a saving at the end of a day. Qubole's cluster setup is smarter than EMR's. Qubole detects when a cluster is not needed and shuts it down for you after a timeout period. You save Qubole and your EC2 cost in quiet periods -- e.g. on holidays, at night, or between scheduled jobs. Furthermore, you don't have to do anything to raise a new cluster. For your Hadoop jobs and Hive queries, the cluster appears to be always on. If it is needed, Qubole creates a new one ad-hoc.
Qubole doesn't stop there. It allows you to define the minimum and maximum size of your cluster and scales it according to load so you get the best of two worlds: performance and cost saving. Additionally, you can define if your cluster's auto-scaling should be done with spot-instances, on-demand instances, or a hybrid approach.
You can also define how much you are willing to pay for spot-instances, and in case of a hybrid approach, the percentage of spot instance to be added. These few choices allow you to deploy a powerful, efficiently scaling cluster with a high cost saving ability and guaranteed compute power to meet service level agreements. Qubole also makes sure that your HDFS remains intact while scaling, and attempts to utilize data locality and caching at the same time.
Qubole Hadoop cluster node settings
These scaling features can translate into major cost savings or speedups if you have spiking workloads that mean that at some times, the cluster is unused and wasting money, or, alternatively, under provisioned.
In summary, Qubole does truly provide Hadoop, and specifically, Hive-as-as-service. A service that is feature rich and promises cost savings and optimal resource utilization. It sets itself apart from EMR with intelligent cluster scaling; unique features like integrating MongoDB with Hive; and providing a web interface, which makes data exploration faster, easier, and cheaper. Its cost is justified considering the expense required to provide anything similar and elastic on EC2 or your own hardware.
Qubole does come with limited documentation, and it can be frustrating when you get stuck because of it. On the upside, my experience was very fast responses to support requests, commonly answered by the highly competent founders and engineers behind the service -- this is even more amazing considering that I was using the free trial period, which includes cluster time free of charge.
Qubole is well financed, and I expect it to be a company to watch over the next years. Its primary threat is probably not to be out-innovated by Amazon, since their level of ingenuity displayed so far seems hard to be matched by AWS. More likely is an interesting race between Qubole and its competitors in the space like Infochimps or Mortar. This competition will change the way we think of and use big data infrastructure, and it will enable companies to finally focus on extracting value from data instead of being preoccupied with and hindered by the complexity of technologies like Hadoop or Storm.
User Rank: Exabyte Executive 9/2/2013 | 4:09:08 AM
Re: Hadoop Any effort to reduce cost with the right strategy can challenge companies to adopt Hadoop. Evolving trends and big data growth can spur new returns of a company's products, solutions and technologies.
User Rank: Petabyte Pathfinder 8/21/2013 | 9:21:36 PM
@Christian nice article thanks for sharing it that was great!
Qubole may not the biggest Hadoop service player in the cloud, ( I think that title goes to Amazon EMR) Buta year after launching the startup appears to be doing great and growing fast, and now competing for the big market with cloud-leader Amazon.
Hadoop is getting much popular these days; the open source softwareis a top choice for most IT infrastructures. Maintaining a Hadoop systems is a daunting task and quite expensive, but adopting a managed Hadoop services could be a good idea, this mean that an IT dept. can now get the benefits of Hadoop without making a huge investment in those massive IT resources.
User Rank: Exabyte Executive 8/20/2013 | 11:29:48 AM
Hadoop There are also huge opportunities in re-writing client-server applications to run on cloud stacks, in building and managing a new class of smaller, cloud-compatible data centers for organizations to use and outperform.