Benchmarking Apache Hive 13 for Enterprise Hadoop

Have you seen the latest benchmarks from Apache Hive 13 for Hadoop from Hortonworks?

Introduced in 2008, Apache Hive has been the de-facto SQL solution in Hadoop. By 2012, SQL had become a key battleground for Hadoop and many vendors started to publish benchmarks showing massive performance advantages their solutions had over Hive. Each of these vendors predicted that Hive would eventually be supplanted by the proprietary solution they were pushing.

The concerns about Hive’s performance were real. Hadoop in 2012 was a purely batch platform and no work had ever been done within Hive to address low-latency or interactive workloads. The big question remained: was it possible to make Hive fast natively in Hadoop, or did people really need to abandon Hadoop and bolt on a foreign SQL engine strictly to satisfy the one use case of interactive query?

For Hortonworks the choice was obvious. The core of Hortonworks’ philosophy is 100% community led open source and solutions 100% in Hadoop, bolting a solution on the side for one use case creates major operational headaches and would have been a major disservice to our customers. At the same time, Hadoop needed to move beyond purely batch and into interactive and real-time use cases. The introduction of YARN in Hadoop 2 meant interactive query could be developed natively in Hadoop rather than as a bolt-on.

Hadoop Apache Hive 13 Benchmarks

You can read the entire article here.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s