توضیحات
Abstract
SQL query processing for analytics over Hadoop data has recently gained significant traction. Among many systems providing some SQL support over Hadoop, Hive is the first native Hadoop system that uses an underlying framework such as MapReduce or Tez to process SQL-like statements. Impala, on the other hand, represents
the new emerging class of SQL-on-Hadoop systems that exploit a shared-nothing parallel database architecture over Hadoop. Both systems optimize their data ingestion via columnar storage, and promote different file formats: ORC and Parquet. In this paper, we compare the performance of these two systems by conducting a set of cluster experiments using a TPC-H like benchmark and two TPC-DS inspired workloads. We also closely study the I/O efficiency of their columnar formats using a set of micro-benchmarks. Our results show that Impala is 3.3X to 4.4X faster than Hive on MapReduce and 2.1X to 2.8X than Hive on Tez for the overall TPC-H experiments. Impala is also 8.2X to 10X faster than Hive on MapReduce and about 4.3X faster than Hive on Tez for the TPC-DS inspired experiments. Through detailed analysis of experimental results, we identify the reasons for this performance gap
and examine the strengths and limitations of each system
Introduction
Enterprises are using Hadoop as a central data repository for all their data coming from various sources, including operational systems, social media and the web, sensors and smart devices, as well as their applications. Various Hadoop frameworks are used to manage and run deep analytics in order to gain actionable insights from
the data, including text analytics on unstructured text, log analysis over semi-structured data, as well as relational-like SQL processing over semi-structured and structured data
Year:2014
By:Avrilia Floratou,Umar Farooq Minhas,Fatma ¨Ozcan
File Information:English Language/12 Page/Size:208 K
Download:click
سال :2014
کاری از :Avrilia Floratou,Umar Farooq Minhas,Fatma ¨Ozcan
اطلاعات فایل:زبان انگلیسی/12 صفحه/ حجم:208 K
لینک دانلود:روی همین لینک کلیک کنید
نقد و بررسیها
هنوز بررسیای ثبت نشده است.