Name: Hive – A Petabyte Scale Data Warehouse Using Hadoop
SKU: 769
Availability: InStock

Hive – A Petabyte Scale Data Warehouse Using Hadoop

توضیحات

ABSTRACT

The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. Hadoop [1] is a popular open-source map-reduce implementation which is being used in companies like Yahoo, Facebook etc. to store and process extremely large data sets on commodity hardware. However, the map-reduce programming model is very low level and requires developers to write custom programs which are hard to maintain and reuse. In this paper, we present Hive, an open-source data warehousing solution built on top of Hadoop. Hive supports queries expressed in a SQL-like declarative language – HiveQL, which are compiled into mapreduce jobs that are executed using Hadoop. In addition, HiveQL enables users to plug in custom map-reduce scripts into queries. The language includes a type system with support for tables containing primitive types, collections like arrays and maps, and nested compositions of the same. The underlying IO libraries can be extended to query data in custom formats. Hive also includes
a system catalog – Metastore – that contains schemas and statistics, which are useful in data exploration, query
optimization and query compilation. In Facebook, the Hive warehouse contains tens of thousands of tables and stores over 700TB of data and is being used extensively for both reporting and ad-hoc analyses by more than 200 users per month

INTRODUCTION
Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook – both
engineering and non-engineering. Apart from ad hoc analysis and business intelligence applications used by analysts across the company, a number of Facebook products are also based on analytics. These products range from simple reporting applications like Insights for the Facebook Ad Network, to more advanced kind such as Facebook’s Lexicon product [2]. As a result a flexible infrastructure that caters to the needs of these diverse applications and users and that also scales up in a cost effective manner with the ever increasing amounts of data being generated on Facebook, is critical. Hive and Hadoop are the technologies that we have used to address these requirements at Facebook

Publisher:IEEE

Year:2010

By: Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang, Suresh Antony, Hao Liu and Raghotham Murthy

File Information:English Language/10 Page/Size:399 K

Download:click

ناشر:IEEE

سال:2010

کاری از :Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang, Suresh Antony,Hao Liu and Raghotham Murthy

اطلاعات فایل:زبان انگلیسی/10 صفحه/حجم:399 K

لینک دانلود:روی همین لینک کلیک کنید

نقد و بررسی‌ها

هنوز بررسی‌ای ثبت نشده است.

اولین کسی باشید که دیدگاهی می نویسد “Hive – A Petabyte Scale Data Warehouse Using Hadoop”

برای فرستادن دیدگاه، باید وارد شده باشید.

Hive – A Petabyte Scale Data Warehouse Using Hadoop

توضیحات

نقد و بررسی‌ها

Load Balancing of Distributed Systems Based on Multiple Ant Colonies Optimization

بكارگيري تكنيكهاي داده كاوي در انتخاب محركهاي هزينه در هزينه يابي بر مبناي فعاليت

Energy Efficient D2D Communications: A Perspective of Mechanism Design

Big-Data Computing: Creating revolutionary breakthroughs in commerce, science, and society

درباره فروشگاه

Hive – A Petabyte Scale Data Warehouse Using Hadoop

توضیحات

نقد و بررسی‌ها

محصولات مرتبط

Load Balancing of Distributed Systems Based on Multiple Ant Colonies Optimization

بكارگيري تكنيكهاي داده كاوي در انتخاب محركهاي هزينه در هزينه يابي بر مبناي فعاليت

Energy Efficient D2D Communications: A Perspective of Mechanism Design

Big-Data Computing: Creating revolutionary breakthroughs in commerce, science, and society

درباره فروشگاه