محصولات

selection_task_results

A Comparison of Approaches to Large-Scale Data Analysis

رایگان!

There is currently considerable enthusiasm around the MapReduce(MR) paradigm for large-scale data analysis [17]. Although the basic control flow of this framework has existed in  parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model [8, 17].

توضیحات محصول

ABSTRACT
There is currently considerable enthusiasm around the MapReduce(MR) paradigm for large-scale data analysis [17]. Although the basic control flow of this framework has existed in  parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model [8, 17]. In this paper, we describe and compare both paradigms. Furthermore, we evaluate both kinds of systems in terms of performance and development complexity. To this end, we define a benchmark consisting of a collection of tasks that we have run on an open source version of MR as well as on two parallel DBMSs. For each task, we measure each system’s performance for various degrees of parallelism on a cluster of 100 nodes. Our results reveal some interesting
trade-offs. Although the process to load data into and tune the execution of parallel DBMSs took much longer than the MR system, the observed performance of these DBMSs was strikingly better. We speculate about the causes of the dramatic performance difference and consider implementation concepts that future systems should take from both kinds of architectures

INTRODUCTION
Recently the trade press has been filled with news of the revolution of “cluster computing”. This paradigm entails harnessing large numbers of (low-end) processors working in parallel to solve a computing problem. In effect, this suggests constructing a data center by lining up a large number of low-end servers instead of deploying a smaller set of high-end servers. With this rise of interest in clusters has come a proliferation of tools for programming
them. One of the earliest and best known such tools in MapReduce

Year : 2009

By : Andrew Pavlo , Erik Paulson , Alexander Rasin

File Information : English Language/14 Page /Size : 215 K

Download : click

سال : 2009

کاری از :Andrew Pavlo , Erik Paulson , Alexander Rasin

اطلاعات فایل : زبان انگلیسی /14 صفحه / حجم : 215 K

 لینک دانلود : روی همین لینک کلیک کنید

دیدگاه‌ها

هیچ دیدگاهی برای این محصول نوشته نشده است.

Be the first to review “A Comparison of Approaches to Large-Scale Data Analysis”