محصولات

خانه مقالات-Article مقالات کامپیوتر-Computer Articles هادوپ-Hadoop Overcoming Hadoop Scaling Limitations through Distributed Task Execution
matrix_architecture

Overcoming Hadoop Scaling Limitations through Distributed Task Execution

ادامه/دانلودرایگان!

Data driven programming models like MapReduce have gained the popularity in large-scale data processing. Although great efforts through the Hadoop implementation and framework decoupling (e.g. YARN, Mesos) have allowed Hadoop to scale to tens of thousands of commodity cluster processors, the centralized designs of the resource manager, task scheduler and metadata management of HDFS file system adversely affect Hadoop’s scalability to tomorrow’s extreme-scale data centers.

توضیحات محصول

ABSTRACT

Data driven programming models like MapReduce have gained the popularity in large-scale data processing. Although great efforts through the Hadoop implementation and framework decoupling (e.g. YARN, Mesos) have allowed Hadoop to scale to tens of thousands of commodity cluster processors, the centralized designs of the resource manager, task scheduler and metadata management of HDFS file system adversely affect Hadoop’s scalability to tomorrow’s extreme-scale data centers. This paper aims to address the YARN scaling issues through a distributed task execution framework, MATRIX, which was originally designed to schedule the executions of data-intensive scientific applications of many-task computing on supercomputers. We propose to leverage the distributed design wisdoms of MATRIX to schedule arbitrary data processing applications in cloud. We compare MATRIX with YARN in processing typical Hadoop workloads, such as WordCount, TeraSort, Grep and RandomWriter, and the Ligand application in Bioinformatics on the Amazon Cloud. Experimental results show that MATRIX outperforms YARN by 1.27X for the typical workloads, and by 2.04X for the real application. We also run and simulate MATRIX with fine-grained sub-second workloads. With the simulation results giving the efficiency of 86.8% at 64K cores for the 150ms workload, we show that MATRIX has the potential to enable Hadoop to scale to extreme-scale data centers for fine-grained workloads

INTRODUCTION
Applications in the Cloud domain (e.g. Yahoo! weather [1], Google Search Index [2], Amazon Online Streaming [3], and Facebook Photo Gallery [4]) are evolving to be data-intensive that process large volumes of data for interactive tasks. This trend has led to the programming paradigm shifting from the compute-centric to the data driven. Data driven programming models [5], in the most cases, decompose applications to embarrassingly parallel tasks that are structured as Direct Acyclic Graph (DAG) [6]. In an application DAG, the vertices are the discrete tasks, and the edges represent the data flows from one task to another

Publisher:IEEE

Year:2015

By:Ke Wang, Ning Liu, Iman Sadooghi, Xi Yang, Xiaobing Zhou,Tonglin Li, Michael Lang, Xian-He Sun, Ioan Raicu

File Information:English Language/10 Page/Size:928 K

Download:click

ناشر:IEEE

سال :2015

کاری از:Ke Wang, Ning Liu, Iman Sadooghi, Xi Yang, Xiaobing Zhou,Tonglin Li, Michael Lang, Xian-He Sun, Ioan Raicu

اطلاعات فایل:زبان انگلیسی/10 صفحه/حجم:928 K

لینک دانلود :روی همین لینک کلیک کنید

دیدگاه‌ها

هیچ دیدگاهی برای این محصول نوشته نشده است.

Be the first to review “Overcoming Hadoop Scaling Limitations through Distributed Task Execution”