توضیحات
ABSTRACT
Data driven programming models like MapReduce have gained the popularity in large-scale data processing. Although great efforts through the Hadoop implementation and framework decoupling (e.g. YARN, Mesos) have allowed Hadoop to scale to tens of thousands of commodity cluster processors, the centralized designs of the resource manager, task scheduler and metadata management of HDFS file system adversely affect Hadoop’s scalability to tomorrow’s extreme-scale data centers. This paper aims to address the YARN scaling issues through a distributed task execution framework, MATRIX, which was originally designed to schedule the executions of data-intensive scientific applications of many-task computing on supercomputers. We propose to leverage the distributed design wisdoms of MATRIX to schedule arbitrary data processing applications in cloud. We compare MATRIX with YARN in processing typical Hadoop workloads, such as WordCount, TeraSort, Grep and RandomWriter, and the Ligand application in Bioinformatics on the Amazon Cloud. Experimental results show that MATRIX outperforms YARN by 1.27X for the typical workloads, and by 2.04X for the real application. We also run and simulate MATRIX with fine-grained sub-second workloads. With the simulation results giving the efficiency of 86.8% at 64K cores for the 150ms workload, we show that MATRIX has the potential to enable Hadoop to scale to extreme-scale data centers for fine-grained workloads
INTRODUCTION
Applications in the Cloud domain (e.g. Yahoo! weather [1], Google Search Index [2], Amazon Online Streaming [3], and Facebook Photo Gallery [4]) are evolving to be data-intensive that process large volumes of data for interactive tasks. This trend has led to the programming paradigm shifting from the compute-centric to the data driven. Data driven programming models [5], in the most cases, decompose applications to embarrassingly parallel tasks that are structured as Direct Acyclic Graph (DAG) [6]. In an application DAG, the vertices are the discrete tasks, and the edges represent the data flows from one task to another
Publisher:IEEE
Year:2015
By:Ke Wang, Ning Liu, Iman Sadooghi, Xi Yang, Xiaobing Zhou,Tonglin Li, Michael Lang, Xian-He Sun, Ioan Raicu
File Information:English Language/10 Page/Size:928 K
Download:click
ناشر:IEEE
سال :2015
کاری از:Ke Wang, Ning Liu, Iman Sadooghi, Xi Yang, Xiaobing Zhou,Tonglin Li, Michael Lang, Xian-He Sun, Ioan Raicu
اطلاعات فایل:زبان انگلیسی/10 صفحه/حجم:928 K
لینک دانلود :روی همین لینک کلیک کنید
نقد و بررسیها
هنوز بررسیای ثبت نشده است.