A review of research on MapReduce scheduling algorithms in Hadoop | IEEE Conference Publication | IEEE Xplore

A review of research on MapReduce scheduling algorithms in Hadoop


Abstract:

Big data has created an era of tera where bulk volume of data is being collected at escalating rates. Due to increase in storage capacities, processing power and availabi...Show More

Abstract:

Big data has created an era of tera where bulk volume of data is being collected at escalating rates. Due to increase in storage capacities, processing power and availability of data, the size of global data is growing in zeta-bytes. Hadoop is one of the technologies in the big data landscape for analyzing the data through Hadoop Distributed File System and Map-Reduce. Job scheduling is an important activity for efficient management of cluster resources. Hadoop schedulers are pluggable components which assign resources to jobs. In a variety of schedulers, prominent are the default FIFO, Fair and Capacity schedulers. In this paper, a comprehensive survey of the various job scheduling algorithms has been performed. Also their comparative parametric analysis has been carried out by emphasizing the common key points in these schedulers.
Date of Conference: 15-16 May 2015
Date Added to IEEE Xplore: 06 July 2015
ISBN Information:
Conference Location: Greater Noida, India
References is not available for this document.

I. Introduction

Big data [1] refers to a massive collection of large amount of data whose processing depends upon open-source frameworks like Hadoop and MapReduce. It cannot be processed using traditional data-processing tools like relational databases and Structured Query Language. Specifically Big Data refers to the creation, storage, retrieval and analysis of data in terms of five V's viz. volume, velocity, variety, veracity and value. According to a report [2], Facebook processes more than 500TB of data daily. Many other similar reports on big data statistics [3] throw light over the challenges of big data.

Select All
1.
[online] Available: http://en.wikipedia.org/wiki/Big_data.
2.
[online] Available: http://www.cnet.com/news/facebook-processes-more-than-500-tb-of-data-daily/.
3.
[online] Available: http://wikibon.org/blog/big-data-statistics/.
4.
[online] Available: http://en.wikipedia.org/wiki/Apache_Hadoops.
5.
[online] Available: http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html.
6.
Jeffrey Dean and Sanjay Ghemawat, "MapReduce: simplified data processing on large clusters", Commun. ACM, vol. 51, no. 1, pp. 107-113, 2008.
7.
[online] Available: http://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html.
8.
[online] Available: http://hadoop.apache.org/docs/r1.2.1/capacity_scheduler.html.
9.
Matei Zaharia et al., "Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling on large clusters", 5th European conference on computer systems (EuroSys), pp. 265-278, 2010.
10.
Thomas Sandholm and Kevin Lai, "Dynamic proportional share scheduling in Hadoop", Job Scheduling Strategies for Parallel Processing (JSSPP), vol. 6253, pp. 110-131, 2010.
11.
[online] Available: http://wiki.apache.org/hadoop/JobTracker.
12.
Mohammad Hammoud and Majd F. Sakr, "Locality-aware reduce task scheduling for MapReduce", 3rd Int. Conf. on Cloud Computing Technology and Science (CloudCom) IEEE, pp. 570-576, 2011.
13.
[online] Available: http://wiki.apache.org/hadoop/TaskTracker.
14.
Hammoud Mohammad, M. Suhail Rehman and Majd F. Sakr, "Center-of-Gravity reduce task scheduling to lower mapreduce network traffic", 5th Int. Conf. on Cloud Computing (CLOUD) IEEE, pp. 49-58, 2012.
15.
Kumar K Arun, Vamshi Krishna Konishetty, Kaladhar Voruganti and G V Prabhakara Rao, "CASH: Context Aware Scheduler for Hadoop", Int. Conf. on Advances in Computing Communications and Informatics(ICACCI12), 2012.
16.
Xiaohong Zhang, Zhiyong Zhong, Shengzhong Feng, Bibo Tul and Jianping Fan, "Improving data locality of MapReduce by scheduling in homogeneous computing environments", 9th Int. Symp. on Parallel and Distributed Processing with Applications, pp. 120-126, 2011.
17.
Xiaohong Zhang and Yang Ding, "A distribution aware scheduling method in MapReduce", IEEE Symp. on Electrical Electronics Engineering (EEESYM), pp. 128-131, 2012.
18.
Yanrong Zhao et al., "TDWS: a job scheduling algorithm based on MapReduce", 7th Int. Conf. on Networking Architecture and Storage (NAS), pp. 313-319, 2012.
19.
Zaharia Matei, Konwinski Andy, Anthony D. Joseph, Randy Katz and Ion Stoica, "Improving MapReduce performance in heterogeneous environments", 8th USENIX Symp. on Operating Systems Design and Implementation, pp. 29-42, 2008.
20.
Quan Chen, Daqiang Zhang, Minyi Guo, Qianni Deng and Song Guo, "SAMR: A Self-Adaptive MapReduce scheduling algorithm in heterogeneous environment", 10th Int. Conf. on Computer and Information Technology (CIT), pp. 2736-2743, 2010.
21.
Quan Chen et al., "HAT: history-based auto-tuning MapReduce in heterogeneous environments" in The Journal of Supercomputing, Springer, pp. 1038-1054, 2011.
22.
Xiaoyu Sun, "An Enhanced Self-Adaptive MapReduce scheduling algorithm", 2012.
23.
Mark Yong, Nitin Garegrat and Shiwali Mohan, "Towards a resource aware scheduler in Hadoop", Proc. ICWS, pp. 102-109, 2009.
24.
Radheshyam Nanduri, Nitesh Maheshwari, Reddy Raja and Vasudeva Varma, "Job aware scheduling algorithm for MapReduce Framework", 3rd Int. Conf. on Cloud Computing Technology and Science, pp. 724-729, 2011.
25.
Hong Mao, Shengqiu Hu, Zhenzhong Zhang, Limin Xiao and Li Ruan, "A load-driven task scheduler with adaptive DSC for MapReduce", Int. Conf. on Green Computing and Communications (GreenCom) IEEE/ACM, pp. 28-33, 2011.
26.
Yi Yao, Jianzhe Tai, Bo Sheng and Mi Ningfang, "LsPS: a job size-based scheduler for efficient task assignments in Hadoop", IEEE Trans. Cloud Computing, pp. 1-14, 2013.
27.
Zhe Wang, Zhengdong Zhu, Pengfei Zheng, Qiang Liu and Xiaoshe Dong, "A new scheduler strategy for heterogenous workload-aware in Hadoop", 8th Annual ChinaGrid Conference (ChinaGrid), pp. 80-85, 2013.
28.
Jorda Polo et al., "Performance-driven task co-scheduling for mapreduce environments", Network Operations and Management Symposium (NOMS), pp. 373-380, 2010.
29.
Jorda Polo et al., "Performance management of accelerated MapReduce workloads in heterogeneous clusters", 39th Int. Conf. on Parallel Processing (ICPP), 2010.
30.
Qinghui Tang, Sandeep Kumar, S. Gupta and Georgios Varsamopoulos, "Energy-efficient thermal-aware task scheduling for homogeneous High-Performance Computing data centers:a cyber-physical approach", IEEE Trans. Parallel and Distributed Systems, vol. 19, no. 11, pp. 1458-1472, November 2008.
Contact IEEE to Subscribe

References

References is not available for this document.