Conferences >2016 IEEE International Confe...

Research and implementation of big data preprocessing system based on Hadoop

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

With the rising growth trend of data size in the Internet era, storage, analysis, and processing of big data arebecomingamong the strongtopics in academia and industry. T...Show More

Metadata

Abstract:

With the rising growth trend of data size in the Internet era, storage, analysis, and processing of big data arebecomingamong the strongtopics in academia and industry. Typical big data processing platforms adopt the MapReduce programming model to perform application processing. For example, the deployment and calculation method of Hadoop are as follows: Hadoop first collects data and stores them in distributed storage systems, which are storage nodes in clusters. Then, the compute nodes read data from the storage nodes and perform map operations. Lastly, the compute nodes communicate with each other and obtain computation results by performing reduction operations. In the process of collecting and storing data, the storage nodes mainly perform IO operations; hence, the computing resources of these nodes are not fully utilized. This paper proposes a big data preprocessing system based on Hadoop platforms. The main idea of this system is that the data collection and storage phase starts computation operations earlier by utilizing idle computing resources on the basis that IO performance is not affected. This idea can reduce the data size of disk transfer and network communication, and the runtime of applications. Experiments conducted with WordCount, a typical big data processing application, indicate that the system can improve the performance of Hadoop applications.

Published in: 2016 IEEE International Conference on Big Data Analysis (ICBDA)

Date of Conference: 12-14 March 2016

Date Added to IEEE Xplore: 14 July 2016

ISBN Information:

DOI: 10.1109/ICBDA.2016.7509802

Conference Location: Hangzhou, China

Contents

References is not available for this document.

MIT Libraries

MIT Libraries

Research and implementation of big data preprocessing system based on Hadoop

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Research and implementation of big data preprocessing system based on Hadoop

Alerts

Abstract:

Metadata

Abstract:

References