Conferences >2018 International Conference...

Big Data Quality Assessment Model for Unstructured Data

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Big Data has gained an enormous momentum the past few years because of the tremendous volume of generated and processed Data from diverse application domains. Nowadays, i...Show More

Metadata

Abstract:

Big Data has gained an enormous momentum the past few years because of the tremendous volume of generated and processed Data from diverse application domains. Nowadays, it is estimated that 80% of all the generated data is unstructured. Evaluating the quality of Big data has been identified to be essential to guarantee data quality dimensions including for example completeness, and accuracy. Current initiatives for unstructured data quality evaluation are still under investigations. In this paper, we propose a quality evaluation model to handle quality of Unstructured Big Data (UBD). The later captures and discover first key properties of unstructured big data and its characteristics, provides some comprehensive mechanisms to sample, profile the UBD dataset and extract features and characteristics from heterogeneous data types in different formats. A Data Quality repository manage relationships between Data quality dimensions, quality Metrics, features extraction methods, mining methodologies, data types and data domains. An analysis of the samples provides a data profile of UBD. This profile is extended to a quality profile that contains the quality mapping with selected features for quality assessment. We developed an UBD quality assessment model that handles all the processes from the UBD profiling exploration to the Quality report. The model provides an initial blueprint for quality estimation of unstructured Big data. It also, states a set of quality characteristics and indicators that can be used to outline an initial data quality schema of UBD.

Published in: 2018 International Conference on Innovations in Information Technology (IIT)

Date of Conference: 18-19 November 2018

Date Added to IEEE Xplore: 10 January 2019

ISBN Information:

Print on Demand(PoD) ISSN: 2325-5498

DOI: 10.1109/INNOVATIONS.2018.8605945

Conference Location: Al Ain, United Arab Emirates

References is not available for this document.

Contents

I. Introduction

Big data is commonly defined as the way we gather, store, manipulate, analyze and get insight from a fast-increasing heterogeneous data. Most of the new generated data is unstructured due to the increase of mobile and human's unlimited generated data from social medias that combine text, pictures, audio, video, in an unstructured way. Unstructured data is a fast-increasing phenomenon than all other types of data, industry analysts say. It will increase by as much as 800 percent during the next five years according to a survey conducted by [1]. This urge the need to automatically characterize and categorize such data. These classifications are strongly coupled with the semantic meaning of what the data represents. In many cases, the data comes in a format and a quality state in which it is impossible to process immediately as it is, and if so, the results cannot guarantee a valuable analysis and insights.

Select All

R. Arsenault, "The Benefits of Utilizing Unstructured Data", Aberdeen.

MIT Libraries

MIT Libraries

Big Data Quality Assessment Model for Unstructured Data

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?