Journals & Magazines >IEEE Transactions on Neural N... >Volume: 27 Issue: 5

Data Generators for Learning Systems Based on RBF Networks

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

There are plenty of problems where the data available is scarce and expensive. We propose a generator of semiartificial data with similar properties to the original data,...Show More

Metadata

Abstract:

There are plenty of problems where the data available is scarce and expensive. We propose a generator of semiartificial data with similar properties to the original data, which enables the development and testing of different data mining algorithms and the optimization of their parameters. The generated data allow large-scale experimentation and simulations without danger of overfitting. The proposed generator is based on radial basis function networks, which learn sets of Gaussian kernels. These Gaussian kernels can be used in a generative mode to generate new data from the same distributions. To assess the quality of the generated data, we evaluated the statistical properties of the generated data, structural similarity, and predictive similarity using supervised and unsupervised learning techniques. To determine usability of the proposed generator we conducted a large scale evaluation using 51 data sets. The results show a considerable similarity between the original and generated data and indicate that the method can be useful in several development and simulation scenarios. We analyze possible improvements in the classification performance by adding different amounts of the generated data to the training set, performance on high-dimensional data sets, and conditions when the proposed approach is successful.

Published in: IEEE Transactions on Neural Networks and Learning Systems ( Volume: 27, Issue: 5, May 2016)

Page(s): 926 - 938

Date of Publication: 22 May 2015

ISSN Information:

PubMed ID: 26011896

DOI: 10.1109/TNNLS.2015.2429711

Funding Agency:

Contents

I. Introduction

One of technological challenges faced by data analytics is an enormous amount of data. This challenge is well known and a term big data was coined with the purpose to bring attention to it and to develop s new solutions. However, in many important application areas, the excess of data is not a problem, and quite the opposite, there is actually not enough data available. There are several reasons for this, the data may be inherently scarce (rare diseases, faults in complex systems, and rare grammatical structures), difficult to obtain (due to proprietary systems, confidentiality of business contracts, and privacy of records), expensive (obtainable with expensive equipment, requiring significant investment of human or material resources), or the distribution of the events of interests is highly imbalanced (fraud detection, outlier detection, and distributions with long tails). For machine learning approaches, the lack of data causes problems in the model selection, reliable performance estimation, development of specialized algorithms, and tuning of learning model parameters. While certain problems caused by scarce data are inherent to underrepresentation of the problem and cannot be solved, some aspects can be alleviated by generating artificial data similar to the original one. For example, similar artificial data sets can be of great help in tuning the parameters, development of specialized solutions, simulations, and imbalanced problems, as they prevent overfitting of the original data set, yet allow sound comparison of different approaches.

References is not available for this document.

MIT Libraries

MIT Libraries

Data Generators for Learning Systems Based on RBF Networks

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Data Generators for Learning Systems Based on RBF Networks

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

I. Introduction

References