Loading [MathJax]/extensions/MathZoom.js
A comparison of extreme learning machines and back-propagation trained feed-forward networks processing the mnist database | IEEE Conference Publication | IEEE Xplore

A comparison of extreme learning machines and back-propagation trained feed-forward networks processing the mnist database


Abstract:

This paper compares the classification performance and training times of feed-forward neural networks with one hidden layer trained with the two network weight optimisati...Show More

Abstract:

This paper compares the classification performance and training times of feed-forward neural networks with one hidden layer trained with the two network weight optimisation methods. The first weight optimisation method used the extreme learning machine (ELM) algorithm. The second weight optimisation method used the back-propagation (BP) algorithm. Using identical network topologies the two weight optimization methods were directly compared using the MNIST handwritten digit recognition database. Our results show that, while the ELM weight optimization method was much faster to train for a given network topology, a much larger number of hidden units were required to provide a comparable performance level to the BP algorithm. When the extra computation due to larger number of hidden units was taken in to account for the ELM network, the computation times of the two methods to achieve a similar performance level was not so different.
Date of Conference: 19-24 April 2015
Date Added to IEEE Xplore: 06 August 2015
Electronic ISBN:978-1-4673-6997-8

ISSN Information:

Conference Location: South Brisbane, QLD, Australia

1. Introduction

Feed-forward neural networks using random weights were first suggested by Schmidt et al. in 1992 [1] but were not a widely used method until Huang et al. popularised them as Extreme Learning Machines (ELM) [2], [3] in 2005. The ELM is a multi-layer feed-forward neural network topology and algorithm that offers fast training and flexible nonlinearity for function regression and classification tasks. Its principal benefit is that the network parameters are calculated in a single pass during the training process, which offers a significant improvement in implementation time over conventional back-propagation-trained feed-forward net-works. In its standard form it has an input layer that is fully connected to a hidden layer with conventional non-linear activation functions. The hidden layer is fully connected to an output layer with linear activation functions. The number of hidden units is often much greater than the input layer with a fan-out of 5 to 20 hidden units per input element frequently used. A key feature of ELMs is that the weights connecting the input layer to the hidden layer are set to random values, usually uniformly distributed in some predefined range. This simplifies the requirements for training to one of determining the hidden to output unit weights, which can be achieved in a single pass. By randomly projecting the inputs to a much higher dimensionality, it is possible for the algorithm to find a hyperplane which approximates a desired regression function, or represents a linear separable classification problem [4].

Contact IEEE to Subscribe

References

References is not available for this document.