I. INTRODUCTION
In many developed countries traffic congestion is a major issue. Research in the last 20 years has concentrated on building models to develop an Intelligent Transport System (ITS). Past research includes a variety of machine learning models, such as K Nearest Neighbours and Support Vector Machines, however, multiple studies have shown that nonparametric models like neural networks perform better at predicting complex time-series data, such as road traffic flow [1]. Furthermore, due to advances in computing power and algorithm development by Hinton et al. [2], the depth of neural networks is increasing, leading to superior perfor mances. Deep neural networks (DNNs) [3] are now feasible and more efficient for large complex data [4]. They are favoured over shallow learners due to their ability to efficiently extract complex latent patterns embedded within the data [5] owing to their long computational chain of layers [6]. Despite this, neural networks designed for road traffic prediction are predominately shallow learners with only one hidden layer [7]. Therefore, research into exploring deep architectures to improve prediction accuracy for road traffic flow is now possible and needed.