I. Introduction
The wide penetration and deployment of communication infrastructures and embedded devices are moving the future Internet of Things (IoT) technologies beyond personal applications. Geographically distributed and hierarchical IoT systems for applications, such as smart cities/nations, pervasive healthcare, and Intelligent Transportation Systems (ITS), are soon becoming the reality [1], [2], [3], [4]. Large amounts of data are expected to be generated by billions of interconnected devices that are communicating via different networks and different protocols [5]. The hierarchical nature of such large-scale systems brings obvious challenges in managing scalability and sustainability. Edge intelligence is becoming a critical aspect to ensure scalable and reliable distributed systems in future sustainable ITS, which calls for new distributed machine learning techniques to effectively manage and leverage the large amounts of data collected over heterogeneous devices and networks across hierarchical topologies [6].