I. Introduction
Along with the development in decades of years, the wireless communication systems has evolved from the 2nd Generation (e,g. GSM) to the 3rd Generation (e.g. CDMA, WiMax), and to the 4th Generation (e.g. LTE) in recent years. Meanwhile the techniques of the 5th Generation have already been put on the agenda [1]. This evolution has led to the coexistence of a variety of communication standards, and multi-modes operation will become an important trend in the global wireless communications market. In this case, the traditional way that radio defined by hardware cannot satisfy the demand any more. Thus the so-called software defined radio (SDR) emerges, in which the signal processing is mostly implemented by software, so the signal processing requires more powerful computing abilities. The traditional uni-processor, or even the multi-processors with a few processing cores, cannot meet the demand of massive computing. In addition, since it is difficult to keep shortening the feature size of semiconductor continuously, it becomes more difficult to achieve higher computing performance by accelerating clock speed. As a result, multi -processor parallel processing, which makes use of several processors to process data concurrently, becomes an inevitable trend in future. This method can greatly boost the computing performance. Particularly, the heterogeneous multi -processors can cooperate to fulfill different types of tasks [2]. In general, multi-core is named if several up to more than a dozen cores are organized, which is often connected by bus. If dozens or even hundreds cores are grouped together, we call it many-core. The many-core parallel processing (MPP) architecture is a prominent solution for heavy load computation in digital signal processing (DSP) [3], [4]. In MPP architecture, the traditional bus-based interconnection shows the shortcoming of low bandwidth. Assume one processor obtains the bus token; other dozens of cores are excluded from bus usage, which leads to the inter-core communication to be the bottleneck of the many-core scenario. Therefore how to organize, connect and schedule a large number of processing cores to meet the computation requirements is an urgent problem in digital signal processing of wireless communications.