I. Introduction
The surging increased population of users (UEs) and the evolution of mobile applications have imposed stringent requirements on wireless communications. To meet the requirements, the base stations(BSs) are deployed in proximal to the UEs, leading to the dense network architecture [1]. For dense deployment of UEs and BSs, the promising cloud radio access network (C-RAN) pooling the Baseband Units (BBUs) from distributed BSs into a center, is able to boost network capacity and spectrum efficiency by resource sharing and centralized processing [2]. In a dense C-RAN, the substantial interference issue is critical, which can be managed by UE scheduling [3] and beamforming design [4]. The centralized architecture of C-RAN also allows cooperation across BSs to mitigate or exploit the interference, which is referred to as the coordinated multiple-point process (CoMP) technique [5]. However, the coordination becomes much more complex if more BSs are involved due to the requirements of precise synchronization among BSs, heavy traffic burden on the backhaul links, complex signal processing, etc. This limits the coordination to only a small subset of BSs in practice. Hence, it is necessary to cluster a subset of BSs to perform CoMP technique in a dense C-RAN network [6]. Note that both UE scheduling and BS clustering are executed in the layer 2, while beamforming design belongs to the layer 1 (physical layer). These cross-layer variables are coupled with each other, making it difficult to solve. It is of practical value to consider cross-layer design, since the layer 1 and layer 2 hardwares are going to be integrated in the BBU, and the performance can be further enhanced by jointly optimizing the coupled variables.