I. Introduction
Indoor positioning technology based on Wi-Fi has attracted extensive attention due to its advantages of low deployment cost and high positioning accuracy. It mainly include two categories: ranging-based positioning methods, e.g., Time Of Arrival (TOA), and non-ranging positioning methods, e.g., fingerprinting. Compared with range-based methods, fingerprint positioning methods are widely used because of their simple implementation[1]. However, fluctuations in received signal strength indicator (RSSI) can have a significant impact on localization accuracy, so fingerprint localization is often combined with machine learning or deep learning to obtain better robustness. Convolutional neural network (CNN), as a typical deep learning network, has been widely utilized. It usually uses a convolution-pooling structure and a general filter to extract features[2]. S. Aikawa et al. modeled the adjacency relationship between APs as a two-dimensional model and used it to construct a CNN model for fingerprint localization[3]. G. Cerar et al. constructed an improved CNN model based on channel state information (CSI) from MIMO[4]. However, although the middle layer has large amount of available information, all these existing researches ignore this advantage. Hence, we propose a fusion network considering both spatial features and intermediate layer features, which fully utilizes features to contribute the localization performance.