I. Introduction
As the main source of energy in China, coal plays an indispensable role in the healthy and stable development of the economy [1]. As China’s industrial structure changes to optimize energy consumption under marketized economic conditions, China’s coal consumption and its fluctuations are influenced by new and complex factors, the relevance of which is yet to be proven. This has made it extremely difficult to forecast and analyze China’s future coal consumption. Based on this, this paper focuses on the analysis and forecasting of the factors influencing coal consumption in China over the next five years, taking into account the realities of the situation and analyzing the main factors affecting coal consumption. Based on the findings of the analysis, insights are drawn from the forecasting of coal consumption. At present, many scholars have researched on the analysis and prediction of factors influencing coal consumption in China, for example: Men [2] et al. used the optimal weighted combination model to predict the structure of coal consumption and concluded that there would be a small increase in total coal consumption in China in the future; Wang [3] et al. constructed a model based on both the supply side and the demand side of coal consumption, and used VAR models as well as Granger The results of the study showed that there is a unidirectional causal relationship between economic development and per capita coal consumption; Liu [4] et al. built a multiple regression model and a time series model and a neural network model based on regression analysis and time series analysis, but only predicted the price of coal; Li Runchen [5] used an LSTM model to construct a model of coal consumption and predicted the time series change pattern was predicted; Fan [6] et al. used a combination of linear regression analysis, grey prediction model and neural network model for optimization, and built a combined prediction model for the relationship between energy consumption and GDP, but with a given GDP. In summary, many scholars have used multiple linear regression methods to analyze the factors influencing the subject of their research and to forecast coal consumption. However, it can be observed that traditional empirical models and multiple regression analysis are not sufficiently precise in their forecasting models, mostly arguing for correlations and trends, but not for specific data. In this paper, we propose to build and train models with machine learning methods, while creating new indicator systems and using XGBOOST with strong nonlinear mapping capabilities. The results of existing research [7] show that compared to single classifiers such as logistic regression and decision trees, a learning method that constructs a set of individual learners based on training data and integrates multiple learners using some strategy has higher accuracy and better robustness. The XGBoost algorithm combines multiple CART trees in the gradient descent direction based on the loss function, and is suitable for processing complex and large-scale annual data published on the website of the National Bureau of Statistics of China as in this paper. This paper not only considers the traditional influencing factors, but also analyses the collected influencing factor data and evaluates the accuracy of the forecasting model using RMSE, , which makes the coal consumption forecasting results more consistent with the actual values and improves the accuracy of the forecasting. The analysis of relevant data from 2000 to 2021 is used as a benchmark for forecasting trends in China’s coal consumption, industrial restructuring and technological innovation, including energy use efficiency and industrial technological progress, and can also provide partial guidance for the production of China’s coal industry and the formulation of government policy on energy.