Introduction
In recent years, with the continuous development of China’s economy, the market demand for hazardous chemicals has increased, of which 95% of hazardous materials come from different places than their destination and more than 50% are transported by road [1]. In 2018, the volume of hazardous chemicals transported by road in China reached 1.86 billion tons. The rapid increase in the frequency of transportation has led to a significant rise in the frequency of hazardous chemical road transport accidents. In addition, since hazardous materials are flammable, explosive, corrosive and poisonous, accidents often lead to more serious secondary injuries, causing a series of social problems, such as damage to the ecological environment and increased casualties and property losses. According to the U.S. Department of Transportation’s report on hazmat accidents (2009-2018), 145,971 (87.90%) of the 166,065 hazmat accidents occurred while in transit, and the number of highway-related incidents is increasing every year (12,730 in 2009 and 17,923 in 2018) [2]. In China, from 2006 to 2017, 3,974 incidents involving the transport of hazardous materials resulted in the loss of 5,203 lives. This finding indicated that more than one person was killed every day in China as a result of a hazardous materials incident [3]. On March 1, 2014, two tanker trucks carrying methanol collided in a road tunnel in Shanxi Province, causing a fire that killed 40 people. On June 13, 2020, a vehicle transporting liquefied petroleum gas exploded during transport in Zhejiang Province, causing the collapse of nearby houses and factories, killing 20 people and injuring 175 others [1].
Over the past decades, the issue of hazardous material transportation has been a very active area of research. However, most studies have focused on the direct costs of hazardous material transport or quantifying the potential losses that can result from an accident [4]–[6]. Research on the factors influencing the severity of hazardous material road transport accidents has been limited. Moreover, most research has merely described the characteristics of an accident or explored the relationship between the features and the severity based on statistical methods. Statistical models can quantitatively describe the functional relationship between a phenomenon and certain factors. Andersson, an early pioneer in the study of hazmat accidents, used statistical methods to analyze 570 hazmat accidents from 1986 to 1987 and determined that the type of hazardous material, type of road, type of truck for transportation, and location of the area had a significant impact on the severity of the accidents [7]. Yang et al., 2010 conducted a statistical survey of hazmat road transport accidents that occurred in China from 2000–2008 and found that 46.6% of the accidents were caused by poor road conditions, 13.7% were caused by driver error and 9% were caused by mismanagement [8]. Zhang et al. found that 1632 accidents involving hazmat trucks occurred in China from 2006 to 2010; the majority of them resulted in hazardous material spills, followed by explosion (15.1%) and fire (5.3%), and leakage was often the cause of subsequent explosion or fire [9]. A total of 708 accidents involving hazardous materials on Chinese highways from 2004 to 2011 were analyzed by Shen et al. [10]. Their study identified that 56% of those accidents resulted in hazardous material spills and that vehicle defects and human error were the main causes of hazmat accidents. Ma et al. [11] used an ordered logit model to estimate the probability of hazmat accidents of different severity levels and applied elasticity theory to analyze the factors significantly influencing the severity of hazmat accidents. They found that the factors that dramatically influence the severity of road hazmat accidents are illegal behavior, unsafe driving behavior, accident responsibility, vehicle problems, vehicle type, weather, lighting, road level, and regional distribution. Duan [12] analyzed hazardous chemical accidents in China for the period spanning 2000 to 2006 and found that the more developed southeast coastal areas had a higher incidence of accidents and deaths than other regions. Poku-Boansi et al. [13] found that vehicle speed, the presence of a spill and the population density at the accident road had a significant effect on the severity of road transport accidents involving dangerous goods. A random parameters ordered probit model was established by Xing et al. [14] to explore the influence of contributing factors on the severity of accidents. The results indicated that higher injury severity may be related to hazmat type, mishandling, driver fatigue, speeding, tunnels, slopes, county roads, dry roads, winter, darkness, more than two vehicles, rear-end accidents, and explosions. The results of a study by Fabiano et al. [15] showed that road alignment, meteorological factors and the frequency of transport vehicle traffic significantly affect the risk of road transport of dangerous goods. Azimi et al. [16] employed a random parameter logit model to study the injury severity of large truck rollover crashes in the state of Florida, and they identified that crashes tend to be more severe when there are hazardous material spills.
Statistical models have been used to successfully explore the factors influencing the severity of traffic accidents. A statistical model is an a priori hypothesis about the potential relationship between the variables of interest to determine the effect of the independent variable on the dependent variable after understanding the statistical characteristics, such as the data collection method and the estimated quantity. However, in practice, there is a possibility that the a priori assumptions do not represent the real situation of the variables, leading to inappropriate inferences [17]. In addition, related studies have pointed out that statistical models are more suitable for exploring the relationships embodied in data with smaller sample sizes and narrower characteristics [18], [19].
In contrast, machine learning models, as nonparametric tools, do not assume relationships between endogenous and exogenous variables and have no or few presuppositions about the explanatory variables. These models are more adaptable and can process high-dimensional data quickly; the larger the sample size is, the better the analysis. Furthermore, these models have the ability to classify dependent variables by calculating the highest significant explanatory variables [20], [21]. Currently, machine learning research is focused on decision trees (DTs), random forests (RFs), artificial neural networks (ANNs), support vector machine (SVM), etc. [22]–[24]. However, algorithmic processes such as ANNs and SVM are performed as if in a black box, making it difficult to see the process and directly obtain differences in the effect of different features on accident severity [25]. Tree-based algorithms are a common approach in machine learning algorithms. These algorithms have progressed from single decision trees to random forests based on bagging algorithms to gradient boosting trees. In a continuous improvement process, eXtreme Gradient Boosting (XGBoost) has improved the basic framework of a gradient boosting machine (GBM) by optimizing the system and enhancing the algorithm to offset all parallelization overheads in computation [26]. Additionally, borrowing regular terms corrects the inherent overfitting of a tree model. Ultimately, XGBoost has demonstrated the distinctive capability to solve a variety of classification problems and is widely recognized among researchers for its accuracy, simplicity, and interpretability. Soleimani et al. [27] used XGBoost to determine the relative importance of the variables used to close a crossing based on accident data occurring at 18,485 road-rail grade crossings in the United States. The model accuracy was 0.991, which was higher than that of decision trees (0.984) and random forests (0.987). Bahador et al. [28] applied XGBoost and SHapley Additive exPlanations (SHAP) for real-time accident detection and characterization. The results showed that XGBoost can robustly detect accidents with 99% accuracy, 79% detection and a 0.16% false alarm rate. It was also proposed that characteristics such as speed, population, network, land use, and weather conditions had a significant impact on the probability of accidents. Ma et al. [26] conducted a spatial analysis of the leading factors for the 3,146 traffic fatalities that occurred in Los Angeles in 2010–2012 based on a methodological framework of XGBoost and grid analysis and identified eight factors as the most influential. The influences were, in descending order, drunk driving, involvement in parties, rear-end collisions, lighting conditions, pedestrian involvement, motorcycle involvement, day of the week, and time of day. Zhang et al. [29] modeled the hierarchical relationship between material properties and their deep semantics occurring in the same image by the GS-XGBoost algorithm, which has been applied in different scenarios such as large-scale product image retrieval, robotics, and industrial inspection. Shi et al. [30] applied XGBoost to urban fire incident prediction.
In general, the literature on the analysis of factors influencing the severity of hazardous material road transport accidents is limited and has mainly focused on accident descriptions using statistical methods, with few studies applying machine learning algorithms to the analysis of factors influencing the severity of hazardous material road transport accidents. In addition, the previously small sample size and failure to account for inter-regional variability has created knowledge gaps in identifying key influences and predicting crash severity. The purpose of this paper is to analyze the factors influencing the severity of hazardous material road traffic accidents in seven regions of China. The severity of an accident was divided into property damage only, injury and fatal, depending on the casualty. The nonparametric machine learning algorithm XGBoost was applied in this paper for data preprocessing and exploration of key risk features, and its performance was compared with that of four other common classification algorithms. The comparison showed that XGBoost outperforms the other algorithms in terms of classification accuracy. The knowledge gained from this study can provide a theoretical basis for the government and transport enterprises to formulate effective preventive measures, rescue programs and material reserve plans to minimize a series of social problems, such as casualties, property damage and environmental pollution.
The remainder of this paper is organized as follows:
Section II provides an introduction to the XGBoost algorithm. Section III describes the data sources and processing procedures. Section IV presents the results of the model assessment and data analysis, and improvement recommendations are made based on the results of the data analysis.
Methodology
A. XGBoost
XGBoost is a C++ optimized implementation of a GBM [26], [31], [32]; complexity is introduced into the model when measuring the efficiency of the algorithm, so the objective function of XGBoost is expressed as:\begin{equation*} Obj=\sum \nolimits _{i=1}^{m} {l\left ({y_{i},\hat {y}_{i} }\right)} +\sum \nolimits _{k=1}^{K} {\Omega \left ({f_{k} }\right)}\tag{1}\end{equation*}
When \begin{equation*} \hat {y}_{i}^{\left ({t }\right)}=\sum \nolimits _{k=1}^{t-1} {f_{k}\left ({x_{i} }\right)+f_{t}\left ({x_{i} }\right)} =\hat {y}_{i}^{\left ({t-1 }\right)}+f_{t}\left ({x_{i} }\right)\tag{2}\end{equation*}
It follows that the traditional loss function is related to all trees that are well established. \begin{align*} Obj=\sum \nolimits _{i=1}^{m} {l\left ({y_{i}^{\left ({t }\right)},\hat {y}_{i}^{\left ({t-1 }\right)}+f_{t}\left ({x_{i} }\right) }\right)+} \sum \nolimits _{k=1}^{t-1} {\Omega \left ({f_{k} }\right)} +f_{t} \\\tag{3}\end{align*}
The objective function can be expressed as follows after expansion based on Taylor’s formula:\begin{align*}&\hspace {-.5pc}Obj=\sum \nolimits _{i=1}^{m} \left [{ l\left ({y_{i}^{\left ({t }\right)},\hat {y}_{i}^{\left ({t-1 }\right)} }\right)+f_{t}\left ({x_{i} }\right)g_{i}+ \frac {1}{2}{(f_{t}\left ({x_{i} }\right))}^{2}h_{i} }\right] \\& \qquad\qquad\qquad\qquad\qquad {+\sum \nolimits _{k=1}^{t-1} {\Omega \left ({f_{k} }\right)} +\Omega \left ({f_{t} }\right)}\tag{4}\end{align*}
The constant term is irrelevant to the result of the \begin{equation*} Obj=\sum \nolimits _{i=1}^{m} \left [{ f_{t}\left ({x_{i} }\right)g_{i}+\frac {1}{2}{(f_{t}\left ({x_{i} }\right))}^{2}h_{i} }\right] +\Omega \left ({f_{t} }\right)\tag{5}\end{equation*}
The structure of the tree is redefined according to formula (6), \begin{equation*} f_{t}\left ({x_{i} }\right)=w_{q\left ({x_{i} }\right)}\tag{6}\end{equation*}
If a tree contains a total of \begin{equation*} \Omega \left ({f_{t} }\right)=\gamma T+\frac {1}{2}\lambda \sum \nolimits _{j=1}^{T} w_{j}^{2}\tag{7}\end{equation*}
By bringing the structure of the tree into the loss function and defining the set of samples contained on a leaf with index \begin{equation*} Obj=\sum \nolimits _{j=1}^{T} \left [{ w_{j}\sum \nolimits _{i\in I_{j}} g_{i} \!+\!\frac {1}{2}w_{j}^{2}\left({\sum \nolimits _{i\in I_{j}} h_{i} \!+\!\lambda }\right) }\right] \!+\!\gamma T\tag{8}\end{equation*}
B. Cross-Validation
In
C. Model Assessment Indicators
1) Confusion Matrix
The confusion matrix and the metrics associated with it, accuracy, true positive rate (TPR), false positive rate (FPR), precision, recall, F-score, receiver operating characteristic (ROC), and the area under the ROC curve (AUC), were used to evaluate the model in this study [34].
A confusion matrix is a specific table layout that allows visualization of the performance of an algorithm. Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class, making it easy to see the numbers of false positives (FP), false negatives (FN), true positives (TP), and true negatives (TN). This approach allows more detailed analysis than the mere proportion of correct guesses (accuracy). The four outcomes and calculation formulas for assessing indicators are shown in Figure 1 as follows:
Confusion matrix and formulas for calculating accuracy, TPR, FPR, precision, recall, and F-score.
2) Accuracy
Accuracy is a composite metric that reflects how many of all samples are correctly predicted and is one of the most commonly used metrics for assessing predictive performance in classification mandates. In general, the higher the accuracy rate is, the better the classifier.
3) True Positive Rate (TPR) and False Positive Rate (FPR)
TPR indicates the proportion of samples that the classifier predicts to be positive as a percentage of the number of samples that are actually positive and measures the classifier’s ability to identify positive examples. FPR expresses the proportion of samples that the classifier predicts to be positive among the actual number of negative samples.
4) Precision, Recall and F-Score
Precision can be defined as a measurement of accuracy, i.e., the proportion of positive samples that are predicted to be correct among the total number of samples predicted to be positive.
Recall is a metric of completeness, i.e., the number of positive samples predicted correctly as a percentage of the number of actual positive samples. F-score is the harmonic mean of precision and sensitivity. The best values for precision, recall and F-score are close to 1, and the worst values are close to 0 [35].
5) Receiver Operating Characteristic (ROC) Curve and Area Under the ROC Curve (AUC)
ROC is a curve with FPR as the horizontal coordinate and TPR as the vertical coordinate, and this curve reflects a combination of the continuous variables of sensitivity and specificity. The larger the AUC is, the better the diagnostic performance [36].
Data
A. Data Collection
The data used in this paper were collected by the Dangerous Chemicals Registration Center of the Ministry of Emergency Management of the People’s Republic of China. The data represent the occurrence of road transport accidents involving hazardous materials in seven regions of China over the five-year period from 2015-2019. Based on the real situation of the raw data and with reference to the factors affecting the safety of road transport of dangerous goods listed in the European Agreement concerning the International Carriage of Dangerous Goods by Road (EUR), Highway Routing of Hazardous Materials: Guidelines for Applying Criteria (U.S.), and Regulations on the Administration of Dangerous Chemicals Safety (CN) documents, 19 features were initially selected as the independent variables of the model. These features are accident forms (direct accident form: DAF, final accident form: FAF), driver attributes (qualification: QU, fatigue: FAT), vehicle attributes (vehicle type: VT, vehicle safety status: VSS, device security status: DSS, moving state: MS), road attributes (road type: RT, road alignment: RA, traffic signal: TS, intersection: INT, segment type: ST), environmental attributes (surface condition: SC, season: SEA, month: MON, time of day: TOD, weather: WEA), and type of hazardous materials: HM. The severity level of an accident was determined by the number of casualties and was divided into three levels (property damage only: O, injury: I, and fatal: F).
B. Data Preprocessing
The complexity of hazardous material road transport incidents and the lack of specialization in the collection of information on hazardous material road transport incidents mean that there are always shortcomings in our raw data, and preprocessing is often required before the data can be applied to the model. The preprocessing process in this study involved data cleaning and data formatting.
1) Data Cleaning
Highly relevant data will be removed from the dataset [37]. Highly relevant data (correlation coefficient above 0.5) include data that are strongly correlated with the target and data that are tightly correlated with each other.
The results of a correlation analysis of the data are displayed in Figure 2, where one of each pair of features with a correlation coefficient greater than 0.5 is removed to alleviate the correlation problem and reduce the computational cost. In summary, this study excluded two highly correlated features. The number of features was reduced to 17. The cleaned dataset and its attribute descriptions are listed in Table 1. The number of hazardous material road transport accidents in each region of China is shown in Figure 3.
2) Data Formatting
Most of the features collected in this study are not sequential but categorical nominal variables, for which only the use of dummy variables can convey the most accurate information possible to the algorithm [38]. In the data, season, road type, etc., are nominal variables that need to be converted to dummy variables using unique hot coding. For example, in the raw dataset, the categorical data for SEA have four independent labels, including spring, summer, fall and winter. After a one-hot encoder was applied, four dummy variables, SEA_1, SEA_2, SEA_3, and SEA_4, were given to indicate the season in which an accident occurred. In this way, the 17 categorical features were formatted into 91 dummy object variables.
Results and Discussion
A. XGBoost Model
The experiment was run on a computer with 8 GB of running memory, an Intel (R) Core (TM) i3-3110M CPU, and a Windows 10 operating system. The coding environment was Python 3.8.2.
1) Model Performance Assessment
To further test the performance of XGBoost, four popular models, logistic regression (LR), multilayer perceptron (MLP), random parameters logit model (RPLM), random forest (RF) and SVM, were used to compare the performance, and 10-fold cross-validation was used to stabilize the results. The results are shown in Figure 4.
2) XGBoost Performance Analysis
The results describing the performance of the classifier for the seven regions, calculated from the confusion matrix, are shown in Table 2.
East China, Northwest China and Central China are the regions with more hazardous material road transport accidents in China, and the performance of the model for those regions is superior to that for the other regions. This finding may be due to the fact that there are fewer accident records in the other regions. These results clearly demonstrate that the model may not obtain the desired predictive accuracy when the dataset is too small.
B. Feature Importance
The combination of feature importance and XGBoost’s decision rules allows for a more definitive and comprehensive exploration of the main features that have an impact on the severity of hazardous material road transport accidents in each region. Specific effective measures and suggestions can be proposed to enhance the safety of hazardous material road transport. The main features affecting the severity of hazardous material road transport accidents in different regions are listed in Figure 5. Table 3 lists the occurrences of accidents (property damage only, injury, and fatal) with and without the relevant features. More specific details will be discussed in the next section.
C. Feature Analysis
The impact of each characteristic on the severity of hazardous material road transport accidents in the local area is analyzed based on the main risk characteristics of each region.
1) East China
The following results can be obtained from Figure 5 and Table 3. In East China, the features that have the greatest influence on the severity of hazardous material road transport accidents include HM, SC, MS, FA, and TOD (in order of importance).
Road transport accidents involving Class III and VIII hazardous materials accounted for 78% of all accidents, and the probabilities of serious and major accidents were higher than those of other types of hazardous materials (I: 48% VS 38%; F: 8% VS 7%). Frequent transport may be the underlying cause, and the flammable, explosive and corrosive properties of hazardous substances increase the likelihood of a serious accident [9].
Accidents occurring on dry pavement accounted for 83% of the total accidents. The casualty probability of accidents that occurred on dry pavement is significantly lower than that of the other road surface conditions (I: 41% VS 66%; F: 6% VS 14%). This is mainly because the high friction coefficient of dry pavement enables drivers to prevent an accident in time. However, on wet pavement with a lower coefficient of friction, the adhesion between a vehicle and the pavement is less, and it is not easy to control vehicles, which exacerbates the seriousness of an accident [14]. East China is located in China’s eastern coastal and southeastern regions, with a temperate and subtropical monsoon climate, more rain in summer, and more snow in winter, further exacerbating the above situation.
Sixty-one percent of the total number of accidents occurred while the vehicle was traveling straight ahead. The probability of a fatal crash occurring when the vehicle is traveling straight ahead is less than that of other moving states (F: 6% VS 10%). The primary reason is that when going straight, the driver is already relatively well acquainted with the surrounding environment and can deal with a potential accident that is happening in time. However, when turning, avoiding or going downhill, the road transport environment is relatively complicated and unfamiliar; these conditions not only increase the accident rate but also increase the severity of an accident if the driver fails to respond in a timely manner [16].
Twenty-three percent of all accidents in the region occurred with drivers who were fatigued while driving. Injuries and fatalities are more likely to occur in a fatigued state than in a non-fatigued state (I: 64% VS 40%; F: 12% VS 6%). The reason for this is that when fatigued, drivers are slow to become aware and react. When an accident is imminent, a fatigued driver is unable to assess the danger or take the correct avoidance measures in time. This dramatically increases the number of potential fatalities in traffic accidents [39].
Accidents at 15:00 and 16:00 in the afternoon accounted for 15% of the total number of accidents, which was significantly higher than the average. The rate of fatal accidents was higher than that at other times of day (F: 9% VS 7%). The main reasons include the fact that East China exerts strict control over the transportation times of hazardous materials, and these measures reduce the accident rates at night and in the early morning. After driving for a long time, truck drivers become fatigued at 15:00-16:00 and lose the ability to judge the driving environment around them, thus increasing the severity of accidents [40].
2) North China
From Figure 5 and Table 3, we can reach the following findings. In North China, DAF, RT, FA, RA, SEA, and HM are identified as key determinants of severity in accidents, in that order.
In 78% of the accidents, the direct accident forms involved spills, rollovers and two-vehicle rear-end collisions. Accidents with the above direct accident forms were less likely to result in fatalities than those with other direct accident forms (F: 2% VS 3%). Possible causes include the following: in comparison to the direct forms of accidents described above, an explosion leaves very little time for people to escape. Fall down accidents generally occur on treacherous roads or at bridges, making rescue difficult and thus increasing the severity of an accident. Multivehicle accidents involve a large number of people, which in turn increases the potential fatality rate. Similarly, in multivehicle accidents, those who are not initially injured and decide to flee their vehicle immediately are still at risk [10].
Fifteen percent of the total crashes occurred on urban roads. The probabilities of injury and fatal levels for hazardous material road transport accidents the occurred on urban roads were less than those of other road types (I: 40% VS 62%; F: 0% VS 2%), which is related to the strict regulation of the time of entry of vehicles transporting hazardous materials on urban roads. More serious accidents on highways and national and provincial roads can be attributed to the high speed of traffic, complicated traffic mix and number of parties involved. Lower police control and the poor road transport environment on rural roads increase the probability of fatal accidents [11].
Twenty percent of the total accidents occurred when drivers were fatigued. Accidents were more likely to be fatal when the driver was fatigued (F: 6% VS 1%). This finding may be due to the fact that fatigued drivers are slower to become aware and react [41].
The casualty probability for hazardous material road transport accidents that occurred on straight and curved roads was significantly higher than that for the other road alignments (I: 64% VS 35%; F: 2% VS 0%). Furthermore, 84% of accidents occurred on straight and curved roads. This result can be attributed to the fact that the main road alignments in North China are straight and curved roads. Driving on straight roads for long periods of time can cause visual fatigue, or driving on straight roads can be too comfortable and increase the likelihood of negligent driving, which can lead to serious accidents. At curves, a large mass of fluid in a tank can easily lead to overturning due to inertia when making turns, leading to casualties [15].
Fall and winter have higher fatality rates than spring and summer (F: 3% VS 1%). In North China, the need for heating during the fall and winter months leads to a significant rise in demand for hazardous materials and frequent transportation, which in turn increases the potential fatality rate [10]. Moreover, the cooler temperatures in autumn and winter pose a challenge to the technical safety of vehicles and equipment [3].
Accidents involving the transport of Class III hazardous materials have a higher probability of casualties than those of other types of hazardous materials (I: 60% VS 58%; F: 3% VS 0%). This finding may be attributed to the larger proportion of Class III hazardous materials (63%).
3) Central China
The following can be derived from Figure 5 and Table 3. In Central China, DAF, RT, FAF, FA, and RA, in that order, are critical features in determining the severity of road transport accidents related to hazardous materials.
When direct accident forms involve spills, fires, and two-vehicle collisions, the occurrence probability of injury or death is lower than that in other direct accident forms (I: 29% VS 69%; F: 5% VS 10%). The reasons for this result are similar to those in the previous section (North China). However, only 36% of the accidents in this region involved the above direct accident forms.
Accidents on highways accounted for 37% of all accidents in the region and were more likely to result in fatalities than accidents on other types of roads (F: 14% VS 5%). This finding might be attributed to more vehicles on the highway leading more easily to multivehicle accidents; the more parties involved in an accident, the higher the number of people to be engaged and the higher the fatality rate. Moreover, it is prevalent that the higher the speed is, the higher the mortality rate [42].
Eighty-five percent of accidents in the region ended in a spill as the final form of the accident. Accidents where the final accident form was a spill were associated with a much lower probability of fatalities than that of other final accident forms (F: 6% VS 23%). This may be explained by the fact that if the final accident form is a spill, the accident may mostly be caused by the failure of equipment and not involve other vehicles. In addition, leaks leave more escape time for accident participants than rollovers, fires, or explosions.
In 27% of the accidents in the region, drivers were fatigued at the time of the accident. Fatigued driving revealed higher rates of injury and death (I: 71% VS 49%; F: 11% VS 7%). This result is also caused by the poor condition of a driver when fatigued.
The influence of road alignment on accident severity is mainly reflected in the probability of fatal accidents. Fatalities are approximately 2.75 times more likely to occur on straight roads than on other road alignments (F: 11% VS 4%), mainly because drivers are more relaxed when driving on straight roads, making them more prone to drowsy or careless driving [43]. Worse still, accidents on such road alignments accounted for 56% of the region’s accidents.
4) South China
From Figure 5 and Table 3, we can reach the following findings. In South China, the severity of hazardous material road transport accidents basically depends on FAF, VT, FA, RT, and SEA, in that order.
Accidents in which the final accident form was a spill were more likely to involve injuries and significantly less likely to result in fatalities than those with other final accident forms (I: 50% VS 19%; F: 5% VS 24%); the reasons are consistent with those in the previous section [9]. In addition, 85% of the accidents ended up in the form of a spill.
Tanker trucks accounted for 87% of all hazardous material road transport vehicles. Accidents involving tanker trucks had a significantly higher probability of injury accidents and a lower probability of fatal accidents than those involving other vehicle types (I: 49% VS 21%; F: 7% VS 11%). The reasons behind this are as follows: tankers are the main vehicles used to transport hazardous materials, the regulation of tankers is becoming more systematic, the design and manufacture of tankers are more sophisticated, and the safety level of vehicles is increasing [44]. Other vehicles are mostly illegal transport vehicles that evade regulations; for these vehicles, the equipment safety level is not up to standard, and the driver has a lack of knowledge of hazardous chemical road transport and rescue, increasing the probability of fatal accidents [11].
Twenty-four percent of drivers were fatigued at the time of the accident. Fatigued drivers were more likely than non-fatigued drivers to be involved in both injury and fatal accidents (I: 56% VS 42%; F: 21% VS 4%). This finding is caused by the poor condition of a driver when fatigued.
Accidents on freeways accounted for 61% of all accidents in the region, and the probabilities of injury and fatal levels for crashes that occurred on freeways were greater than those of other road types (I: 50% VS 38%; F: 10% VS 4%). The possible explanations for the above results are as follows. First, because the highway road environment is better, when driving on such a road, a driver will unknowingly increase speed; second, more vehicles on the highway can easily lead to multivehicle accidents; the more parties that are involved in an accident, the higher the number of people to be engaged and the higher the fatality rate [16].
In summer, accidents were more likely to be fatal (F: 16% VS 3%), and 35% of accidents occurred in the summer. These findings are mainly due to the fact that summer is the main season for road transport of hazardous materials, which increases the possibility of accidents due to frequent transport. Additionally, high temperatures and heavy rainfall have a great impact on the transport environment, the technical safety of vehicles and equipment and the attention of drivers, further increasing the chances of serious accidents [45].
5) Southwest China
The following results can be obtained from Figure 5 and Table 3. In the Southwest, features that have a significant impact on the severity of road transport accidents involving hazardous materials include the DAF, FA, SEA, HM, and SC, in order of importance.
When the direct accident form was a spill, the severity of an accident was significantly lower than those of other direct accident forms, and the probability of a fatal accident was zero (I: 10% VS 70%; F: 0% VS 7%). This may be because spills are usually caused by the aging of equipment or by a minor impact, which will not readily lead to serious accidents. However, accidents where the direct accident form was a spill accounted for only 13% of the total number of accidents.
According to our results, fatigue has a substantial effect on the incidence of fatal accidents. Mortality in a fatigued state is 3.5 times higher than that in a non-fatigued state (F: 14% VS 4%), and the interpretation of this result is the same as presented previously. Eighteen percent of drivers were fatigued at the time of an accident.
The season in which an accident occurs has a dramatic impact on fatalities. Fatal accidents were more likely to occur in summer than in other seasons (F: 9% VS 4%), with 36% of accidents in the region occurring during the summer months. This is primarily attributed to the fact that summer is the season with the most frequent transportation of hazardous materials, the traffic volume is large, the number of parties involved in accidents is large and the number of people involved is large, thus increasing the probability of fatal accidents [18], [46]. On the other hand, summer precipitation is more frequent in Southwest China, with 78% of days experiencing precipitation and a large amount of precipitation, approximately 300 mm. Persistent heavy rain reduces road conditions and a driver’s ability to observe the surrounding environment, increases the tension of driving and affects the driver’s ability to control the vehicle, and the likelihood of a serious accident in this state is greater [45].
The probability of a fatal accident involving Class III hazardous materials was significantly smaller than that of other types of hazardous materials (F: 4% VS 9%). In the Southwest, 64% of accidents involved the transport of Class III hazardous materials. Unlike other regions, the Southwest had a lower probability of fatalities from transporting Class III hazardous materials than from transporting other types of hazardous materials, and this finding can be traced to the region’s strict regulations on transporting Class III hazardous materials.
Road surface conditions have a high correlation with the probability of fatal accidents. The probability of fatal crashes occurring on wet pavement was greater than that on other road surface conditions (F: 21% VS 4%), and 12% of accidents in this region involved wet road conditions [45]. In addition to the abovementioned climatic reasons, the complex geographical environment of the Southwest region causes certain difficulties for rescue, which is also a reason for the high rate of fatal accidents.
6) Northwest China
From Figure 5 and Table 3, we can draw the following conclusions. In Northwest China, the severity of hazardous material road transport accidents is mainly influenced by DAF, RT, FAF, SC, FA, and SEA (in order of importance).
Accidents involving the direct accident forms of fires, rollovers and two-vehicle rear-end collisions accounted for 55% of the total number of accidents in the region. The probability of a fatal accident was lower under the above direct accident forms than under other direct accident forms (F: 8% VS 15%). The reasons for this result are analogous to those in the North China region.
Accidents involving highways and rural roads accounted for 56% of the total number of accidents in the region. The probability of fatal hazardous material road transport accidents that occurred on highways and rural roads was significantly higher than that on other road types (F: 14% VS 7%). Possible reasons include high speeds, large numbers of vehicles and complex road conditions on highways, as well as lax transport management and poor road infrastructure on rural roads, preventing timely rescue efforts [47].
Spills accounted for the largest proportion of final accident forms (86%) and were far less likely to be fatal than other final accident forms (F: 7% VS 39%). An explanation of this result can be found in the section on the Central China region.
Eighty-eight percent of accidents in this region occurred on dry road surface conditions. Accidents on dry roads were significantly less likely to be fatal than those on wet, waterlogged or icy roads (F: 8% VS 35%). The possible reasons for this result are as follows: the Northwest has more plateau and mountainous terrain with difficult terrain and poor road conditions, reducing driver control and the opportunity to adjust the vehicle when the road surface is wet or icy. Additionally, lower levels of emergency response and medical care increase the probability of fatal accidents [45].
Twenty-two percent of accidents in the region occurred when drivers were fatigued. Injuries and fatalities were more likely to occur in fatigued conditions than in non-fatigued conditions (I: 66% VS 47%; F: 17% VS 10%) for the same reasons as before.
Fifty-two percent of all accidents in the region occurred during the fall and winter months. Fatalities were more likely to occur in autumn and winter than in spring and summer (F: 15% VS 7%). This result is mainly because of the rugged terrain and mountainous roads in the Northwest. In addition, the harsh natural environment in autumn and winter means that vehicles and equipment are more likely to break down, thus increasing the likelihood of dangerous accidents [45].
7) Northeast China
The following results can be observed from Figure 5 and Table 3. In Northeast China, FAF, SEA, DAF, HM, and MON are the key features, in order of importance, in distinguishing the severity of hazardous material road transport accidents.
Accidents where the final accident form was a spill accounted for 72% of the total number of accidents, and the fatality rate was significantly lower than that of other final accident forms (F: 8% VS 44%). The reasons for this result are similar to those described above for Central China.
Accidents occurring during the winter months accounted for 22% of the total number of accidents in the region. Contrary to previous perceptions, the probability of a fatal winter accident was extremely low and almost nonexistent, whereas the probability of a fatal accident in other seasons was 10%. This is probably because drivers understand the harshness of the winter environment in the Northeast, the difficulty of rescue, and the severity of an accident, so they increase their caution, thus reducing the chance of a serious accident.
Direct accident forms involving spills, fires, and two-vehicle rear-end collisions were less likely to result in fatalities than other direct accident forms (F: 3% VS 12%). Explanations for this result can be found in the sections on the North and Southwest regions of China. The direct form of an accident involved spills, fires and two-vehicle rear-end collisions in 49% of the accidents.
Among all hazardous material road transport accidents in the Northeast, 62% involved Class III hazardous materials, much higher than for other hazardous materials. This may also be one of the reasons why the probability of fatal accidents was higher for Class III hazardous materials than for other types of hazardous materials (F: 13% VS 0%).
Eleven percent of road transport accidents involving hazardous materials occurred in March. The fatal accident rate was extremely high compared to that occurring in other months (F: 29% VS 5%). This may be due to the fact that the hazardous material industry in the Northeast begins operations in March, and drivers are not fully familiar with the vehicles and routes to handle changes in the driving environment [3]. In addition, the excitement of starting work may cause drivers to forget the unique nature of hazardous material road transport, let their guard down, and engage in unsafe behaviors such as speeding.
D. Proposals to Improve Safety in the Transport of Hazardous Materials by Road
According to the results of the above analysis, corresponding recommendations will be made for each of the seven regions regarding how to improve the safety of hazardous material road transport.
1) East China
East China, with a relatively dense transportation network and population, should first establish relevant laws and regulations regarding safe distances for industries, while companies should try to avoid densely populated areas such as residential areas when choosing routes [9].
The safety of transporting Class III and VIII hazardous chemicals is strictly controlled, specific transport plans and workflows are formulated, safety education is carried out, and safety supervision of transport enterprises is strengthened [3].
Gather information from various sources (weather, road conditions) to adjust routes and transportation schedules in a timely manner to avoid driving in rain or snow or on slippery roads. Near curves, ramps, and other special road alignments, the road designer should provide sufficient information by installing road signs to alert drivers to upcoming road alignments and that they should reduce their speed and remain alert [42].
Truck manufacturers are recommended to take sufficient care in developing new safety equipment and detection instruments. For example, by adding driver detection devices to the in-vehicle system, the driver’s driving time, mental state and operating behavior will be monitored in real time, and any unsafe behavior will be promptly alerted to reduce the occurrence of accidents or aggravation caused by fatigue [20].
Other options are additional mobile checkpoints for hazardous materials at suitable locations on roads and mandatory control of vehicle travel times.
Traffic control, such as drowsy driving checks, should be strengthened from 15:00-16:00.
2) North China
Recommendations on the treatment of the FA, RA, and HM factors can be found in the section on the East China region.
Road authorities and transport companies should invest more in real-time monitoring and early warning equipment and establish monitoring and early warning systems. A coordination mechanism should be developed among transport enterprises, road authorities, fire departments, and environmental protection and health authorities to make efficient emergency rescues and plans [42].
Introduce standards for the hours of exclusion of vehicles transporting hazardous materials on urban roads, and strictly enforce them [48]. Speed limits should be strengthened on expressways and national, provincial roads and rural roads, and more road infrastructure should be installed for rural roads.
Reduce the design of longer, straighter roads, or add bulges in an orderly manner on longer, straighter roads to constantly remind drivers to stay alert [49].
The frequency of safety inspections of transport vehicles and equipment should be increased during the autumn and winter seasons. Additionally, traffic control should be strengthened. Recommendations on the transport of Class III hazardous chemicals can be found in the section on the East China region.
3) Central China
Recommendations for dealing with the DAF, FA, and RA factors can be found in the sections on the East China and North China regions.
Suggestions for handling the final form of an accident: Advanced technologies such as global positioning systems (GPS), geographic information systems (GIS), electronic billing and mobile networks can be used to set up monitoring systems to understand the development of accidents and provide basic information for the timely formulation of rescue plans [50].
Developing effective training programs for road transport accidents involving hazardous chemicals, raising staff risk awareness and knowledge of the characteristics of hazardous materials, and improving the response capacity are also necessary [10]. In addition, it is necessary to establish a linkage among transport enterprises, road management departments and emergency rescue organizations [50].
4) South China
For suggestions on the handling of the FAF, FA, RT, and SEA factors, please refer to the sections on East China, North China, and Central China.
The vehicle type should be regulated more heavily in terms of the overloading of tankers, and overweight vehicles have larger inertia and reduced operability. A load detection device can be installed at a load detection site for hazardous material transport vehicles, and the data can be uploaded in real time [51]. When a load is heavy, the relevant supervisors will be notified to avoid overloading in the transportation process, which can cause hazardous material transportation accidents.
The region will have to increase the costs of illegal modification and illegal transportation and improve the frequency and supervision of inspections on rural roads.
5) Southwest China
Recommendations for dealing with DAF and FA can be referenced in the section on the East China region.
In summer transportation, enterprises should comprehensively collect information from various parties, set up routes and schedules, avoid traveling on rainy days and steep terrain, and strengthen training for drivers to improve their safety awareness and ability to deal with emergencies.
Companies should tighten their load management of Class IV hazardous chemicals to avoid exposure to wet conditions during transport [52].
Recommendations for wet road surface conditions can be taken from the suggestions for summer transport management.
6) Northwest China
Recommendations on the treatment of RT, FAF, FA, and DAF can be obtained from the sections on the relevant regions above.
Route planning should be undertaken cautiously to minimize driving on wet, waterlogged, icy and snowy roads. If necessary, additional vehicle anti-skid equipment can be installed.
In the autumn and winter seasons, transport enterprises need to perform proper vehicle maintenance to deal with the harsh natural environment and rugged terrain of the Northwest Territories. Additionally, the frequency of vehicle inspections should be increased to ensure that vehicles and equipment run well. Increasing the stockpile of emergency supplies and equipment is also necessary [15].
7) Northeast China
Proposals for the DAF, FAF, and HM factors can be obtained from the sections on the abovementioned regions.
The Northeast region should strengthen the supervision of road transport of hazardous materials in spring, summer and autumn.
Transport companies should intensify training on driving skills, emergency operations and safety awareness before the resumption of work in March. Traffic authorities should increase the frequency of traffic control and inspections, such as by addressing speed limits, fatigue and illegal transport [12].
Conclusion
This paper proposed the use of XGBoost to develop a ternary classification model of property damage only, injury, and fatal accidents. On the basis of this model, we explored the factors influencing the severity of hazardous material road transport accidents in seven regions of China. In addition, four popular models, LR, MLP, RPLM and SVM, were applied to model the same data to validate the proposed model. XGBoost was found to have better prediction accuracy than the other models. It was then applied to explore the importance of factors influencing the severity of hazardous material road transport accidents in different areas, as well as to analyze the reasons why these important factors influence the severity of accidents in different regions. In the data, the distribution of hazardous material road transport accidents varied from region to region, and XGBoost performed well for those regions with a large amount of data (East China). Therefore, it is certain that more information is needed to obtain productive results.
The accident analysis results showed that there were some differences in the factors that determine the severity of hazardous material road transport accidents in different regions. The importance of the same factors in influencing the severity of accidents varied somewhat by region. There were also some regional differences in the causes of the impact of the same factor on the severity of accidents. Depending on the results of the analysis of the main influencing factors and causes identified in this study, targeted recommendations and countermeasures were provided for each region to improve the problems in the road transport of dangerous goods.
Nevertheless, this study is subject to several limitations. First, although this study collected data for 1411 hazardous material road transport accidents, the sample collected is relatively small compared to the research data of other traffic accidents. Second, due to the special nature of road transport accidents involving hazardous materials, the accident investigation cycle is relatively long. Some accidents were still under investigation at the time of accident data collection, and more comprehensive information was not available. Finally, the quality of the data is also limited by the professionalism of the collectors due to the lack of professionals in the study of hazardous material road transport in China. The sample size and dimension of the sample will be further expanded in future studies, and a more rational preprocessing approach to the data will be adopted to improve the quality of the data and perform a more comprehensive and prudent study.