I. Introduction
With the prevalence of ensemble and/or deep learning, a large amount of ensemble or deep classifiers [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24] have been contributed to serve various goals like image classification [2], [3], [4], sentiment analysis [7], [8], [9], [10], [11], [12], text analysis or classification [14], [15], [16], [17], [18], and sarcasm identification [22], [23]. Typical examples include RF [1], deep convolutional neural network (CNN) [2], ensemble pruning approach [8], deep Takagi–Sugeno–Kang fuzzy classifier (D-TSK-FC) [13], enhanced long short-term memory (ELSTM) [9], two-stage topic extraction framework [17], and three-layer stacked bidirectional long short-term memory architecture [23]. Among them, as an effective means of eXplainable artificial intelligence (XAI), various interpretable regression-based classification models on training samples with their integer [25], [26] or one-hot-vector label set [27] have been drawing more and more attentions from academic and industrial communities [13], [28], [29], [30], [31], [32], [33], [34], [35], [36]. While the interpretability of a classification model has been diverse in its meaning (e.g., visual interpretability and linguistic interpretability), linear regression models [25], [26] usually have clear feature-importance-based interpretability yet with poor classification performances. Deep or wide interpretable fuzzy classifiers [13], [29], [30], [31], [32] have a guarantee of linguistic interpretability simultaneously with promising classification performance and enhanced generalization capability. Typical examples include deep TSK fuzzy classifier (DSA-FC) [29], wide learning (WL)-TSK [31], knowledge adversarial training method (KAT) [32], deep high-order TSK fuzzy classifier (DHO-TSK) [30], D-TSK-FC [13], and faster convergence and concise interpretability (FCCI)-TSK [36].