Loading [MathJax]/extensions/MathZoom.js
Automated Brain Tumor Detection From Magnetic Resonance Images Using Fine-Tuned EfficientNet-B4 Convolutional Neural Network | IEEE Journals & Magazine | IEEE Xplore

Automated Brain Tumor Detection From Magnetic Resonance Images Using Fine-Tuned EfficientNet-B4 Convolutional Neural Network


A novel deep learning approach using a fine-tuned EfficientNet-B4 model enhances brain tumor detection from MRI images, achieving a 99.33% accuracy on the Brain Tumor Det...

Abstract:

Brain tumors are pathological conditions characterized by abnormal cell growth within the brain, which can disrupt normal brain function and pose significant challenges i...Show More

Abstract:

Brain tumors are pathological conditions characterized by abnormal cell growth within the brain, which can disrupt normal brain function and pose significant challenges in diagnosis and treatment. Magnetic Resonance Imaging (MRI) is crucial for identifying these tumors, but manual detection is time-consuming and error-prone. This study proposes a novel approach using deep convolutional neural networks (DCNNs) with EfficientNet-B4 as the base model, fine-tuned with customized layers for brain tumor detection. Specifically, our proposed model achieves an impressive overall accuracy of 99.33% for the detection of brain tumors on the publicly available Brain Tumor Detection 2020 Kaggle dataset. This model also conducted a comprehensive ablation study to evaluate the impact of various components on performance, including layer modifications, changes in batch sizes, optimizers, loss functions, and learning rates. This analysis helped to identify the optimal configuration, further enhancing the model’s robustness and classification accuracy. We ensured the robustness of our model through K-Fold cross-validation and a blind test on an independent dataset. Hyperparameter optimization was conducted using Bayesian Optimization to identify the optimal configuration, further enhancing the model’s performance. Comparative analysis against other deep learning (DL) algorithms showcases the efficacy of our approach, with EfficientNet-B4 surpassing all other models, including its variants such as EfficientNet- B0, B1, B2, B3, B5, B6 and B7. Our experimental results demonstrate the superiority of the fine-tuned EfficientNet-B4 model over other convolutional neural network (CNN) architectures, including VGG19, ResNet50, and ResNet101 in terms of recall/sensitivity, classification accuracy, F1-score, F2-score and area under the curve (AUC). These findings underscore the potential of DL-based approaches, particularly the fine-tuned EfficientNet models, in improving the efficiency and acc...
A novel deep learning approach using a fine-tuned EfficientNet-B4 model enhances brain tumor detection from MRI images, achieving a 99.33% accuracy on the Brain Tumor Det...
Published in: IEEE Access ( Volume: 12)
Page(s): 112181 - 112195
Date of Publication: 13 August 2024
Electronic ISSN: 2169-3536

SECTION I.

Introduction

Brain tumors are abnormal tissue growths in the brain that disrupt normal functions and can lead to severe damage or death [1]. Over 150 types of brain tumors have been identified, classified as either primary or metastatic. Primary brain tumors can be glial or non-glial, benign or malignant, and originate within the brain or its vicinity. Metastatic brain tumors [2], which originate from cancerous cells elsewhere in the body, account for a significant portion of cases, with lung cancer being the primary source in up to 40% of instances [3]. Advancements in diagnostic technologies, and surgical and radiation treatments have improved survival rates and enhanced quality of life for patients post-diagnosis [4].

Despite advancements in diagnostic technologies, surgery, and radiation treatments improving survival rates and quality of life, brain tumors remain a critical health concern. In 2023, approximately 25,400 malignant brain or spinal cord tumors are expected to be diagnosed in the United States, with 18,760 fatalities [5]. Brain tumors rank tenth [6] among all causes of mortality, with a 5-year relative survival rate of around 36%. For children under 15, the five-year relative survival rate is nearly 75%, but for those aged 40 and above, it drops to 21% [7].

Doctors traditionally detect brain tumors by analyzing MRI images [8]. However, interpreting these images quickly and accurately is challenging due to numerous abnormalities and noisy data. This necessitates automated computer-aided diagnostic (CAD) methods [9] to support physicians in promptly identifying these life-threatening tumors, thereby preserving the invaluable lives of people.

Artificial intelligence (AI) is a computer science branch focused on endowing computers with intelligence comparable to humans, enabling them to think, learn, and solve problems [10]. AI is crucial in detecting and treating brain tumors, particularly in surgery for brain tumors due to its complex processes [11]. Neurosurgical approaches have been transformed by subgroups of AI algorithms like machine learning (ML) and DL [12].They include feature extraction [13], selection [14], reduction [15], and classification [16] in addition to data pre-processing [17]. This has increased confidence in brain tumor diagnosis among neurosurgeons [18].

Deep learning [19], particularly CNNs [20], is crucial for achieving promising results in various applications such as pattern categorization, object detection, and voice recognition. Traditional ML algorithms like K-Nearest Neighbor (k-NN) [21], Naive Bayes [22], support vector machine (SVM) [23], and decision tree [24] and DL algorithms, like custom CNNs, ResNets [25], GoogleNet [26], and VGGNets [27] are used to diagnose diseases in the healthcare sector. However, existing approaches encounter deficiencies such as inadequate precision, bulky and sluggish models, elevated computational expenditures, and privacy concerns. Additionally, the extensive data in the healthcare domain makes it difficult to share medical information openly, and current methods have lower values of precision and recall, resulting in lower efficiency and potentially causing delays in patient treatment [28].

Recent studies have increasingly employed DL techniques to enhance the efficacy of CAD in medical diagnosis, particularly in the investigation of brain cancer. These methods have become integral in the healthcare sector, serving as valuable assets in diagnosing various critical conditions, such as brain diseases and skin cancer [29], [30]. DL approaches, particularly those involving transfer learning [31] and fine-tuning [32], are extensively utilized for identifying brain tumors. Transfer learning, a DL technique where models pretrained on large datasets are fine-tuned for specific tasks, has proven effective. By using pretrained models, the feature extraction process becomes more efficient, allowing for better detection of hidden patterns in MRI images, even with smaller datasets.

Several studies have explored using DL and ML techniques for brain tumor detection. Kang et al. presented a method utilizing features from pre-trained DCNN and ML classifiers across three dataset sizes. An SVM classifier with a radial basis function kernel achieved maximum accuracy compared to other classifiers [33]. However, the model’s performance on smaller datasets was not sufficiently addressed. Khan et al. proposed a brain tumor segmentation and classification system employing a fine-tuned VGG19 CNN model with synthetic data augmentation. This method demonstrated superior efficiency and accuracy, assisting radiologists in categorizing tumors into benign and malignant [34]. Yet, the approach’s dependency on synthetic augmentation may limit its applicability to real-world datasets.

A three-dimensional DNN model for tumor extraction and classification using a 3D CNN and transfer learning with VGG19 achieved high-precision segmentation and classification performance on multiple BraTS datasets [35]. However, the complexity of integrating 3D CNNs and transfer learning requires substantial computational resources. Another study proposed a decision support system using image processing techniques and a feed-forward neural network, achieving 95.80% accuracy in categorizing malignant and benign MRI images [36]. While effective, the system’s high sensitivity and specificity rates highlight its potential in aiding physicians but may face challenges in generalizability across diverse datasets.

Preethi and Aishwarya introduced a multi-stage brain tumor classification model combining gray-level co-occurrence matrix (GLCM) and wavelet-based GLCM for feature extraction, achieving 92% accuracy [37]. This model’s reliance on complex feature extraction techniques may pose scalability issues for larger datasets. Rajan and Sundar developed a hybrid energy-efficient technique for automatic tumor detection, achieving 98% accuracy through seven lengthy phases [38]. Despite its high accuracy, the model’s extended computation time due to multiple techniques is a significant drawback.

These studies highlight the strengths and limitations of existing approaches. The primary objective of this research is to conduct comprehensive experiments utilizing deep convolutional neural networks (DCNNs), transfer learning, and fine-tuning for automated brain tumor detection. Our proposed method aims to address the challenges of existing models by leveraging the efficiency of transfer learning and the accuracy of fine-tuned DCNNs, providing a robust solution for brain tumor identification in MRI images.

The key research contributions of this study are given as follows:

  • A new deep learning methodology is introduced for binary classification of MRI brain images into tumor or non-tumor categories. This approach optimizes the classification performance of pre-trained EfficientNet models (EfficientNetB0 to EfficientNetB7) by fine-tuning custom final layers while keeping the convolutional base frozen, utilizing publicly available MRI Brain Tumor Detection datasets to improve classification accuracy.

  • The fine-tuned EfficientNetB4 proposed in this study is characterized by its computational efficiency, lightweight architecture, and robust generalization capability on unknown test data.

  • Demonstrated performance enhancements of the fine-tuned EfficientNetB4 model in contrast to cutting-edge techniques across various evaluation metrics including recall/sensitivity, precision, F1 score, specificity, and accuracy.

  • Employed Bayesian Optimization to find the optimal hyperparameters, significantly improving the model’s performance and efficiency.

  • An ablation study was conducted to systematically analyze the impact of different model components and hyper-parameters on model performance, providing insights into the optimal configuration for EfficientNet variants in the context of brain tumor classification.

  • Ensured robustness and reliability of the model through K-Fold cross-validation, preventing overfitting and validating the model’s performance.

  • Conducted a blind test to assess the model’s performance on an independent dataset, verifying its generalization capabilities.

The following sections of the paper are categorized as follows: Section II outlines the proposed methodology. Experimental setup details are elaborated in Section III. Performance evaluation measures are presented in Section IV. The obtained results and their corresponding discussions are presented in Section V. A comparison of the proposed model with the most recent cutting-edge techniques is discussed in section VI. Finally, the paper concludes with a conclusion and suggestions for future research directions.

SECTION II.

Proposed Methodology

This section delineates the method employed for classifying MRI brain images into tumor or non-tumor categories. The proposed method’s workflow is depicted in a block diagram, as illustrated in Figure 1. Utilizing transfer learning with pre-trained EfficientNets and their variations, the approach fine-tunes eight models ranging from EfficientNetB0 to EfficientNetB7 on MRI series from an MRI brain tumor detection dataset for both feature extraction and detection purposes. In contrast to other cutting-edge pre-trained DCNN architectures [39], these models are chosen because of their computational efficiency, minimal FLOPS requirement during inference, and superior top-1 and top-5 accuracy scores on ImageNet [40]. The transfer learning and fine-tuning approach, integral to DL algorithms, leverages various hyper-parameters for optimization and training. An optimizer, playing a pivotal role in reducing overall loss and enhancing accuracy, is crucial for adjusting neural network learning rates and biases. In ML, a loss function assesses how successfully an algorithm fits the available data. The function of loss learns gradually to minimize error prediction with the help of an optimization function. To address this particular problem, the Adam optimizer [41] and binary cross-entropy loss function [42] are employed. The following sections will provide an in-depth description of every stage.

FIGURE 1. - Workflow of the proposed method.
FIGURE 1.

Workflow of the proposed method.

A. EfficientNet Baseline Model

EfficientNet, created by the Google Brain Team [43], represents a CNN model. Their research focused on scaling the network, demonstrating that optimizing network parameters such as width, depth, and resolution can substantially improve performance. By scaling a neural network, they introduced a series of models that achieve superior efficacy and accuracy in contrast to CNNs that were previously employed. EfficientNet has shown remarkable performance in large-scale visual recognition tasks, particularly on the ImageNet dataset, delivering high accuracy and consistency. The CNN architectures represented by EfficientNet are nearly 6 times faster and 8 times smaller for inference when compared to established approaches like VGGNets, GoogleNet, ResNets, Xception [44], and InceptionResNet [45]. EfficientNet employs a compound scaling technique to generate diverse models within the CNN family. Network depth refers to the count of layers, while the width of convolutional layers is directly related to the number of filters they encompass. Resolution is determined by the height and width of the input image. Equation (1)–(5) presented by the authors outlines the proposed scaling of depth, width, and resolution with respect to $\phi $ [43].\begin{align*} & \text {Depth:} \quad D = \theta ^{\phi }\quad \tag {1}\\ & \text {Width:} \quad W= \lambda ^{\phi }\quad \tag {2}\\ & \text {Resolution:} \quad R = \mu ^{\phi }\quad \tag {3}\\ & s.t. \theta \cdot \lambda ^{2} \cdot \mu ^{2} \approx 2 \quad \tag {4}\\ & \theta \geq 1, \lambda \geq 1, \mu \geq 1 \tag {5}\end{align*} View SourceRight-click on figure for MathML and additional features.where $\theta $ , $\lambda $ , and $\mu $ are constants obtained through grid search hyper-parameter tuning. The compound coefficient $\phi $ serves as a user-defined parameter that governs the allocation of scaling resources within the model. It enables the adjustment of network width, depth, and resolution, aiming to enhance both accuracy and requirements for memory according to the available resources. EfficientNet surpasses other cutting-edge models trained on the ImageNet dataset by scaling every dimension with a predetermined collection of scaling parameters, in contrast to traditional DCNNs. Even when employing transfer learning techniques, EfficientNet consistently delivers excellent outcomes, showcasing its applicability beyond the confines of the ImageNet dataset. Scales ranging from zero to seven are provided with the model, reflecting an improvement in accuracy and an increase in parameter size. As a result of the latest advances in EfficientNet technology, users and developers can now benefit from more robust DL-powered connectivity across a variety of platforms, fulfilling a wide range of applications.

The typical structure of EfficientNets consists of a stem block, seven intermediate blocks, and a final layer. As depicted in Figure 2, every block comprises a different number of modules, with the number increasing progressively from EfficientNetB0 to B7. Every variant in the EfficientNet family exhibits its distinct depth and parameter configuration. Although EfficientNetB7 has 66 million parameters and 813 layers, EfficientNetB0, the most basic version, has 5.3 million parameters and 237 layers. EfficientNets’ architecture incorporates mobile inverted bottleneck convolution (MBConv) layers, analogous to MnasNet [46] and MobileNetV2 [47]. EfficientNets are designed to accept images of pixel intensity values between 0 and 255, as image normalization is performed automatically within the model.

FIGURE 2. - EfficientNet B4 architecture.
FIGURE 2.

EfficientNet B4 architecture.

This study concentrates on using EfficientNets’ eight distinct versions (EfficientNetB0 through EfficientNetB7) as the basis for classifying MRI brain images as tumors or non-tumors. The selection criteria for the different versions of EfficientNet are based on factors like the size of the dataset, batch size, training and evaluation resources of the model, network parameters, and model complexity. The higher versions of EfficientNets, including EfficientNetB5 to EfficientNetB7, are characterized by increased network depth and parameters. While these larger models have a higher capacity for learning complex patterns, they are also more prone to overfitting and require greater computational resources, including more RAM and GPU power, for training. To address these challenges and ensure efficient model training, we employed data augmentation techniques and regularization methods. By using all eight versions, we aimed to comprehensively evaluate their performance and determine the most effective configuration for MRI brain image classification.

B. Transfer Learning

CNNs enable the automated extraction of feature maps at both low and high levels directly from the layers of CNN in contrast to conventional ML methods. The features extracted are further transformed into a 1-dimensional (1-D) feature vector, which is subsequently fed into one or more fully connected(FC) layers to perform classification tasks. Despite CNN’s notable successes, a primary limitation is its requirement for a large volume of data sets for efficient training of models and to mitigate overfitting. However, in domains such as medical imaging, gathering a substantial amount of annotated data is often impractical, and much of the available data is not publicly available. Transfer learning fixes this issue by transferring knowledge from models initially trained on larger benchmark datasets, such as ImageNet, to solve issues with fewer available data samples, such as classifying brain tumors from MRI images. The fundamental concept of transfer learning, as depicted in Figure 3, acknowledges the discrepancy between the source dataset and target dataset, such as MRI images. As a result, none of the pre-trained CNN architectures apply to interpretation and are expected to demonstrate a significant ability to generalize on unknown test data. Rather, the various layers of already trained models are modified empirically to adapt to the features present in the target domain images. Fine-tuning entails retraining the weights of selected top layers from DCNN architectures, which were initially trained on huge datasets for different tasks. The process of fine-tuning can include unfreezing either some or all layers in the CNN base layers [48], [49] or employing pre-trained models that act as constant feature extractors, which then feed the resulting features to SVM and other classifiers for classification purposes [50]. This work applies transfer learning by pre-trained EfficientNet models and fine-tunes EfficientNet variations from EfficientNetB0 to B7. To extract features and carry out classification tasks, these models are fine-tuned specifically on MRI series taken from the brain tumor detection dataset. The following sections will elaborate on the methodology employed to fine-tune the classification part of the pre-trained EfficientNet models, the experimental setups for training and assessing the model, and its performance on unknown test data.

FIGURE 3. - General concept of transfer learning.
FIGURE 3.

General concept of transfer learning.

C. Fine-Tuning of Pre-Trained EfficientNet Models

In this work, transfer learning is applied to 8 variants of pre-trained EfficientNets, specifically EfficientNetB0 through EfficientNetB7. These variants were initially trained on the ImageNet benchmark dataset. The aforementioned models undergo explicit fine-tuning of MRI sequences from the brain tumor detection dataset. The architecture of the fine-tuned EfficientNetB4 is depicted in Figure 4. The initial step in fine-tuning the pre-trained EfficientNets is to initialize the base model using pre-trained ImageNet weights as a basis. To reduce dimensionality, a Global Average Pooling (GAP) layer is added on top of the EfficientNets framework, but the convolutional base of each block in the EfficientNets remains unmodified. Moreover, the GAP layer streamlines the network by reducing the number of parameters while maintaining accuracy. Subsequently, a flattening layer was added. Following the flattening process, the data is directed to a dense layer containing 128 hidden units. This layer is activated using the rectified linear unit (ReLU) activation function. To address overfitting, a dropout layer with a dropout rate of 30% is introduced following the hidden layer comprising 128 neurons. Next, a dense layer with a single unit indicates the labels provided, thus generating predictions. Each feature map undergoes a linear transformation using a new set of biases and weights to yield probabilities. Lastly, a sigmoid function is employed as the final classifier for the task. The entire architecture is subsequently re-trained using the brain tumor detection dataset.

FIGURE 4. - Architecture of fine-tuned EfficientNetB4.
FIGURE 4.

Architecture of fine-tuned EfficientNetB4.

SECTION III.

Experimental Setup

This section commences with a comprehensive overview of the brain tumor detection dataset utilized in this work. To address the data limitations and enhance the robustness of our model, we employed various data augmentation techniques, and software requirements for training and evaluation of the model were discussed after that. Subsequently, the optimization of different hyper-parameters is addressed.

A. Dataset

In this work, the dataset used is the publicly available Brain Tumor Detection 2020 dataset collected from Kaggle [51]. The dataset comprises a total of 3000 images, with 1500 images containing tumors and the remaining 1500 images depicting normal brain scans. This distribution establishes two distinct classes within the dataset: the “normal” class, representing images without tumors, and the “tumor” class, representing images depicting brain tumors. The dataset was subdivided into three sets in which 2400 images(1200 normal images and 1200 tumor images) were used for the training sample, 300 images(150 normal images and 150 tumor images) for the validation sample, and 300 images(150 normal images and 150 tumor images) for the testing sample. The images were resized to a resolution of ($224\times 224$ ) pixels. Sample images in the dataset are shown in Figure 5 and Figure 6.The details of the blind test using an independent dataset are provided in the “BLIND TEST EVALUATION” section.

FIGURE 5. - Sample tumor images.
FIGURE 5.

Sample tumor images.

FIGURE 6. - Sample non-tumor images.
FIGURE 6.

Sample non-tumor images.

B. Data Augmentation

To address data availability limitations, we employed various data augmentation techniques to expand the size of our training dataset artificially. Data augmentation involves generating new training samples by applying random transformations to the existing images, enhancing the model’s generalization ability. Specifically, we applied the following augmentation techniques: randomly zooming into images within a range of 85% to 115% of the original size (zoom range), randomly shifting images horizontally within a range of -20% to 20% (width shift range), randomly shifting images vertically within a range of -20% to 20% (height shift range), and randomly shearing images within a range of -15% to 15% (shear range). By employing these data augmentation techniques, we aimed to improve the robustness and generalization capability of our deep learning model, ultimately enhancing its performance in brain tumor detection.

C. System and Software Requirements

The model proposed in this study was applied to an openly available dataset, leveraging a fine-tuned EfficientNet architecture developed in Python utilizing TensorFlow and Keras libraries. The training was conducted on a computer system featuring an Intel Core i5-12450H CPU operating at 2.00 GHz, with a 64-bit operating system and 16 GB of RAM, along with a 512GB SSD. Additionally, an NVIDIA GeForce RTX 3050 GPU was utilized for experimental purposes. Table 1 summarizes the details.

TABLE 1 System and Software Requirements
Table 1- System and Software Requirements

Initially, the pre-trained EfficientNet-B4 network was imported from Keras, with the initial layers of the base model frozen. Subsequently, fine-tuning was executed on the proposed final layers with brain tumor MRI images, and the entire model underwent re-training. To validate our experiment, comparisons were conducted with other CNN models, with specific validation procedures outlined in Section VI on the dataset used.

D. Hyper-Parameter Optimization and Settings

This section discusses the hyper-parameter tuning process and the final settings used to optimize the performance of the deep learning model for binary classification. Various hyper-parameters, including optimizer selection, learning rate, batch size, loss function, and epochs, are empirically tuned to attain optimal configurations for training the model and to attain the expected results.Hyper-parameter optimization aims to maximize the performance of a given model. Various methods can be employed for this purpose. The most common method is Grid Search [52], which exhaustively searches through a predefined subset of the hyper-parameter space. However, as more hyper-parameters are added, the number of possible combinations increases exponentially, making this process extremely time-consuming.An alternative method is Random Search [53], which experiments with various combinations of parameters randomly. This approach can result in high variance in outcomes due to its randomness. Both Grid Search and Random Search do not utilize information from previously evaluated hyper-parameter sets to inform future searches.

In this study, we employed Bayesian Optimization [54], a more sophisticated technique for obtaining optimal hyper-parameters. Bayesian Optimization has been shown to outperform Grid Search in previous studies. Unlike Grid Search, Bayesian Optimization efficiently finds optimal hyper-parameters with fewer iterations by using a surrogate model fitted to the observations of the actual model.The hyper-parameters considered for optimization included the number of units in the dense layer, dropout rate, optimizer type, learning rate, and batch size. The objective function for optimization was defined as the negative validation accuracy, which we aimed to minimize. The search space for hyper-parameters is included in Table 2. After performing 72 iterations of Bayesian Optimization, the best hyper-parameters identified.

TABLE 2 Search Space and Best Hyperparameters Identified
Table 2- Search Space and Best Hyperparameters Identified

The main goal is to minimize loss since a more efficient model is one having a smaller value of the computed loss. For calculating the variations between expected value and predicted value, cross-entropy (CE) is used. Equation 6 represents the loss computation for binary classification, where ‘X’ stands for binary values (0 or 1), and ‘Q’ for the probability [42].\begin{equation*}CE = -\left ({{X \log (Q) + (1 - X) \log (1 - Q)}}\right) \tag {6}\end{equation*} View SourceRight-click on figure for MathML and additional features.

The binary cross-entropy loss function was chosen as the model’s task was binary classification, making it the appropriate loss function for optimizing the model’s output probabilities to match the true labels.The Adam optimizer is utilized across all eight EfficientNets models, initialized with a learning rate of 0.001. Adam was chosen as the optimizer for training models due to its simple implementation, optimized memory utilization, and expedited learning capabilities. In contrast to other optimization techniques like RMSProp [55] or SGD [56], Adam was chosen due to its excellent DL applications in medical imaging analysis. Adam’s efficiency and quick convergence made it an ideal choice for training models in medical imaging analysis. The allowed batch size of 32 for efficient information transmission without consuming computational memory. To provide further regularization during fine-tuning while maintaining the ImageNet weights, a dropout rate of 0.3 is chosen. During training, training set images are added with a mini-batch size of 32, and EfficientNet which is fine-tuned undergoes training for 100 epochs. During every epoch, ten percent of the train set images are designated for validation to assess the trained model’s performance and guard against overfitting. All selected EfficientNet variants, from B0 to B7, are trained and tested using consistent hyper-parameter settings and experimental setups. A summary of the optimized values for all hyper-parameters employed throughout the experiments is presented in Table 3.

TABLE 3 Hyper-Parameters and Values
Table 3- Hyper-Parameters and Values

SECTION IV.

Performance Evaluation Metrics

The confusion matrix (CM) [57] is a conventional tool for illustrating the predictive accuracy of a trained model on a given testing data. It comprises the same number of rows and columns, delineating the true class labels and the ground truth labels. Likewise, the matrix includes predicted values that indicate the count of correct and incorrect predictions or classifications for each testing sample. True Positive(TP) represents the accurately predicted positive cases, while True Negative (TN) denotes the correctly detected negative cases. False Positive (FP), also called Type-1 errors, occur when an image is erroneously labeled as positive despite being negative, whereas False Negative(FN), also called Type-2 errors, are positive instances that are erroneously labeled as negative. The evaluation of AI-based models encompasses various metrics including precision, recall/sensitivity, accuracy, F1 score, F2 score, and specificity. The concise description of each evaluation metric is elaborated upon in the following subsections.

A. Precision

Precision, denoted as the proportion of true positive predictions, quantifies the accuracy of positive predictions made by the model. It can be computed using the formula provided below.\begin{equation*}Precision = \frac {TP}{TP + FP} \tag {7}\end{equation*} View SourceRight-click on figure for MathML and additional features.Precision is crucial in brain tumor detection as it indicates the model’s accuracy in identifying affected brain regions without falsely classifying healthy tissue. High precision ensures reliable predictions, aiding medical professionals in making informed decisions about patient care and treatment strategies.

B. Recall/Sensitivity

Sensitivity, also referred to as the true positive rate (TPR), quantifies the number of positive instances that are correctly identified as positive cases. The sensitivity or recall or TPR of the model can be computed using the following formula.\begin{equation*}Recall/Sensitivity/TPR = \frac {TP}{TP + FN} \tag {8}\end{equation*} View SourceRight-click on figure for MathML and additional features.This metric is crucial in diagnostics for medical purposes such as brain tumor classification. It possesses significant importance since it directly influences the accurate identification of brain tumors. High sensitivity ensures that the model can effectively detect true positive cases, thereby minimizing the risk of missing actual instances of brain tumors. This metric is crucial in healthcare applications where the early and accurate detection of tumors is essential for timely intervention and treatment. Moreover, a higher value of sensitivity or recall reflects the enhanced reliability of the model in terms of its generalizability.

C. Specificity

Specificity, often termed as true negative rate (TNR), delineates the proportion of negative instances that are accurately identified as negative. It quantifies the model’s ability to correctly classify negative cases. The formula to compute specificity is depicted below\begin{equation*}Specificity/TNR = \frac {TN}{TN + FP} \tag {9}\end{equation*} View SourceRight-click on figure for MathML and additional features.This metric is crucial in medical diagnosis tasks like brain tumor detection as it evaluates the model’s ability to correctly identify true negative cases, which helps in distinguishing healthy individuals or non-tumor cases from those with tumors.

D. F1-Score

The F1-score, also called the F-measure, provides a balanced evaluation of model performance by computing the harmonic mean of recall and precision. It serves as a comprehensive metric considering precision and recall simultaneously. F1 score encapsulates balanced evaluation by considering both precision and recall, offering a unified measure of model effectiveness. The computation formula for the F1 score is depicted in the equation below\begin{equation*}F1\text {-}score = \frac {2 \times {Precision} \times {Recall}}{{Precision} + {Recall}} \tag {10}\end{equation*} View SourceRight-click on figure for MathML and additional features.

In medical diagnosis tasks like brain tumor detection, achieving a high F1 score indicates that the model can accurately identify both positive (tumor) and negative (non-tumor) cases while minimizing FP and FN. This harmony is crucial for ensuring reliable and accurate classification results in clinical settings.

E. F2-Score

The F2 score is a widely used metric in binary classification to assess the effectiveness of a machine learning model, particularly when the goal is to prioritize recall over precision. It introduces beta, called the configuration parameter, which defaults to 1.0 (same as the F-measure). However, we can adjust beta to give more or less weight to precision and recall. For example: A smaller beta (e.g., 0.5) emphasizes precision. A larger beta (e.g., 2.0) emphasizes recall. It’s a valuable tool for assessing model performance in scenarios where correctly identifying positive cases (recall) is crucial.\begin{equation*}F2\text {-}score = \frac {5 \times {Precision} \times {Recall}}{4 \times {Precision} + {Recall}} \tag {11}\end{equation*} View SourceRight-click on figure for MathML and additional features.

The F2 score is crucial for maximizing recall and precision in brain tumor detection. It assesses a model’s ability to identify all positive cases, prioritizing sensitivity and capturing as many true positive cases as possible, ensuring accurate tumor diagnosis.

F. Accuracy

The model’s accuracy quantifies the proportion of accurately predicted labels among all labels. It provides insight into the correct predictions achieved during testing. Accuracy is computed using the formula given below.\begin{equation*}Accuracy = \frac {TP + TN}{TP + TN + FP + FN} \tag {12}\end{equation*} View SourceRight-click on figure for MathML and additional features.

SECTION V.

Results and Discussion

This section presents the outcomes of evaluating the trained fine-tuned EfficientNet versions for the binary classification task, along with an ablation study to understand the impact of different model components and hyperparameters on performance. By modifying various components, such as batch size, loss function, optimizer,learning rate, and layer modifications, a more robust architecture with higher classification accuracy can be achieved. To explore this, an ablation study was conducted on these components. To ensure the robustness and reliability of the model, K-Fold cross-validation was performed, and a blind test was conducted to assess the model’s performance on an independent dataset. The following subsections detail these evaluations and their outcomes.

A. Evaluation Metrics, Confusion Matrix and Receiver Operating Characteristics(ROC)Curve

The predicted results of the fine-tuned models on unknown test instances are summarized in Table 4, outlining accuracy, sensitivity or recall, precision, specificity, F1-score, and F2-score metrics. These findings demonstrate that the proposed fine-tuned EfficientNetB4 exhibits notable competence across every metric of evaluation. The curves illustrating the model train accuracy and test accuracy are depicted in Figure 7. As the count of epochs rises, both the train and test accuracy of the proposed model exhibit a consistent rise. Figure 8 shows curves for model training and test loss. As the count of epochs increased, both the train and test loss of the model decreased consistently.

TABLE 4 Performance Comparison of Fine-Tuned Models
Table 4- Performance Comparison of Fine-Tuned Models
FIGURE 7. - Accuracy plot of the proposed model.
FIGURE 7.

Accuracy plot of the proposed model.

FIGURE 8. - Loss plot of the proposed model.
FIGURE 8.

Loss plot of the proposed model.

The proposed model’s performance was assessed using a confusion matrix (CM) to identify correctly classified and misclassified data. Figure 9 illustrates that the model successfully identified 148 normal brain MRI images, with two being missed in the Non-Tumor Class(Class 0). In the Tumor Class(Class 1), the model identified 150 tumor images successfully.

FIGURE 9. - Confusion matrix of the proposed model.
FIGURE 9.

Confusion matrix of the proposed model.

The effectiveness of our brain tumor detection model is illustrated by the ROC plot, which is shown in Figure 10. The AUC holds significant importance as an evaluation metric for various classifiers, signifying the extent of separation between classes. This curve illustrates the model’s efficacy in distinguishing between different categories with precision. A higher AUC value indicates the model’s superior ability to differentiate between individuals affected and unaffected by the condition. A model exhibiting an AUC nearly equal to one signifies excellent proficiency. Notably, EfficientNet-B4 demonstrated an AUC value of 1, indicating its remarkable efficacy. The precision-recall curve is shown in Figure 11. It shows a trade-off between precision and recall, with higher curves indicating better performance. Proposed EfficientNet-B4 demonstrated high precision and recall values across various thresholds, showing its efficacy in correctly detecting brain tumors from MRI images.

FIGURE 10. - ROC curve of the proposed model.
FIGURE 10.

ROC curve of the proposed model.

FIGURE 11. - Precision-Recall curve of the proposed model.
FIGURE 11.

Precision-Recall curve of the proposed model.

B. Ablation Study

An ablation study was conducted to assess the impact of different model components and hyperparameters on the performance of the proposed EfficientNetB4 model.This study involved modifying various elements of the proposed EfficientNetB4 model and observing the resulting changes in performance metrics.

1) Case Study 1:Layer Modifications

In this case study, the accuracy, precision, recall, sensitivity, specificity, F1-score, and F2-score values were recorded after removing different layers to understand its impact on the model’s performance.The model was evaluated without the dropout layer to assess its impact on regularization and performance.Next, the dense layer was removed to understand its contribution to model accuracy and generalization. After that, the effect of removing the global average pooling layer was analyzed. Then the model’s performance without the flatten layer was also tested. The results are summarized in Table 5.

TABLE 5 Ablation Case Study 1:Layer Modifications
Table 5- Ablation Case Study 1:Layer Modifications

2) Case Study 2:Batch Size Modifications

The effect of different batch sizes (16, 32, 64) on the model’s performance was analyzed by recording the evaluation metrics for each batch size. Batch sizes 32 and 64 give identical high accuracy and for batch size 16 accuracy dropped. Results are shown in Table 6.

TABLE 6 Ablation Case Study 2:Batch Size Modifications
Table 6- Ablation Case Study 2:Batch Size Modifications

3) Case Study 3:Loss Function Modifications

Various loss functions, such as binary cross-entropy, hinge, and mean squared error, were tested. The performance metrics were compared across these loss functions. Binary cross-entropy and mean squared error loss functions give identical high accuracy compared to the hinge loss function. Results are shown in Table 7.

TABLE 7 Ablation Case Study 3:Loss Function Modifications
Table 7- Ablation Case Study 3:Loss Function Modifications

4) Case Study 4:Optimizer Modifications

The model was trained using different optimizers, including Adam, SGD, RMSprop, and Adagrad. The evaluation metrics for each optimizer were recorded to identify the best-performing optimizer for this task.Adam optimizer gives highest accuracy compared to others.The results are summarized in Table 8.

TABLE 8 Ablation Case Study 4:Optimizer Modifications
Table 8- Ablation Case Study 4:Optimizer Modifications

5) Case Study 5:Learning Rate Modifications

Learning rates of 0.01, 0.001, and 0.0001 were tested to determine the optimal learning rate for training the model. Results are shown in Table 9.

TABLE 9 Ablation Case Study 5:Learning Rate Modifications
Table 9- Ablation Case Study 5:Learning Rate Modifications

The ablation study indicates that each component and hyperparameter plays a crucial role in the performance of the proposed EfficientNetB4 model. A summary of optimized values is presented in Table 3. By systematically evaluating different configurations, we identified the optimal setup for accurately detecting brain tumors from MRI images. This comprehensive analysis underscores the robustness and adaptability of the proposed model, ensuring its efficacy in clinical applications.

C. K-Fold Cross-Validation

To ensure that our model does not suffer from overfitting and to validate its performance, we performed K-Fold Cross-Validation [58] using 5-fold and 10-fold splits. This technique allows for a robust evaluation by partitioning the dataset into k subsets, iteratively training the model on $ k-1 $ subsets while using the remaining subset for validation. The results from each iteration are then averaged to provide a comprehensive assessment of the model’s performance.In the 5-Fold Cross-Validation, the dataset was divided into 5 subsets. The model was trained and validated 5 times, each time using a different subset as the validation set and the remaining 4 subsets for training. The results are summarized in Table 10.Similarly, for the 10-Fold Cross-Validation, the dataset was divided into 10 subsets. The model was trained and validated 10 times, using each subset once as the validation set and the remaining 9 subsets for training. The results are summarized in Table 11.

TABLE 10 5-Fold Cross Validation Results
Table 10- 5-Fold Cross Validation Results
TABLE 11 10-Fold Cross Validation Results
Table 11- 10-Fold Cross Validation Results

The results of the K-Fold Cross-Validation indicate consistent performance across multiple splits of the dataset. The model maintained high accuracy and low loss, demonstrating its robustness and generalizability.The main advantage of using 5-fold cross-validation is its lower computational cost compared to higher values of k, while still providing a robust estimate of model performance. On the other hand, 10-fold cross-validation often provides a more accurate estimate by reducing the bias of the actual error rate estimator, although at a higher computational expense.Both techniques produced promising results in our experiments. The 10-Fold Cross-Validation showed slightly better performance with a higher average accuracy and lower average loss, further confirming that the model does not overfit to any particular subset of the data.

D. Blind Test Evaluation

To assess the robustness and generalizability of our trained model, we performed a blind test using an independent dataset for brain tumor detection [59]. This dataset consisted of 98 non-tumor images and 155 tumor images, which were not seen by the model during the training phase.The confusion matrix indicates that the model correctly classified 93 non-tumor images and 155 tumor images, with 5 false positives and no false negatives.Results are shown in Table 12.

TABLE 12 Blind Test Evaluation
Table 12- Blind Test Evaluation

The blind test results demonstrate the high performance of our model on an unseen dataset, confirming its potential effectiveness in real-world scenarios. The high accuracy (98.02%) and recall (100%) indicate that the model is highly reliable in detecting tumor cases, with no false negatives. This is critical in medical diagnostics, where missing a positive case could have severe implications.The precision (96.88%) and F1 score (98.41%) further validate the model’s capability to correctly identify tumor images, with a balanced trade-off between precision and recall. The specificity (94.90%) shows that the model is also effective in identifying non-tumor cases, though there is a small rate of false positives (5 cases out of 98 non-tumor images).The F2 score (99.36%), which emphasizes recall over precision, aligns with the goal of maximizing the detection of tumor cases. The sensitivity (100%) reiterates the model’s perfect performance in identifying all actual tumor cases in the dataset.

In conclusion, the blind test results underscore the model’s robustness and reliability in detecting brain tumors, with strong performance across all key metrics. These findings suggest that the model can be confidently applied in clinical settings for brain tumor diagnosis, potentially aiding in timely and accurate medical decision-making.

SECTION VI.

Comparison of Proposed Model With Recent Cutting-Edge Techniques

This section provides a comparative analysis of the performance achieved by the proposed fine-tuned EfficientNetB4 model against various cutting-edge techniques. In this work, the performance and efficiency of different CNN models: VGG19, ResNet50, ResNet101, and variants of EfficientNet such as EfficientNet B0, EfficientNet B1, EfficientNet B2, EfficientNet B3, EfficientNet B4, EfficientNet B5, EfficientNet B6 and EfficientNet B7 were compared. Each DCNN utilized a consistent set of parameters, as outlined in Table 3, with features adjusted based on the convolution layer depth and the fully connected layers. The performance metrics of the fine-tuned models employed in this work are presented in Table 4.

In the initial study, we utilized EfficientNet, a sophisticated DNN devised by Google AI, along with our suggested layers, to explore the transfer learning methodology for brain tumor detection in MRI images. Through fine-tuning, the EfficientNet-B4 attained the highest testing accuracy of 99.33%, surpassing the performance of the other variants of EfficientNet and other convolutional neural networks discussed subsequently. The efficacy of the transfer learning approach for brain tumor detection in MRI images was tested using the VGG19 architecture, created by Visual Geometry Group, which yielded a test dataset accuracy of 97.67%. Next, a pre-trained version of ResNet50, created by the Microsoft team, is utilized for brain tumor detection in MRI images. The results from the fine-tuned ResNet50 showed 98.67% accuracy on the test dataset. Then fine-tuned ResNet101 was utilized and obtained an accuracy of 98.33%. The analysis and comparison of results for each architecture utilizing the fine-tuned approach, as presented in Table 4 showed that the proposed model outperformed all other CNNs. The EfficientNet-B4 model outperformed all other CNNs in accuracy, with slight variations, according to a fine-tuned technique analysis. Out of the eleven CNN designs, the proposed model achieved the highest accuracy, demonstrating improved generalization capabilities for images of brain tumors.

Table 13 presents a comprehensive performance evaluation of both the current study and other recent research endeavors that have employed ML and DL methods for the detection of brain tumors. It’s essential to note that direct comparison among the following studies was not conducted due to variations in pre-processing methods, training and validation techniques, and computational resources. However, it’s important to highlight that the proposed model demonstrated outstanding results, achieving an impressive overall accuracy of 99.33%.

TABLE 13 Comparison of the Proposed Model With Recent Cutting-Edge Techniques
Table 13- Comparison of the Proposed Model With Recent Cutting-Edge Techniques

SECTION VII.

Conclusion and Future Work

The utilization of MRI for brain tumor detection has surged in popularity owing to the escalating demand for practical and precise analysis of extensive medical information. Brain tumors, being a life-threatening ailment, pose challenges due to time-consuming manual detection relying on medical experts’ expertise. An automated diagnostic system is imperative for detecting abnormalities in MRI images. The proposed approach, which involves fine-tuning the pre-trained EfficientNetB4 as its basis, surpasses numerous cutting-edge techniques tackling identical classification tasks. It achieves an outstanding overall test accuracy, recall/sensitivity, precision, specificity, F1-score and F2-score of 99.33%, 100%, 98.68%, 98.67%, 99.34% and 99.73%, respectively.

To ensure the robustness of the model and prevent overfitting, K-Fold cross-validation was employed, and a blind test was performed to assess model performance on an independent dataset. Additionally, hyperparameter optimization was conducted using Bayesian Optimization to identify the optimal configuration. A comprehensive ablation study was performed to evaluate the impact of various components on the model’s performance, including testing different batch sizes, removing layers (such as Dropout, Dense, Global Average Pooling, and Flatten layers), changing optimizers (Adam, SGD, RMSprop, Adagrad), loss functions (binary cross-entropy, hinge, mean squared error), and learning rates (0.01, 0.001, 0.0001), which helped identify the optimal configuration and further enhanced the model’s robustness and classification accuracy.

Looking ahead, there is potential to explore transformer-based models for MRI brain image classification. Such architectures can extract feature maps rich in information and may help to reduce the complexity of the network to some extent. Furthermore, extending the application of the proposed approach to other medical images like computed tomography (CT), X-ray, and ultrasound will lay the groundwork for forthcoming research initiatives.

References

References is not available for this document.