Loading web-font TeX/Main/Regular
Decision-Level Fusion Classification of Ovarian CT Benign and Malignant Tumors Based on Radiomics and Deep Learning of Dual Views | IEEE Journals & Magazine | IEEE Xplore

Decision-Level Fusion Classification of Ovarian CT Benign and Malignant Tumors Based on Radiomics and Deep Learning of Dual Views


Fusion at the decision level.

Abstract:

Ovarian cancer is one of the most prevalent malignant tumors in the female reproductive system, and its early diagnosis has always posed a challenge. Computed tomography ...Show More

Abstract:

Ovarian cancer is one of the most prevalent malignant tumors in the female reproductive system, and its early diagnosis has always posed a challenge. Computed tomography (CT) is a widely utilized clinical management tool that can extract much detail through computer algorithms, playing a vital role in the early diagnosis of ovarian cancer. This research aims to develop an ovarian benign-malignant classification model based on radiomics and deep learning of dual views. A retrospective analysis of CT images from 135 ovarian tumor patients was conducted using the StratifiedKFold method (K =5) for cross-validation. Radiomics features were extracted from CT data and inputted into an automated machine learning (A-ML) framework. Meanwhile, the deep learning (DL) model called Dual-View Global Representation and Local Cross Transformer (D_GR_LCT) was proposed for ovarian tumor classification using a global-local parallel analysis approach for end-to-end training. Radiomics results indicate the superiority of 3D input over 2D, with an average AUC-ROC of 88.35% and an average AUC-PR of 88.73%. Comparative experiments demonstrate enhanced model performance with parameter settings. The DL model achieves an average AUC-ROC and AUC-PR of 88.15% and 85.17%, respectively, validated by ablative and comparative experiments. At the decision-making level, the fusion of radiomics and DL models demonstrates an average AUC-ROC and AUC-PR of 91.35% and 90.20%, respectively, utilizing the stacking method. The fusion model outperformed individual models. Thus, models based on radiomics and dual-view DL are recommended for early identification and screening of ovarian cancer in clinical practice.
Fusion at the decision level.
Published in: IEEE Access ( Volume: 12)
Page(s): 102381 - 102395
Date of Publication: 19 July 2024
Electronic ISSN: 2169-3536

Funding Agency:


SECTION I.

Introduction

Ovarian cancer (OC) is a prevalent gynecologic cancer, ranking behind cervical and uterine cancers [1]. The incidence and mortality rate of OC in China have both steadily increased over the past 30 years, and this trend is projected to continue for the next 30 years [2]. Most ovarian tumors are diagnosed at mid to advanced stages due to the lack of early visible symptoms and reliable screening tests. Current clinical methods for early diagnosis of OC include pelvic examination, imaging tests (such as ultrasound or CT scan of the abdomen and pelvis), blood tests to assess organ function, tumor marker tests such as CA-125, genetic tests, etc. However, these diagnostic procedures are laborious, time-consuming, costly, and require highly skilled examiners and physicians. Ultimately, patients may still fail to obtain precise diagnostic results. Therefore, the primary challenge facing OC lies in accurately and efficiently distinguishing between cancer patients and normal/benign patients in the early stages without imposing additional burdens on clinical practice and patients [3], [4].

Radiomics, an emerging research method, was first proposed by Lambin et al. [5]. This method reveals the relationship between tumor biological features, heterogeneity, and image data by extracting high-throughput image features. Doctors can utilize it to develop descriptive and predictive models that aid them in making more precise diagnoses [6]. Medical scanning imaging is widely used as a clinical management method for patients in most hospitals, which means that radiomics can be tightly integrated with clinical practice. Radiomics extracts a significant amount of analyzable data from medical images through computer algorithms to quantitatively capture features such as the shape, size, volume, and texture of a tumor or normal tissue region. Ultimately, these features are utilized to obtain valuable diagnostic or prognostic information to support clinical decision-making without burdening existing workflows significantly [7], [8].

Compared to the gold standard of pathological biopsy, radiomics offers a non-invasive method that not only reduces patient discomfort but also improves work efficiency and reduces the financial burden on patients. It provides a healthier and safer means of assessing their condition in the future [9]. Additionally, biopsies have limited ability to characterize lesions’ spatial and temporal heterogeneity, making them inferior to radiomics [10].

In recent years, several reviews have been developed on the application of radiomics for OC, providing a comprehensive summary of the developing experiences in OC radiomics [11], [12], [13]. Additionally, numerous radiomics models for OC have been proposed and applied to various medical scenarios. These include predicting postoperative recurrence [14], early detection and diagnosis [15], [16], [17], assessing patient survival [18], preoperative classification [19], [20], [21], [22], cancer typing [23], [24], chemotherapy sensitivity [25], and prognostic [26].

Deep learning (DL), originally developed for image analysis, has shown remarkable performance in diverse image processing tasks, including registration, segmentation, feature extraction, and classification. It has demonstrated outstanding performance in extracting latent information from medical images, making it a powerful tool for lesion detection and classification tasks, such as lung cancer [27], thyroid cancer [28], and breast cancer [29]. Numerous studies have established DL as the most effective method for computer-aided diagnosis (CAD) [30], [31]. However, traditional DL models typically rely on a single-view input, which may lead to underestimation or even neglect of the spatial correlation between tumor locations, particularly in small datasets. To address this limitation, recent research has shifted towards exploring multi-view/double-view approaches to improve model performance. In medical imaging, the term “Multi-view” refers to images obtained from different angles or planes. For CT scans, multi-view typically includes axial (transverse), coronal, and sagittal views: 1) The axial view provides a detailed perspective of different anatomical levels by scanning along the transverse plane, aiding in accurate localization and measurement of lesions. 2) The coronal view is generated by scanning along the frontal and posterior directions, offering information about the width and thickness of the organ. 3) The sagittal view is produced by scanning along the left and right directions, revealing the depth and position of organs while also helping to determine their spatial relationships. The utilization of multi-view approaches allows DL models to thoroughly learn the features of the region of interest (ROI), thereby enhancing the model’s performance and accuracy in medical image analysis tasks.

For example:

  • A case study on liver cancer utilized a Deep Multi-view Comparative Learning (DMCL) approach for cancer subtype identification [32].

  • A paper published in Medical Image Analysis designed attention-enhanced deep neural networks that jointly analyze Bone scintigraphy from anterior and posterior views to automatically diagnose the absence or presence of bone metastasis [33].

  • Chen et al. used local and global transformation modules for modeling dependencies within and between Mammograms to accurately identify lesion regions and compute features from unregistered multiple mammograms [34].

  • Chen et al. constructed a multi-view local co-occurrence and global consistency learning model using two mammographic views (main and auxiliary) as inputs, further improving the generalization of mammogram classification [35].

  • Gao et al. proposed a new method (MuVAL) for multi-view synthesis learning of CT images using an attention mechanism, to accurately predict residual lesions after ovarian cystectomy surgery [36].

Currently, research on the dual-view DL of ovarian has not yet fully developed, especially in distinguishing ovarian cysts.

The radiomics features are extracted through medical imaging data without the need for large datasets, as required by DL. This method possesses strong biological and clinical interpretability [37]. However, it requires high data quality and standardization and is influenced by the imaging acquisition equipment and parameters. On the other hand, DL also possesses unique advantages. Firstly, it can automatically learn and extract features without the need for manual algorithms. Secondly, it can acquire multi-level representations, which aids in a better understanding and analysis of image data. Thirdly, it can parallel process large-scale image data, thereby improving efficiency. Fourthly, it can achieve end-to-end learning, simplifying the entire image analysis process and enhancing model performance. Lastly, it demonstrates a strong generalization ability to adapt to different types and styles of image data. The deep features extracted by DL models have demonstrated powerful representational capabilities and robustness against interference, enabling the quantification of high-level semantic information within the data. However, it requires a large amount of annotated data and computational resources and lacks interpretability [38]. In general, the strong interpretability of radiomics features can complement the shortcomings of deep features, while deep features provide deeper semantic information, which helps assist the study of radiomics. Therefore, research about radiomics and DL is receiving increasing attention.

In recent years, there has been an increasing number of studies focusing on the classification of OC by combining radiomics and DL [39]. However, there is still a lack of research on the classification of benign and malignant ovarian tumors based on radiomics & dual-view DL.

The purpose of this paper is to construct a model for the precise classification of ovarian benign and malignant tumors based on radiomics and DL technology. We propose a model that combines radiomics and dual-view DL, using the Stacking method to optimize the effectiveness of each approach. The results show that this fusion strategy effectively integrates the decision levels of radiomics and the DL model, successfully achieving precise classification of benign and malignant ovarian tumors. The innovation of this study lies in:

  1. Regarding radiomics, we used Pyradiomis and improved automated machine learning (A-ML) to carry out experiments, successfully achieving precise differentiation of benign and malignant tumors in enhanced ovarian CT.

  2. In the field of DL, we have developed a novel model for ovarian tumor classification known as Dual-view Global Representation and Local Cross Transformer (D_GR_LCT). The model takes dual views (axial view and coronal view) as input for the first time and employs a global-local parallel analysis approach to assess the global representation and local information within ROI. To generate local information, we have developed a new Cross Attention Transformer (CAttnT) to facilitate the exchange of information between different view features.

  3. In this paper, the dual-view DL model is integrated with radiomics for the first time to achieve ovarian tumor classification. The stacking method was employed for decision-making between the two models. The final results demonstrate that the combined model outperforms the individual models (radiomics/DL).

SECTION II.

Datasets Preparation

A. Data Acquisition

This study focuses on the enhanced CT arterial phase data of patients with ovarian tumors. A retrospective analysis was conducted on clinical data from 135 patients with confirmed ovarian tumors at Nanfang Hospital in Guangzhou, Guangdong Province, between June 2011 and August 2018. The patient data were collected using the Siemens SOMATOM Definition scanner, which utilized specific parameters: a tube voltage of 120 kVp, a tube current range of 122-673 mA, collimation widths of 19.2, 28.8, and 80 mm, minimum reconfigurable thicknesses of 0.6, 0.625, and 1.2 mm, a data acquisition diameter of 500 mm, and an exposure time of 500 ms. The layer thickness ranged from 0.6-5 mm and the image resolution of 512\times 512 .

B. Outlining the ROI

The open-source medical image processing and visualization software ITK-SNAP is utilized in medical research and clinical practice to facilitate accurate image analysis and diagnosis by doctors and researchers. In this study, imaging experts used the ITK-SNAP software to meticulously map the ROI layer by layer for ovarian tumors in patient images. Figure 1 shows a 3D reconstruction of the sketched image.

FIGURE 1. - Stereo image after 3D reconstruction.
FIGURE 1.

Stereo image after 3D reconstruction.

C. Raw Data Analysis

The experimental datasets comprise raw images stored in dcm format and masks stored in nii format. The ITK-SNAP software visualizes the three views (axial, coronal, sagittal) of patient ROI, and the user can observe that the ROI varies in size for each view. In order to extract information from the maximum ROI, this paper conducted a statistical analysis of the maximum ROI distribution for all patients, as shown in Figure 2. The results showed that 73.33% of patients had the maximum ROI in the axial section, and the remaining 26.67% had the maximum ROI in the coronal section (24.45%) and sagittal section (2.22%), respectively. Therefore, this paper selected the axial slice with the highest proportion as the input of 2D radiomics, and selected the maximum ROIs of axial view and coronal view as the research objects of dual view DL.

FIGURE 2. - Distribution of the maximum ROI.
FIGURE 2.

Distribution of the maximum ROI.

SECTION III.

Research Methodology

All experiments in this research were written and executed in Python (3.10.9). The experiments were conducted on a computer with an Intel Core (TM) i5-6500 CPU running at 3.20GHz, 8GB of RAM, and the Windows 7 Professional (x64) operating system. Additionally, the experiments were also performed on a server equipped with an NVIDIA Titan X graphics card for GPU acceleration.

A. Division of the Data Set

In this study, the dataset was divided into training and independent testing sets using the StratifiedKFold (K =5) method, in order to preserve the same proportion of each class. Figure 3 demonstrates the application of StratifiedKFold for stratifying the dataset. The number of benign/malignant samples in each fold’s training and testing sets is shown in Table 1.

TABLE 1 Dataset of OC Patients
Table 1- Dataset of OC Patients
FIGURE 3. - Schematic diagram of StratifiedKFold for stratifying the dataset.
FIGURE 3.

Schematic diagram of StratifiedKFold for stratifying the dataset.

B. Evaluation Index

The Receiver Operating Characteristic (ROC) curve and the Precision-Recall (PR) curve are widely utilized metrics for evaluating the performance of binary classification models in academic research. The ROC curve illustrates the true positive rate (TPR) against the false positive rate (FPR), while the PR curve plots precision against recall. The Area Under the Curve (AUC) for ROC (AUC-ROC) and AUC for PR (AUC-PR) serve as indicators for assessing different model classifications’ performance. A higher AUC-ROC value indicates a stronger ability of the model to differentiate between positive and negative cases, whereas a higher AUC-PR value suggests better performance at various thresholds, effectively balancing precision and recall. Referring to [35], [40], and [41], AUC-ROC and AUC-PR were adopted as evaluation indexes to evaluate model performance in this paper. The formulas for calculating the TPR, FPR, Precision, and Recall are provided as follows:\begin{align*} \mathrm {TPR}& =\frac {\mathrm {TP}}{\mathrm {(TP+FN)}} \tag {1}\\ \mathrm {FPR}& =\frac {\mathrm {FP}}{\mathrm {(FP+TN)}} \tag {2}\\ \mathrm {Precision}& =\frac {\mathrm {TP}}{\mathrm {(TP+FP)}} \tag {3}\\ \mathrm {Recall}& =\frac {\mathrm {TP}}{\mathrm {(TP+FN)}} \tag {4}\end{align*}

View SourceRight-click on figure for MathML and additional features.where TP, TN, FP, and FN represent the counts of true positive, true negative, false positive, and false negative samples, respectively.

C. Radiomics

1) Radiomics Feature Extraction

Based on the data shown in Figure 2, we have decided to extract radiomic features from the maximum ROI slice along the axial axis for the 2D radiomic experiment. Meanwhile, the 2D radiomics experiment will also serve as a comparative study. Finally, this study will conduct 3D and 2D radiomics experiments to further determine the optimal results.

We used the PyRadiomics toolkit [42], which supports 2D and 3D feature extraction, to extract radiomics features from CT images of each patient. For the legibility of this paper, ROI is used as the general term for 2D area of interest and 3D volume of interest. The extracted features encompassed shape (3D/2D), first-order statistical features, gray level co-occurrence matrix (GLCM), gray level run-length matrix (GLRLM), grey-level size zone matrix (GLSZM), gray level dependency matrix (GLDM), and neighbourhood gray-tone difference matrix (NGTDM) [43]. Table 2 displays the dimension of extracted radiomics features.

TABLE 2 The Dimension of Extracted Radiomics Features
Table 2- The Dimension of Extracted Radiomics Features

To enhance the extracted features, this paper referred to [44], [45], and [46] for performing image preprocessing on the original input image. Specifically, Laplacian of Gaussian (LoG) and Wavelet were added. At the same time, image resampling is performed using the b-spline interpolation method (sitkBSpline), while mask resampling is carried out using the nearest neighbor interpolation method (sitkNearestNeighbor). Since shape description is independent of gray value and extracted from the label mask, shape features can only be calculated on the original image. In contrast, other radiomics features can be calculated on both the original image and the derived (LoG, Wavelet) image. Please refer to Table 3 for specific settings and feature dimensions. “Other Features” refers to the radiomics features listed in Table 2, excluding shape features.

TABLE 3 Parameter Setting for Image Preprocessing
Table 3- Parameter Setting for Image Preprocessing

The wavelet transform is a filtering technique utilized for decomposing images into different scales of detailed information. The optional wavelet basis functions available in Pyradiomics include haar, dmey, sym, db, coif, bior and rbio. The Coiflet wavelet exhibits better smoothness compared to other wavelets, making it suitable for extracting low-frequency information from medical images while maintaining good frequency localization characteristics to preserve image details. Therefore, this paper chose the default Coiflet wavelet basis function “coif1” in Wavelet. Through wavelet transform, an image can be decomposed into subbands of different frequencies, with each subband representing detailed information at different scales. In 2D experiments, the LL sub-band represents the overall contour and structure of the image, the HH sub-band represents subtle texture and edge information, the LH sub-band represents horizontal low-frequency and vertical high-frequency information, and the HL sub-band represents horizontal high-frequency and vertical low-frequency information. In 3D experiments, we obtain higher-level sub-bands such as LLH, LHL, LHH, HLL, HLH, HHL, and HHH by applying wavelet transform to these sub-bands again.

In the LoG algorithm, sigma is a standard deviation parameter used to adjust the size of the Gaussian kernel to control the smoothing effect. Increasing the sigma value will enhance the smoothing effect and reduce the impact of noise, but it may also result in some loss of image details. Conversely, decreasing the sigma value can preserve more details but may not effectively remove noise. Therefore, we chose five different sigma values to obtain a more comprehensive radiomics feature set.

Finally, a total of 919-dimensional radiomics features were extracted from the 2D input, while 1288-dimensional radiomics features were extracted from the 3D data.

2) Construction of Radiomics

In this paper, we updated the automatic machine learning (Auto-ML) approach presented in reference [47] and established a new automatic machine learning (A-ML) framework for conducting radiomics feature analysis of ovarian tumors, as shown in Figure 4. The process includes preprocessing, feature selection, oversampling, and classifier, ultimately achieving the automatic classification of benign and malignant tumors.

FIGURE 4. - The flow of A-ML.
FIGURE 4.

The flow of A-ML.

Within A-ML, the first step involves three preprocessing operations: outlier detection, normalization, and Analysis of Variance (ANOVA) [48]. Following this, five common feature selection methods are employed for dimensionality reduction. Subsequently, the Synthetic Minority Over-sampling Technique (SMOTE) [49] and Random Over Sampling Example (ROSE) algorithm [50] are employed to address class imbalance issues, improve machine learning model learning for minority samples, and stabilize model performance. Finally, Logistic Regression (LR), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN) are utilized for binary classification of benign and malignant tumors. And “NO” indicates the exclusion of the method. The A-ML model takes 3D and 2D radiomic features as inputs in this study and the 5-fold random search is employed to find the optimal parameters during the training phase. Finally, the model’s performance is assessed independently using the test set.

The only difference between A-ML and Auto-ML [47] lies in the dimension of the feature selection method. A-ML is based on extracting radiomics features to determine the dimension of feature selection, rather than using a fixed dimension directly. This can avoid undefined dimension selection and thus not overlook differences between different datasets.

Consequently, a total of 108 ((1+5 ) ^{\ast } (1+2 ) ^{\ast } ~3~^{\ast } ~2 ) radiomics models were developed for the experimental classification of ovarian benign and malignant tumors, encompassing both 3D and 2D. The overall classification process of this radiomics experiment is depicted in Figure 5.

FIGURE 5. - The flowchart of classification for radiomics.
FIGURE 5.

The flowchart of classification for radiomics.

D. DL Model

Effective early detection can reduce cancer mortality while minimizing unnecessary surgical interventions caused by false positive screening. Artificial intelligence-assisted tools using DL technology can streamline the evaluation process of OC screening for radiologists, thereby enhancing their work efficiency and diagnostic accuracy. Radiologists analyze the imaging findings of OC patients by observing different slices of CT images, to identify potential tumor heterogeneity areas and make diagnostic conclusions based on clinical information. To mimic this process, we developed a DL model named Dual-view Global Representation and Local Cross Transformer (D_GR_LCT) that takes dual views of the same ovary as input and integrates the global representation with the local information generated by the local cross-attention transformer for global-local analysis to distinguish benign and malignant ovarian tumors better.

1) Data Preprocessing

In Figure 6, 3D images of benign and malignant ovarian tumors are presented. There are distinct morphological differences between benign and malignant tumors, each containing different information. Furthermore, as shown in Figure 2, the largest ROI is mostly distributed in the axial and coronal planes. Therefore, this study utilizes axial and coronal views as the original inputs for the DL model.

FIGURE 6. - Stereo image of ovarian benign and malignant tumors.
FIGURE 6.

Stereo image of ovarian benign and malignant tumors.

To extract effective features from ROI while disregarding the influence of surrounding interference information, we have developed a Detection and Crop of Tumor Region (DCTR) module. The DCTR module is used to detect tumor areas and extract ROI. The processing steps are as follows: 1) Use the findContours function in OpenCV to find contours in the mask image and obtain their coordinates; 2) Extract the original ROI from the original image based on these coordinates, and resize it to 512\times 512 ; 3) Finally, enhance the brightness of the processed ROI for use as input image data for deep learning models. Figure 7 shows an example of an axial view of a benign case before and after processing by the DCTR module.

FIGURE 7. - A certain benign case of axial view before and after preprocessing.
FIGURE 7.

A certain benign case of axial view before and after preprocessing.

The data augmentation methods employed in the paper include vertical flip, horizontal flip, affine transformation (with a rotate value of 20, translate_percent value of 0.1, shear value of 20, and scale value ranging from 0.8 to 1.2), and elastic transformation (alpha value of 10, sigma value of 15), with probability values set at 0.5. Figure 8 illustrates the visual effects of eight data augmentation methods on the training set, with basic parameters annotated in the figure. It is beneficial to apply vertical and horizontal flips to enhance the diversity of data, while affine transformation can simulate image changes at various angles and positions by rotating, translating, scaling, and clipping, thereby increasing the richness of data. Moreover, Elastic transformation can simulate image deformation, which makes the model more robust. After implementing these data augmentation methods, each image in the training set becomes eight times larger than its original size.

FIGURE 8. - Data augmentation process visualization.
FIGURE 8.

Data augmentation process visualization.

2) Construction of DL Model

The overall structure of the D_GR_LCT model proposed in this paper is illustrated in Figure 9. The construction concept of the entire model is as follows: Initially, the model inputs were specified as axial view and coronal view images of ovarian patients processed after the DCTR module, and then, the features (\mathrm {U}_{\mathrm {A}} and \mathrm {U}_{\mathrm {C}} in Figure 9) of these two views are extracted using the backbone model (convnext_nano/densenet121/resnet101/tf_efficientnetv2_s). Then, The basic deep features (\mathrm {U}_{\mathrm {Axial}} and \mathrm {U}_{\mathrm {Coronal}} in Figure 10) extracted by the backbone, are simultaneously processed by the Global Representation (GR) module and Local Cross Transformers (LCT) module to generate global representation and local information. Finally, this global representation and local information are input into a Multi-Layer Perception (MLP) classification layer, with the final prediction results being output.

FIGURE 9. - Overall structure of the proposed model D_GR_LCT.
FIGURE 9.

Overall structure of the proposed model D_GR_LCT.

FIGURE 10. - Structure of LCT module.
FIGURE 10.

Structure of LCT module.

To capture the global information of the image, we developed the GR module as shown in Figure 9. This module combines the features \mathrm {U}_{\mathrm {Axial}} and \mathrm {U}_{\mathrm {Coronal}} of dual views extracted by the backbone and performs feature dimensionality reduction by generalized mean pooling (GMP).\begin{align*} \mathrm {GR}\left ({{ \mathrm {U}_{\mathrm {Axial}},\mathrm {U}_{\mathrm {Coronal}} }}\right)=\mathrm {GMP}\left ({{ \mathrm {Concat}(\mathrm {U}_{\mathrm {Axial}},\mathrm {U}_{\mathrm {Coronal}}) }}\right) \tag {5}\end{align*}

View SourceRight-click on figure for MathML and additional features.

The global representation information produced by the GR module is derived from the entire image, addressing the limitations of the LCT module in capturing local information.

To gain a better understanding of the model, the LCT module in the model will be presented separately, as depicted in Figure 10. The LCT module is responsible for generating the local information between dual views. To enhance learning of the dependency between dual views, the LCT module incorporates the concept of Cross Transformer, where it adopts an attention mechanism following the Cross-shaped Window Self-attention method in CSW [51]. This mechanism achieves self-attention by forming horizontal and vertical stripes of a cross window, thereby widening the receiving domain of each token and enhancing its context modeling ability. In this paper, to facilitate interaction between information from dual views within this DL model, a Cross Attention Transformer (CAttnT) module is designed. It applies horizontal and vertical cross attention, respectively. Unlike using the Cross-shaped Window Self-attention method, we utilize CAttnT to perform cross attention on input information (\mathrm {U}_{\mathrm {Axial}} and \mathrm {U}_{\mathrm {Coronal}} ) from both dual views into the LCT module. The horizontal CAttnT exchanges Q generated by one group of heads with another group of heads, while vertical CAttnT exchanges Q generated by a different set of heads. By doing so, CAttnT realizes cross attention between dual views through exchanging Q generated by their respective horizontal and vertical stripes.

Assuming that after applying CAttnT, A and C represent the output of U_{\mathrm {Axial}} and U_{\mathrm {Coronal}} , respectively; then we can define the output of the LCT module as:\begin{align*} & \mathrm {LCT}\left ({{ U_{\mathrm {Axial}},U_{\mathrm {Coronal}} }}\right) \\ & =\mathrm {Concat}\left ({{ \mathrm {GAP}\left ({{ \mathrm {A} }}\right),\mathrm {GAP}\left ({{ \mathrm {C} }}\right) }}\right) \\ & \mathrm {A} \\ & =\mathrm {Concat}(A_{1},\ldots,A_{k},\ldots,A_{K})W^{O},~k=\mathrm {1,\ldots,}K \\ & \mathrm {C} \\ & =\mathrm {Concat}(C_{1},\ldots,\mathrm {C}_{k},\ldots,\mathrm {C}_{K})W^{O},~k=\mathrm {1,\ldots,}K \tag {6}\end{align*}

View SourceRight-click on figure for MathML and additional features.

\mathrm {W}^{\mathrm {O}}\in \mathrm {}\mathrm {R}^{\mathrm {C\times C}} denotes the projection matrix and the output dimension is set to C. The K value is set to 2, 4, and 8, and the final value was determined based on the optimal AUC. GAP stands for global average pooling, and LN stands for layer normalization.

For the cross attention of both \mathrm {U}_{\mathrm {Axial}} and \mathrm {U}_{\mathrm {Coronal}} , \mathrm {U}_{\mathrm {Axial}} is uniformly divided into non-overlapping equal high horizontal stripes \left [{{ \mathrm {U}_{\mathrm {Axial}}^{1},\ldots,\mathrm {U}_{\mathrm {Axial}}^{\mathrm {x}},\ldots,\mathrm {U}_{\mathrm {Axial}}^{\mathrm {X}} }}\right] and non-overlapping equal wide vertical stripes \left [{{ \mathrm {U}_{\mathrm {Axial}}^{1},\ldots,\mathrm {U}_{\mathrm {Axial}}^{\mathrm {y}},\ldots,\mathrm {U}_{\mathrm {Axial}}^{\mathrm {Y}} }}\right] , and \mathrm {U}_{\mathrm {Coronal}} performs the same operation. The dynamic stripe width (SW) is used to divide the stripes, and it can adjust the balance between the model’s learning ability and computational complexity. In this paper, SW was set with values of 1, 2, and 4. The method for determining the value of SW is consistent with the method for determining the value of K.\begin{align*} \left [{{ \mathrm {U}_{\mathrm {Axial}}^{1},\ldots,\mathrm {U}_{\mathrm {Axial}}^{\mathrm {x}},\ldots,\mathrm {U}_{\mathrm {Axial}}^{\mathrm {X}} }}\right ]& =\mathrm {U}_{\mathrm {Axial}}, \\ \mathrm {U}_{\mathrm {Axial}}^{\mathrm {x}}\in \mathrm { }\mathrm {R}^{\left ({{ \mathrm {sw\times W} }}\right)\times \mathrm { C}},\mathrm { X}& =\mathrm {H} \mathord {\left /{{\vphantom {\mathrm {H} {\mathrm {sw}}}}}\right. \hspace {-1.2pt} } {\mathrm {sw}}\mathrm {} \\ \left [{{ \mathrm {U}_{\mathrm {Axial}}^{1},\ldots,\mathrm {U}_{\mathrm {Axial}}^{\mathrm {y}},\ldots,\mathrm {U}_{\mathrm {Axial}}^{\mathrm {Y}} }}\right ]& =\mathrm {U}_{\mathrm {Axial}}, \\ \mathrm {U}_{\mathrm {Axial}}^{\mathrm {y}}\in \mathrm { }\mathrm {R}^{\left ({{ \mathrm {sw\times H} }}\right)\times \mathrm { C}},\mathrm { Y}& =\mathrm {W} \mathord {\left /{{\vphantom {\mathrm {W} {\mathrm {sw}}}}}\right. \hspace {-1.2pt} } {\mathrm {sw}} \\ \left [{{ \mathrm {U}_{\mathrm {Coronal}}^{1},\ldots,\mathrm {U}_{\mathrm {Coronal}}^{\mathrm {x}},\ldots,\mathrm {U}_{\mathrm {Coronal}}^{\mathrm {X}} }}\right ]& =\mathrm {U}_{\mathrm {Coronal}}, \\ \mathrm {U}_{\mathrm {Coronal}}^{\mathrm {x}}\in \mathrm { }\mathrm {R}^{\left ({{ \mathrm {sw\times W} }}\right)\times \mathrm { C}},\mathrm { X}& =\mathrm { H/sw} \\ \left [{{ \mathrm {U}_{\mathrm {Coronal}}^{1},\ldots,\mathrm {U}_{\mathrm {Coronal}}^{\mathrm {y}},\ldots,\mathrm {U}_{\mathrm {Coronal}}^{\mathrm {Y}} }}\right ]& =\mathrm {U}_{\mathrm {Coronal}}, \\ \mathrm {U}_{\mathrm {Coronal}}^{\mathrm {y}}\in \mathrm { }\mathrm {R}^{\left ({{ \mathrm {sw\times H} }}\right)\times \mathrm { C}},\mathrm { Y}& =\mathrm { W/sw} \tag {7}\end{align*}

View SourceRight-click on figure for MathML and additional features.

Assuming that the projection Q (query), K (key) and V (value) dimensions of the \mathrm {k}^{\mathrm {th}} head are \mathrm {d}_{\mathrm {k}} , the output of the \mathrm {k}^{\mathrm {th}} head of \mathrm {U}_{\mathrm {Axial}} after cross attention is defined as:\begin{align*} A_{k}=\begin{cases} \displaystyle \left [{{ A_{k}^{1},\ldots,A_{k}^{x},\ldots,A_{k}^{X} }}\right ],& k=\mathrm {1,\ldots,}K\mathrm {/2} \\ \displaystyle \left [{{ A_{k}^{1},\ldots,A_{k}^{y},\ldots,A_{k}^{Y} }}\right ],& k=\frac {K}{2}\mathrm {+1,\ldots,}K \end{cases} \tag {8}\end{align*}

View SourceRight-click on figure for MathML and additional features.\begin{align*} A_{k}^{\mathrm {x}}& =CAttnT(Q_{\mathrm {Coronal}}^{x},K_{\mathrm {Axial}}^{\mathrm {x}},V_{\mathrm {Axial}}^{x}) \\ A_{k}^{\mathrm {y}}& =CAttnT(Q_{\mathrm {Coronal}}^{y},K_{\mathrm {Axial}}^{y},V_{\mathrm {Axial}}^{\mathrm {y}}) \\ Q_{\mathrm {Coronal}}^{x}& =U_{\mathrm {Coronal}}^{x}W_{k}^{Q},K_{\mathrm {Axial}}^{x}=U_{\mathrm {Axial}}^{x}W_{k}^{K}, \\ V_{\mathrm {Axial}}^{x}& =U_{\mathrm {Axial}}^{x}W_{k}^{V} \\ Q_{\mathrm {Coronal}}^{\mathrm {y}}& =U_{\mathrm {Coronal}}^{\mathrm {y}}W_{k}^{Q},K_{\mathrm {Axial}}^{y}=U_{\mathrm {Axial}}^{\mathrm {y}}W_{k}^{K}, \\ V_{\mathrm {Axial}}^{y}& =U_{\mathrm {Axial}}^{y}W_{k}^{V}\end{align*}
View SourceRight-click on figure for MathML and additional features.

\mathrm {W}_{\mathrm {k}}^{\mathrm {Q}} \in \mathrm {R}^{\mathrm {C\times }\mathrm {d}_{\mathrm {k}}},\mathrm {W}_{\mathrm {k}}^{\mathrm {K}}\in \mathrm { }\mathrm {R}^{\mathrm {C\times }\mathrm {d}_{\mathrm {k}}},\mathrm {W}_{\mathrm {k}}^{\mathrm {V}}\in \mathrm { }\mathrm {R}^{\mathrm {C\times }\mathrm {d}_{\mathrm {k}}} denote the projection matrix of Q, K, and V of the \mathrm {k}^{\mathrm {th}} head, respectively. \mathrm {d}_{\mathrm {k}} denotes the channel dimension of the kth head with the value C/K. The Q, K, and V of CAttnT come from the features of different views, respectively, and the CAttnT is calculated as:\begin{align*} & CAttnT\left ({{ Q_{\mathrm {Coronal}}^{\mathrm {x}},K_{\mathrm {Axial}}^{x},V_{\mathrm {Axial}}^{x} }}\right) \\ & =softmax\left ({{ \frac {Q_{\mathrm {Coronal}}^{\mathrm {x}}\left ({{ K_{\mathrm {Axial}}^{x} }}\right)^{T}}{\sqrt d_{k}} }}\right)V_{\mathrm {Axial}}^{\mathrm {x}} \tag {9}\end{align*}

View SourceRight-click on figure for MathML and additional features.

We will demonstrate the efficacy of the GR module and the LCT module in the ablation experiments. Figure 11 shows how the output (\mathrm {A}_{\mathrm {k}} and \mathrm {C}_{\mathrm {k}} ) after cross attention is obtained.

FIGURE 11. - The diagram for calculating 
$\mathrm {A}_{\mathrm {k}}$
 and 
$\mathrm {C}_{\mathrm {k}}$
.
FIGURE 11.

The diagram for calculating \mathrm {A}_{\mathrm {k}} and \mathrm {C}_{\mathrm {k}} .

3) Single-View Model

To demonstrate the superior effectiveness of the double view over the single view, we compared the model performance between S_GR_LT (the single view only) and D_GR_LCT. The structure of the S_GR_LT model is illustrated in Figure 12, where the attention mechanism in the LT module does not include the cross part.

FIGURE 12. - Overall architecture of S_GR_LT.
FIGURE 12.

Overall architecture of S_GR_LT.

4) Implementation Details

All DL experiments in this paper were conducted using PyTorch. Each batch consisted of five samples, and the model was trained for 20 epochs. The AdamW optimizer [52] was utilized to minimize the binary cross-entropy (BCE) loss with a learning rate (lr) of 0.0001, and a weight decay of 0.01. OneCycleLR was employed to dynamically adjust the lr based on BCE loss, where the maximum learning rate was set at 0.0001 and the proportion of lr increase was 0.1.

E. Fusion Model Based on Radiomics & DL Model

We have developed radiomics methods and DL technology to distinguish between benign and malignant ovarian tumors in III.C and III.D. However, in order to fully leverage the advantages of both approaches, this paper will integrate radiomics and the DL model. Building upon references [53] and [54], we propose a novel decision-level fusion method that employs the Stacking method to merge the decision layers of the Radiomics and DL model. Initially, the outputs of the Radiomics and DL model are utilized as inputs, followed by determining optimal parameters for LR, SVM, and KNN algorithms on the training set. Finally, predictions for the test set are calculated using the combined model. The specific fusion method is illustrated in Figure 13.

FIGURE 13. - Fusion at the decision level.
FIGURE 13.

Fusion at the decision level.

SECTION IV.

Results and Analysis

This paper employs unified evaluation metrics, specifically AUC-ROC and AUC-PR.

A. Results Based Radiomics Only

Table 4 presents the results of 2D and 3D radiomics tests for each fold, including mean and standard deviation values. It is evident from the table that, on average, 3D radiomics outperforms 2D radiomics. Specifically, the average AUC-ROC for 3D radiomics is 88.35% with a standard deviation of 9.7%, and the average AUC-PR is 88.73% with a standard deviation of 8.61%. Compared to 2D radiomics, there is an average improvement of 3.98% and 2.09% in AUC-ROC and AUC-PR, respectively. Therefore, the use of radiomics with 3D ovarian tumor data demonstrates superior performance, as illustrated in Figure 14, which shows the ROC and PR curves for 3D radiomics tests in each fold.

TABLE 4 Results of Radiomic Tests in 2D and 3D
Table 4- Results of Radiomic Tests in 2D and 3D
FIGURE 14. - The test ROC and PR Curves of each fold of 3D radiomics data.
FIGURE 14.

The test ROC and PR Curves of each fold of 3D radiomics data.

1) Comparative Experiments

To assess the impact of adding LoG and Wavelet in the radiomics experiment, we used the radiomics features of the Original parameter as the control experiment. Table 5 presents the experimental results of the control experiment. We analyzed the original data and the number of features and their classification effects after adding LoG and Wavelet, and conducted statistical comparisons, with specific data shown in Table 6. From Table 6, it is clear that the introduction of LoG and Wavelet significantly enhances the richness of radiomics features: the 3D features increased from 105 to 1288, and the 2D features increased from 100 to 919; In addition, LoG and Wavelet also have a significant positive impact on the classification results. For example, for 3D input, introducing LoG and Wavelet resulted in an increase of 7.2% in AUC-ROC and an increase of 3.89% in AUC-PR.

TABLE 5 Results of Radiomics Comparative Experiment
Table 5- Results of Radiomics Comparative Experiment
TABLE 6 Analysis of Results
Table 6- Analysis of Results

2) Summary Based Radiomics

Based on the original images of ovarian tumor CT enhanced scanning in the arterial phase, this paper conducted four different radiomics experiments to achieving early diagnosis and prediction of OC through quantitative analysis and mining CT data. The experimental process of this study is as follows: Firstly, after analyzing the original images, we decided to use 3D and 2D (axial) data as inputs. Additionally, LoG and Wavelet image preprocessing methods were integrated based on the original images to enhance the extracted multi-class radiomics features. Subsequently, after feature preprocessing, we used a A-ML to train and test the data set that had been split, automatically selecting the best combination model.

Through comparative experiments, we have confirmed the effectiveness of incorporating LoG and Wavelet. These methods not only enriched radiomics features but also further consolidated the data foundation for classifying benign and malignant ovarian tumors. At the same time, we compared the impact of different input data (3D/2D) on ovarian tumor classification when extracting radiomics features. The results showed that both 2D and 3D achieved AUC-PR values exceeding 85%, with AUC-ROC values of 84.37% for 2D and 88.35% for 3D.

In conclusion, this paper demonstrates the effectiveness of radiomics methods in classifying ovarian tumors, with results indicating that using 3D input yields better outcomes than using 2D input.

B. Results Based DL Model Only

During the experiment, we utilized the Bootstrap method for self-help (Bootstrap) to estimate the 95% confidence interval (CI) of AUC-ROC and AUC-PR as described in reference [55]. This method allows us to assess model performance stability by repeating sampling (default 2000 times), calculating AUCs and CIs, and evaluating classification model performance. The mean, lower bound, and upper bound of the 95% CI for AUC-ROC and AUC-PR are displayed in Table 7. The test results demonstrate that our DL model (D_GR_LCT) significantly outperforms the single-view model (S_GR_LT). In the test set, D_GR_LCT achieves an average AUC-ROC of 88.15% and an average AUC-PR of 85.17%, and S_GR_LT achieves an average AUC-ROC of 81.61% and an average AUC-PR of 77.72%.

TABLE 7 Test Results of Single/Double-View Models
Table 7- Test Results of Single/Double-View Models

Therefore, it can be observed from test results that our dual-view model D_GR_LCT significantly outperforms single-view model S_GR_LT for both metrics — achieving an average improvement of 6.54% in AUC-ROC and a substantial increase averaging at 7.45% in AUC-PR compared to S_GR_LT’s performance designed. Figure 15 illustrates the test ROC and PR curves for each fold of the DL model (D_GR_LCT).

FIGURE 15. - The test ROC and PR Curves of each fold of DL model (D_GR_LCT).
FIGURE 15.

The test ROC and PR Curves of each fold of DL model (D_GR_LCT).

1) Ablation Experiments

To assess the effectiveness of each component of the proposed dual-view model (D_GR_LCT), four ablation experiments were conducted in this paper to test the effectiveness of each component: ① S-Backbone, ② D-GR, ③ D-LT, and ④ D-LCT, respectively, in classifying benign and malignant ovarian tumors. Based on the results of ablation experiments, we have added the performance of single-view and double-view models (⑤ S_GR_LT and ⑥ D_GR_LCT) to demonstrate their classification capabilities, as shown in Table 8. The results of the ablation experiments indicate:

  1. The GR and LT module in the single-view model test results (S_GR_LT) showed an average increase of 3.87% in AUC-ROC and 4.47% in AUC-PR compared with S-Backbone, confirming the effectiveness of GR and LT modules in the single view model.

  2. Comparing D-LT with D-LCT revealed that our proposed CAttnT module improved AUC-ROC by 1.23% and AUC-PR by 0.97% on average when extracting local representation information, demonstrating the effectiveness of the CAttnT module.

  3. When conducting ablation experiments on S-Backbone, D-GR, and D-LCT, it was observed that both D-GR and D-LCT yielded improved test results compared with S_Backbone; for example, their respective average AUC-ROC values were 77.74%, 83.05%, and 82.62%; The average AUC-PR values were 73.25%, 81.75%, 80.05%. This indicates the effectiveness of both components within the model; However, the improvements made by a single component are not substantial enough to fully account for the information regarding ovarian tumors.

  4. The combination of the GR module and LCT module positively impacted double-view model performance as evidenced by an average increase in AUC-ROC to 88.15% and AUC-PR to 85.17% for D_GR_LCT.

TABLE 8 Test Results of Ablation Experiments
Table 8- Test Results of Ablation Experiments

Therefore, the ablation experiment confirms the effectiveness of all components of D_GR_LCT.

2) Comparison Experiments

Apart from ablation experiments, we compared the proposed D_GR_LCT with four other commonly used classification models: ① ConvNext_tiny [56], ② MobileNetv3_small [57], ③ EfficientNet_b2 [58], ④ EfficientNet_b3. The results are shown in Table 9. It is not difficult to find that the D_GR_LCT model proposed in this paper outperforms the four models mentioned above.

TABLE 9 Test Results of Comparison Experiments
Table 9- Test Results of Comparison Experiments

3) Summary Based DL Model

In the context of DL, this paper focuses on the dual-view (Axial and Coronal) as the research object, utilizing Backbone to extract basic deep features and designing GR and LCT modules. The LCT module includes the CAttnT module to achieve cross-attention, ultimately outputting classification results through MLP. We named this model D_GR_LCT. Additionally, to compare the differences between single-view (Axial) models and dual-view models, a single-view DL model was also designed. Furthermore, we conducted ablation experiments and comparative experiments for the D_GR_LCT model, demonstrating the superiority of our proposed model in terms of effectiveness and performance.

All experimental results are based on the average values after five-fold cross-validation, with specific data detailed in Table 10.

TABLE 10 Average Test Results Based on DL
Table 10- Average Test Results Based on DL

C. Results Based Radiomics & DL

In the current study, when 3D data is input into A-ML, radiomics achieved the average test AUC-ROC of 88.35% and AUC-PR of 88.73%. The best DL model identified in this study is our proposed D_GR_LCT, which achieved corresponding average results of 88.15% for AUC-ROC and 85.17% for AUC-PR on the test set.

By combining the decision-level outputs of the best results from both radiomics and the DL model, Table 11 displays the fused results, with Figure 16 providing a visualization of the fusion outcomes. It can be observed from these representations that our proposed decision-level fusion method is effective for analyzing ovarian CT datasets in this experiment, yielding an impressive AUC-ROC as high as 91.35% and AUC-PR as high as 90.20%. The fused results demonstrate an average improvement of 3% for AUC-ROC and 1.47% for AUC-PR over radiomics alone; similarly, they show an average improvement of 3.2% for AUC-ROC and 5.03% for AUC-PR over the DL model.

TABLE 11 Test Results of Fusion Model
Table 11- Test Results of Fusion Model
FIGURE 16. - The visualization of the fusion outcomes.
FIGURE 16.

The visualization of the fusion outcomes.

In conclusion, the classification model proposed in this paper for distinguishing between benign and malignant ovarian tumors has produced significant outcomes in the fields of Radiomics, DL, and Radiomics & DL.

  1. A total of 1288 dimensional radiomics features were extracted in this study when 3D data was used as input. These features included Shape, First-Order, Second-Order texture features, and High-Order texture features. To enhance the image information, LoG and Wavelet were added to preprocess the images. Input the radiomics feature set into A-ML to automatically determine the optimal result. Experiments were designed to validate the effectiveness of these preprocessing operations.

  2. In terms of DL, this paper introduces the D_GR_LCT model — a dual-view (i.e., axial and coronal) global-local parallel analysis method based on ovarian CT images. The classification task is accomplished by extracting global representation and local information from different views. To capture dependencies between dual views and enhance information exchange among different view features, we have developed a CAttnT module for extracting local information. Comparative and ablation experiments have been conducted to demonstrate the effectiveness of our proposed model in this paper.

  3. To integrate the decision-level information of radiomics and the DL model (D_GR_LCT), this study further developed and implemented a comprehensive classification model for distinguishing between benign and malignant ovarian tumors using the Stacking method, as shown in Figure 13. The final experimental results demonstrate that the classification outcomes of the fused decision level are more accurate than those obtained from a single radiomics or a single DL model for benign and malignant ovarian tumors, thus directly validating the effectiveness of the proposed fusion method.

SECTION V.

Conclusion

OC is known as the number one killer among female cancers, but benign tumors have a good prognosis. Therefore, early detection of ovarian tumors is vital. Currently, the classification of ovarian tumors into benign and malignant categories relies heavily on pathological biopsies and imaging examinations, which may yield results with unrepeatability and subjectivity. To mitigate these factors and improve patient outcomes, this paper proposes a model for classifying benign and malignant ovarian tumors based on radiomics and DL.

In radiomics, different experiments were designed to obtain the best models. In terms of DL, the maximum ROI information was fully utilized to achieve training with dual views in an end-to-end manner. A global-local parallel analysis method was adopted in DL, and after the parallel analysis of global representation and local features of ovarian CT dual-view images, the task of classifying benign and malignant ovarian tumors for the first time using dual views as input was completed. Moreover, this paper employs the Stacking method for decision-level fusion, which produces more precise classification results compared to individual radiomics or the DL model.

However, there are still some limitations in our current work that require further improvement. To introduce the model proposed in this paper into clinical practice, we will pay more attention to other research related to the ovarian classification of benign and malignant tumors based on dual views. Additionally, considering potential variations in equipment and parameters used for data collection, it is essential to verify the robustness of our model through additional data, such as multi-center datasets, before promoting its clinical use.

Overall, the proposed model is promising but requires further refinement before it can be widely implemented in clinical settings. Therefore, in our future work, we will continue to optimize the network structure and are eager to apply it in a wider range of medical scenarios.

Conflicts of Interest

None declared.

References

References is not available for this document.