Journals & Magazines >IEEE Access >Volume: 8

Single Low-Dose CT Image Denoising Using a Generative Adversarial Network With Modified U-Net Generator and Multi-Level Discriminator

Results of removing noise from the low-dose lung CT image.

Abstract:

Low-dose CT (LDCT) images have been widely applied in the medical imaging field due to the potential risk of exposing patients to X-ray radiations. Given the fact that re...Show More

Metadata

Abstract:

Low-dose CT (LDCT) images have been widely applied in the medical imaging field due to the potential risk of exposing patients to X-ray radiations. Given the fact that reducing the radiation dose may result in increased noise and artifacts, methods that can eliminate the noise and artifacts in the LDCT image have drawn increasing attentions and produced impressive results over the past decades. However, recent proposed methods mostly suffer from noise remaining, over-smoothing structures or false lesions derived from noise. In this paper, we propose a generative adversarial network (GAN) with novel architecture and loss function for restoring the LDCT image. Firstly, the inception-residual block and residual mapping are incorporated in the U-Net structure. The modified U-Net is applied as the generator of the GAN network so that the noise feature can be eliminated during the forward propagation. Secondly, a novel multi-level joint discriminator is designed by concatenating multiple convolutional neural networks (CNNs) where the output of each deconvolutional layer in the generator is compared with the corresponding down-sampled ground truth image. The adversarial training can be sensitive to noise and artifacts in different scales with this discriminator. Thirdly, we novely define a loss function consisting of the least square adversarial loss, VGG based perceptual loss, MSE based pixel loss and the noise loss, so that the differences in pixel, visual perception and noise distribution are comprehensively considered to optimize the network. Experimental results on both simulated and official simulated clinical images have demonstrated that the proposed method can provide superior performance to the state-of-the-art methods in noise removal, structure preservation and false lesions elimination.

Results of removing noise from the low-dose lung CT image.

Published in: IEEE Access ( Volume: 8)

Page(s): 133470 - 133487

Date of Publication: 02 July 2020

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2020.3006512

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

X-ray computed tomography (CT) plays an important role in medical imaging and has been widely used in modern clinical institutions recently. Due to the potential risk of genetic damages or cancers caused by patients’ exposure to radiations [1], [2], there is an increasing concern on the radiation dose to the patients, leading to the well-known guiding principle of as low as reasonably achievable (ALARA). However, decreasing the radiation dose leads to extra noise and artifacts in a reconstructed image, degrading the diagnostic information. Therefore, Many works that can remove noise and improve the quality of low-dose CT (LDCT) images have been proposed in the past decades, which can be generally divided into three categories: 1) sinogram domain filtering [3]–[6], 2) iterative reconstruction [7]–[10], and 3) image processing [11]–[15].

Sinogram domain filtering methods considered the 2-D sinogram signals as 2-D images and applied image processing methods to remove noise from them, such as penalized weighted least-squares (PWLS) algorithm [4], bilateral filtering [5], and structural adaptive filtering [16]. In [6], Liu et. al. attempted to remove noise through a 3-D representation-based feature decomposition of the projected attenuation component and the noise component using a well-designed composite dictionary containing atoms with discriminative features. The sinogram filtration methods performed well when the characteristics of noise in the data domain were well known. However, these methods would lose structure and spatial resolution in the reconstructed images when small edges were filtered out. Moreover, the raw data was not always accessible for commercial scanners.

A wide range of iterative reconstruction (IR) algorithms have been proposed for LDCT reconstruction by the researchers over the past decade and adopted by many major CT vendors [17]. The scanner geometry and physical properties of the imaging processing [7] were first simulated. Then statistical noise models [18] and prior information in the image domain, such as total variation (TV) and its variants [9], [19], as well as dictionary learning [10], [20], [21], were incorporated with the system model to optimize objective functions as the reconstructed image. These algorithms improved the image quality a lot but they still suffered from losing details and remaining artifacts. In addition, they required expensive computational cost due to the iterative nature.

Different from the sinogram filtration and the iterative reconstruction methods, image processing algorithms could be directly applied to LDCT images. For example, the well-known non-local means (NLM) method [22], which calculated the weighted average of similar neighbourhoods, was adapted by Ma et al. for CT image denoising [11]. The block-matching 3D (BM3D) algorithm [23] grouped similar 2D CT image patches into 3D arrays, applied 3D transform and filtered the corresponding coefficients so as to remove noise in various image denoising tasks efficiently [13], [24]. Stemming from sparse representation, Chen et al. [12] proposed to adapt a patch-based K-SVD algorithm [25] to suppress mottled noise and streak artifacts in abdomen CT images. Although these methods can improve the image quality effectively, over-smoothing and residual errors were still existed in the resulted image because the noise was non-uniformly distributed in CT images.

Deep learning (DL) methods have proved to provide superior performance in many image-based applications [26] and have attracted more and more attentions in medical image processing field. Chen et al. [27] proposed a noise reduction method for low-dose CT by training a deep convolutional neural network (CNN) patch-by-patch. And they improved this original work by integrating autoencoder, deconvolution network and residual block into the so-called residual encoder-decoder convolutional neural network (RED-CNN) [28]. Kang et al. [29] proposed a wavelet network by combining a deep CNN with a directional wavelet approach, leading to greater noise reduction and shorter reconstruction time. Gholizadeh-Ansari et al. [30] proposed a deep residual network with dilated convolution that can pass the signal to the higher layers through identity mappings and achieved good results with fewer layers and less computational costs. Chen et al. [31] unfolded an existing iterative framework for CT reconstruction into a deep-learning network where the key parameters were learned from training samples. In [32], Liu et al. adopted the stacked sparse denoising autoencoders to construct a low-dose CT restoration network, which was not only capable of suppressing noise but also recovering structural details. Yin et al. [33] proposed a domain progressive 3D residual convolution network (DP-ResNet) for LDCT imaging procedure containing sinogram domain network (SD-net), filtered back projection (FBP) and image domain network (ID-net), which could improve the performance of noise removal significantly.

Very recently, with that generative adversarial network (GAN) has become one of the most impressive variants of CNN for computer vision and pattern recognition [34], there were increasing attempts to use GAN for image noise removal [35]–[37]. Specifically for LDCT restoration, Wolterink et al. [38] proposed to use a conditional GAN (CGAN) [39] where a network consisting of seven consecutive convolutional layers with small convolutional kernels was used as generator while the discriminator was a network aiming at differentiating real routine-dose CT images from denoised low-dose CT images by optimizing the cross-entropy loss. Nie et al. [40] employed a 3D fully convolutional network (FCN) together with residual connection as the generator and utilized an additional image gradient difference term in the loss function. In [41], the authors revealed that using mean squared error (MSE) as loss function would overlook subtle image features critical for human perception. To solve this problem, Yang et al. [42] proposed to replace the JS divergence by Wasserstein distance to measure the differences between distributions of generated images and ground truths. They also introduced the distances between features extracted by a pre-trained VGG-19 network [43] as perceptual loss of the GAN. With the same Wasserstein adversarial loss and perceptual loss, Shan et al. [44] proposed a 2D Conveying Path-based Convolutional Encoder-decoder (CPCE) network and entended it to 3D model, resulting in better performance in noise suppression and structure preservation. Liao et al. [45] incorporated a feature pyramid network (FPN)-based discriminator and a differentially modulated focus map to the least squares GAN (LSGAN) [46], outperforming other methods in correcting cone-beam artifacts in the image. Yi et al. [47] introduced a sharpness loss in addition to adversarial loss and perceptual loss to ensure the final sharpness of the image and the faithful reconstruction of low-contrast regions. Du et al. [48] attempted to inject visual attention knowledge into the learning process of GAN to provide powerful prior of the noise distribution, so the network would not only pay special attention to noisy regions and surrounding structures but also explicitly assess the local consistency of the recovered regions.

Although these GAN-based denoising methods can provide convincing performance, they suffer from the following drawbacks: 1) noise was transferred into the decoding blocks along with the shortcut connection from the corresponding encoding blocks, resulting in a large amount of noise remaining in the generated image, even some false lesions generated from the noise; 2) only the result from final decoding blocks of generator was sent to the discriminator, ignoring the impact of results from pervious decoding blocks on the final result, and 3) Wasserstein distance was useful for stabilizing the GAN training but not good enough for improving the image quality [49].

To overcome the shortcomings of the existing LDCT image denoising methods, in this work we propose a generative adversarial network with novel architecture. As shown in Fig. 1, we modify the U-Net [50] as generator by adding inception-residual blocks on each of the shorcut connection routines and replacing the concatenating connection by the residual mapping. For the discriminator part, we propose a novel architecture by combining discriminating results from multiple CNNs, each of which independently distinguishes the output of every deconovolutional layer from the corresponding downsampling layers of the real normal-dose CT images, so that the adverserial training is sensitive to noise and artifacts in different scales. To further improve the performance of the proposed network, we defined a loss function consisting of the following parts: 1) the least square loss as adversarial loss, which can improve the stability of the training process, 2) the Euclidean distances between features extracted by VGG-19 network as perceptual loss to make the generated image more similar as the true image under the human visual perception, 3) the MSE between the generated image and the ground truth as content loss to guarantee the generated image closer to the true image in the pixel level, and 4) the MSE between the removed noise and the real noise as noise loss to make sure the noise could be removed more precisely. The experimental results illutrate that our proposed network performs better than the state-of-the-art methods on two different public CT image datasets with respect to various evaluation criteria.

FIGURE 1.

The overall structure of the proposed GAN for LDCT image restoration.

Show All

The rest of this paper is organized as follows. We introduce some background knowledge in Section II. The architecture of our proposed network is described in Section III. The experimental results are then presented and discussed in Section IV. Finally, we conclude our work in Section V.

SECTION II.

Background

A. Noise Reduction Model

Given a normal dose CT (NDCT) image $I_{ND}\in R^{w\times h}$ of size $w\times h$ , the generation of the corresponding LDCT image $I_{LD}\in R^{w\times h}$ can be simplified as follows:

$\begin{equation*} I_{LD}=N(I_{ND}),\tag{1}\end{equation*}$ View Source

where

$N:R^{w\times h} \rightarrow R^{w\times h}$

represents the degradation caused by quantum noise and other factors. Typically, the denoising process is defined as:

$\begin{equation*} I_{ND} = N^{-1}(I_{LD}),\tag{2}\end{equation*}$

View Source

where

$N^{-1}$

denotes the inverse function of

$N$

. Due to the complicated reconstruction scheme from the noisy photon measurements to the LDCT image, it is not clear that how the LDCT image is related to the NDCT image, making it difficult to estimate the noise modeling function

$N$

as well as the denoising function

$N^{-1}$

. However, deep learning based method provides an alternative approach by ignoring the noise model and learning a neural network as a mapping function:

$\begin{equation*} \widehat {I_{ND}}=\mathcal {F}(I_{LD};\Theta),\tag{3}\end{equation*}$

View Source

where

$\Theta$

denotes the parameters of the optimal network

$\mathcal {F}$

. Therefore, noise reduction actually aims to solve the following problem:

$\begin{equation*} \mathcal {F}(I_{LD};\Theta)=\arg \min _{\theta }\left \|{ f(I_{LD};\theta)-I_{ND} }\right \|^{2}_{2},\tag{4}\end{equation*}$

View Source

where

$f$

is the network with the parameters

$\theta$

trained using deep learning methods.

B. Least Square Generative Adversarial Network (LSGAN)

In contrast to the original GAN, which employs a sigmoid cross-entropy loss and leads to vanishing gradients problem and unstable training, LSGAN uses the least square loss with $a-b$ coding scheme for the discriminator, where $a$ and $b$ denote the labels for generated and real data, which are commonly set as 0 and 1, respectively. The objective functions for LSGAN can be defined as:

$\begin{align*} \min _{D}V_{LSGAN}(D)=&\frac {1}{2}\mathrm {E}_{x\sim p_{data}(x)}[(D(x)-b)^{2}] \\&+\,\frac {1}{2}\mathrm {E}_{z\sim p_{z}(z)}[(D(G(z))-a)^{2}] \\ \min _{G}V_{LSGAN}(G)=&\frac {1}{2}\mathrm {E}_{z\sim p_{z}(z)}[(D(G(z))-c)^{2}],\tag{5}\end{align*}$ View Source

where

$x$

is the true data,

$p_{data}(x)$

is the probability distribution of

$x$

$z$

denotes the noise data to generate predicted data

$G(z)$

$p_{z}$

is the probability distribution of

$z$

, E is the expectation value of the data under the certain probability distribution.

$c$

represents the value that

$G$

expects

$D$

to believe for the generated data, which is usually set as 1. Therefore, the generator

$G$

could generate samples as close as possible to the ground truth.

Inspired by the model of noise reduction with neural network and the advantages of LSGAN, we make use of LSGAN to remove noise in the LDCT image. The details of the proposed LSGAN architecture are dicussed in the following section.

SECTION III.

Network Architecture

A. Objective

As discussed in the above section, we take advantage of the LSGAN as the basic structure of our proposed network. Fig. 1 shows the overall architecture of the proposed method, where the generator $G$ is trained to transform an LDCT image $x_{l}$ (Fig. 2(a)) to a de-noised image (Fig. 2(c)) similar as the corresponding NDCT image $x_{n}$ (Fig. 2(b)). The discriminator $D$ consisiting of $D_{k}$ is used for the adversarial training, while the perceptual network, MSE between $x_{n}$ and $G(x_{l})$ , MSE between noise in $x_{l}$ and noise estimated from $G(x_{l})$ are used for additional perceptual and structural normalization. Similarly as that in Eq. 5, the adversarial objective functions for the proposed model can be expressed as:

$\begin{align*} \min _{D}\mathcal {L}_{A}(D;G)=&\sum _{k=1}^{N} \min _{D}\mathcal {L}_{A}(D_{k};G) \\=&\sum _{k=1}^{N} \mathrm {E}_{x_{n}\sim p(x_{n})}\left [{ \left \|{ D_{k}(x_{n})-1 }\right \|^{2} }\right] \\&+\, \mathrm {E}_{x_{l}\sim p(x_{l})}\left [{ \left \|{ D_{k}(G(x_{l})) }\right \|^{2} }\right],\tag{6}\\ \min _{G}\mathcal {L}_{A}(G;D)=&\sum _{k=1}^{N} \min _{G}\mathcal {L}_{A}(G;D_{k}) \\=&\sum _{k=1}^{N} \mathrm {E}_{x_{l}\sim p(x_{l})}\left [{ \left \|{ D_{k}(G(x_{l}))-1 }\right \|^{2} }\right],\tag{7}\end{align*}$ View Source

where

$D_{k}$

is the

$k$

-th sub-discriminator used for the training at

$k$

-th scale, which will be detailed in Sec. III-C.

$\mathrm {E}_{x_{n}\sim p(x_{n})}$

and

$\mathrm {E}_{x_{l}\sim p(x_{l})}$

denote the expectation over the sampled true NDCT image data

$x_{n}$

and the sampled input LDCT image data

$x_{l}$

respectively. The generator

$G$

tries to synthesize a virtual NDCT as “real” as possible to “fool” the discriminator

$D$

, while

$D$

is trained to differentiate the true NDCT image

$x_{n}$

from the generated “fake” NDCT image

$G(x_{l})$

FIGURE 2.

The training image pairs of the proposed network. (a) is the input noisy image, (b) is the ground truth image without noise, and (c) is the “fake” de-noised image generated by the proposed network.

Show All

It is well considered that only using the adversarial loss is far from enough for noise removal. Therefore, we follow the theory proposed in [43] by applying VGG-19 to calculate the perceptual loss for better training performance. The using of perceptual loss can solve the situation that two images can look the same to human beings but be quite different mathematically [51]. The perceptual loss is calculated as follows:

$\begin{equation*} \mathcal {L}_{p}=\frac {1}{N_{i}} \left \|{ \phi ^{(i)}(x_{n})-\phi ^{(i)}(G(x_{l})) }\right \|_{1},\tag{8}\end{equation*}$ View Source

where

$\phi ^{(i)}(\cdot)$

represents the feature maps extracted by the

$i$

-th layer of the pre-trained VGG-19 net

$\phi$

, and

$N_{i}$

is the number of elements in

$\phi ^{(i)}(\cdot)$

. In this work, we use the original pre-trained VGG-19 network and set

$i=8$

because it empirically works well. Feature maps from the layers lower than 8 are too shallow to reflect the global information while those higher than 8 are too deep to preserve the structure details.

Together with the above two loss functions, the traditional mean square error (MSE) between the generated image $G(x_{l})$ and ground truth $x_{n}$ is used as the content loss which is formulated as follows:

$\begin{equation*} \mathcal {L}_{c}=\frac {1}{N_{x}}(x_{n}-G(x_{l}))^{2},\tag{9}\end{equation*}$ View Source

where

$N_{x}$

is the total number of pixels in the image. The reason we use MSE (L2-norm) as the loss function instead of the mean absolute error (MAE, L1-norm) is that the MSE loss is more sensitive to the subtle differences between the recovered image and the ground truth caused by noise or nodules.

Furthermore, we propose to measure the difference between the noise map of LDCT image and the noise removed by the generator, which is computed as:

$\begin{equation*} \mathcal {L}_{n}=\frac {1}{N_{x}}(\left |{x_{n}-x_{l} }\right |-\left |{G(x_{l})-x_{l} }\right |)^{2},\tag{10}\end{equation*}$ View Source

where

$N_{x}$

$x_{n}$

and

$x_{l}$

are the same as noted above. By adding the loss of noise into the total loss function, the network directly takes the amount of noise into consideration. The generator attempts to simulate an image where the removed noise from the input image is as close as that added to the ground truth image, so that it could be as similar to the ground truth image as possible. Further as we know, it is the first attempt to include the noise amount into the loss function for medical image restoration.

Given the above losses, the objective of our proposed network is:

$\begin{align*}\mathcal {L}=&\arg \min _{G}\min _{D}(\lambda _{1}\mathcal {L}_{A}(D;G)+\lambda _{2}\mathcal {L}_{A}(G;D)+\lambda _{3}\mathcal {L}_{p}(G) \\&\qquad \qquad \qquad \qquad \qquad {+\,\lambda _{4}\mathcal {L}_{c}(G)+\lambda _{5}\mathcal {L}_{n}(G)),} \tag{11}\end{align*}$ View Source

where

$\lambda _{1}$

$\lambda _{2}$

$\lambda _{3}$

$\lambda _{4}$

and

$\lambda _{5}$

denote the weights of different losses so that the training could be balanced.

B. Generator

Fig. 3 shows the architecture of the proposed generator, where the U-Net [50] is used as the basic structure since it can recover the fine-grained details well in the generated image. However, noise in each convolution layer is transfered into the deconvolution layer without any “filtration” in shortcut connection, resulting in much noise remained in the generated image. In order to solve this problem, we apply the inception-residual block to each shortcut connection and change the connection mode from concatenation to residual mapping so as to filter noise as much as possible.

FIGURE 3.

The structure of the proposed generator.

Show All

1) Inception-Residual Block

The inception-residual block was proposed by Szegedy et. al. in [52] and was well known for its ability to reflect the inception structure of an image. It combined the advantages of both the inception architecture [53] and the residual connections [54] so that it could reflect the multi-scale visual features and remove the noise features while retaining high computational efficiency. In this work, we propose to use 4 inception-residual blocks with different sizes to process 4 different convolution layers before connecting them to the corresponding deconvolution layers. Fig. 4(a) shows the Inception-ResNet-A block used on the first two convolution-deconvolution shortcut connections, where the parameters are modified so that it can fit the sizes of the first two convolution layers in the proposed U-Net structure. Fig. 4(b) shows the Inception-ResNet-B block used on the last two convolution-deconvolution shortcut connections, where we also somehow adjust the parameters so as to make the block consistency with the sizes of the last two convolution layers in the U-net structure. The reason of using different blocks for different shortcut connections is that the noise is less with the U-Net going deeper and there is no need to make use of a too complicated inception block for a relatively simple input to make the image over-smooth.

FIGURE 4.

The structure of the proposed inception-residual blocks used in the generator. (a) is the Inception-ResNet-A module used on the shortcut connection routines for the first two convolution layers, (b) is the Inception-ResNet-B module used on the shortcut connection routines for the last two convolution layers.

Show All

2) Residual Mapping

In a traditional U-Net, each convolution layer is concatenated with the corresponding deconvolution layer, which usually leads to the problem of model degradation and parameter explosion, reducing the training efficiency and accuracy. Inspired by the outstanding denoising performance of ResNet50 and its invariants [54], we change the mode of shortcut connection so that the proposed generator could mimick an image with less noise. As shown in Fig. 5, the convolution layer processed by the inception-residual block is added directly to the corresponding deconvolution layer, which is mathematically expressed as follows:

$\begin{equation*} SC_{i}=Conv_{K-i}+Deconv_{i},\quad i=1,2,3,4\tag{12}\end{equation*}$ View Source

where

$SC_{i}$

is the result of the

$i$

-th shortcut connection, which is the sum of the

$i$

-th deconvolution layer and the corresponding

$K-i$

-th convolution layer.

$K$

is the total number of the convolution or deconvolution layers. By changing the connection mode, the network could go deeper than the conventional U-Net so that more noise could be removed from the generated image.

FIGURE 5.

The structure of the proposed residual mapping used for connecting the convolution layer and the corresponding deconvolution layer. (a) is the concatenation connection used in the traditional U-Net structure, (b) is the residual mapping connection used in the modified U-Net structure as generator.

Show All

Some key parameters of the layers in the proposed generator are illustrated in Table. 1. With the proposed inception-residual blocks on shortcut connections and the residual mapping mode of connection, the modified U-Net performs well on removing noise. The experiments to compare the functions of different parts in the generator will be described in Sec. IV.

TABLE 1 Parameters of Some Key Layers in the Proposed Generator Structure

TABLE 2 Distribution of the Samples in the Simulated Dataset

C. Discriminator

Multi-layer convolutional neural networks (CNNs) are usually used as discriminators of traditional LSGANs, where the inputs are both the generated images and the ground truth images. Considering that the convolution only preserve the image structure [55], only differentiating the generated image and the ground truth using a single CNN would lose the discrimination of image details, resulting in loss of details in the generated image. To solve this problem, we propose a multi-level joint discriminator consisting of several sub-discriminators. As shown in Fig. 6, every individual sub-discriminator is a $n$ -layer CNN and takes the deconvolution layer of the generator and the corresponding down-sampled ground truth image as inputs. Directly comparing the generated image and the ground truth image with CNN could only reflect their perceptual similarity in general, omitting the measurement of the image details restoration. Therefore, we first calculate the difference between the input image and the ground truth image, and the difference between the input image and the generated image, which reflect the details lost from the ground truth image, and the details recovered by the generator, respectively. Then the two differences are compared as the adversarial loss to measure the ability of the generator to recover the lost details of the LDCT image from the NDCT image at the corresponding level. Finally, the results of these sub-discriminators are consequently fused to derive the final discriminating result, which can be mathematically expressed as follows:

$\begin{equation*} D = \frac {1}{N}(D_{1}+D_{2}+\cdots +D_{N}),\tag{13}\end{equation*}$ View Source

where

$D_{n},n=1,2, {\dots },N$

represent the discrimination score between the

$n$

-th deconvolution layer and the corresponding “down-sampled” ground truth image.

$D$

is the final score of the whole discriminator. The lower the value is, the more similar the deconvolution layer and the image are. Since the “down-sampling” could preserve more details from the image than the convolution operation, the image generated by iteratively “up-sampling” the deconvolution layers that are close to the corresponding “down-sampled” ground truth could be close to the ground truth image with respect to the details.

FIGURE 6.

The structure of the proposed multi-level discriminator.

Show All

D. Network Training

The pair-wise 2D slices of the recovered image $G(x_{l})$ from the generator $G$ and the ground truth image $x_{n}$ are fed to the pre-trained VGG-19 network for extracting features and calculating the perceptual loss $\mathcal {L}_{p}$ . Together with the content loss $\mathcal {L}_{c}$ , the noise loss $\mathcal {L}_{n}$ , and the adversarial loss $\mathcal {L}_{A}$ from the initial discriminator network, the objective loss is computed according to Eq. 11 and back-propagated to update the weights of $G$ while fixing the parameters of the discriminator network. After that, we calculate the objective loss using the image generated from the updated generator and the ground truth image. The reconstruction error is used to update the weights of $D$ only while keeping the $G$ parameters unchanged. The training of $G$ and $D$ is implemented alternatively until convergence.

SECTION IV.

Experiments

To evaluate the effectiveness of the propose method, we apply it to both simulated low-dose CT and official simulated low-dose CT images and compared the performance with several state-of-the-art image reconstruction algorithms, including K-SVD [25], BM3D [23], KAIST-NET [29], RED-CNN [28], WGAN-VGG [42], SAGAN [47]. For quantitative analysis, peak signal to noise ratio (PSNR) and structured similarity index (SSIM) are used as the quantitive metrics for the evaluation. The PSNR is commonly used as the measurement of the pixel-wise differences between two images whereas the SSIM is used as the reflection of the human visual perceptual differences. For qualitative analysis, reader study is performed on 10 groups of images in terms of artifact reduction, noise suppression, contrast retention, lesion discrimination and overal quality. In the following, we introduce the selections of datasets, parameters setting, implementation of experiments, and discuss the experimental results of comparing the proposed method with state-of-the-art methods with respect to different evaluation metrics.

A. Experimental Datasets

1) Simulated Dataset

For the simulated noisy data, 4036 normal-dose CT images with the size of $512\times 512$ are downloaded from The Cancer Imaging Archieve (TCIA) [56], including different parts of human body for diversity. A fan-beam geometry is used to transform every NDCT image into the sinogram, leveraging 937 detectors and 1200 views. The corresponding LDCT images are then produced by adding noise into these simulated sinograms. Usually the photons hitting the detector are treated as Possion distributed while the electrical noise in low-dose cases is regarded as Gaussian distributed. However, we discard the electrical noise for simplicity and the projection measurements $N$ from a low-dose CT scan can be therefore expressed as:

$\begin{equation*} N \sim \text {Poisson}(N_{0}\text {exp}(-y)),\tag{14}\end{equation*}$ View Source

where

$N_{0}$

is the X-ray source intensity called blank flux, and

$y$

denotes the sinogram data.

In our experiments, the blank scan flux $N_{0}$ is set to $1 \times 10^{4}$ and $1 \times 10^{5}$ to simulate effects of different dose levels. As shown in Fig. 2, for each blank scan flux level, 4036 low-dose CT images are generated from the normal-dose ones. Since we try to do single image restoration, we do not make use of the information during the 2D CT image series. Therefore, 3740 of the normal-dose CT images and corresponding simulated low-dose images are randomly selected as the training set, while the rest image pairs are used as the testing set.

2) Official Simulated Clinical Data

For the official simulated clinical data, we use the dataset authorized by Mayo Clinics for “the 2016 NIH-AAPM-Mayo Clinic Low Dose CT Grand Challenge”, which contains 2378 full and quarter dose $512\times 512$ CT images from 10 patients [57]. Cross-validation is utilized in the testing phase, that is, full-dose and quarter-dose CT image pairs from each patient are used as testing set while image pairs from the other 9 patients are used as training set. Note that what we do is single image restoration as discussed above, every image from the patients is treated as an independent case so the amount of the images is large enough for the network training.

B. Parameter Setting

In our experiments, several parameter combinations are evaluated and the parameters are empirically set as follows. The base learning rate is set to $10^{-4}$ , and slowly decreased down to $10^{-5}$ . The convolution and deconvolution kernels are initialized with random Gaussian distributions with zero mean and standard deviation 0.01. The filter number of last layer is set to 1 and others are set to 96. The kernel size of all layers are set to $5 \times 5$ . The strides of convolution and deconvolution are set to 1 with no padding. In the loss function shown by Eq. 11, $\lambda _{1}$ and $\lambda _{2}$ are set to 0.1, $\lambda _{3}$ is set to 0.01, while $\lambda _{4}$ and $\lambda _{5}$ are set to 0.001. Note that these parameters are so far set empirically and we will try to find out an optimal way in the future.

All the experiments are performed through Python with the Tensorflow and Keras libraries on an Intel Xeon Silver 4110 2.1GHz PC with 32G RAM and an NVIDIA TITAN Xp graphic processing unit card with 12G RAM.

C. Comparator Methods

Several different state-of-the-art methods are compared with our proposed method, including K-SVD [25], BM3D [23], KAIST-NET [29], RED-CNN [28], WGAN-VGG [42], SAGAN [47]. Dictionary learning [25] and BM3D [23] are two most popular image-based denoising methods already widely applied for LDCT. KAIST-NET [29] is one of the most recently proposed CNN-based LDCT denoising method, which can be considered as a deepened variant of the lightweight CNN model [27]. RED-CNN [28] proposes a successful attempt in applying U-net structure [50] in medical image denoising. It replaces the pooling/unpooling layers of U-net with convolution/deconvolution pairs. WGAN-VGG [42] and SAGAN [47] are both state-of-the-art image reconstruction methods based on GAN structure. WGAN-VGG adopts the WGAN structure to generate de-noised image and combines the perceptual loss calculated by the VGG network and the WGAN adversarial loss to keep the image content after de-noising. SAGAN utilizes U-net structure with long skip connections as the generator of GAN and proposes a sharpness detection network to calculate the sharpness loss as a complement of the perceptual loss and adversarial loss. The KAIST-NET, RED-CNN, WGAN-VGG and SAGAN are fine-tuned using our training samples and the hyper-parameters are adjusted according to our experimental experience.

D. Implementation of Experiments

The experiments are implemented as follows:

the test LDCT images in different databases are processed to provide reconstruction NDCT images by the proposed method, as well as by the comparator state-of-the-art methods;
a blind reader study is performed on 10 groups of images for qualitative analysis. The images processed by different methods are sent to two radiologists to independently score every image in terms of artifact reduction, noise suppression, contrast retention, lesion discrimination and overall quality on a five-point scale (1 =unacceptable and 5 =excellent). The mean and standard deviation values of the scores from the two radiologists are calculated as the final evaluation results;
for quantitative analysis, the resulting de-noising images are first compared with the ground truth to generate the mean squared error:
$\begin{equation*} \text {RMSE}=\sqrt {\frac {1}{m\times n}\sum _{i=0}^{m-1}\sum _{j=0}^{n-1}[I_{r}(i,j)-G(i,j)]^{2}},\tag{15}\end{equation*}$ View Source where $m$ and $n$ represent the width and height of the image. Then we calculate the peak signal to noise ratio (PSNR) as follows: $\begin{equation*} \text {PSNR}=10\cdot \log _{10}\left({\frac {\text {MAX}_{I_{r}}^{2}}{\text {MSE}}}\right),\tag{16}\end{equation*}$ View Source where $\text {MAX}_{I_{r}}$ is the maximum possible pixel value of the image. It is set to 255 in our experiments since the pixels of the images are represented using 8 bits per sample. MSE represents the mean squared error as defined above. PSNR is used to evaluate the performance of the proposed method in removing noise, while MSE is used to assess the ability of the proposed method in preserving the small nodules, without discarding them as noise;
structured similarity index (SSIM) is taken into account to evaluate the performance of the proposed method in reconstructing images $I_{r}$ that are preceptually similar to the ground truth images $G$ . SSIM is mathematically defined as:
$\begin{align*} \text {SSIM}(I_{r},G)=\frac {(2\mu _{I_{r}}\mu _{G}+C_{1})(2\sigma _{I_{r}G}+C_{2})} {(\mu _{I_{r}}^{2}+\mu _{G}^{2}+C_{1})(\sigma _{I_{r}}^{2}+\sigma _{G}^{2}+C_{2})}, \\ {}\tag{17}\end{align*}$ View Source where $\mu _{I_{r}}$ , $\mu _{G}$ , $\sigma _{I_{r}}$ , $\sigma _{G}$ , and $\sigma _{I_{r}G}$ represent the local means, standard deviations and cross-covariance for images $I_{r}$ and $G$ , respectively. $C_{1}=(k_{1}L)^{2}$ and $C_{2}=(k_{2}L)^{2}$ are variables to stabilize the division with weak denominator, where $L$ the dynamic range of the pixel values that is set to 255 and $k_{1}$ and $k_{2}$ are set to 0.01 and 0.03 in our experiments.

E. Examinations of Design Options

Table. 3 and Table. 4 illustrate the performance of using different modules in the proposed network on the testing data from two datasets. Fig. 7 shows an example LDCT image processed by the network with different strategies. The comparisons demonstrate that the inception-residual blocks(IRB), residual mapping(RM), the multi-level joint discriminator(MLJD) and the combination of adversarial loss, perceptual loss, MSE loss and noise loss (CL) can significantly improve the image restoration in terms of PSNR and SSIM. Fig. 8 shows the absolute differences between images processed by different methods (Fig. 7(b) to (g)) and the normal-dose CT image (Fig. 7(a)), where it can be observed more clearly that the proposed method provides the smallest difference, proving that it can preserve most details and suppress most noise and artifacts.

TABLE 3 Quantitative Results (Mean±Sd of PSNR, SSIM and MSE) Associated With Different Modules in the Proposed Network for the Images in the Simulated Testing Dataset

TABLE 4 Quantitative Results (Mean±Sd of PSNR, SSIM and MSE) Associated With Different Modules in the Proposed Network for the Images in the MAYO Clinical Testing Dataset

FIGURE 7.

Results of removing noise from the low-dose lung CT image by the proposed network with different modules. (a) NDCT, (b) LDCT, (c) U-Net+VGG, (d) U-Net(IRB)+VGG, (e) U-Net(IRB+RM)+VGG, (f) U-Net(IRB+RM)+MLJD, (g) U-Net(IRB+RM)+MLJD+CL.

Show All

FIGURE 8.

Absolute differences between the NDCT image and the de-noised images in Fig. 7. Brighter color represents larger difference. (a)LDCT, (b) U-Net+VGG, (c) U-Net(IRB)+VGG, (d) U-Net(IRB+RM)+VGG, (e) U-Net(IRB+RM)+MLJD, (f) U-Net(IRB+RM)+MLJD+CL.

Show All

Fig. 9 shows the pixel-wise similarity between the generated NDCT images and the ground truth NDCT images over the training process of the proposed method. It demonstrates that the proposed network achieves a convergence rate after about 500 training iterations.

FIGURE 9.

Pixel-wise similarity between the generated NDCT images and the ground truth NDCT images over the training process of the proposed method.

Show All

F. Comparisons With Other Models

1) Simulated Data

Two representative slices from the tesing set are used to demonstrate the performance of our proposed method. Fig. 10 shows the de-noising results of applying different methods on a $512\times 512$ chest image with the noise level $N_{0}=1\times 10^{4}$ . Fig. 11 shows the zoomed $128\times 128$ region-of-interest (ROI) marked by the red rectangle in Fig. 10. All these methods demonstrate capabilities to remove the noise from the image to different extents. However, K-SVD and BM3D cannot eliminate the streaking artifacts adjacent to the regions with high attenuation, such as the bones marked by the red arrows in Fig. 11(c) and (d). KAIST-Net and RED-CNN can remove most of the noise but suffer a bit from over-smoothing. We can see more clearly that a small nodule, marked by the blue arrows in Fig. 11(e) and (f), is mostly filtered out as noise by these two methods. It is mainly because they use the MSE as the loss of the network, making the network focus on the pixel-wise difference elimination but overlook the perceptual effect preservation of the whole structures. WGAN-VGG and SAGAN can eliminate the noise while preserving structures. But they are likely to generate some “extra” artifacts from the noise in the flat regions. As shown by the green arrows in Fig. 11(g) and (h), several noise spots that should be removed from the image are mimicked to “fake” nodules by WGAN-VGG and SAGAN. Actually, most of GAN-based medical image restoration methods would suffer from this problem because the noise passing through the generator without suppression would correspond to nodules by mistake. The proposed method adds the inception-residual blocks and residual mapping to the U-Net structure to restrain the noise while reflecting the perceptual structures with different scales. Furthermore, the proposed multi-level joint discriminator can distinguish the generated image and the ground truth image in multiple scales, leading to a stronger constraint of detail reproduction. Therefore, as shown in Fig. 11(i), it can achieve the best performance in balancing the noise removal and structure preservation without introducing unnecessary artifacts. Fig. 12 illustrates the absolute differences between images processed by different methods (Fig. 10(b) to (i)) and the normal-dose CT image (Fig. 10(a)), where it can be observed more clearly that the proposed method provides the smallest difference, proving that it can preserve most details and suppress most noise and artifacts.

$FIGURE 10. - Results of removing noise from the simulated low-dose lung CT image with noise level $N_{0}=1\times 10^{4}$ . (a) NDCT, (b) LDCT, (c) K-SVD, (d) BM3D, (e) KAIST-Net, (f) RED-CNN, (g) WGAN-VGG, (h) SAGAN, (i) the proposed network.$

FIGURE 10.

Results of removing noise from the simulated low-dose lung CT image with noise level $N_{0}=1\times 10^{4}$ . (a) NDCT, (b) LDCT, (c) K-SVD, (d) BM3D, (e) KAIST-Net, (f) RED-CNN, (g) WGAN-VGG, (h) SAGAN, (i) the proposed network.

Show All

FIGURE 11.

Zoomed ROI images of the red rectangles in Fig. 10. (a) NDCT, (b) LDCT, (c) K-SVD, (d) BM3D, (e) KAIST-Net, (f) RED-CNN, (g) WGAN-VGG, (h) SAGAN, (i) the proposed network. The arrows indicate three regions containing features revealed differently by the competing algorithms.

Show All

FIGURE 12.

Absolute differences between the NDCT image and the de-noised images in Fig. 10. Brighter color represents larger difference. (a) LDCT, (b) K-SVD, (c) BM3D, (d) KAIST-Net, (e) RED-CNN, (f) WGAN-VGG, (g) SAGAN, (h) the proposed network.

Show All

Fig. 13 shows the de-noising results from another $512\times 512$ abdominal image with noise level $N_{0}=1\times 10^{5}$ and Fig. 14 shows the zoomed $128\times 128$ ROI marked by the red rectangle in Fig. 13. Since there are more organs than those in Fig. 10, more structures are obscured, resulting in more deteriorated image. As shown in Fig. 14(c) and (d), K-SVD and BM3D draw severe artifacts in the region close to the spine (signed by the red arrows) because they take much noise into account. And to remove that large amount of noise, KAIST-NET and RED-CNN pay the price that the tissue textures are eliminated as highlighted by the blue arrows in Fig. 14(e) and (f), which will interrupt the following understanding of the image. As a GAN-based method, the proposed method also suffers from the fake structures generated by the network. However, as pointed by the green arrow in Fig. 14(i), it performs better than WGAN-VGG and SAGAN since the inception-residual blocks on the shortcut connection and the multi-level discriminator can restrain the noise during the image generation.

$FIGURE 13. - Results of removing noise from the simulated low-dose abdominal CT image with noise level $N_{0}=1\times 10^{5}$ . (a) NDCT, (b) LDCT, (c) K-SVD, (d) BM3D, (e) KAIST-Net, (f) RED-CNN, (g) WGAN-VGG, (h) SAGAN, (i) the proposed network.$

FIGURE 13.

Results of removing noise from the simulated low-dose abdominal CT image with noise level $N_{0}=1\times 10^{5}$ . (a) NDCT, (b) LDCT, (c) K-SVD, (d) BM3D, (e) KAIST-Net, (f) RED-CNN, (g) WGAN-VGG, (h) SAGAN, (i) the proposed network.

Show All

FIGURE 14.

Zoomed ROI images of the red rectangles in Fig. 13. (a) NDCT, (b) LDCT, (c) K-SVD, (d) BM3D, (e) KAIST-Net, (f) RED-CNN, (g) WGAN-VGG, (h) SAGAN, (i) the proposed network. The arrows indicate three regions containing features revealed differently by the competing algorithms.

Show All

To further show the merits of the proposed method, the mean and standard deviation values of the subjective quality scores (as described in Step 2 in Section IV-D) of the images produced by different methods are shown in Table. 5. It can be considered that K-SVD and BM3D provide good noise suppression scores but low artifact reduction, contrast retention and lesion discrimination scores. KAIST-NET and RED-CNN give high scores in both noise suppression and artifact reduction, but not as good scores in contrast retention. WGAN-VGG and SAGAN output good results in noise suppression, contrast retention and lesion discrimination but relatively low scores in artifact reduction. The proposed methods provides scores of 3.63 ± 0.27, 3.53 ± 0.22, 3.23 ± 0.20 and 3.26 ± 0.24 for noise suppresion, artifact reduction, contrast retension and lesion discrimination over the testing dataset. Given to the subjective scores of the NDCT images, which are 3.65 ± 0.25, 3.58 ± 0.23, 3.27 ± 0.24 and 3.28 ± 0.26 respectively, the proposed method performs closest to the ground truth NDCT images statistically. Compared to the state-of-the-art methods, which already provide good performance, our proposed method further improve the image quality so the recovered images have less noise and artifacts while preserving more contrast so that different lesion regions could be better discriminated. Therefore, the overall quality score of the images from the proposed method (3.31 ± 0.23) is closer to that of the ground truth NDCT image (3.40±0.25) than those of the comparators statistically.

TABLE 5 Subjective Quality Scores (Mean±Sd) for Different Algorithms

For quantitative evaluation, Table. 6 shows the mean and standard deviation values of the PSNR, SSIM and RMSE of performing different methods on all the testing images with different noise levels. According to Eq. 15 and Eq. 17, the SSIM and RMSE values of the NDCT images, i.e., the ground truth images, are 1 and 0 respectively. Using the SSIM and RMSE values of the NDCT images as benchmarks, we can find that the proposed method provides closest PSNR, SSIM and RMSE values to the values of the ground truth images in cases of adding all noise levels, which confirms our previous qualitative observations. The p-values prove that the higher PSNR, SSIM and lower RMSE values have the statistically significance, which means the better performance from the proposed method is over the whole testing dataset.

TABLE 6 Quantitative Results (mean±Sd of PSNR, SSIM and MSE) Associated With Different Algorithms for the Images in the Simulated Testing Dataset

2) Official Simulated Clinical Data

Fig. 15 show a representative $512\times 512$ slice from MAYO clinical CT scans and the corresponding restored images via different methods. Fig. 16 is the zoomed $128\times 128$ ROIs marked by red rectangles in Fig. 15. Similar as the results for simulated images, the proposed method outperforms the comparators with respect to noise removal and structure preservation as shown in Fig. 15. When taking a look at the zoomed regions in Fig. 16, K-SVD and BM3D leave some streaking artifacts without elimination and blur the low-contrast lesions as shown by the green arrows in Fig. 16(c) and (d). KAIST-NET and RED-CNN over-smoothen some low-contrast lesions in the flat regions marked by the blue arrows in Fig. 16(e) and (f), while WGAN-VGG and SAGAN over-estimate the noise as lesions, pointed by red circles in Fig. 16(g) and (h). The proposed method can mostly avoid these problems in Fig. 16(i).

FIGURE 15.

Results of removing noise from the MAYO clinical low-dose CT image. (a) NDCT, (b) LDCT, (c) K-SVD, (d) BM3D, (e) KAIST-Net, (f) RED-CNN, (g) WGAN-VGG, (h) SAGAN, (i) the proposed network.

Show All

FIGURE 16.

Zoomed ROI images of the red rectangles in Fig. 15. (a) NDCT, (b) LDCT, (c) K-SVD, (d) BM3D, (e) KAIST-Net, (f) RED-CNN, (g) WGAN-VGG, (h) SAGAN, (i) the proposed network. The arrows indicate three regions containing features revealed differently by the competing algorithms.

Show All

Table. 7 shows the mean and standard deviation values of the subjective quality scores from two experienced radiologists to judge the qualities of LDCT images processed by different methods. The proposed method gets highest scores in artifact reduction, contrast retention and lesion discrimination, while sceond highest scores in noise suppression, therefore it achieves higher overall score than the other methods.

TABLE 7 Subjective Quality Scores (Mean±Sd) for Different Algorithms

To quantitatively evaluate our proposed method, Table. 8 summarizes the mean and standard deviation values of the PSNR, SSIM and RMSE of all the testing LDCT images restored by different methods. The proposed method achieves the best SSIM and second best PSNR and RMSE (very close to the best one from RED-CNN), which accords with the aforementioned qualitative observations.

TABLE 8 Quantitative Results (Mean±Sd of PSNR, SSIM and MSE) Associated With Different Algorithms for the Images in the MAYO Clinical Testing Dataset

G. Discussions

The main target of the proposed method is to restore the LDCT image as close to the NDCT image as possible. As described in the above sections, the GAN method is adopted to recover the image by using a modified U-Net with inception-residual blocks in the short-cut connections as generator, proposing a multi-level joint discriminator consisting of multiple CNNs, and defining a novel loss function by integrating least square loss, VGG loss, MSE loss and noise loss. Compared with the state-of-the-art CT image de-noising methods, this modified GAN network is effective for better visual image quality with more structure details and less noise and artifacts.

The experimental results have demonstrated that since the noise is non-uniformly distributed in CT images, the traditional image-patch-based de-noising methods, represented by K-SVD and BM3D, are likely to suffer from the streaking artifacts adjacent to the high attenuation regions. In contrast, the deep learning based methods can provide higher image quality because they learn the structures and contents from a large amount of image data. And comparing the results of KAIST-NET and RED-CNN to those of WGAN-VGG, SAGAN and the proposed method, we can find that the GAN-based methods can avoid the over-smoothing problem that usually happens in the MSE-based deep neural networks. It can be mainly attributed to the capabilities of adversarial loss and VGG loss in preserving image visual details. However, the WGAN-VGG and SAGAN methods preserve the detailed structures at the expense of mimicking some noise into lesions, especially when the noise is adjacent to the lesion regions, making the generated images look natural but cause severe distortion for medical diagnosis. One reason for this phenomenon is that the generators in WGAN-VGG and SAGAN cannot eliminate the noise during the propagation processing. The CNN generator in WGAN-VGG is likely to capture the noise features together with the structure features. The U-Net used in SAGAN performs better than the traditional CNN one by using the multi-scale sampling strategy, but the noise still exists in the shortcut connections. The proposed method adds the inception-residual blocks on each shortcut connection route of the U-Net, so the noise can be strongly reduced with those visual structures well preserved. Another reason for the “false lesion” phenomenon is that the discriminators used in both the WGAN-VGG and the SAGAN only calculate the similarity between the generated image and the ground truth image in one scale, missing the noise that is easily confused with the lesions or structures with tiny size. The proposed method improves the works before by simultaneously computing the differences between the output from every deconvolutional layer and the corresponding downsampling layer of the ground truth image as the loss of the whole network, which can further avoid mimicking noise as lesions. Moreover, all the methods are evaluated on lung images and abdominal images from different datasets, and the p-values illustrate the statistical significance of all the PSNR, SSIM and RMSE values, so the better performance of the proposed method is robust.

Although the proposed method can generate images quite close to the ground truth NDCT image, we can observe that there is still noise in the generated image. The main reason for this problem is that the noise existed in the input image is so visually similar to those structures that the features of noise are extracted and tranferred into the output image by the network, even with the proposed inception-residual blocks in the U-Net. Actually, it is a common problem for all the deep learning based de-noising methods. How to learn the noise model and distinguish it from structure model is the question that needs deeper research. Furthermore, our proposed network is an image post-denoising method which suffers from the information loss in the input image during the FBP reconstruction. A possibly better way to adopt the capability of deep neural network in learning data patterns is to design a network that maps the sinogram signal of LDCT image to that close to NDCT image. It could be an interesting direction in our future research plan.

Another problem of the proposed method, which is also a common problem for deep learning based methods, is that the generalization capability of the network is not as high as those model-based image processing methods. It is mainly embodied in two aspects: 1) the network need to be re-trained for data with different types of noise, even we have trained the proposed network with a variety of medical images with different noise levels; and 2) the hyper-parameters, including the kernel sizes of convolutional layers and the coefficients of different parts in the loss function, need to be adjusted carefully when the dataset is changed.

H. Running Time

Table 9 shows the average running time of different methods used for recovering the LDCT images. On average, it takes our proposed method 1.84 seconds to process one $512\times 512$ LDCT image, which is faster than the WGAN-VGG and SAGAN, and slighly slower than the K-SVD, BM3D, KAIST-Net and RED-CNN. Given to the better image quality provided by the proposed method, the computational cost is acceptable. It could be further improved in practical applicaton with better hardware support.

TABLE 9 Quantitative Results (Mean±Sd of PSNR, SSIM and MSE) Associated With Different Algorithms for the Images in the MAYO Clinical Testing Dataset

SECTION V.

Conclusion

In conclusion, we have proposed a LSGAN network with novel architecture for low-dose CT image de-noising. We incoporated the inception-residual block and residual mapping in the U-Net structure and applied it as the generator of the GAN network. We proposed a novel multi-level joint discriminator to distinguish the output of each deconvolutional layer in the generator from the corresponding down-sampled ground truth image. The least square adversarial loss, VGG-19 based perceptual loss, MSE based pixel loss and the noise loss were combined together as the loss for optimizing the whole network. Experimental results on both simulated and official simulated clinical images have illustrated that the proposed method can effectively remove noise and artifacts from the image while preserving the structures and eliminating the false lesions. Therefore, the proposed method outperformed the state-of-the-art methods in the visual effect and the quantitative assessments.

ACKNOWLEDGMENT

The authors would like to thank H. Wang, Q. Hu, and Y. Wang for helpful discussions and fruitful feedback along the way. Jianning Chi would also like to thank the editor, associate editor, and referees for comments and suggestions which greatly improved this paper.

References is not available for this document.

MIT Libraries

MIT Libraries

Single Low-Dose CT Image Denoising Using a Generative Adversarial Network With Modified U-Net Generator and Multi-Level Discriminator

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Background

A. Noise Reduction Model

B. Least Square Generative Adversarial Network (LSGAN)

Network Architecture

A. Objective

B. Generator

1) Inception-Residual Block

2) Residual Mapping

C. Discriminator

D. Network Training

Experiments

A. Experimental Datasets

1) Simulated Dataset

2) Official Simulated Clinical Data

B. Parameter Setting

C. Comparator Methods

D. Implementation of Experiments

E. Examinations of Design Options

F. Comparisons With Other Models

1) Simulated Data

2) Official Simulated Clinical Data

G. Discussions

H. Running Time

Conclusion

ACKNOWLEDGMENT

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Single Low-Dose CT Image Denoising Using a Generative Adversarial Network With Modified U-Net Generator and Multi-Level Discriminator

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Background

A. Noise Reduction Model

B. Least Square Generative Adversarial Network (LSGAN)

Network Architecture

A. Objective

B. Generator

1) Inception-Residual Block

2) Residual Mapping

C. Discriminator

D. Network Training

Experiments

A. Experimental Datasets

1) Simulated Dataset

2) Official Simulated Clinical Data

B. Parameter Setting

C. Comparator Methods

D. Implementation of Experiments

E. Examinations of Design Options

F. Comparisons With Other Models

1) Simulated Data

2) Official Simulated Clinical Data

G. Discussions

H. Running Time

Conclusion

ACKNOWLEDGMENT

References