Journals & Magazines >IEEE Access >Volume: 8

SlimRGBD: A Geographic Information Photography Noise Reduction System for Aerial Remote Sensing

In this paper, how to eliminate the noise of aerial image is to be talked, the multi-channel pruning technology is used to pruning the RnResNet network. Based on this, a ...

Abstract:

In the past ten years, civil drone technology has developed rapidly, and UAV (Unmanned Aerial Vehicle) has been widely used in various industries. Especially in the field...Show More

Metadata

Abstract:

In the past ten years, civil drone technology has developed rapidly, and UAV (Unmanned Aerial Vehicle) has been widely used in various industries. Especially in the field of aerial remote sensing, the emergence of UAV technology has enabled the geographical information of remote areas that are not concerned to be quickly presented. However, UAV aerial photography is greatly affected by the weather. Pictures that use aerial drones for aerial photography in rainy weather will appear noise. In this paper, how to eliminate the noise of aerial image is to be talked, the multi-channel pruning technology is used to pruning the RnResNet network. Based on this, a new anti-convergence-convolution neural network noise reduction system for the operation of UAV airborne embedded equipment is proposed. The system is used to eliminate noise in the aerial image. This type of noise reducer has got rid of the current situation that the neural network noise reducer consumes too much power and is inefficient, and has certain advantages.

In this paper, how to eliminate the noise of aerial image is to be talked, the multi-channel pruning technology is used to pruning the RnResNet network. Based on this, a ...

Published in: IEEE Access ( Volume: 8)

Page(s): 15144 - 15158

Date of Publication: 14 January 2020

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2020.2966497

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

UAV technology has been an important part of human early aviation and has a history of more than 100 years. At present, many countries have adopted drone technology as a frontier technology. Civil drone applications have less than one decade. In modern society, remote sensing technology has become an important means for humans to acquire geographical environment and change information [1]. With the advent of the information age, the demand for remote sensing data has increased dramatically in various countries. Existing satellite remote sensing and aerial remote sensing technologies have the characteristics of obtaining large-scale macro geographic information [2]. However, for many remote sensing technology applications with high resolution requirements and fast update time, it is difficult to guarantee. The use of drones as remote sensing platforms for aerial photography and ground observations provides a new technical approach to this emergency response [3]. With the development and improvement of low-altitude drone photogrammetry technology, a large number of experiments have shown that the accuracy of UAV mapping topographic map can meet the requirements of 1:2000 topographic map.

UAV as an aerial remote sensing platform has the following advantages: (1) Fast maneuvering response capability. The low-altitude drone system has a short lift-off time, simple operation, and convenient transportation, and can quickly reach the monitoring area. Fly fast and photography quickly [4]. (2) High resolution images and high precision positioning capabilities. The spatial resolution of the image acquired by the system reached the decimeter level. They are high-resolution digital images acquired for high-resolution 3D landscape maps [1]. (3) The cost of using a drone is low. UAVs are inexpensive to design and produce because they do not install flight crew driving equipment, voice communications, and security equipment. The highly integrated design of the drones due to the universal application of digital technology makes the production cost low. UAVs are inexpensive to produce tooling and materials due to the ability to properly reduce safety requirements and allow for the extensive use of composite materials and their new manufacturing processes [5]. (4) UAV can undertake high-risk or high-tech task flights. Drivers or researchers can work safely on the ground. Flight does not cause accidents due to human error or flight measurement failure. When conducting real-time information research, the number of people working is not limited. Long-term or continuous real-time data transmission, real-time and dynamic of fidelity. Especially for environmental monitoring that cannot be reached by vehicles and ships, environmental monitoring in toxic areas, disaster monitoring and command and rescue, the UAV remote sensing system can show its unique advantages [6].

However, most of the aerial surveys of drones are subject to weather and light conditions. In the environment with good weather and sufficient sunlight, the quality of remote sensing images is higher. Remote sensing images with poor shooting conditions on rainy or weak lighting conditions are of poor quality. This is because light good or bad is the main influencing factor for shooting quality of the images [7]. Under low light conditions, the image taken by the drone camera is likely to form a noisy image. It mainly refers to the rough part of the image generated by CCD (CMOS) in the process of receiving and outputting light as the receiving signal. Digital photos taken by drones may not be noticed if they are reduced by a personal computer, if you magnify the image, then there will be a color (false color) that is not present. This false color is image noise. The cause of the noise generated by the drone camera: (1) Image noise caused by long exposure. This phenomenon mainly occurs in low-light shooting, and in the dark areas of the image, some isolated bright spots appear. It can be said that the reason is that the CCD cannot handle the huge workload caused by the slow shutter speed, causing some specific pixels to lose control. (2) Image noise generated by compressing an image in JPEG format. Since images in JPEG format still appear natural after reducing the image size, special methods can be used to reduce image data. At this point, it will process pixels in a row above and below. Therefore, especially at the edge of the pixel edge, an unnatural combination with the next pixel unit occurs. Image noise generated by compression in JPEG format is also called Block Noise. The higher the compression ratio, the more obvious the image noise. Although the noise becomes invisible after the image is reduced, the color compensation is very obvious. Such image noise can be solved by using the highest possible image quality or by recording images other than the JPEG format. (3) Image noise caused by blur filtering. Image noise caused by fuzzy filtering, like JPEG, causes image noise when processing images. Sometimes it is generated during the internal processing of a digital camera, sometimes when it is processed by image retouching software. For smaller images, image noise is produced in order to make the image appear sharper and emphasize its color edges [8].

GAN have proven to be outstanding in image noise reduction. However, compared with UAV equipment, the common GAN method for image noise reduction requires high hardware resources of the device, but the memory and computing power of UAV embedded devices are limited. It is a big challenge to use the GAN method to achieve image denoising on embedded devices with limited computing power while ensuring superior noise reduction performance. The goal of this study was to develop a model that would allow drones to produce realistic images even after processing aerial images with noisy images. The main challenges are twofold: First, the model of this study should be flexible and robust to handle the same image corrupted by different levels of noise; second, we must ensure that the denoised image is real and visually pleasing. To solve these challenges, we propose a new type of computing lightweight SlimRGBD (Slim ReCNN-GAN Blind Denoising) for image noise reduction on UAV.

The rest of this paper is orginzed as follows: In the second section, this article will introduce the efforts and shortcomings of the previous research on image noise reduction, and the problems that this article will try to solve. In the third chapter, the paper elaborates the model design and system principle of the image noise reducer in this study. The fourth chapter describes in detail the experimental process of this study, as well as the comparison of the methods used in this study with the historical advanced methods. The fifth chapter summarizes some of the characteristics of this study and the work to be carried out in the future.

SECTION II.

Related Work

A. Lightweight Convolutional Neural Network for UAV Visual Recognition

UAVs or Universal UAVs, which are computer vision enabled by onboard cameras and embedded systems, are popular in a wide range of applications. However, due to the limited memory and computing power of embedded devices, real-time scene analysis by target detection running on a drone platform is very challenging. In order to solve the problem of reducing the memory footprint of the ResNet type convolutional network architecture, Pierre Stock et al. proposed a vector quantization method. The goal is to maintain the quality of the network output refactoring, not the weight. The advantage of the vector quantization method is that it minimizes the loss reconstruction error of the input in the domain and does not require any tag data. Pierre Stock et al. also used a byte-aligned codebook to generate a compressed network for efficient reasoning on the CPU [9]. The vector quantization method was verified by quantifying a high-performance ResNet-50 model to a memory size of 5 mb (20 compression factors) while maintaining a 76.1% top-1 accuracy on the ImageNet object classification. The method can be easily adapted to simultaneously compress and transfer ResNet trained on ImageNet to other domains. However, no consideration is given to nonlinearity, and there is a certain range of reconstruction errors.

As we all know, power consumption is always positively related to the size of the function implemented by the application. UAV applications usually require low power consumption to ensure the endurance of the drone. In order to solve these problems, Pengyi Zhang et al. proposed a highly efficient target detector on the UAV through channel pruning of the convolutional layer [10]. They enhanced the channel-level sparsity of the convolutional layer by applying L1 regularization to the channel scale factor, and pruning the feature channel with less information to obtain a sparse target detector. Based on this method, Pengyi Zhang et al. proposed SlimYOLOv3, which has fewer promising parameters and floating point operations (FLOPs) than the original YOLOv3. It is a promising real-time target detection solution for drones. They evaluated SlimYOLOv3 on the VisDrone2018-Det benchmark dataset. SlimYOLOv3 achieved convincing results compared to unpruned convolutional neural networks, with PLOPs reduced by approximately 90.8% and parameter sizes reduced by approximately 92.0%. The running speed is twice that of YOLOv3 [11], and the detection accuracy is also quite good. This proves that the L1 regularization is applied to the channel scale factor to enhance the channel-level sparsity of the convolutional layer, and the convolutional neural network obtained by pruning the feature channel with less information is more efficient, faster and better. It is more suitable for the realization of tasks such as image denoising using the deep learning method on the drone.

B. Advantages and Disadvantages of Mainstream Methods for Image Noise Reduction

Existing image denoising methods mostly assume Gaussian white noise of known intensity with a uniform Gaussian distribution of noise. However, in real noise images, the noise model is usually unknown and may be more complicated. In response to this problem, Fengyuan Zhu et al. proposed a new blind image denoising algorithm for recovering clean images from noise images containing unknown noise models [12]. In order to model the empirical noise of an image, their method introduces a mixture of Gaussian distributions that is flexible enough to approximate different continuous distributions. Redefine the blind image denoising problem as a learning problem. First, a two-layer structural model of noise plaque is established, and the net spot is used as a latent variable. In order to control the complexity of the noise plaque model, Fengyuan Zhu et al. proposed a new Bayesian nonparametric prior, that is, relying on the Dirichlet process tree to build the model [12]. Then, a variational inference algorithm is derived to estimate the model parameters and restore a clean patch. This method is applied to the synthesis and real noise images of different noise models.

Convolutional neural networks have always been a research hotspot to solve image denoising problems, but their performance is still not satisfactory in most applications. The synthetic noise distribution of these network trainings does not accurately reflect the noise captured by the image sensor. Data sets for some clean noise image pairs are often used for benchmarking or for specific applications. The main reason is that the model being studied is easily over-fitting on the simplified AWGN model, which seriously deviates from the complex actual noise model. In order to improve the generalization ability of deep CNN denoiser, Shi Guo et al. proposed a method of training convolutional blind denoising network (CBDNet) [13]. In order to further provide an interactive strategy to easily correct the denoising results, they embedded a noise estimation sub-network with asymmetric learning into the CBDNet to suppress low estimation of noise levels. The main findings of this work are twofold. First, real noise models, including miscellaneous Gaussian and ISP pipelines, are key to making composite image learning models suitable for real-noise photos. Secondly, adding synthetic and real noise images to the training can improve the denoising performance of the network. Benoit Brummer et al. introduced the Natural Image Noise Data Set (NIND), a dataset of SLR-like images with different ISO noise levels, which is large enough to train the model for blind noise reduction over a wide range of noise [14]. They demonstrated a denoising model trained with NIND and showed that it is significantly better than BM3D in terms of ISO noise from unseen images, even when it is extended to images from different types of cameras. Most of Benoit Brummer’s experiments are conducted on a single U-Net [15], [16] network, but modern methods, such as conditionally generated confrontational networks (GANs), may yield better performance.

Kai Zhang et al. took a step more forward by studying the construction of feedforward denoising of convolutional neural networks (DnCNNs), witch introduced a very deep architecture, learning algorithms and regularization methods into image denoising [17]. Specifically, the residual learning and batch normalization are applied to accelerate the denoising model training process and promote the performance of image denoising. Their proposed DnCNN model is capable of dealing with Gaussian Gaussian noise (ie, Gaussian white noise) denoising at unknown noise levels, blind Gauss denoising). Using the residual learning strategy, DnCNN implicitly removes hidden images hidden in the hidden layer. This feature motivate they to train a DnCNN model to deal with some common image denoising tasks such as Hybrid Noise denoising, single image super resolution, and JPEG format image denoising. This method not only has good image denoising performance in quantitative and qualitative, but also has good running timeliness in a device with a GPU implementation [17].

Based on discriminative learning DnCNN, the most advanced denoising effect can be achieved, but these methods are not suitable for this problem due to the lack of paired training data. To tackled this problem, Jingwen Chen et al. proposed a two-step solution: (1) The training generates a Generative Adversarial Network (GAN) to estimate the noise distribution on the input noise image to generate the noise samples. (2) The noise block generated by the first step is constructed to denoise the training datasets and the training deep convolutional neural network (CNN) [18]. They attempted to improve the performance of image blind noise reduction using a deep learning-based approach in the absence of paired training data. On this basis, they proposed the GCBD algorithm [18]. The GAN is used to learn the noise distribution, establish a paired training dataset, and train the CNN for denoising. One limitation of this approach is to assume that the noise is zero mean additive noise. This type of noise is common in the natural environment and includes a wide variety of noise.

Majed El Helou et al. proposed a theory-based Gaussian noise blind learning general image denoiser (blind universal image fusion denoising network, BUIFD) [8]. This method is called fusion denoising. It has a strong generalization ability for invisible noise levels. Their approach improves the PSNR of real-world grayscale image denoising with up to 0.7 dB of training noise levels. It also improves the single-noise performance of each of the most advanced color image denoising performance, averaging 0.1 decibels.

SECTION III.

System Model and Definitions

The new SlimRGBD (Slim ReCNN-GAN Blind Denoising) we proposed for UAV image noise reduction can be seen semantically that our SlimRGBD model uses: ReCNN, neural network punning for ReCNN, GAN, blind noise reduction and other technologies. A simple framework diagram of the entire method is as Figure 1. This section will explain in detail how these technologies are applied in SlimRGBD.

FIGURE 1.

SlimRGBD method framework. “Generator” is a neural network for generating noise-reduced images, and “Discriminator” is a convolutional neural network for determining the quality of noise-reduced images. “Generator” and “Discriminator” have different neural network architectures.

Show All

A. Generative Adversarial Network

The main inspiration of GAN comes from the idea of zero-sum game in game theory [19]. When applied to deep neural networks, it is to generate G (Generator) and D (Discriminator) to continuously play, so that G learns the distribution of data. If image generation is used, G can generate realistic images from a random number after training is completed [20]. The main functions of G and D are: (1) G is a generative network that receives a random noise z (random number) and generates an image through this noise; (2) D is a discriminant network that discriminates whether the picture is “real”. Its input is x, which represents a picture, and the output D(x) represents the probability of a real picture. If D(x) is 1, it means that it is a real picture, and if D(x) is 0, it means that it is impossible to be a real picture [21].

Compared with the traditional model, it has two different networks instead of a single network, and the training method adopts the gradient update information of G in the confrontation training mode GAN from the discriminator D, instead of from the data sample [22]. GAN is a generative model that uses only backpropagation compared to other generation models (Boltzmann machines and GSNs [23]) without the need for complex Markov chains. Compared to all other models, GAN can produce clearer, more realistic samples [24]. GAN uses an unsupervised learning style training that can be widely used in unsupervised learning and semi-supervised learning. Compared to variational self-encoders, GANs does not introduce any deterministic bias, variational methods. Deterministic biases are introduced because they optimize the lower bound of the log likelihood rather than the likelihood itself, which seems to result in instances of VAEs being generated more ambiguous than GANs. Compared with VAE, GANs has no variation lower bound [25]. If the discriminator is well trained, the generator can perfectly learn the distribution of training samples. In other words, GANs are gradual, but VAE is biased. GAN is applied to some scenes, such as image style migration, super resolution, incomplete image, denoising, avoiding the difficulty of loss function design, no matter what, as long as there is a benchmark, you can use the discriminator directly, and then leave the rest The task is handed over to the confrontation training. Training GAN needs to achieve Nash equilibrium, sometimes it can be done with gradient descent, sometimes it can’t. There is no best Nash equilibrium method, so training GAN is unstable compared to VAE or PixelRNN, but in practice it is much more stable than training Boltzmann [25].

B. Noise Model

Most classic and practical image denoising models can be applied to tackled the following problems [26]: $\begin{equation*} \tilde {m} =\arg \;\min \limits _{m} \frac {1}{2~\mu }\left \|{ {n-m} }\right \|^{2}+\xi \cdot W\left ({m }\right)\tag{1}\end{equation*}$ View Source

The first portion $\frac {1}{2~\mu }\left \|{ {n-m} }\right \|^{2}$ is a data fidelity item having a different noise level $\xi$ , and the second portion $W\left ({m }\right)$ is a regularization term having a generally predefined image prior. $\xi$ is the hyperparameter that balances the two parts. The discriminative denoising model employed in this work is intended to learn the nonlinear mapping function $m=F\left ({n }\right)$ parameterized by P to predict the potentially sharp image $m$ from the noise image $n$ . Therefore, the solution of equation (1) is given by: $\begin{equation*} \tilde {m} =G\left ({{n,\mu,\xi,W;P} }\right)\tag{2}\end{equation*}$ View Source

The hinge to the implementation of this framework is the pre-defined image. This observation prompted us to learn image priors directly from the datasets. In particular, two data-driven image priors are learned at the feature level and pixel level, respectively. Before constructing a paired training data set, an approximate noise block needs to be extracted from a given noise image. These modules are then applied to train the GAN for noise modeling and noise data generation.

Properly training the GAN to simulate unknown noise is a significant step, as the noise distribution will be better estimated from the noise dominant data. Equation 2 requires a predefined noise level $\mu$ , so the trained image denoising model is not flexible enough to handle different noise levels through a GAN network. In order to complete blind image denoising, this study seeks to combine noise level information by previously learning images in the feature-class level space [26]. In particular, this study trains multiple types of discriminators at the fusion feature level and pixel level output of local and global paths to understand different level noise images, as shown in Figs. 2 and 3.

FIGURE 2.

The dotted rectangle is applied to build the loss function to learn the feature-class prior.

Show All

FIGURE 3.

Constructing a loss function with a dashed rectangle to learn pixel-class priors.

Show All

The perceptual discriminator stabilizes and improves the performance of the GAN by embedding a convolutional portion of the pre-trained deep classification network. Specifically, the extracted features of the output image from the pre-training network are connected to the output of the previous layer and then processed by the learnable convolution operation block. In this study, three-step convolutional blocks were used to implement spatial downsampling, and RnResNet was used for image feature extraction. The final classification is processed from each activation in the feature map. Since the valid receptive field for each activation corresponds to an image block on the original input image data, the discriminator actually predicts each label of each image block. The patch-based discriminator makes it useful for high-frequency modeling in image denoising by limiting the focus of the structure in the local image block.

In order to reduce the influence of the training image background, it is necessary to extract a group of approximate noise blocks from the image with weak background. In this way, noise distribution becomes the main target of model training, so as to make GAN model more accurate. Under the assumption that noise distribution variance is zero, approximate noise data can be obtained by subtracting the average value of relatively smooth modules in the noise image. The smooth module we’re talking about here is a very similar area within a component.

Based on the above, a smooth and fast patch search algorithm is utilized in this research. Make $u_{i}$ and $v_{j}^{i}$ denote a size $k\times k$ global patch and a local patch $u_{i}$ of size $l\times l$ , respectively. Each $u_{i}$ is gotten by scanning the entire noise image with the stride $e_{g}$ , and each $v_{j}^{i}$ is gotten by scanning $u_{i}$ with the stride $e_{l}$ . Whether or not $u_{i}$ is a smooth patch in the algorithm is determined by the difference between the mean and the variance between $u_{i}$ and $v_{j}^{i}$ of each $j$ . In other words, there need first define two constraints [18] $\begin{equation*} \left |{ {Avg\left ({{v_{j}^{i}} }\right)-Avg\left ({{u_{i}} }\right)} }\right |\le \phi \cdot Avg\left ({{u_{i}} }\right)\tag{3}\end{equation*}$ View Source And $\begin{equation*} \left |{ {Var\left ({{v_{j}^{i}} }\right)-Var\left ({{u_{i}} }\right)} }\right |\le \varphi \cdot Var\left ({{u_{i}} }\right)\tag{4}\end{equation*}$ View Source $Avg\left ({\Delta }\right)$ calculate the mean, and $Var\left ({\Delta }\right)$ calculate the variance, respectively, $\phi,\varphi \in \left ({{0,1} }\right)$ . If for each $j$ , two constraints are met, $u_{i}$ will be considered as a smooth patch and added to the set $E$ .

When $E=\left \{{{e_{1},e_{2},\cdots,e_{t}} }\right \}$ is obtained by applying an algorithm to all noise images, the set of approximate noise blocks $R=\left \{{{r_{1},r_{2},\cdots,r_{t}} }\right \}$ can be derived by $r_{i} =e_{i} -Avg\left ({{e_{i}} }\right)$ . The devices used in this research often produce high-resolution images. There are a large number of smooth areas that meet requirements in these images, such as rooms, agriculture and rivers. Therefore, a sufficiently smooth plaque can be found in some limited images, which in other words is that enough noise blocks can be extracted to train the GAN model in the following steps.

When the input noise image is insufficient, the number of noise blocks extracted in the previous section is very limited.In this case, deep RnResNet training using only these blocks would be unsatisfactory. In order to better promote the denoising performance of the model, one method is to model the noise distribution on these extracted blocks and generate more noise data (in other words, generate any number of samples with more diversity) for RnResNet training.

This study try to minimize the aforementioned loss of the discriminator and maximize it compared to the conversion network. This can be achieved by generating a training strategy for the antagonistic network. As a framework for estimating the generation model, GAN has the ability to learn complex distribution.What’s more, GAN can generate noise samples through forward propagation without involving other components. In addition, it can also train data through back-propagation algorithm.In this study, GAN is used to estimate the noise distribution on a group of approximate noise blocks. Due to WGAN [27] can improve the training of GAN and generate high quality samples, as previously described in the relevant work section. Therefore, in this research, WGAN-GP [28] is a evolutionary version of WGAN for learning noise distribution. The loss function in this task is $\begin{align*}&\hspace {-.5pc}Loss_{GAN} =\mathop {E}\limits _{\tilde {m} \sim D_{g}} \left [{ {W\left ({{\tilde {m}} }\right)} }\right]-\mathop {E}\limits _{m\sim D_{g}} \left [{ {W\left ({m }\right)} }\right] \\& \qquad\qquad\qquad\qquad\quad {{+\,\mathop {E}\limits _{\tilde {m} \sim D_{\tilde {m}}} \left [{ {\left \|{ {\nabla _{\tilde {m}} W\left ({{\tilde {m}} }\right)} }\right \|_{2} -1} }\right]^{2}}}\tag{5}\end{align*}$ View Source where $D_{r}$ is the distribution on $R$ , $D_{g}$ is the destruction of the generator, and $P_{\tilde {x}}$ is defined as a uniform distribution along the line between the pairs of points sampled from $D_{r}$ and $D_{g}$ . The trained image noise model is applied to generate noise samples for increasing $R$ and ultimately to obtain larger Datasets $R^{\prime }=\left \{{{r_{1}^{\prime },r_{2}^{\prime },\cdots,r_{w}^{\prime } \;} }\right \}$ .

C. Deep Resnet Retrofit & Using

Many previous studies have proposed to use a large number of pairs of datasets which contained different noise levels to train CNN to solve the problem of image noise reduction, and achieved significant results. CNN has the power of a network architecture that implicitly learns potential noise models from paired training datasets, thereby relaxing the dependence on image prior knowledge of human knowledge [29]. Therefore, the ResNet [29] module is used in our framework.

The prediction of clean images can be gotten by $y_{i} -R\left ({{y_{i};\Theta } }\right)$ (as shown in Fig. 4). The filter number of the last block is equal to the output channel number. Each other block contains 64 filters. In order to train ResNet, you need to first establish a paired training data set. The set $V'$ obtained by noise modeling using GAN, another set of images is divided into small blocks of size $k\times k$ , forming a set $X=\left \{{{x_{1},x_{2},\cdots x_{e}} }\right \}$ . Randomly add the noise block in $V^{\prime }$ to the patch in $X$ to get $Y=\left \{{{y_{1},y_{2},\cdots y_{f}} }\right \}$ , where $y_{1} =x_{j} +v_{k}^{\prime }$ . The set X and Y form a paired training datasets $\left \{{{\textrm {X,Y}} }\right \}$ . In fact, the data set was built during the denoising network training. In each training epoch, the combination of $x_{j}$ and $v_{k}^{\prime }$ is changed and a new dataset $\left \{{{\textrm {X},\textrm {Y}^{`}} }\right \}$ is gotten, which results data increase in next step [30].

$FIGURE 4. - Convolutional network architecture of the DnResNet. The input on the left is the noise image dataset $y_{i} $ and the output result on the right is $R\left ({{y_{i};\Theta } }\right)$ which is the difference between the input and the potentially clean image.$

FIGURE 4.

Convolutional network architecture of the DnResNet. The input on the left is the noise image dataset $y_{i}$ and the output result on the right is $R\left ({{y_{i};\Theta } }\right)$ which is the difference between the input and the potentially clean image.

Show All

When the paired noise-clean training dataset is established, DnResNet can be trained to eventually denoise (as shown in Fig. 5). In this experiment, a network structure similar to DnCNN was employed. DnResNet is considered to be a single residual block which predicts the residual image data, in other words is that predicts the difference between the input noise image and the potentially clean image data [31], [32]. The loss function to be minimized is set as following $\begin{equation*} Loss_{DnResNet\left ({\Theta }\right)} =\frac {1}{2\theta }\sum \limits _{i=1}^\theta {\left \|{ {R^{2}\left ({{y_{i};\Theta } }\right)-\left ({{y_{i} -x_{i}} }\right)^{2}} }\right \|_{F}^{\frac {1}{2}}}\tag{6}\end{equation*}$ View Source where $\Theta$ is the parameter of the DnResNet, $\theta$ is the size of the training image dataset, $y_{i}$ is the noisy image data, and $x_{i}$ is the base image data. Batch normalization, ReLU and residual learning strategies are also used to improve the training of deep networks.

FIGURE 5.

Convolutional network architecture of the image denoising network proposed by this study. The input noise image dataset is first processed by the N Residual Blocks (a depth residual network) to calculate low feature-class and pixel-class level features that are then divided into two paths to learn other local and global features. Our image denoising model then fuses the two paths to produce the final output.

Show All

The image noise reduction network in this paper consists of three parts: a pile of residual blocks (as shown in Fig. 4 or 5) for extracting low-level features of the input image; and two asymmetric paths for extracting local and global features, respectively. Then our architecture fuses these two paths to produce the final output.

The input noise image is first processed by a 16-layer residual network with skip connections to extract low-level features (Fig. 4). The “pre-activated” residual block is used because it is easier to train and promote than the original ResNet. For all residual blocks, use kernel size and zero rewrite to keep the size of the input space. This study also keep the number of features in all remaining blocks at 32. In addition, a skip join is added between the input feature and the output of the last residual block. As a result, large space support can be utilized to extract complex patterns [33]. The coding features are further processed by two asymmetric networks for local and global feature extraction. The local path is fully convolved and consists of two residual blocks, as shown in Fig. 4. It is designed to learn local features while preserving spatial information. The remaining connections make it easy to learn the same function, considering that the output image shares many structures with the input image, which is an attractive feature for the conversion network. The global path uses two fully connected layers to learn global features. Each fully connected layer is followed by a ReLU layer as an activation function. The global average merge layer is used to ensure that our model can handle images of any resolution [34], [35]. Finally, the global information is summarized as a fixed dimension vector and used to normalize the local features produced by the local path. The local and global features are then fused into a common set of features that are fed to the convolutional layer to produce an output [36].

D. Model Pruning

When deploying a depth model on a resource-constrained device, model compression is a useful tool for researchers to re-adjust the resource consumption required by the depth model. Existing model compression methods mainly include model pruning [37], knowledge distillation [38], [39], parameter quantification and dynamic calculation. In this section, we discuss model pruning methods in detail.

The representative process of incremental model pruning is shown in Fig. 6. The components removed from the deep model in the model pruning method can be separate neural connections or network structures. The weight trim method trims less important connections with less weight. It is conceptually easy to understand, but due to the generated irregular network architecture, it is difficult to store the pruned model and speed it up. Technically, weight trimming may not be suitable for practical applications unless a dedicated software library or dedicated hardware is designed to support the trim model. Unlike weight pruning, structured pruning is more likely to produce a regular and manageable network architecture. In order to obtain the structural importance of structured pruning, the researchers used sparse training [40], using structured sparsity regularization, including structured sparsity learning and channel scaling factor sparsity. Liu et al. proposed a simple but effective channel clipping method called network weight loss [40]. They directly use the scaling factor in the bulk normalization (BN) layer as the channel mode scaling factor, and perform L1 regularization training networks on these scaling factors to obtain channel mode sparsity. Channel pruning is a coarse-grained but effective method and, more importantly, it is convenient to implement a pruning model without dedicated hardware or software requirements [41]. They applied a network slimming method to pruning CNN-based image classifiers and significantly reduced model size and computational operations [40]. In this paper, we follow Liu’s work and extend it to a coarse-grained neural structure search method to find effective depth image noise reducers using the pruned RnResNet.

FIGURE 6.

Representative process of incremental model pruning. There are four iterative steps: (1) assessing the importance of each component in the pre-trained depth model; (2) removing components that are less important to model inference; and (3) fine-tuning the trim model to compensate for potential temporary performance degradation; 4) Evaluate the fine-tuning model to determine if the pruning model is suitable for deployment. Make sure that the incremental pruning strategy is preferred to prevent over-pruning.

Show All

Manually designing the network architecture of the depth image noise reducer does not guarantee that each component plays an important role in forward reasoning. It is recommended to learn an effective depth image noise reducer by performing channel clipping on the convolutional layer. Specifically, the goal of this paper is to search for a more compact and efficient convolutional channel configuration to help reduce trainable parameters and FLOP. To this end, channel trimming is applied in SlimRGBD to obtain SlimRGBD by following the procedure shown in Fig. 7.

FIGURE 7.

Iterative process of efficient depth image denoiser learning through SlimRGBD sparse training and channel pruning.

Show All

After sparsity training, this study introduce a global threshold $\xi$ to determine if the feature channel is to be trimmed. The global threshold $\xi$ is set to the nth percentile of all n to control the trim ratio. In addition, This study introduced local security thresholds to prevent excessive convolution of the convolutional layer and maintain the integrity of the network connection. The local security threshold is set in a hierarchical manner to the kth percentile of all ks in a particular layer. This study trim the feature channel whose scale factor is less than the minimum of the minimum. In SlimRGBD, several special connections between layers need to be carefully handled. During the pruning process, this study discard the maxpool layer and the upsampled layer directly because they are independent of the channel number. Initially, This study constructed a trim mask for all convolutional layers based on global thresholds and local safety threshold constructs. For path layers, this study connect the clipping masks of their incoming layers in order and use the connection mask as their clipping mask. The shortcut layer in SlimRGBD is similar to the residual learning in ResNet. Therefore, all layers connected to the shortcut layer need to have the same channel number. To match the feature channels of each layer connected by the shortcut layer, this study traverse the trim masks of all the connection layers and perform an OR operation on these trim masks to generate the final trim mask for these tie layers. After trimming the channel, it is recommended to perform a fine-tuning operation on the trim model to compensate for potential temporary degradation. In fine-grained object inspection tasks, detection performance is often sensitive to channel pruning. Therefore, fine-tuning is important to make the pruning model recover from a potential reduction in performance.

E. Sparse Training

The channel mode sparsity of the depth model helps channel trimming and describes the number of less important channels that may be deleted later. To facilitate channel pruning, we assign a scale factor to each channel, where the absolute value of the scale factor represents the channel importance. Specifically, in addition to detecting the header, the BN layer after each convolutional layer in SlimRGBD accelerates convergence and improves generalization [42]. The BN layer uses small batch static to normalize the convolution feature, which is formulated as equation (7). $\begin{equation*} y=\gamma \times \frac {x-\bar {x}}{\sqrt {{\sigma ^{2}+\varepsilon } }}+\beta\tag{7}\end{equation*}$ View Source

$\bar {x}$ and $\sigma ^{2}$ are the mean and variance of the input elements in the mini batch, and $\gamma$ and $\beta$ represent the scale factors and deviations that can be trained. Of course, we directly use the trainable scale factor in the BN layer as an indicator of channel importance. In order to effectively distinguish between important channels and unimportant channels, we perform channel mode sparsity training by implementing L1 regularization on $\gamma$ [43]. The training goal of sparse training is given by equation (8). $\begin{equation*} L=loss_{GCBD} +a\sum \limits _{y\in \Gamma } {f\left ({\gamma }\right)}\tag{8}\end{equation*}$ View Source where $f\left ({\gamma }\right)=\left |{ \gamma }\right |$ represents the L1 norm and $\alpha$ represents the penalty factor that balances the two loss terms.

SECTION IV.

Experiment and Analysis

A. Datasets

Train the model on the LY_Datasets, which is a remote sensing image dataset. In the present study, all of them were randomly tailored to $64 \times 64$ image patches for training. The input noise image is added by the noise level $\sigma$ in the [10, 80] range, and the corresponding sharp image is used as the GroundTruth.

B. Experiment

For the proposed SlimRGBD in this research, a set of clean image datas is applied to construct a paired noise-clean training dataset with noise image data generated by the GAN network. In order to simulate the environment for actually processing a large image, noise data is added in another set of high resolution clean image dataset to form a noise image dataset for SlimRGBD in the estimate of the synthesized data.

In the image noise extraction procedure, the parameter: $d,h,s_{g},s_{l} \;\textrm {and}\;\gamma$ is set to 64, 16, 32, 16, 0.1 and 0.25, severally. For noise GAN modeling, this article generally follows the parameter settings in DCGAN [44], [45]. For RnResNet, it trains 50 periods with an initial learning rate of 0.0001 and an Adam [46] optimizer (as shown in Fig. 8, the experimental flow of the study is shown. As shown in Fig. 9, this study also compares the process of using SGD [47] training).

FIGURE 8.

Experimental process.

Show All

FIGURE 9.

Under two gradient-based optimization algorithms ((a) SGD, (b) Adam), four specific image denoising models are trained in different combinations of residual learning (RL) and batch normalization (BN). Gaussian denoising results are shown in the two figures (with a noise level of 20). The results were evaluated on 80 natural images from LY_Datasets.

Show All

The competition methods compared in this study include BM3D [48], EPLL [49], NCSR [50], WNNM [51], Multiscale [50], DnCNN [17] and the proposed SlimRGBD. In particular, in order to reveal the limitations of the method based on discriminative learning in dealing with blind denoising problems, a blind model of DnResNet available for Gaussian image denoising is employed in the evaluation. Specifically, the DnCNN is trained with accurate Gaussian noise image data from different class levels, which achieves the prior art blind Gaussian denoising results.

The SlimRGBD in this paper uses a $3\times 3$ core for the transform network and all convolutional layers in both discriminator networks. After each convolution layer is constant followed by a batch normalization layer to stabilize and accelerate deep network training. The model was trained using Adam optimizer to achieve a random optimization of 64 in 45 epoches with a convergence time of approximately 43 hours. The initial learning rate is 0.0001, and the introduced cosine shape learning rate table smoothes the initial learning rate. Implement a deep learning framework based on TensorFlow and a single GTX 1080Ti GPU.

C. Integrated Noise Assessment

In this segment, different types of zero-avg synthesized noise image data are generated and added to LY_Datasets to evaluate all competing methods. In this evaluation, in addition to DnCNN and SlimRGBD, other methods have actual noise levels (ie, standard deviation $\sigma$ ).

This section evaluates the SlimRGBD method for synthetic and actual data. Several representative methods were compared. Four sections of experiments were carried out: (1) evaluating the accuracy of noise GAN modeling, and comparing SlimRGBD with the prior art denoising method, especially based on the discriminative learning method DnCNN, in the Gaussian blind denoising task; (2) In order to show that SlimRGBD can process more complex noises than Gaussian noise, this experiment uses mixed noise to evaluate; (3) discusses the selection of noise modeling methods, shows noise samples, and shows that the reason which GAN is selected and not other traditional methods is selected, such as GMM. A large number of experiments have proved the superiority of SlimRGBD in image blind denoising.

Since Gaussian noise is one of the widely studied noises, it is important to perform blind Gaussian denoising experiments. Table 1 shows the different results for all comparison methods. Although no noise information is provided, SlimRGBD is still superior to BM3D, EPLL, WNNM and Multiscale. In particular, SlimRGBD and DnCNN achieve comparable results. This is impressive because DnCNN uses accurate data for training, while SlimRGBD uses the approximate data generated by GAN for training. This experiment demonstrates the accuracy of noise modeling by using GAN.

TABLE 1 Results of PSNR (DB) for All Comparison Methods on LY_Datasets in the Synthetic Noise Denoising Environment

In addition to Gaussian noise (which a noise of Hybrid Noise), we further evaluated the performance of other several methods in complex noise denoising environment. The mixed noise used in the experiment includes 10% uniform noise, 20% Gaussian noise, and 70% Gaussian noise. Table 1 shows the quantitative results. In this research, SlimRGBD also performs better than EPLL, BM3D, Multiscale and WNNM, which further demonstrates the superiority of SlimRGBD in blind denoising problems. In particular, DnCNN does not perform well due to the paired training data set is not available. In contrast, the proposed SlimRGBD uses GAN to estimate the image noise distribution of the noise image and solves the problem of lack of training image data, thereby achieving significant denoising effects, as shown in Fig. 10 (left). Fig. 10 (right) is based on the UAV power consumption baseline, Loading 4 kinds of intelligent noise reduction algorithms, the power consumption. It can be seen that the RGBD noise reducer without pruning consumes a large amount of power. The SlimRGBD algorithm proposed in this study consumes less power than DnCNN in some cases. For the above cases (Mixed noise or Gaussian noise), the proposed method in this research performs well. Specifically, the deviation value of the GAN from the distribution of the extracted noise block learning to the GroundTruth distribution is about 0.25% of the average value and 1.14% of the standard deviation. Time spent testing various noise reduction methods on DJI Phantom 4 Pro V2.0 is as Table 2. It can be seen that although our method uses more time consuming than some traditional methods, it is the best among all neural network methods.

TABLE 2 Time Spent Testing Various Noise Reduction Methods on DJI Phantom 4 Pro V2.0 (in Seconds)

FIGURE 10.

Left: Noise sensitivity curve for the SlimRGBD model of noise level training. The average PSNR results at different input noise levels were evaluated on LY_Datasets. Right: Based on the UAV power consumption baseline, Loading 4 kinds of intelligent noise reduction algorithms the power consumption. It can be seen that the RGBD noise reducer without pruning consumes a large amount of power. The SlimRGBD algorithm proposed in this study consumes less power than DnCNN in some cases.

Show All

D. SLIMRGBD Performance and Application Analysis

The most vital part of the proposed system is noise modeling, which involves extracting noise blocks and learning the noise distribution through the GAN. In this section, it will first study the effects of the first step and then discuss the accuracy of noise modeling which used GAN.

To check the effectiveness of GAN noise modeling, Effect of Noise Modeling Using GAN was been used, which accurate synthetic noise data is used as input to train SLIMRGBD system. In addition, SLIMRGBD can well learn the noise distribution and generate good samples when dealing with complex real-world noise (see Figs. 11, 12, 13, 14). All these facts show that using GAN to simulate noise may be accurate.

FIGURE 11.

The first set of test samples. Test 1&2 is the original two aerial image of drones randomly extracted from the LY Datasets. The shaded squares in the figure are the parts we will show in Fig. 12.

Show All

FIGURE 12.

The second set of test samples. Test 3&4 is the original aerial image of two drones randomly extracted from the LY Datasets data set. The shaded squares in the figure are the parts we will show in Fig. 13.

Show All

FIGURE 13.

Comparison of housing areas when evaluating real noise denoising. The contrast areas in the figure are from the shaded squares in Fig. 10, respectively. Here, zoom in for a better visual comparison. The serial numbers in the figure: (a) (b) (c) (d) (e) (f) (g) (h) represent (a) original image; (b) EPLL; (c) NCSR; (d) WNNM; (e) Multiscale; (f) DnCNN; (g) GCBD; (h) results graph after SlimRGBD processing. As can be seen from their comparison results, SlimRGBD is a very competitive image denoising method.

Show All

FIGURE 14.

Comparison of agricultural and forestry areas when assessing true noise denoising. The contrast areas in the figure are from the shaded squares in Fig. 12, respectively. Here, zoom in for a better visual comparison. The serial numbers in the figure: (a) (b) (c) (d) (e) (f) (g) (h) represent (a) original image; (b) EPLL; (c) NCSR; (d) WNNM; (e) Multiscale; (f) DnCNN; (g) GCBD; (h) Results graph after SlimRGBD processing. As can be seen from their comparison results, SlimRGBD is a very competitive image denoising method.

Show All

SECTION V.

Conclusion and Future Work

This study attempts to improve the performance of image blind denoising by using a deep learning-based approach without paired training data. The proposed SlimRGBD can improve blind denoising performance. The GAN is used to learn the noise distribution and construct a paired training data set to train the DnResNet for denoising. A large number of experiments have proved the superiority of our method. In the future work, we will integrate our work into more embedded platforms to improve the image noise reduction performance of other devices.

ACKNOWLEDGMENT

All of authors in this paper would like to thanks for all anonymous reviewers for their very insightful comments and constructive suggestions to polish this paper in high quality.

References is not available for this document.

SlimRGBD: A Geographic Information Photography Noise Reduction System for Aerial Remote Sensing

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Related Work

A. Lightweight Convolutional Neural Network for UAV Visual Recognition

B. Advantages and Disadvantages of Mainstream Methods for Image Noise Reduction