Journals & Magazines >IEEE Access >Volume: 7

UAV Image High Fidelity Compression Algorithm Based on Generative Adversarial Networks Under Complex Disaster Conditions

The work schematic diagram of region of interest generative adversarial networks.

Abstract:

This paper proposes an improved image high fidelity compression algorithm based on the generative adversarial networks (GANs) to deal with the problem that the UAV image ...Show More

Topic: Recent Advances in Video Coding and Security

Metadata

Abstract:

This paper proposes an improved image high fidelity compression algorithm based on the generative adversarial networks (GANs) to deal with the problem that the UAV image has a large amount of data which is not conducive to post-processing. By adding an encoder in front of the generator, the disaster area image transmitted by UAV is compressed to meet the requirements of the generator. After the compressed image is trained together with the real image through the discriminator, the quality of the compressed image is constantly improved. This image compression algorithm can fully synthesize the codes of non-major areas such as trees and rivers in the image, and try to retain the codes of important areas such as houses and roads. The experimental results show that the proposed compression method in this paper has a higher compression ratio than the traditional compression method for the disaster area image, and can obtain images with strong sense of hierarchy.

Topic: Recent Advances in Video Coding and Security

The work schematic diagram of region of interest generative adversarial networks.

Published in: IEEE Access ( Volume: 7)

Page(s): 91980 - 91991

Date of Publication: 09 July 2019

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2019.2927809

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

In recent years, a variety of natural and man-made disasters have occurred frequently, seriously affecting people’s lives in disaster areas. Although the damage caused by disasters can be understood through various channels, people still lack a clear understanding of the overall impact of disasters [1]. With the continuous development of uav technology, it has been applied in many fields such as disaster assessment, environmental detection, traffic management and aerial photography recording [2]. The purpose of applying uav to complex disaster relief is to replace the traditional and inefficient manual rescue, and complete various tasks with high efficiency and high reliability. The first priority for disaster relief is to transmit high-definition disaster images, and eliminating redundant data is the key to high-quality transmission. In this paper, a high fidelity image compression method is proposed to reduce the pressure of storage device capacity, and facilitate the later image processing [3].

Compared with other methods, video and image information give people an intuitive feeling, and people mainly obtain external information through them. However, video and image contain a large number of invalid data, mainly involving time redundancy, space redundancy and coding redundancy [4]. Image compression is the basis of processing and transmission. If the image is not compressed in the early stage, it will increase the difficulty of image stitching in the later stage and affect the transmission efficiency. This may hinder people from using the effective data of video and image [5]. How to transmit clear high-fidelity images with limited bandwidth and get high quality compressed data has become the focus of research. Image compression has become a frontier science in the field of computer vision [6].

Compression technology can be roughly divided into three stages, each of which produces a number of creative results. The first compression coding stage is mainly used to remove redundancy, such as PCM [7] and transformation coding method [8]. The emergence of arithmetic coding [9] marks the beginning of the second generation of image compression coding, followed by dictionary coding [10] and lossless compression algorithm. The third generation compression technology mainly includes fractal image, wavelet transform [11] and other coding algorithms, and has been developing till now. The image recognition and synthesis method are used to compress the data, mainly including the pixel coding, prediction coding [12] and other main technologies. In recent years, deep learning [13]–[16] has been widely used in data analysis [17], image recognition [18], speech processing [19] and other fields. Compression based on neural network structure is a hot topic in current research. The latest image compression algorithm combines the neural network structure. The compressed size is only about half of the size of the traditional algorithm compression, and the visual quality is not affected [20]. Convolutional neural network is a common network structure in the field of image compression [21].

The compression of disaster image should consider not only the details of each part, but also ensure that the image has high fidelity [22]. The generative adversarial networks [23] provides a new idea for image compression. The generative adversarial networks was proposed by Goodfellow et al. based on the relevant principles of game theory. The core of generative adversarial networks is to generative model and discriminative model. The role of generative model is to generate new samples, while the role of discriminative model is to verify the authenticity of samples and determine the quality of generative samples. The optimization of network structure refers to the strategy of “binary minimax game” [24]. During the training process, keep the parameters of one model unchanged, change the parameters of another model, and repeat them alternately. Finally, the data generated by the model is close to the original sample. The application of generative adversarial networks has expanded from the initial image generation to various fields of computer vision [25], such as image recognition and video prediction.

SECTION II.

Related Works

A. Image Compression Based on Deep Learning

Image compression is an important research method in the field of digital image processing, which has a broad application in video transmission, image redundancy removal and image stitching. Jiang [26] proposed a compression framework based on CNNs, the two CNNS are seamlessly integrated into an end-to-end compression framework, and has accurate reconstruction of decoded images. Yoo et al. [27] proposed a two-step framework for reducing blocking artifacts in different regions based on increment of inter-block correlation, which classifies the coded image into flat regions and edge regions. Sun and Cham [28] put forward the solution with the maximum posterior criterion. The distortion caused by coding is modeled as spatially related gaussian noise, and the original image is modeled as a high-order markov random field based on expert frame field. Prakash et al. [29] proposed a new CNN architecture specifically for image compression, which generates a label map that highlights semantically annotated areas. Cavigelli’s et al. [30] took inspiration from deep neural networks developed for semantic segmentation and proposed a neural network with hierarchical skip connections and a multi-scale loss function for compression artifact suppression. These papers provide some inspiration for the algorithm design in this paper.

Although these algorithms utilize the powerful computational features of deep learning and save a lot of bandwidth, they ignore the comparison with the original image and result in low fidelity of partially compressed images. On the contrary, the proposed framework makes use of the characteristics of the generative adversarial networks to compare the compressed image with the original image in order to improve the visual quality of the image.

The framework of image compression includes several modules: encoder, quantizer, inverse quantizer, decoder and entropy coding [31]. The function of the encoder is to convert the image into the compression feature, and the decoder is to recover the original image from the compression feature. The encoder and decoder can be designed by convolution, pooling, nonlinear and other modules. Taking a three-channel picture of 768*512 as the input of the encoder, after forward processing, the compression feature occupying 96*64*192 data units can be obtained. If the compressed data are floating point numbers, the quality of the restored image will be the highest. But a floating-point number takes up 32 bits, and the number of bits per pixel is much larger after compression. The technique of quantization is used to convert floating point numbers into integer or binary numbers. At the decoding end, inverse quantization technology can be used to restore the transformed feature data to a floating point number, which can reduce the influence of quantization on the accuracy of the neural network and improve the quality of the restored image.

B. The Structure of Generative Adversarial Networks

Image compression based on generative adversarial networks is a novel image compression method. Generative adversarial networks is a method of modeling according to the characteristics of training samples. The model contains two networks. the generator obtains the distribution of data in the sample and continuously generates new samples closer to the real sample. The discriminator is usually a binary classifier, which determines whether the input content is real data or generative samples, and finds the difference between generative samples and real samples. As a training framework, adversarial network does not require a new neural network structure. Some mature network models, such as RNN [32] or CNN [33], can be selected for each component of the framework. The basic structure of generative adversarial networks is shown in Fig. 1.

FIGURE 1.

The model of generative adversarial networks.

Show All

In order to learn the generative distribution in the data set, the generator $G$ takes the prior distribution of random noise $P_{Z}$ as input to generate the sample distribution $P_{G}\left ({z }\right)$ approximate to the real data distribution $P_{data}(x)$ , and try to make the performance $D(G(z))$ of the generative data $G\left ({z }\right)$ on the discriminator $D$ consistent with the performance $D(x)$ of the real sample $x$ , that is, the scatter metric of Jensen-Shannon [34] between $x$ and $G(z)$ is the minimum. The divergence of Jensen-Shannon is used to measure the difference between two probability distributions. It is the deformation of KL divergence. The discriminator constantly extracts the characteristics of real samples, so that the output probability $D(G(z))$ tends to 0 and $D(x)$ tends to 1. Finally, the discriminator cannot distinguish whether the data belongs to real samples or generative samples. The mini-batch stochastic gradient descent training is used to generate the antagonistic network. The discriminator $D$ is updated by the stochastic gradient ascending method, and the generator $G$ is updated by the stochastic gradient descent method. The process of training is the process of game between the two, and the model eventually tends to the global optimum. The total loss function of $G$ and $D$ can be expressed as

$\begin{align*} L\left ({D,G }\right)=&E_{x\sim P_{data}\left ({x }\right)}\left [{ logD\left ({x }\right) }\right] \\&+\,E_{x\sim P_{noise}\left ({x }\right)}[log(1-D(G(z)))]\to \mathop {min}\limits _{G} \mathop {max}\limits _{D}. \\\tag{1}\end{align*}$ View Source

$G(z)$ is the image generated from using the input noise $z$ , and $x$ is the image in the real sample. $E$ represents the expected value of the function. $x\sim P_{data}(x)$ means that $x$ obeys the distribution of data. $x\sim P_{noise}(x)$ means that $x$ obeys the distribution of noise variable.

GAN does not have fixed conditions, so on this basis a conditional generative adversarial networks [35] is designed(CGAN). In the modeling of generator $G$ and discriminator $D$ , CGAN uses conditional variables and additional information to add conditions to the model, which can supervise data generation and confrontation. Equivalent to converting unsupervised GAN into supervised CGAN, this condition variable $y$ can be any type of data, such as category label, used to repair part of the data of the image. Every data in the CGAN model is associated with additional information $s$ to a certain extent, in which the binary group $(x,s)$ obeys the joint distribution $P_{x,s}$ . The additional information $s$ is integrated into the generator $G(z,s)$ and the discriminator $D(z,s)$ as part of the input layer. In the generative network, input noise $z$ and additional information $s$ constitute the joint hidden layer representation, and the structure is shown in Fig. 2. The total loss function of $G$ and $D$ can be expressed as

$\begin{align*} V\left ({D,G }\right)=&E_{x\sim P_{data}(x)}\left [{ logD\left ({x,s }\right) }\right] \\&+\,{E}_{x\sim P_{noise}\left ({x }\right)}[log(1-D(G(z,s)))]\to \mathop {min}\limits _{G} \mathop {max}\limits _{D}. \\\tag{2}\end{align*}$ View Source

FIGURE 2.

The model of conditional generative adversarial networks.

Show All

s is additional information and i represents the uncompressed original image. $E$ represents the expected value of the function. $x\sim P_{data}(x)$ means that $x$ obeys the distribution of data. $x\sim P_{noise}(x)$ means that $x$ obeys the distribution of noise variable.

C. Semantic Segmentation of Disaster Images by Fully Convolutional Networks

Fully convolutional networks (FCN) was first proposed in for solving the problem of semantic segmentation [36]. FCN mainly includes convolution layer, pooling layer and deconvolution layer, and its basic structure is shown in Fig. 3.

FIGURE 3.

The basic structure of FCN.

Show All

The convolution operation [37] is used to extract features, and the image matrix ${X}$ is convolved with a set of filters ${F}$ . After convolution, the results as the input to nonlinear activation function ${f(.)}$ , the formula is as follows:

$\begin{equation*} {Y}_{i^{\prime }j^{\prime }}={f}\left ({\sum \nolimits _{i=1}^{m}\sum \nolimits _{j=1}^{n} {{F_{ij}{X}}_{i+{i}^{\prime },{j}+{j}^{\prime }}+{b}} }\right)\!.\tag{3}\end{equation*}$ View Source

${m}\times {n}$ is the size of the convolution kernel, ${F}_{ij}$ is the parameter of the convolution kernel, ${X}_{i+{i}^{\prime },{j}+{j}'}$ are the input of the convolution layer, ${b}$ is the bias, ${Y}_{i'j'}$ is the output of the convolution layer. ${f(.)}$ is the nonlinear activation function, and can be expressed as follows:

$\begin{equation*} f(x_{ij})={max}\{0,{x}_{ij}\}.\tag{4}\end{equation*}$ View Source

The pooling layer reduces the dimension of image features and retains the most critical information. Max-pooling is used instead of mean-pooling because max-pooling retains more texture information, and mean-pooling retains more background information. It calculates the maximum value of the region within the range of ${m}\times {n}$ and the output ${Y}_{i'j'}$ of the pooling layer can be expressed as follows:

$\begin{equation*} Y_{i'j'}=max {X}_{i^{\prime }+{i},{j}^{\prime }+{j}}, \quad {i}\in \left \{{1,{m} }\right \},~{j}\in \left \{{1,{n} }\right \}.\tag{5}\end{equation*}$ View Source

The deconvolution is the inverse operation of the convolution operation, which merely reverses the steps in the convolution transformation process. A prediction can be made for each pixel while the spatial information in the original input image is retained. The deconvolution is an up-sampling operation. The output size after the deconvolution operation is as follows:

$\begin{equation*} {O}=\left ({{I}-1 }\right)\times {s}+{p}.\tag{6}\end{equation*}$ View Source

${I}$ is the input of deconvolution, ${s}$ is the sliding stride of deconvolution kernel, ${p}$ is the size of deconvolution kernel, and $O$ is the output size.

The softmax classifier as the last layer in the network, which is used for the final classification and normalization. The formula of softmax function is as follows:

$\begin{equation*} {h}_{i}=\frac {e^{X_{i}}}{\sum \nolimits _{k=1}^{K} {e}^{X_{k}}}.\tag{7}\end{equation*}$ View Source

${X}_{i}$ represents the input sample of the output layer, ${K}$ represents the total number of classes in the sample, and ${h}_{i}$ represents the possibility that the classifier classifies the image as class ${i}$ .

SECTION III.

High Fidelity Image Compression of UAV Based on Based on Generative Adversarial Networks Improvement

A. The Basic Working Principle of Compression

The model of uav image compression used in complex disaster conditions is designed based on generative adversarial networks. It has the basic structure of image compression, including encoder $E$ , decoder $G$ , quantizer $Q$ and inverse quantizer $\hat {Q}$ . The encoder $E$ converts the image to a compression form $w$ with less bit storage space. The decoder $G$ (also used as a generator) restores the information of the image by forming a reconstruction $G(Q(E(i)))$ . The quantizer $Q$ transmits the gradient back to the input data of the discriminator, and the inverse quantizer $\hat {Q}$ reconstructs the image based on the data provided by the decoder $G$ . The effect of compression needs to consider both visual quality and compression rate. It can be expressed by the formula $\sqrt {\mu {(E[L(i,G(z))])}^{2}+\delta {H(\hat {w})}^{2}}$ , where $L$ is a loss function to measure the visual similarity of the real image and the compressed image. In the formula, entropy $H(\hat {w})$ is the average information output of each gray level, and can represent the minimum bit rate $w$ required for the gray level information in the image. The entropy can adjust the code rate of the compressed image through the boundary ${log}_{2}(Layer)dim(\hat {w})$ . Weight $\delta$ and $\mu$ are used to adjust the compression effect of the model. The design of formula $\sqrt {\mu {(E[L(i,G(z))])}^{2}+\delta {H(\hat {w})}^{2}}$ may cause that the visual quality is not optimal after image compressed, but the compression rate is also guaranteed as the evaluation of the compression effect. The value of the final formula tends to a certain range, and the image will achieve the better compression effect.

The decoder $G$ generates the decoded image representation according to the vector $z$ , and the discriminator finds the difference between the compressed image $\hat {i}$ and the real image $i$ . Compression can be optimized by solving the saddle point of the function

$\begin{align*}&\hspace{-0.5pc}\mathop {min}\limits _{E,G} \mathop {max}\limits _{D} E\left [{ f\left ({D\left ({i }\right) }\right) }\right]+E\left [{ g\left ({D\left ({G\left ({z }\right) }\right) }\right) }\right] \\&\qquad\qquad\qquad\qquad {+\,\sqrt {\mu {(E[L(i,G(z))])}^{2}+\delta {(H(\hat {w}))}^{2}}.} \tag{8}\end{align*}$ View Source

$E$ represents the expected value of the function. It is different from the Encoder $E$ . $i$ represents the uncompressed original image. $z$ represents latent vector, and it is a combination of $v$ and $\hat {w}$ . $v$ is the extracted noise. $\hat {w}$ is the quantized representation of the latent feature map $w$ . $G$ represents a generator and $D$ represents a discriminator. $L$ is a loss function for measuring the similarity between the original image $i$ and the compressed image $\hat {i}$ . Weight $\delta$ and $\mu$ are used to adjust the compression effect of the model. $f$ and $g$ are scalar functions. Here makes $f\left ({y }\right)={(y-1)}^{2}$ and $g\left ({y }\right)=y^{2}$ . $H(\hat {w})$ is the average information output of each gray level.

Since the last one term of formula (8) are not related to discriminator $D$ , it will not affect the ability of discriminator to estimate samples. So we can rewrite equation (8) as follows:

$\begin{equation*} \mathop {min}\limits _{E,G} \ell _{GAN}+\sqrt {\mu {(E[L(i,G(z))])}^{2}+\delta {(H(\hat {w}))}^{2}}.\tag{9}\end{equation*}$ View Source

$\ell _{GAN}$ represents the minimum divergence on the generator $G$ . The description of other variables has been explained in equation (8).

The vector $z$ in the formula contains $\hat {w}$ , which stores the information of disaster area image $i$ . A key factor to improve the image compression quality is to set $\delta$ close to 0 and make the number of Layer or the dimension of $\hat {w}$ large enough, so that $\hat {w}$ can contain more code rate and the model can almost reconstruct the content of the disaster area image losslessly. At this time, the divergence between $p_{i}$ and $P_{G(z)}$ tends to be zero, and the code rate loss caused by the adversarial network will not affect the overall quality of the disaster image.

The work of compression is to remove redundant data, but it also needs to ensure the fidelity of compressed images and produce compressed images more and more real [38]. For the damaged roads in the disaster image, encoder $E$ cannot effectively obtain the data of this part, and generator $G$ needs to use the extracted noise $v$ and compression representation $\hat {w}$ to preserve the road texture as much as possible, rather than simply synthesizing the data of this part. If the road in the image looks like a fuzzy gray stripe, it will not provide accurate information to disaster relief workers. The generator $G$ can use vectors to repair images. The variables in the formula are not fixed, but still have some association with the content of the original image. In this way, the underlying structure of the image remains consistent, that is, some information is shared between the input and output, which is conducive to improve the quality of the compressed image. At the end of the iteration, the discrimination accuracy on the data set is calculated. Once the discrimination accuracy is saturated, the training is stopped to prevent the model from overfitting.

The above design is not able to judge the regions of important and background in the image, and will not make additional processing for local data. Encoder $E$ and generator $G$ automatically handle the balance between the compression quality of the whole image and the removal of redundant data without any information guidance. However, the scene studied in this paper is relatively special. It is based on the complex disaster conditions, and has no high requirements for the visual quality of some regions in the image. It is hoped that the amount of data can be reduced as much as possible for post-processing, so that the amount of data in unimportant regions can be greatly reduced. Referring to the structural design of CGAN, this paper makes further improvement to the compression structure. The additional information $s$ of the image $i$ is semantic labels in a complex disaster scenario. The model needs to provide relevant information to the encoder $E$ and the generative network $G$ , and needs to combine semantics in the encoding or decoding of images. This design is called region of interest generative compression, which guides network confrontation according to semantic labels. Region of interest generative compression can specify important regions in disaster images, and establish compression quality standards for these regions during compression, or specify decompression requirements for certain regions during image reconstruction.

Providing semantic labels to the structure requires semantic segmentation of disaster images [39]. Different parts of the images are divided into different classes according to semantics [40], and the images provided are RGB three-channel images. The image is divided into nine classes, each with a corresponding RGB value. These 9 different RGB values are one-hot coding, and the output is a 9-channel encoded image. Each channel represents a specific class. Table 1 is the specific encoding scheme.

TABLE 1 Image Encoding Scheme

Before using generator to decode, bit allocation needs to be combined with heat map. Roads, houses and other areas are set as 1, while rivers, mountains, trees and other areas are set as 0, which is converted into a binary heatmap $m$ with the same spatial dimension as $\hat {w}$ . The region of 0 corresponds to the region that should be completely synthesized (i.e., background region), and the region of 1 corresponds to the region that should be retained (i.e., important region). The synthesized region still contains a small amount of data from the original image $i$ . The original disaster image and semantic segmentation image in the data set are shown in Fig. 4.

FIGURE 4.

Original disaster image and semantic segmentation image. Fig (a) is original disaster image and Fig (b) is semantic segmentation image. Each region of the image has a different label. Some regions are labeled as region of interest, and some regions are labeled as region of background.

Show All

The semantic $s$ and the image encoding are stored in different spaces, so the feature extractor extracts the data from it to the encoder. The data contains information to distinguish the region of interest and background, and then the generator $G$ receives the data for further processing. The algorithm uses the semantic labels to guide the confrontation between networks. The compressed item $\hat {w}$ that should be synthesized in the background region is mostly set to 0, but a small part of data is also reserved for optimizing the visual effect. The algorithm uses the heat map to encode only the region of interest corresponding to item $\hat {w}$ , which greatly reduces the bit rate required for image storage. The bit rate of the compressed item is much higher than that of the semantic label and heat map information, and the semantic label will not increase the data storage obviously. This method can save a lot of code rate of the disaster area image.

With the addition of semantic labels, there is a large difference in the amount of coding between the region of interest and the background region, which may lead to the lack of authenticity of the compressed image. We improve formula (9) and added the energy function of MRF to the loss function on the basis of GAN loss function. With the addition of the energy function, the pixels of the region of interest maintain continuity with the pixels of the region of background in image features at the maximum probability. The enhancement of image features between the region of interest and background makes accelerate the loss function, and further improves the training speed of the model. The energy function of MRF is defined as

$\begin{equation*} K\left ({x }\right)=\beta \sum \nolimits _{j,k} x_{j} y_{k}.\tag{10}\end{equation*}$ View Source

$x_{j}$ is the pixel point of the region of interest, $y_{k}$ is the pixel point of the region of background, and $\beta$ is the weight parameter that makes the region of interest of the image consistent with the background region. The loss function of the final model is

$\begin{align*}&\hspace{-0.5pc}\mathop {min}\limits _{E,G} \mathop {max}\limits _{D} E\left [{ K\left ({i }\right)+f(D(i)) }\right]+E\left [{ g\left ({D\left ({G\left ({z }\right) }\right) }\right) }\right] \\&\qquad\qquad\qquad\quad {+\,\sqrt {\mu {(E[L(i,G(z))])}^{2}+\delta {(H(\hat {w}))}^{2}}.} \tag{11}\end{align*}$ View Source

$K$ is the energy function of MRF. $f$ and $g$ are scalar functions. $L$ is a loss function for measuring the similarity between the original image $i$ and the compressed image $\hat {i}$ . The description of other variables has been explained in equation (8).

In essence, the compression problem about the region of interest is still to remove redundant data from the image, and its working diagram is shown in Fig. 5. Firstly, the image is randomly selected from the data set and divided by semantic segmentation. Each part is marked and the corresponding features are extracted. The labels generated by the currently standardized image and the additional information after semantic segmentation are taken as the input of the encoder, and the output generates the encoding representation, which is then compressed by the quantizer. The compression representation is combined with the heat map to form a vector with conditional information. The data of the region of interest in the disaster image are reserved according to the vector, and other regions are synthesized as far as possible. The generative vector serves as the input of the generator, which generates the compressed image and gives it to the discriminator for judgment. The current standardized image and the compressed image generated by the generator are taken as the input of the discriminator, and the error result is used to make the judgment. Then it is fed back to the generator, and the weight of the generator and the discriminator is updated.

FIGURE 5.

The work schematic diagram of region of interest generative adversarial networks.

Show All

B. Generator Model and Discriminator Model

The model of generator is essentially a decoder, which receives the vector generated by the encoder as input. It adopts the deconvolution neural network structure and consists of four convolution layers. The size of convolution kernel of each layer is $5\times 5$ , and the number of convolution kernel in the four layers is 512, 256, 128 and 64. Relu is adopted as the activation function in the structure.

The model of discriminator adopts convolutional neural network structure, including four convolutional layers and one fully connected layer. The convolution layer is used to compress the image, and then the full connection layer is used to classify the image to judge the authenticity of the image. The size of the convolution kernel of each convolution layer is $5\times 5$ , and the number of the convolution kernel is 64, 128, 256 and 512. Finally, a vector of 1024 is generated and processed by the sigmoid function to obtain a value in the range of 0 to 1, which is the probability of being identified as the original image. The model structure of generator and discriminator is shown in Fig. 6.

FIGURE 6.

The model structures of generator and discriminator. The left side of Fig. 6. is the model structure of generator, and the right side is the model structure of discriminator.

Show All

SECTION IV.

Experiments and Data Analysis

In order to verify the feasibility of the compression algorithm, the program is written based on python. GPU is used to enhance the training speed, and Tensorflow is used to realize deep learning algorithm. In the experiment, the disaster images collected by the uav over the years are used as the training set, and some images that do not meet the training conditions are removed. There are about 5,400 images in the training set. A large number of experimental data show that after the training set reaches 5400 images, adding more images can only bring insignificant parameter optimization, and some generative redundant parameters are not required for disaster scenarios. The test set also adopts the disaster area image collected by uav, but it is different from the training set. There are about 2500 images in the test set, and some images are shown in Fig. 7.

FIGURE 7.

Part of the images in the test set. Fig (a) is fire image, and the image name in the test set is fire0054. Fig (b) and (c) are earthquake disaster images, and the image names in the test set are seismic0983 and seismic1106 respectively. Fig (d) and (e) are flood images, and the image names in the test set are flood1417 and flood1643 respectively. Fig (f) is explosion disaster image, and the image name in the test set is blast2412.

Show All

Adam gradient is selected to update method, and all samples in the training set 128 times are trained. The batch size of each processing in the training set is 1. The learning rate of generator and discriminator is 2e-4. In the network structure, the generator takes a 128 dimensional random sampling evenly distribution between 0 and 1 as input, and outputs a $64\times 64$ compressed image; the discriminator takes a $64\times 64$ compressed image and an original image as input, and the output is used to distinguish whether input image is compressed or original.

Firstly, the experiment needs to conduct semantic segmentation of the disaster image, analyze the key part of the information in the image, and use it as conditional information after processing. The semantic segmentation of some disaster images in the data set is shown in Fig. 8. After continuous training, the discriminator cannot distinguish whether the image is original or compressed, the compression quality is constantly improved.

FIGURE 8.

Semantic segmentation results of some disaster images. Different parts of the image are semantically divided into different classes. Fig (a) is the semantic segmentation of fire image. Fig (b) and (c) are the semantic segmentation of earthquake disaster image. Fig (d) and (e) are the semantic segmentation of the flood image. Fig (d) and (e) are the semantic segmentation of the explosion disaster image. These six images are semantic annotation images of Fig. 7.

Show All

In order to further evaluate the effect of the proposed compression framework, the compression algorithm of this paper is compared with the standard compression methods (e.g., JPEG and JPEG2000). JPEG2000 is an image compression standard based on wavelet transform. Compared with JPEG, JPEG2000 has higher compression ratio and no block-blur defect. The proposed compression framework draws on some creative designs of Jiang’s [26], Yoo’s [27], Sun’s [28], Prakash’s [29] and Cavigelli ’s [30], so the compression effects of these algorithms will also be used for comparison to prove that the design of the region of interest does enhance the compression effect of disaster images. In order to facilitate display, a disaster image is selected from the data set, and the compressed image of the improved algorithm is compared with that of other algorithms.

The contrast effect is shown in Fig. 9. In image directly compressed by JPEG, significant blockiness occurs and the texture is blurred. Some artifacts appear when Yoo’s [27] and Sun’s [28] reconstruct the image. Prakash’s [29], Cavigelli’s [30] and JPEG2000 have better visual quality, but the edge of the area produce blurring effects. Jiang’s [26] obtains better PSNR and SSIM, but the edge grayscale of the image changes drastically, and color saturation of the image is relatively low, which will lead to the compressed image looking uncoordinated. GAN has a higher visual quality, but cannot synthesize blocky or blurred spots. GAN does not highlight important areas such as houses. Compared to others compression algorithms, proposed compression algorithm retains more edge and texture details, and the resulting image has higher fidelity.

FIGURE 9.

Comparison results of image compression for flood1643 in the case of BPP = 0.3. From left to right and top to bottom: Original image, the proposed algorithm (PSNR = 28.74 dB, MS-SSIM = 0.908), GAN (PSNR = 27.17 dB, MS-SSIM = 0.904), JPEG2000 (PSNR = 28.33 dB, MS-SSIM = 0.897), JPEG (PSNR = 25.14 dB, MS-SSIM = 0.827), Jiang’s (PSNR = 28.45 dB, MS-SSIM = 0.899), Yoo’s (PSNR = 26.27 dB, MS-SSIM = 0.862), Sun’s (PSNR = 26.15 dB, MS-SSIM = 0.843), Prakash’s (PSNR = 27.10 dB, MS-SSIM = 0.887) and Cavigelli’s (PSNR = 26.59 dB, MS-SSIM = 0.876).

Show All

The traditional compression algorithm has obvious defects such as block effect in image reconstruction. The compression algorithm in this paper can retain the details of the road, house and other objects, and still be more similar to the original image after compression. Both the generator and discriminator use 64 epochs for training. The loss value decreases exponentially over 64 epochs, from 0.9 to about 0.5. Under different epochs, the loss value of the generator and discriminator of the algorithm is shown in Fig. 10.

FIGURE 10.

Training loss value of generators and discriminators under different epochs. Fig (a) is about the generator loss value. Fig (b) is about the discriminator loss value. Limited by the calculation conditions, the image is basically stable considering that the epoch is within 64. The value of epoch ranges from 0 to 64. The range of loss value is between 0.4 and 1.

Show All

The proposed compression algorithm preserves more image detail with a higher compression ratio, so that the image quality is not greatly damaged. We use MS-SSIM to further evaluate the performance of the algorithm. MS-SSIM is used to evaluate the visual quality of image compression algorithm. The higher the value is, the better the image reconstruction [41] will be. The comparison results are shown in Fig. 11.

FIGURE 11.

Compressed quality evaluation results of nine algorithms. BPP is used to measure image resolution. The higher the BPP, the more colors are available. The measurement range of BPP is 0–1, and the value range of MS-SSIM is 0–1. The MS-SSIM of the all algorithms is basically stable when BBP is between 0.6 and 1.

Show All

From the six graphs in Fig. 11, it can be seen that when the BPP reaches about 0.9, the proposed compression algorithm, GAN, Jiang’s [26], and JPEG2000 remain around 0.95, while Yoo’s [27], Sun’s [28], Prakash’s [29], and Cavigelli’s [30] remain around 0.90. JPEG remains around 0.85. Under the same BPP, the MS-SSIM of the proposed compression algorithm is slightly better than other algorithms. The reconstruction of the proposed algorithm and GAN algorithm is based on the distribution of data samples, and fidelity is more outstanding.

The values of PSNR and MS-SSIM for all compression methods are obtained by running the original author’s source code and adjusting the parameters to the optimal value. Table 2 shows PSNR values of 6 disaster images in the test set compressed by various algorithms under BPP = 0.4. Table 3 shows MS-SSIM values of 6 disaster images in the test set compressed by various algorithms under BPP = 0.4. It can be seen that the proposed algorithm is the highest in terms of PSNR, and the average value reaches 29.72. The average value of JPEG2000 and Jiang’s [26] also exceeds 29. In terms of the MS-SSIM, the proposed algorithm and GAN algorithm are superior to other algorithms.

TABLE 2 PSNR Results of 9 Contrast Algorithms for BPP = 0.4 for Multiple Different Disaster Images

TABLE 3 MS-SSIM Results of 9 Contrast Algorithms for BPP = 0.4 for Multiple Different Disaster Images

Finally, the four image datasets CIFAR-10, HKU-IS, ECSSD and Cityscapes are used to evaluate the performance of the algorithm, and the effectiveness of the proposed algorithm is further verified. The classes used to provide semantic labels in these four data sets are different from the disaster image datasets, and the experiment has additionally dealt with them. Considering the two indexes of PSNR and MS-SSIM, it can be seen from Table 4 and 5 that compared with other algorithms, the PSNR and MS-SSIM values of the proposed algorithm remain the largest. It shows that the proposed algorithm has certain universality.

TABLE 4 Average PSNR Results of JPEG2000, GAN and Ours for BPP = 0.2,0.3,0.4 for CIFAR-10, HKU-IS, ECSSD and Cityscapes

TABLE 5 Average MS-SSIM Results of JPEG2000, GAN and Ours for BPP = 0.2,0.3,04 for CIFAR-10, HKU-IS, ECSSD and Cityscapes

In this paper, the algorithm can seamlessly merge the content of regions of interest and regions of synthesis when compressed. In addition, compared with the same network structure without synthesis, the proposed network structure greatly reduces BPP. When the object has repetitive structure, the visual quality is basically not damaged, which can generate images with high fidelity and save a lot of code rate. At the same code rate, the visual quality of the proposed algorithm is even better than some mature algorithms.

SECTION V.

Conclusion

Firstly, this paper introduces the content of generative adversarial networks algorithm and analyses how the algorithm can be applied to image compression under complex disaster conditions. The generator generates the decompressed image, and the discriminator discriminates the difference between the compressed image and the real image. They compete with each other to produce images with better compression effect. By semantic labels, additional information is added to the model, which can preserve the data in important areas and compress the redundant data in non-critical areas on a large scale. Experimental results show that under the premise of ensuring image quality, the algorithm in this paper can indeed reduce the redundancy of data, and is significantly superior to common compression algorithms in terms of visual quality and image compression performance. It has a high fidelity and can provide convenience for later target detection. All regions of interest in disaster images are compressed in a similar way, but each region in the actual scene has different visual quality requirements. Therefore, it is the main content of the next step to consider setting different priorities for each region in the image and increasing the compression weight.

ACKNOWLEDGMENT

The authors would like to appreciate all anonymous reviewers for their insightful comments and constructive suggestions to polish this paper in high quality.

UAV Image High Fidelity Compression Algorithm Based on Generative Adversarial Networks Under Complex Disaster Conditions

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Related Works

A. Image Compression Based on Deep Learning

B. The Structure of Generative Adversarial Networks

C. Semantic Segmentation of Disaster Images by Fully Convolutional Networks

High Fidelity Image Compression of UAV Based on Based on Generative Adversarial Networks Improvement

A. The Basic Working Principle of Compression

B. Generator Model and Discriminator Model

Experiments and Data Analysis

Conclusion

ACKNOWLEDGMENT

References

IEEE Account

Purchase Details

Profile Information

Need Help?

UAV Image High Fidelity Compression Algorithm Based on Generative Adversarial Networks Under Complex Disaster Conditions

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Related Works

A. Image Compression Based on Deep Learning

B. The Structure of Generative Adversarial Networks

C. Semantic Segmentation of Disaster Images by Fully Convolutional Networks

High Fidelity Image Compression of UAV Based on Based on Generative Adversarial Networks Improvement

A. The Basic Working Principle of Compression

B. Generator Model and Discriminator Model

Experiments and Data Analysis

Conclusion

ACKNOWLEDGMENT

References