Introduction
In recent years, a variety of natural and man-made disasters have occurred frequently, seriously affecting people’s lives in disaster areas. Although the damage caused by disasters can be understood through various channels, people still lack a clear understanding of the overall impact of disasters [1]. With the continuous development of uav technology, it has been applied in many fields such as disaster assessment, environmental detection, traffic management and aerial photography recording [2]. The purpose of applying uav to complex disaster relief is to replace the traditional and inefficient manual rescue, and complete various tasks with high efficiency and high reliability. The first priority for disaster relief is to transmit high-definition disaster images, and eliminating redundant data is the key to high-quality transmission. In this paper, a high fidelity image compression method is proposed to reduce the pressure of storage device capacity, and facilitate the later image processing [3].
Compared with other methods, video and image information give people an intuitive feeling, and people mainly obtain external information through them. However, video and image contain a large number of invalid data, mainly involving time redundancy, space redundancy and coding redundancy [4]. Image compression is the basis of processing and transmission. If the image is not compressed in the early stage, it will increase the difficulty of image stitching in the later stage and affect the transmission efficiency. This may hinder people from using the effective data of video and image [5]. How to transmit clear high-fidelity images with limited bandwidth and get high quality compressed data has become the focus of research. Image compression has become a frontier science in the field of computer vision [6].
Compression technology can be roughly divided into three stages, each of which produces a number of creative results. The first compression coding stage is mainly used to remove redundancy, such as PCM [7] and transformation coding method [8]. The emergence of arithmetic coding [9] marks the beginning of the second generation of image compression coding, followed by dictionary coding [10] and lossless compression algorithm. The third generation compression technology mainly includes fractal image, wavelet transform [11] and other coding algorithms, and has been developing till now. The image recognition and synthesis method are used to compress the data, mainly including the pixel coding, prediction coding [12] and other main technologies. In recent years, deep learning [13]–[16] has been widely used in data analysis [17], image recognition [18], speech processing [19] and other fields. Compression based on neural network structure is a hot topic in current research. The latest image compression algorithm combines the neural network structure. The compressed size is only about half of the size of the traditional algorithm compression, and the visual quality is not affected [20]. Convolutional neural network is a common network structure in the field of image compression [21].
The compression of disaster image should consider not only the details of each part, but also ensure that the image has high fidelity [22]. The generative adversarial networks [23] provides a new idea for image compression. The generative adversarial networks was proposed by Goodfellow et al. based on the relevant principles of game theory. The core of generative adversarial networks is to generative model and discriminative model. The role of generative model is to generate new samples, while the role of discriminative model is to verify the authenticity of samples and determine the quality of generative samples. The optimization of network structure refers to the strategy of “binary minimax game” [24]. During the training process, keep the parameters of one model unchanged, change the parameters of another model, and repeat them alternately. Finally, the data generated by the model is close to the original sample. The application of generative adversarial networks has expanded from the initial image generation to various fields of computer vision [25], such as image recognition and video prediction.
Related Works
A. Image Compression Based on Deep Learning
Image compression is an important research method in the field of digital image processing, which has a broad application in video transmission, image redundancy removal and image stitching. Jiang [26] proposed a compression framework based on CNNs, the two CNNS are seamlessly integrated into an end-to-end compression framework, and has accurate reconstruction of decoded images. Yoo et al. [27] proposed a two-step framework for reducing blocking artifacts in different regions based on increment of inter-block correlation, which classifies the coded image into flat regions and edge regions. Sun and Cham [28] put forward the solution with the maximum posterior criterion. The distortion caused by coding is modeled as spatially related gaussian noise, and the original image is modeled as a high-order markov random field based on expert frame field. Prakash et al. [29] proposed a new CNN architecture specifically for image compression, which generates a label map that highlights semantically annotated areas. Cavigelli’s et al. [30] took inspiration from deep neural networks developed for semantic segmentation and proposed a neural network with hierarchical skip connections and a multi-scale loss function for compression artifact suppression. These papers provide some inspiration for the algorithm design in this paper.
Although these algorithms utilize the powerful computational features of deep learning and save a lot of bandwidth, they ignore the comparison with the original image and result in low fidelity of partially compressed images. On the contrary, the proposed framework makes use of the characteristics of the generative adversarial networks to compare the compressed image with the original image in order to improve the visual quality of the image.
The framework of image compression includes several modules: encoder, quantizer, inverse quantizer, decoder and entropy coding [31]. The function of the encoder is to convert the image into the compression feature, and the decoder is to recover the original image from the compression feature. The encoder and decoder can be designed by convolution, pooling, nonlinear and other modules. Taking a three-channel picture of 768*512 as the input of the encoder, after forward processing, the compression feature occupying 96*64*192 data units can be obtained. If the compressed data are floating point numbers, the quality of the restored image will be the highest. But a floating-point number takes up 32 bits, and the number of bits per pixel is much larger after compression. The technique of quantization is used to convert floating point numbers into integer or binary numbers. At the decoding end, inverse quantization technology can be used to restore the transformed feature data to a floating point number, which can reduce the influence of quantization on the accuracy of the neural network and improve the quality of the restored image.
B. The Structure of Generative Adversarial Networks
Image compression based on generative adversarial networks is a novel image compression method. Generative adversarial networks is a method of modeling according to the characteristics of training samples. The model contains two networks. the generator obtains the distribution of data in the sample and continuously generates new samples closer to the real sample. The discriminator is usually a binary classifier, which determines whether the input content is real data or generative samples, and finds the difference between generative samples and real samples. As a training framework, adversarial network does not require a new neural network structure. Some mature network models, such as RNN [32] or CNN [33], can be selected for each component of the framework. The basic structure of generative adversarial networks is shown in Fig. 1.
In order to learn the generative distribution in the data set, the generator \begin{align*} L\left ({D,G }\right)=&E_{x\sim P_{data}\left ({x }\right)}\left [{ logD\left ({x }\right) }\right] \\&+\,E_{x\sim P_{noise}\left ({x }\right)}[log(1-D(G(z)))]\to \mathop {min}\limits _{G} \mathop {max}\limits _{D}. \\\tag{1}\end{align*}
GAN does not have fixed conditions, so on this basis a conditional generative adversarial networks [35] is designed(CGAN). In the modeling of generator \begin{align*} V\left ({D,G }\right)=&E_{x\sim P_{data}(x)}\left [{ logD\left ({x,s }\right) }\right] \\&+\,{E}_{x\sim P_{noise}\left ({x }\right)}[log(1-D(G(z,s)))]\to \mathop {min}\limits _{G} \mathop {max}\limits _{D}. \\\tag{2}\end{align*}
s is additional information and i represents the uncompressed original image.
C. Semantic Segmentation of Disaster Images by Fully Convolutional Networks
Fully convolutional networks (FCN) was first proposed in for solving the problem of semantic segmentation [36]. FCN mainly includes convolution layer, pooling layer and deconvolution layer, and its basic structure is shown in Fig. 3.
The convolution operation [37] is used to extract features, and the image matrix \begin{equation*} {Y}_{i^{\prime }j^{\prime }}={f}\left ({\sum \nolimits _{i=1}^{m}\sum \nolimits _{j=1}^{n} {{F_{ij}{X}}_{i+{i}^{\prime },{j}+{j}^{\prime }}+{b}} }\right)\!.\tag{3}\end{equation*}
\begin{equation*} f(x_{ij})={max}\{0,{x}_{ij}\}.\tag{4}\end{equation*}
The pooling layer reduces the dimension of image features and retains the most critical information. Max-pooling is used instead of mean-pooling because max-pooling retains more texture information, and mean-pooling retains more background information. It calculates the maximum value of the region within the range of \begin{equation*} Y_{i'j'}=max {X}_{i^{\prime }+{i},{j}^{\prime }+{j}}, \quad {i}\in \left \{{1,{m} }\right \},~{j}\in \left \{{1,{n} }\right \}.\tag{5}\end{equation*}
The deconvolution is the inverse operation of the convolution operation, which merely reverses the steps in the convolution transformation process. A prediction can be made for each pixel while the spatial information in the original input image is retained. The deconvolution is an up-sampling operation. The output size after the deconvolution operation is as follows: \begin{equation*} {O}=\left ({{I}-1 }\right)\times {s}+{p}.\tag{6}\end{equation*}
The softmax classifier as the last layer in the network, which is used for the final classification and normalization. The formula of softmax function is as follows: \begin{equation*} {h}_{i}=\frac {e^{X_{i}}}{\sum \nolimits _{k=1}^{K} {e}^{X_{k}}}.\tag{7}\end{equation*}
High Fidelity Image Compression of UAV Based on Based on Generative Adversarial Networks Improvement
A. The Basic Working Principle of Compression
The model of uav image compression used in complex disaster conditions is designed based on generative adversarial networks. It has the basic structure of image compression, including encoder
The decoder \begin{align*}&\hspace{-0.5pc}\mathop {min}\limits _{E,G} \mathop {max}\limits _{D} E\left [{ f\left ({D\left ({i }\right) }\right) }\right]+E\left [{ g\left ({D\left ({G\left ({z }\right) }\right) }\right) }\right] \\&\qquad\qquad\qquad\qquad {+\,\sqrt {\mu {(E[L(i,G(z))])}^{2}+\delta {(H(\hat {w}))}^{2}}.} \tag{8}\end{align*}
Since the last one term of formula (8) are not related to discriminator \begin{equation*} \mathop {min}\limits _{E,G} \ell _{GAN}+\sqrt {\mu {(E[L(i,G(z))])}^{2}+\delta {(H(\hat {w}))}^{2}}.\tag{9}\end{equation*}
The vector
The work of compression is to remove redundant data, but it also needs to ensure the fidelity of compressed images and produce compressed images more and more real [38]. For the damaged roads in the disaster image, encoder
The above design is not able to judge the regions of important and background in the image, and will not make additional processing for local data. Encoder
Providing semantic labels to the structure requires semantic segmentation of disaster images [39]. Different parts of the images are divided into different classes according to semantics [40], and the images provided are RGB three-channel images. The image is divided into nine classes, each with a corresponding RGB value. These 9 different RGB values are one-hot coding, and the output is a 9-channel encoded image. Each channel represents a specific class. Table 1 is the specific encoding scheme.
Before using generator to decode, bit allocation needs to be combined with heat map. Roads, houses and other areas are set as 1, while rivers, mountains, trees and other areas are set as 0, which is converted into a binary heatmap
Original disaster image and semantic segmentation image. Fig (a) is original disaster image and Fig (b) is semantic segmentation image. Each region of the image has a different label. Some regions are labeled as region of interest, and some regions are labeled as region of background.
The semantic
With the addition of semantic labels, there is a large difference in the amount of coding between the region of interest and the background region, which may lead to the lack of authenticity of the compressed image. We improve formula (9) and added the energy function of MRF to the loss function on the basis of GAN loss function. With the addition of the energy function, the pixels of the region of interest maintain continuity with the pixels of the region of background in image features at the maximum probability. The enhancement of image features between the region of interest and background makes accelerate the loss function, and further improves the training speed of the model. The energy function of MRF is defined as \begin{equation*} K\left ({x }\right)=\beta \sum \nolimits _{j,k} x_{j} y_{k}.\tag{10}\end{equation*}
\begin{align*}&\hspace{-0.5pc}\mathop {min}\limits _{E,G} \mathop {max}\limits _{D} E\left [{ K\left ({i }\right)+f(D(i)) }\right]+E\left [{ g\left ({D\left ({G\left ({z }\right) }\right) }\right) }\right] \\&\qquad\qquad\qquad\quad {+\,\sqrt {\mu {(E[L(i,G(z))])}^{2}+\delta {(H(\hat {w}))}^{2}}.} \tag{11}\end{align*}
In essence, the compression problem about the region of interest is still to remove redundant data from the image, and its working diagram is shown in Fig. 5. Firstly, the image is randomly selected from the data set and divided by semantic segmentation. Each part is marked and the corresponding features are extracted. The labels generated by the currently standardized image and the additional information after semantic segmentation are taken as the input of the encoder, and the output generates the encoding representation, which is then compressed by the quantizer. The compression representation is combined with the heat map to form a vector with conditional information. The data of the region of interest in the disaster image are reserved according to the vector, and other regions are synthesized as far as possible. The generative vector serves as the input of the generator, which generates the compressed image and gives it to the discriminator for judgment. The current standardized image and the compressed image generated by the generator are taken as the input of the discriminator, and the error result is used to make the judgment. Then it is fed back to the generator, and the weight of the generator and the discriminator is updated.
B. Generator Model and Discriminator Model
The model of generator is essentially a decoder, which receives the vector generated by the encoder as input. It adopts the deconvolution neural network structure and consists of four convolution layers. The size of convolution kernel of each layer is
The model of discriminator adopts convolutional neural network structure, including four convolutional layers and one fully connected layer. The convolution layer is used to compress the image, and then the full connection layer is used to classify the image to judge the authenticity of the image. The size of the convolution kernel of each convolution layer is
Experiments and Data Analysis
In order to verify the feasibility of the compression algorithm, the program is written based on python. GPU is used to enhance the training speed, and Tensorflow is used to realize deep learning algorithm. In the experiment, the disaster images collected by the uav over the years are used as the training set, and some images that do not meet the training conditions are removed. There are about 5,400 images in the training set. A large number of experimental data show that after the training set reaches 5400 images, adding more images can only bring insignificant parameter optimization, and some generative redundant parameters are not required for disaster scenarios. The test set also adopts the disaster area image collected by uav, but it is different from the training set. There are about 2500 images in the test set, and some images are shown in Fig. 7.
Part of the images in the test set. Fig (a) is fire image, and the image name in the test set is fire0054. Fig (b) and (c) are earthquake disaster images, and the image names in the test set are seismic0983 and seismic1106 respectively. Fig (d) and (e) are flood images, and the image names in the test set are flood1417 and flood1643 respectively. Fig (f) is explosion disaster image, and the image name in the test set is blast2412.
Adam gradient is selected to update method, and all samples in the training set 128 times are trained. The batch size of each processing in the training set is 1. The learning rate of generator and discriminator is 2e-4. In the network structure, the generator takes a 128 dimensional random sampling evenly distribution between 0 and 1 as input, and outputs a
Firstly, the experiment needs to conduct semantic segmentation of the disaster image, analyze the key part of the information in the image, and use it as conditional information after processing. The semantic segmentation of some disaster images in the data set is shown in Fig. 8. After continuous training, the discriminator cannot distinguish whether the image is original or compressed, the compression quality is constantly improved.
Semantic segmentation results of some disaster images. Different parts of the image are semantically divided into different classes. Fig (a) is the semantic segmentation of fire image. Fig (b) and (c) are the semantic segmentation of earthquake disaster image. Fig (d) and (e) are the semantic segmentation of the flood image. Fig (d) and (e) are the semantic segmentation of the explosion disaster image. These six images are semantic annotation images of Fig. 7.
In order to further evaluate the effect of the proposed compression framework, the compression algorithm of this paper is compared with the standard compression methods (e.g., JPEG and JPEG2000). JPEG2000 is an image compression standard based on wavelet transform. Compared with JPEG, JPEG2000 has higher compression ratio and no block-blur defect. The proposed compression framework draws on some creative designs of Jiang’s [26], Yoo’s [27], Sun’s [28], Prakash’s [29] and Cavigelli ’s [30], so the compression effects of these algorithms will also be used for comparison to prove that the design of the region of interest does enhance the compression effect of disaster images. In order to facilitate display, a disaster image is selected from the data set, and the compressed image of the improved algorithm is compared with that of other algorithms.
The contrast effect is shown in Fig. 9. In image directly compressed by JPEG, significant blockiness occurs and the texture is blurred. Some artifacts appear when Yoo’s [27] and Sun’s [28] reconstruct the image. Prakash’s [29], Cavigelli’s [30] and JPEG2000 have better visual quality, but the edge of the area produce blurring effects. Jiang’s [26] obtains better PSNR and SSIM, but the edge grayscale of the image changes drastically, and color saturation of the image is relatively low, which will lead to the compressed image looking uncoordinated. GAN has a higher visual quality, but cannot synthesize blocky or blurred spots. GAN does not highlight important areas such as houses. Compared to others compression algorithms, proposed compression algorithm retains more edge and texture details, and the resulting image has higher fidelity.
Comparison results of image compression for flood1643 in the case of BPP = 0.3. From left to right and top to bottom: Original image, the proposed algorithm (PSNR = 28.74 dB, MS-SSIM = 0.908), GAN (PSNR = 27.17 dB, MS-SSIM = 0.904), JPEG2000 (PSNR = 28.33 dB, MS-SSIM = 0.897), JPEG (PSNR = 25.14 dB, MS-SSIM = 0.827), Jiang’s (PSNR = 28.45 dB, MS-SSIM = 0.899), Yoo’s (PSNR = 26.27 dB, MS-SSIM = 0.862), Sun’s (PSNR = 26.15 dB, MS-SSIM = 0.843), Prakash’s (PSNR = 27.10 dB, MS-SSIM = 0.887) and Cavigelli’s (PSNR = 26.59 dB, MS-SSIM = 0.876).
The traditional compression algorithm has obvious defects such as block effect in image reconstruction. The compression algorithm in this paper can retain the details of the road, house and other objects, and still be more similar to the original image after compression. Both the generator and discriminator use 64 epochs for training. The loss value decreases exponentially over 64 epochs, from 0.9 to about 0.5. Under different epochs, the loss value of the generator and discriminator of the algorithm is shown in Fig. 10.
Training loss value of generators and discriminators under different epochs. Fig (a) is about the generator loss value. Fig (b) is about the discriminator loss value. Limited by the calculation conditions, the image is basically stable considering that the epoch is within 64. The value of epoch ranges from 0 to 64. The range of loss value is between 0.4 and 1.
The proposed compression algorithm preserves more image detail with a higher compression ratio, so that the image quality is not greatly damaged. We use MS-SSIM to further evaluate the performance of the algorithm. MS-SSIM is used to evaluate the visual quality of image compression algorithm. The higher the value is, the better the image reconstruction [41] will be. The comparison results are shown in Fig. 11.
Compressed quality evaluation results of nine algorithms. BPP is used to measure image resolution. The higher the BPP, the more colors are available. The measurement range of BPP is 0–1, and the value range of MS-SSIM is 0–1. The MS-SSIM of the all algorithms is basically stable when BBP is between 0.6 and 1.
From the six graphs in Fig. 11, it can be seen that when the BPP reaches about 0.9, the proposed compression algorithm, GAN, Jiang’s [26], and JPEG2000 remain around 0.95, while Yoo’s [27], Sun’s [28], Prakash’s [29], and Cavigelli’s [30] remain around 0.90. JPEG remains around 0.85. Under the same BPP, the MS-SSIM of the proposed compression algorithm is slightly better than other algorithms. The reconstruction of the proposed algorithm and GAN algorithm is based on the distribution of data samples, and fidelity is more outstanding.
The values of PSNR and MS-SSIM for all compression methods are obtained by running the original author’s source code and adjusting the parameters to the optimal value. Table 2 shows PSNR values of 6 disaster images in the test set compressed by various algorithms under BPP = 0.4. Table 3 shows MS-SSIM values of 6 disaster images in the test set compressed by various algorithms under BPP = 0.4. It can be seen that the proposed algorithm is the highest in terms of PSNR, and the average value reaches 29.72. The average value of JPEG2000 and Jiang’s [26] also exceeds 29. In terms of the MS-SSIM, the proposed algorithm and GAN algorithm are superior to other algorithms.
Finally, the four image datasets CIFAR-10, HKU-IS, ECSSD and Cityscapes are used to evaluate the performance of the algorithm, and the effectiveness of the proposed algorithm is further verified. The classes used to provide semantic labels in these four data sets are different from the disaster image datasets, and the experiment has additionally dealt with them. Considering the two indexes of PSNR and MS-SSIM, it can be seen from Table 4 and 5 that compared with other algorithms, the PSNR and MS-SSIM values of the proposed algorithm remain the largest. It shows that the proposed algorithm has certain universality.
In this paper, the algorithm can seamlessly merge the content of regions of interest and regions of synthesis when compressed. In addition, compared with the same network structure without synthesis, the proposed network structure greatly reduces BPP. When the object has repetitive structure, the visual quality is basically not damaged, which can generate images with high fidelity and save a lot of code rate. At the same code rate, the visual quality of the proposed algorithm is even better than some mature algorithms.
Conclusion
Firstly, this paper introduces the content of generative adversarial networks algorithm and analyses how the algorithm can be applied to image compression under complex disaster conditions. The generator generates the decompressed image, and the discriminator discriminates the difference between the compressed image and the real image. They compete with each other to produce images with better compression effect. By semantic labels, additional information is added to the model, which can preserve the data in important areas and compress the redundant data in non-critical areas on a large scale. Experimental results show that under the premise of ensuring image quality, the algorithm in this paper can indeed reduce the redundancy of data, and is significantly superior to common compression algorithms in terms of visual quality and image compression performance. It has a high fidelity and can provide convenience for later target detection. All regions of interest in disaster images are compressed in a similar way, but each region in the actual scene has different visual quality requirements. Therefore, it is the main content of the next step to consider setting different priorities for each region in the image and increasing the compression weight.
ACKNOWLEDGMENT
The authors would like to appreciate all anonymous reviewers for their insightful comments and constructive suggestions to polish this paper in high quality.