Conferences >2023 3rd International Confer...

ViT-R50 GAN: Vision Transformers Hybrid Model based Generative Adversarial Networks for Image Generation

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

In recent years, the tremendous potential of GAN in image generation has been demonstrated. Transformer derived from the NLP field is also gradually applied in computer v...Show More

Metadata

Abstract:

In recent years, the tremendous potential of GAN in image generation has been demonstrated. Transformer derived from the NLP field is also gradually applied in computer vision, and Vision Transformer performs well in image classification problems. In this paper, we design a ViT-based GAN architecture for image generation. We found that the Transformer-based generator did not perform well due to using the same attention matrix for each channel. To overcome this problem, we increased the number of heads to generate more attention matrices. And this part is named enhanced multi-head attention, replacing multi-head attention in Transformer. Secondly, our discriminator uses a hybrid model of ResNet50 and ViT, where ResNet50 works on feature extraction making the discriminator perform better. Experiments show that our architecture performs well on image generation tasks.

Published in: 2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)

Date of Conference: 06-08 January 2023

Date Added to IEEE Xplore: 02 June 2023

ISBN Information:

DOI: 10.1109/ICCECE58074.2023.10135253

Conference Location: Guangzhou, China

Contents

I. Introduction

Since Generative Adversarial Network (GAN) [1] was proposed, GAN has improved the image generation task to a new level because GAN can improve the modeling ability through continuous games. In traditional GANs, fully connected neural networks are usually used, and they are difficult to train until the emergence of DC-GAN [2], which introduces convolution neural networks (CNNs) [11] into the generator and discriminator and uses convolutions in the discriminator model instead of the pooling layer. Four fractionally-strided convolutions are used in the generator model to complete the generation process from random noise to images. Compared to the original GAN, DC-GAN almost entirely uses convolutional layers instead of fully connected layers, and the discriminator is almost symmetric to the generator. With the help of CNN's more robust fitting and expression ability, the subsequent GAN produces vivid images and greatly improves the diversity of images in image generation.

References is not available for this document.

MIT Libraries

MIT Libraries

ViT-R50 GAN: Vision Transformers Hybrid Model based Generative Adversarial Networks for Image Generation

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

ViT-R50 GAN: Vision Transformers Hybrid Model based Generative Adversarial Networks for Image Generation

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?