Integrating Multiple Receptive Fields Through Grouped Active Convolution | IEEE Journals & Magazine | IEEE Xplore

Integrating Multiple Receptive Fields Through Grouped Active Convolution


Abstract:

Convolutional networks have achieved great success in various vision tasks. This is mainly due to a considerable amount of research on network structure. In this study, i...Show More

Abstract:

Convolutional networks have achieved great success in various vision tasks. This is mainly due to a considerable amount of research on network structure. In this study, instead of focusing on architectures, we focused on the convolution unit itself. The existing convolution unit has a fixed shape and is limited to observing restricted receptive fields. In earlier work, we proposed the active convolution unit (ACU), which can freely define its shape and learn by itself. In this paper, we provide a detailed analysis of the previously proposed unit and show that it is an efficient representation of a sparse weight convolution. Furthermore, we extend an ACU to a grouped ACU, which can observe multiple receptive fields in one layer. We found that the performance of a naive grouped convolution is degraded by increasing the number of groups; however, the proposed unit retains the accuracy even though the number of parameters decreases. Based on this result, we suggest a depthwise ACU (DACU), and various experiments have shown that our unit is efficient and can replace the existing convolutions.
Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 43, Issue: 11, 01 November 2021)
Page(s): 3892 - 3903
Date of Publication: 19 May 2020

ISSN Information:

PubMed ID: 32750767

Funding Agency:

References is not available for this document.

1 Introduction

Convolutional neural network (CNN) has become a major topic of deep learning, especially in visual recognition tasks. After the great success at the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) of 2012 [1], many efforts have been made to improve accuracy while reducing computational budgets by using CNN. The major focus for this research was on designing network architectures [2], [3], [4], [5], [6], [7]. Recently, attempts were made to automatically generate efficient network architectures [8], [9], and the generated networks achieved a better result than the conventional networks. This approach is yet very slow and difficult to train by using feasible amounts of resources but will affect the designing of networks. In such studies, components can be considered as more important factors than network construction.

Select All
1.
O. Russakovsky et al., "ImageNet large scale visual recognition challenge", Int. J. Comput. Vis., vol. 115, no. 3, pp. 211-252, Dec. 2015.
2.
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition", Proc. Int. Conf. Learn. Representations, pp. 1-14, 2015.
3.
C. Szegedy, S. Ioffe, V. Vanhoucke and A. A. Alemi, "Inception-v4 inception-ResNet and the impact of residual connections on learning", Proc. 31st AAAI Conf. Artif. Intell., 2017.
4.
C. Szegedy et al., "Going deeper with convolutions", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1-9, 2015.
5.
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens and Z. Wojna, "Rethinking the inception architecture for computer vision", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 2818-2826, 2016.
6.
K. He, X. Zhang, S. Ren and J. Sun, "Deep residual learning for image recognition", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 770-778, 2016.
7.
K. He, X. Zhang, S. Ren and J. Sun, "Identity mappings in deep residual networks", Proc. Eur. Conf. Comput. Vis., pp. 630-645, 2016.
8.
B. Zoph and Q. V. Le, "Neural architecture search with reinforcement learning", Proc. Int. Conf. Learn. Representations, 2017.
9.
B. Zoph, V. Vasudevan, J. Shlens and Q. V. Le, "Learning transferable architectures for scalable image recognition", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 8697-8710, 2018.
10.
V. Nair and G. E. Hinton, "Rectified linear units improve restricted Boltzmann machines", Proc. 27th Int. Conf. Mach. Learn., pp. 807-814, 2010.
11.
A. L. Maas, A. Y. Hannun and A. Y. Ng, "Rectifier nonlinearities improve neural network acoustic models", Proc. 30th Int. Conf. Mach. Learn., vol. 30, no. 1, 2013.
12.
B. Xu, N. Wang, T. Chen and M. Li, "Empirical evaluation of rectified activations in convolutional network", Proc. Int. Conf. Mach. Learn. Workshop, 2015.
13.
K. He, X. Zhang, S. Ren and J. Sun, "Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1026-1034, 2015.
14.
D.-A. Clevert, T. Unterthiner and S. Hochreiter, "Fast and accurate deep network learning by exponential linear units (ELUs)", Proc. Int. Conf. Learn. Representations, 2016.
15.
B. Graham, "Fractional max-pooling", CoRR, 2014.
16.
C.-Y. Lee, P. W. Gallagher and Z. Tu, "Generalizing pooling functions in convolutional neural networks: Mixed gated and tree", Proc. Int. Conf. Artif. Intell. Statist., pp. 464-472, 2016.
17.
K. He, X. Zhang, S. Ren and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition", Proc. Eur. Conf. Comput. Vis., pp. 346-361, 2014.
18.
R. Girshick, "Fast R-CNN", Proc. IEEE Int. Conf. Comput. Vis., pp. 1440-1448, 2015.
19.
D. E. Worrall, S. J. Garbin, D. Turmukhambetov and G. J. Brostow, "Harmonic networks: Deep translation and rotation equivariance", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 5028-5037, 2017.
20.
Y. Zhou, Q. Ye, Q. Qiu and J. Jiao, "Oriented response networks", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 4961-4970, 2017.
21.
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille, "Semantic image segmentation with deep convolutional nets and fully connected CRFs", Proc. Int. Conf. Learn. Representations, 2015.
22.
F. Yu and V. Koltun, "Multi-scale context aggregation by dilated convolutions", Proc. Int. Conf. Learn. Representations, 2016.
23.
X. Jia, B. De Brabandere, T. Tuytelaars and L. V. Gool, "Dynamic filter networks", Proc. 30th Int. Conf. Neural Inf. Process. Syst., pp. 667-675, 2016.
24.
J. Dai et al., "Deformable convolutional networks", Proc. IEEE Int. Conf. Comput. Vis., pp. 764-773, 2017.
25.
Y. He, M. Keuper, B. Schiele and M. Fritz, "Learning dilation factors for semantic segmentation of street scenes", Proc. German Conf. Pattern Recognit., pp. 41-51, 2017.
26.
Y. Jeon and J. Kim, "Active convolution: Learning the shape of convolution for image classification", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1846-1854, 2017.
27.
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. L. Yuille, "DeepLab: Semantic image segmentation with deep convolutional nets atrous convolution and fully connected CRFs", IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834-848, Apr. 2018.
28.
L.-C. Chen, G. Papandreou, F. Schroff and H. Adam, "Rethinking atrous convolution for semantic image segmentation", CoRR, 2017.
29.
S. Xie, R. Girshick, P. Dollár, Z. Tu and K. He, "Aggregated residual transformations for deep neural networks", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 5987-5995, 2017.
30.
X. Zhang, X. Zhou, M. Lin and J. Sun, "ShuffleNet: An extremely efficient convolutional neural network for mobile devices", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 6848-6856, 2018.

Contact IEEE to Subscribe

References

References is not available for this document.