I. Introduction
In traditional image classification research tasks, the target images for image analysis are usually at coarse- grained level, such as distinguishing the differences between multiple categories of birds, airplanes, and cars. Along with the social progress and the need for intelligence, there is an increasing demand for fine-grained level image classification in various fields for example ecological protection, public security [1], pedestrian re-identification [2], and other artificial intelligence scenarios. This class of tasks is the division of subcategories of the same class of objects (e.g., more detailed differentiation of birds) and is called fine- grained image classification.