I. Introduction
Existing object recognition methods based on machine learning mainly rely on supervised learning by allowing the model to learn the characteristics of objects through labeled data. According to current research results [1], a larger amount of data generally can bring better results. However, besides the fact that it is unpractical to annotate all the data, the high cost of human labeling and the emerging of new objects must also be considered.