I. Introduction
Age estimation is a highly important topic and a very active research field in both clinical and radiological medicine [1], [2]. In clinical medicine, age estimation plays a crucial role in diagnosing endocrinological disorders, including accelerated or delayed development in adolescents [3]. It also aids in optimizing the timing of pediatric orthopedic surgeries for children approaching puberty, such as procedures for correcting leg length discrepancies or spinal deformities [4]. Artificial age estimation methods, such as those described in [5] and [6], rely on visually observing the ossification of growth plates in bone radiographic images. These methods involve comparing the observed growth patterns with reference data to estimate the age of an individual based on hand or wrist radiographs. Furthermore, referring to the Greulich and Pyle (GP) and tanner-whitehouse 2 (TW2) [7], many studies [8], [9], [10], [11], [12], [13], [14], [15] have summarized the artificial method of knee age prediction based on knee X-ray examinations or magnetic resonance imaging (MRIs). Specifically, the knee images can yield valuable information for three parts: the distal femur, proximal tibia, and proximal fibula [16]. These regions provide essential insights into the ossification stage of the growth plates, which is crucial for age estimation purposes [12]. The process involves categorizing individual slices of a sample into distinct growth stages based on the assessment of skeletal maturation in the knee. Estimating the age of the sample entails aggregating the results obtained from various slices. However, conventional approaches to age assessment in knee age prediction rely on manual analysis performed by expert radiologists, a task that requires extensive professional training. Furthermore, the determination of distinct stages for various image slices in each sample poses a complex challenge, often demanding substantial time and labor investments. When artificially estimating the age of each sample, the age of each slice needs to be separately predicted, which also makes different slices from one sample not correlated at the image level. In addition, even experienced radiologists may encounter challenges such as tedium, time consumption, and subjectivity when diagnosing the age of the knee. These factors can contribute to significant subjective errors, reduced prediction accuracy, and decreased efficiency.