1. Introduction
Geometry Problem Solving (GPS) aims to obtain the an-swer of problem based on the given geometric diagram and textual problem description. It has drawn growing attention recently [4], [15], [22], [31] due to its application prospects in intelligent education field in high schools [2], [20]. Dif-ferent from general question answering (QA) tasks, GPS requires the model to possess the abilities of symbolic abstraction, logical reasoning and algebraic calculation simultaneously [6], [24], making it a challenging task even for large multimodal models (LMMs) like GPT-4V [37]. Therefore, recent works attempt to combine the procedural power of symbolic models with the general power of neural models. Among these, symbolic-based approaches [22], [25], [29], [32] first parse the geometric diagram and problem text into for-mal language representations, and then continuously pre-dict and apply predefined theorem rules to obtain the final answer. Neural-based approaches [4], [5], [4]0 tend to trans-fer the original problem into multi-modal features, and feed them into generative models to acquire an executable pro-gram sequence for an answer. However, they both suffer from the following two limitations which hinder their application in practical scenarios.
Output examples of two mainstream gps methods. case 1 and case 2 are chosen from inter-gps [22] and ngs [4], respectively. Content with red background is seen as inexplicable.