1. Introduction
Realistic rendering of city-scale scenes is a crucial component of many real-world applications, including aerial surveying, virtual reality, film production, and gaming. While NeRF [22] has made notable advancements in rendering objects and small-scale scenes, only a few early attempts [30], [33], [37] have sought to extend NeRF and its variants to larger city-scale scenes. Due to the paucity of benchmark dataset, the complexity and challenges of city-scale neural rendering have not been thoroughly investigated.