Motivation and Challenges
With the rapid development of foundation models (FMs) in recent years, exemplified by large language models (LLMs), the research paradigm in artificial intelligence (AI) is undergoing a transition from specialized models tailored to particular tasks toward FMs capable of addressing a variety of downstream tasks. FMs are pre-trained on vast amounts of multi-modal data, and their extensive parameter size endows them with emergence ability. Furthermore, employing zero-shot or few-shot learning, FMs demonstrate the capability for rapid adaptation to diverse tasks, achieving performance levels that approximate those of specialized models. However, the further development of FMs faces two significant challenges. One challenge lies in the limited sources, quality, and scale of multi-modal training data of FMs collected and curated from Internet content. Most data within networks are presently dispersed and poised for extraction from wireless devices. The other challenge arises from the exponentially growing parameter sizes of FMs, demanding training and inference on large-scale computing clusters comprised of high-performance GPUs, as outlined in Table 1. This trend leads to considerable energy consumption and hardware expenses, impeding the sustainable development of FMs. Huggingface's research on the BLOOM model, with 176 billion parameters, highlights the high energy demands of FMs. BLOOM's initial training consumes 433,000 kWh of electricity, equal to the yearly usage of 117 households.