I. Introduction
Protein engineering enables the modification and optimization of protein sequences or structures for diverse applications, revolutionizing our ability to manipulate biological systems at the molecular level [1], [2]. Deep learning approaches offer higher efficiency and better performance in protein engineering tasks by efficiently processing vast amounts of protein data, capturing complex patterns in sequences, predicting protein properties and structures with increasing accuracy, and facilitating rapid exploration of optimal protein designs [3]–[5]. In protein engineering, protein sequences serve as the fundamental data format, often referred to as the "language of life sciences" due to their role in encoding biological information [6], [7]. The inherent sequential similarities between protein sequences and natural language have already led to the parallel development of foundation models, namely protein language models [8] and large language models (LLMs) [9]. Because LLMs have demonstrated strong capabilities in text understanding [10], recent research has begun exploring their potential in protein understanding through multi-modal large models [11], [12]. Previous attempts have focused on integrating protein sequences or structures in the form of protein graphs with textual content using extra encoders [13], [14], as depicted in Fig. 1(a). However, these approaches fail to fully leverage the intricate connections between protein sequences and natural language, leading to higher model complexity and sub-optimal performance. This limitation highlights the need for a model that can directly understand and process protein sequences without relying on external encoders, potentially improving both efficiency and performance in protein engineering tasks. To address this gap, we present TourSynbio7B, the first multi-modal large model specifically designed for protein engineering tasks without external protein encoders as shown in Fig. 1(b). TourSynbio-7B is post-trained and instruction fine-tuned on InternLM2-7B [15] using ProteinLM-Dataset [16], demonstrating that LLMs themselves can learn to understand proteins in the form of language.