I. Introduction
Introducing robots and artificial intelligence in our everyday lives has opened a new era of possibilities and challenges [1]. Among the many areas impacted, industrial sectors such as manufacturing, logistics and warehousing, energy, automotive, construction, and agriculture [2] have significantly transformed, particularly in how work is performed. The emergence of human-robot collaboration has changed the landscape of these sectors, creating a synergy where humans and robots can work together in harmony to achieve common goals. Integrating robots into human-occupied spaces has drastically improved productivity, efficiency, and safety [3] while also addressing the issue of tedious, repetitive, and hazardous tasks that are unsuitable for humans. Despite the advantages, ensuring a harmonious and safe co-existence of humans and robots in the same workspace has been a significant challenge [4]. The fact that humans and robots have different capabilities contributes significantly to the problem. Humans have complex cognitive abilities and can understand, interpret, and react to their surroundings in ways that robots cannot. Conversely, robots may not anticipate and respond to human behavior accurately and efficiently because they are programmed to perform specific tasks. Furthermore, The dynamic nature of human behavior, combined with the rigid and programmed nature of robotic behavior, complicates the issue even further [5]. As a result, accurately predicting human motion is crucial for enhancing the safety of human-robot collaboration. Robots that can accurately anticipate human movements and adapt their actions in real-time would undoubtedly revolutionize industrial operations. Despite the advances in machine learning and robotics, there is yet to be a method that uses different behavioral styles to predict the future movements of human joints. To address this problem, our work applies a novel solution: a transformer-based multi-style network that predicts human joint movements in two dimensions on the industrial setting dataset. The MSN we employ utilizes two sub-networks: style proposal and stylized prediction. The idea behind this novel approach is to give human joints multi-style predictions in a categorical manner. The network consists of multiple style channels, each corresponding to a unique and specific action style. By adopting this unique approach, we aim to capture the inherent diversity in human motion, providing a more accurate and comprehensive prediction of future human joints. Through a detailed exploration of the benefits and potential of this method, we aim to pave the way for more secure and efficient human-robot collaborations, bringing us one step closer to a future where humans and robots can work side by side seamlessly and safely.