Introduction
Science, technology, engineering, and mathematics (STEM) education has been adopted as a priority for students worldwide [1]. The National Science Foundation report on science and engineering indicators reflects the growth of the STEM labor force and education [2]. The report shows that the investment of the United States in STEM research and development has increased significantly in recent years. Many other countries are taking steps to ensure their students have access to high-quality STEM education. For example, in 2015, Australia launched its “National STEM School Education Strategy 2016-2026,” focusing on foundation skills, developing mathematical, scientific, and digital literacy, and promoting problem-solving. This program has reported promising outcomes around the proposed initiatives [3]. Likewise, Finland launched the “National STEM Strategy and Action Plan in 2021” to boost STEM education, research, and careers [4]. With the increasing demand, different trainers and professionals should be able to instruct all these concepts and materials at multiple educational levels. Then, a need emerged for qualified professionals to design and teach materials on STEM effectively in different educational institutions [5]. Teaching methods must be tailored to engage young learners constructively. With technological advancements and the widespread use of mobile devices, there are challenges and opportunities to innovate traditional learning methods. Among these, virtual reality (VR) has emerged as a promising educational tool to enhance learning in STEM disciplines [6].
Immersive virtual reality (IVR) can provide the possibility to explore different concepts and manipulate reality within a simulated environment. Understanding affordance as a characteristic of the environment that, when perceived, affords an agent the opportunity for action based on the agent’s capabilities [7], IVR has two main affordances: the sense of presence and agency. The sense of presence, defined as the “sense of being there” [8], immerses users in a 3D simulated world. Agency establishes the sense of ownership over one’s actions within the environment and enables potential interactive learning scenarios [9]. Through IVR, users can be immersed in a 3D simulation, for example, being teleported on a historical architecture to explore unknown places [10] or in front of a hazardous procedure in a construction environment [11]. Modern VR head-mounted displays (HMDs), such as HTC VIVE or Meta Quest, allow users to experience high immersion. Unlike mobile VR (e.g., Google Cardboard or Samsung Gear VR), HTC VIVE or Meta Quest can be considered high-end HMDs due to the inclusion of various accessories and features designed to control the user’s immersion. High-end HMDs enable users to move and interact with the virtual environment in six degrees of freedom (6 DOF), consisting of three translational movements (left/right, up/down, and forward/backward) and three rotational movements (roll, pitch, and yaw) [12]. By leveraging these hardware capabilities, designed experiences across different levels of immersion can transform IVR into a truly unique learning experience, especially for immersive learning. Immersive learning occurs when a student experiences a technological, narrative, and challenge-based state of deep mental involvement within a simulated reality isolated from the real world [13].
IVR benefits in education and STEM have been explored, and the conclusions are not ready to be confirmed. Findings about whether the usage of IVR in education is practical or necessary have been contradictory. Researchers have highlighted the need to investigate IVR’s potential advantages in STEM, particularly in higher education, as previous analyses have shown more significant effects on learning outcomes in K-12 scenarios compared to higher education settings [14], [15]. Despite this, a possible adoption of the technology could happen, and instructors must develop competencies to integrate IVR into the educational curriculum effectively. Designers and developers must understand how IVR can be implemented, including developer tools, recommended frameworks, possible learning approaches, devices, and expected learning outcomes. Moreover, unlike available third-party VR applications, customized IVR solutions can enable instructors to create experiences tailored to the specific needs of their students and educational contexts. Therefore, it is crucial for designers and developers to be aware of current trends in IVR and the best practices for implementing these experiences to teach STEM concepts effectively. To address this issue, we systematically reviewed the design and development of IVR experiences and their different effects (advantages and disadvantages) on learning and user experience in higher education.
We divide this review paper into the following sections. In Section II, we discuss related reviews, findings, limitations, and opportunities. In Section III, we introduce the conceptual framework for categorizing and understanding the reviewed papers and their findings. In Section IV, we present the methodology used in the conducted review. In Section V, we compile all results and metadata from the systematic review and categorize the papers based on the proposed conceptual framework. In Section VI, we delimit the discussion around the findings and address the research questions. Finally, we present our conclusions in Section VII.
Related Works
Researchers have recently shown interest in using IVR for education and training. They have conducted several reviews and surveys to establish the current state and identify opportunities and gaps [16], [17], [18]. In education, these surveys have disclosed various relationships and recommendations regarding IVR usage in the classroom. IVR has shown advancements in learning, evidenced by students’ positive attitudes, engagement, learning outcomes, and performance across different STEM fields [17]. However, few authors have grounded their IVR designs or activities in theoretical learning frameworks or evaluated knowledge acquisition and skill development [14], [16]. Radianti et al. [16] examined VR in education concerning learning content, VR design elements, and learning theories as a foundation for successful VR-based learning. However, their study focused solely on papers published between 2016 and 2018, suggesting that any conclusions regarding the adoption and utilization of IVR may have changed since then. Notably, their findings showed that most papers (68%) did not include learning theories as the foundation of their VR design. Won et al. [19] classified the design elements used and their level of integration for IVR in education, identifying patterns in the use of VR affordances in these studies. They reviewed 219 studies, categorizing design features based on learning tasks and context. However, the authors did not provide details on how the reviewed experiences were implemented or whether the authors of those papers developed a customized IVR application. Lui et al. [14] mapped different design approaches for IVR experiences in higher education based on learning theories. They outlined various strategies for designing IVR educational experiences, considering factors affecting learning outcomes and cognitive load. However, their review did not provide details on how the IVR experiences reviewed were leveraged or which methodologies those studies followed. Additionally, their review focused narrowly on science-related topics, disregarding other STEM concepts.
This review focuses on the design and development of IVR learning experiences, particularly their implemented features (e.g., haptic feedback, realistic hands, virtual avatars) and their effects, including advantages and disadvantages, on learning outcomes (e.g., cognitive load, motivation) and user experience (e.g., usability, presence) in higher education STEM-related concepts. We focused this review exclusively on STEM concepts due to their relevance to technological advancements. Additionally, technologies such as VR have the potential to enhance the instruction and learning of STEM topics due to their affordances. Lui et al. [20] noted mixed results on the impact of IVR on learning outcomes in higher education, contrasting with the benefits observed in K-12 and high school education. Therefore, this review centers on higher education to provide insights into how customized IVR experiences can enhance student learning in these settings. Previous reviews have reported findings around the design choices and the delimited VR features to enhance learning in science fields [14], [16], as well as other topics, including training and health [19]. Some reviews have also reported incorporating learning theories [14] and the focus of the state-of-the-art literature on learning through VR. However, an analysis of the development of these applications, specifically focusing on high-end HMDs, has not been detailed. Finally, the advantages and disadvantages of the designed IVR experiences for STEM concepts in learning and usability warrant further discussion.
Conceptual Framework: Embodiment Level, Immersion, and Formative Assessment
Three taxonomies or frameworks inform our conceptual framework: taxonomy of embodiment [21], framework for immersion [22], [23], and framework for learning assessment [24]. We summarized the framework in Figure 1.
Johnson-Glenberg and Megowan-Romanowicz [21] proposed a taxonomy of embodiment based on three factors: sensorimotor engagement (SE), gestural congruency (GC), and immersion (IM). Sensorimotor engagement measures physical involvement in learning, gestural congruency assesses how well gestures match learning content, and immersion refers to the learner’s feeling of being inside the experience. They propose four degrees of embodiment:
First-degree: Little to no sensorimotor engagement, gestural congruency, and immersion;
Second-degree: Low to moderate sensorimotor engagement, some gestural congruency, and moderate immersion;
Third-degree: Moderate to high sensorimotor engagement, high gestural congruency, and high immersion; and
Fourth-degree: High sensorimotor engagement, high gestural congruency, and high immersion.
Embodied learning is grounded in the theory that using bodily actions and interactions in VR can enhance learning [25]. The degree of embodiment has been used to clarify how the designed IVR environment is composed in relation to the instructional topic and the integration of VR affordances. Johnson-Glenberg et al. [26] evaluated different degrees of embodiment by comparing PC and VR with varying interaction levels. They found that the low-embodied VR group performed significantly worse than the high-embodied VR group, primarily due to the lower agency offered in the low-embodied VR. Chatain et al. [27] conducted a study on varying degrees of embodiment in a math lesson based on the taxonomy but did not find differences in learning outcomes across different degrees of embodiment. These contrasting results provide an opportunity to evaluate how embodied IVR applications can be designed to promote active learning.
Shavelson et al. [24] outlined a conceptual framework for the embedded formative assessment. The authors define an embedded assessment as a formative evaluation to diagnose students’ understanding and actions during a lesson, thereby enhancing their learning achievement, motivation, and conceptual change. The framework is divided into four types of knowledge and reasoning necessary for achievements in learning: declarative (knowing facts and concepts), procedural (knowing steps to accomplish tasks), schematic (connecting and explaining knowledge), and strategic (knowing when and where to apply knowledge). The relevance of formative assessment in STEM has been explored [28], [29]. When designing a lesson, regardless of the medium, different objectives and expected outcomes are defined. The lesson should provide learners with sufficient tools and scaffolding to construct new knowledge, making assessment a central element of a learning environment aimed at improving students’ learning by tracking their progress [30].
Formative assessment assists the learning process and is often referred to as “assessment for learning.” It involves seeking and interpreting evidence to help learners and their teachers determine the learners’ current level of understanding, identify their learning goals, and decide on the best strategies to achieve those goals [29]. This approach can also be applied to lessons using immersive technology, such as VR [31], [32]. Given the importance of formative assessment in learning, we consider it necessary to include an overview of how the authors of customized IVR experiences implemented their assessments and the objectives they aimed to achieve.
Dede et al. [22], [23] described the VR scenario’s main characteristics of immersion for learning, including sensory, actional, narrative, and social features. These features refer to how the design of immersive experiences is managed to leverage practical learning applications. They provide capabilities to explore novel actions (actional), trigger semantic associations through symbolisms (narrative), enhance the sense of presence through immersive devices (sensory), and include the degree of collaboration and work with pairs (social). However, Dede et al. [22] stated that immersive experiences could potentially provide learning through constructivist approaches solely on materials that require 3D to be explained (e.g., understanding of the solar system’s ellipsis) or where embodied cognition can be applied (e.g., empathy through first-view experiences). Other mediums, such as 2D simulations, non-immersive environments, and traditional non-digital elements, could be practical or even more efficient than IVR, depending on the concept of the instruction. Therefore, an overview of how immersion is integrated into customized learning experiences can provide insights into the enhancements authors aim to achieve in their designed lessons. Dede’s immersive interface design framework can be adapted to identify the technological and pedagogical features used in these immersive learning experiences.
We adapted these frameworks to understand the composition and enhancement of IVR experiences. The IVR affordances and their integration into the reviewed design examples provide guidelines on effective ways to develop and customize IVR for learning. With our conceptual framework, we aim to demonstrate the level of integration achieved by authors in their designs and developments, specifically examining patterns in these developments and their potential benefits or drawbacks for learning. Based on the previously discussed frameworks and objectives, this review investigates the following research questions:
RQ1: What are the development features and methodological approaches for designing and evaluating IVR learning experiences?
RQ2: How the customized IVR applications are classified in terms of degree of embodiment, immersion, and type of learning?
RQ3: How do the targeted STEM topics in the customized IVR experiences effectively improve learning outcomes?
RQ4: What are the reported advantages and disadvantages of learning outcomes and user experience when using customized IVR experiences?
Methodology
A. Search Strategy
We aimed to explore the current trends in the design of IVR in STEM. We searched recent publications and studies on multidisciplinary databases according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [33]. We have reviewed seven scientific databases: IEEE Xplore, ACM, Wiley, Springer, Elsevier, Taylor & Francis, and Web of Science. The database platform provides different methods and searches advanced tools to access the relevant literature according to the delimited keywords. The refined search was done, including special Boolean operators (e.g., “AND,” “OR,” and “NOT”), to narrow down the possible results coming from the consulted databases. The key search used to perform the search on each database is based on the main keywords that align with targeted publications and intended revision. The key search used is stated as follows:
(“immersive virtual reality” OR “virtual reality”) AND “higher education” AND (design
The keys selected are related to our intended objective around the article’s focus and content. The term “higher education” is presented as immovable due to our leading population group of interest, similar to the terms “immersive virtual reality” and “virtual reality.” The key “virtual reality” is often used in different contexts and has meant slightly different technologies over the years, such as computer-based interactions, 3D simulations (e.g., second life), and high-end HMD experiences. For the possible VR misconception, we opted to include the Boolean operator “
B. Inclusion and Exclusion Criteria
Consumer-grade VR HMDs, such as HTC VIVE, Sony PlayStation VR, and Oculus Rift CV1, began to be announced in 2016, significantly boosting interest in VR trends [34]. By this time, HMDs had extended the visualization aspects of VR through a more interactive experience (e.g., isolation through HMD and controllers) to enhance immersion potentially. Additionally, in 2016, the Horizon expert panel [35] anticipated widespread adoption of virtual and augmented reality in higher education in two years; however, such a range was extended to five years with the emergence of the mixed reality paradigm. We considered peer-reviewed studies published from January 1, 2016, to August 31, 2023. The literature review presents the current state of designed and developed IVR systems and their advantages and disadvantages in terms of learning outcomes and user experience. We delimited the inclusion criteria that fit our research objectives as follows:
Papers published between 01/2016-08/2023;
Paper with full text available;
Peer-reviewed papers;
Samples related to higher education at any level (e.g., undergraduate or graduate students);
STEM-related IVR lesson:
The authors aim to teach/train around a concept; and
Authors present the development of an instructional IVR experience, including:
Evaluation of one or multiple components of the developed tool with conditions or comparison with traditional methods; and
Discuss the advantages/disadvantages of using VR and the designed features (e.g., haptic feedback, pedagogical agents).
C. Exclusion Process and Screening of the Papers
From the listed strategies, we identified 2175 studies for consideration. After eliminating duplicates and entries with incomplete metadata (e.g., those lacking authors or with one-word titles), we were left with 2012 papers. Subsequently, we applied filters to these papers using the relevant keywords outlined in the specified categories (see Table 1) through an automated Python script. In this script, we scored each paper based on the occurrence of words from all categories on the paper metadata (e.g., title, keywords, abstract, journal, series, and others). We discarded papers with a resulting zero score. Furthermore, we implemented a string match filter to retain only papers containing at least one specified word, such as VR, virtual reality, immer*, headset, head mount, head-mount, HMD, mixed reality, extended reality, and XR. Following this process, we screened the titles and the abstracts of 713 filtered papers to validate their eligibility. We excluded irrelevant entries by manually reading full texts and excluding papers.
Two of the authors examined the 625 papers based on the exclusion criteria (see Table 2) by reading the titles and abstracts. From the 625 papers, the authors, after a voting process followed by a discussion phase, decided to read the chosen 88 papers to assess their eligibility for inclusion in the systematic review. Exclusion criteria comprised the absence of conducted user studies, lack of focus on learning outcomes or STEM topics, non-use of high-end HMDs, absence of details about the design and development of the VR tool, and usage of third-party VR solutions for their studies. We compiled information regarding the learning design, virtual reality device, prominent features developed for the IVR experience, and targeted population, addressed STEM topics and findings, and discussed the advantages/disadvantages of the developed tools. The selection and filtering of the studies included in this systematic review are delimited in Figure 2.
The literature identification and screening process flow chart is based on the PRISMA guidelines.
D. Risk of Bias Assessment
We conducted a thorough risk of bias assessment by crosschecking all the authors’ choices. We adopted the Cochrane Collaboration’s ROB-2 (Risk of Bias version 2) tool for randomized studies to assess the “intention-to-treat” effect [36]. Bias was categorized into five domains: (1) bias arising from the randomization process, (2) bias due to deviations from intended interventions, (3) bias due to missing outcome data, (4) bias in measurement of the outcome, and (5) bias in the selection of the reported result. Each study was scored with an indicator of risk: “low risk,” “some concerns,” or “high risk.” For non-randomized studies that focused on validating interventions rather than conducting comparison studies, we used the ROBINS-E (Risk of Bias in Non-randomized Studies of Exposures) tool [37]. In these cases, bias was categorized into seven domains: (1) bias due to confounding, (2) bias in the selection of participants, (3) bias in classification of exposures, (4) bias due to deviations from intended exposures, (5) bias due to missing data, (6) bias in measurement of outcomes, and (7) bias in the selection of the reported result. Following the guidelines of ROB-2 and ROBINS-E, we answered signaling questions and used algorithms to estimate the level of risk for each domain and the overall risk. The risk of bias assessment for all studies is provided in the supplementary material.
E. Categorization of the Papers
1) Overview
We rigorously identified and categorized studies aligned to suggested content analysis guidelines [38]. We recognized essential characteristics of the papers’ content based on each research question’s objectives. Encoded categories guided the review’s findings, focusing on the papers’ design and development description of IVR experiences (RQ1). The design should include decisions that align with the author’s reasoning for choosing IVR and fit the learning content. We categorized the papers according to the proposed conceptual framework, considering the degree of embodiment, level of immersion, and learning type targeted in the designed IVR experiences (RQ2). Other observations included diverse STEM topics addressed in the developed IVR tool and its effectiveness regarding learning outcomes (RQ3). When considering design, authors had to choose topics to evaluate or emphasize, such as complex invisible phenomena or challenging-to-access materials to be replicated in 3D. Additionally, we critically classified the discussed advantages and disadvantages of the implemented IVR design. The authors drew on different arguments from previous literature, highlighting various benefits (e.g., VR increases motivation) and limitations (e.g., possible cognitive overhead) of IVR for learning. Consequently, we compiled observed advantages and disadvantages of the developed IVR tool resulting from the measured student’s perception through self-reported metrics (quantitative methods) or interviews and think-out-loud (qualitative approaches) (RQ4).
2) Conceptual Framework Categorization
Based on the proposed conceptual framework (see Figure 1), we classified the reported papers according to the delimited constructs. This classification relies on the reported design, development, and results from each paper, including elements such as the selected point of view (POV), environment layout, interactions (with controllers or hand tracking), and multimodal feedback (e.g., haptic or audio feedback), assessments used and user feedback and self-reported ratings. For the embodiment degree, we considered factors such as physical involvement, required gestures or bodily movements, and user engagement with the content (e.g., 360-degree environments). We categorized the papers by the degree (low, medium, or high) of sensorimotor engagement, gestural congruency, and immersion, assigning them to one of the four levels described in Section III. For immersion, we assessed integration based on sensory, actional, narrative, and social features. Sensory immersion was classified by the level (high to low) of representational fidelity, graphics quality, and multimodal feedback integration. Actional immersion was evaluated by the extent of user interaction with the environment, from passive viewing to active control and modification. Narrative immersion was assessed by including context or storylines, such as assigned roles, missions, achievements, and difficulty variations. Social immersion was determined by the presence of peer interaction and collaboration components. For learning type, we examined the knowledge assessment described in the user studies. Papers were categorized based on the knowledge types outlined in Section III, and each paper was labeled according to one or more of these categories.
Results
In this section, we compiled the results from the systematic review according to the delimited research questions and objectives. We have included 30 papers. Of the papers, 56.67% were submitted to scientific journals, 36.67% were published as conference papers, and 6.67% were included as book chapters. In terms of publication, the “British Journal of Educational Technology (BJTE)” was the one with the highest frequency (3 articles) around the included papers, followed by “The Journal of Computer-Assisted Learning (JCAL)” (2 articles). Considering the databases we used to retrieve the papers, the distribution from the included publications is shown in Figure 3.
We found that the papers were published in different institutions in the USA (23.33%), Germany (13.33%), China (10.00%), Spain (6.67%), Taiwan (6.67%), Canada (6.67%), Australia (6.67%), Czech Republic, Belgium, Malaysia, New Zealand, Italy, Austria, Denmark, and Thailand (all with 3.33%). The country data is taken from the authors’ affiliation, and we have considered it the most frequent country among the papers’ listed authors. The year distribution of the papers is summarized in Figure 4. The years 2021 and 2023 could be considered the years with the most published papers, with ten and nine, respectively. The data reflects the trends of using high-end HMD for these learning activities in current years. The data shows that this review did not include publications published around 2022 from the revised papers.
A. Methodological Approaches and Metrics (RQ1)
In terms of research methods and how the authors explored IVR experiences and their effects, the results show that authors tend to use mixed (46.67%), quantitative (40%), or qualitative (13.33%) methodologies. Only a few paper publications explicitly stated the research methodology as classified (8 out of 30). However, we inferred their intended research methodology based on the described procedures in their user studies. In Table 3, the selected papers are classified based on their intended research methods.
As we reviewed the research paper that includes user studies, the authors aimed to answer specific research questions. To achieve this, they utilized various metrics to assess the usage of IVR and its potential impact on the students’ experience and performance. In Table 4, we categorized and described the used metrics. The authors focused on previously implemented surveys to measure users’ self-perceived aspects such as presence, immersion, engagement, simulation sickness, and usability [52], [53], [54], [62]. Instead, other authors preferred to design questionnaires to target the expected measurements that fit their user studies [39], [52], [53], [54], [58]. The more straightforward way that authors used to report the analysis of certain behaviors or emotions, such as engagement, was through a single question such as “How engaging did you find the game?” [26], where the authors quantified the self-perceived users’ experiences through standardized scales (e.g., Likert scale).
The authors aimed to measure learning performance in various ways, often using their own designed surveys tailored to the targeted learning content [55], [57], [59], as presented in Table 4. Other standardized surveys that address learning from a specific topic are not reported, such as the Geologic Block Cross-sectioning Test (GBCT) used to evaluate knowledge around earthquakes [59]. Bagher et al. [59] explored the IVR’s use to enhance student’s learning experience and performance in drawing earthquake location cross-sections in 3D, focusing on learning performance in geosciences. The self-developed knowledge questionnaire and the semi-structured interview were also a way to assess learning.
Among the less used but promising metrics are multimodal measurements for student performance, including video screen recording [40], performance video recording [20], [40], [43], eye-tracking [20], and physiological measurements [60]. Video recordings allow capturing participants’ activities in the virtual space on the HMD through video streams. Additionally, they allow the analysis from different perspectives [91]. Regarding eye tracking, the authors can track what the participants are focused on, for example, to help students who struggle with specific parts of the 3D simulation [20]. The authors aimed to register user actions to understand the interactions with the implemented features [40], user attention [20], and sensor data to identify any cognitive workload during the intervention [20]. This reflects one advantage of developing applications tailored to specific research objectives: direct access to the source code allows designers to implement mechanisms for tracking learners’ performance, such as embedding data in the IVR application interaction. An example is reported in Santos-Torres et al. [52] paper, which includes aspects such as the number of errors and the overall time of the whole task when using the HMD.
B. Development Platforms, Toolkits, and Pipelines (RQ1)
In discussing the design of IVR experiences, the authors provided details about their developed experiences, including features and the utilized developer resources. For development toolkits, authors primarily relied on existing game engines that facilitated the implementation of graphical interfaces specifically for VR. Authors used the Unity game engine the most, with 76.67%, while the Unreal game engine was utilized for one of 30 papers (3.33%). The remaining papers (20%) did not provide details about the development tools used for the IVR platforms. Regarding implementation details, a few authors discussed how the tool was developed and described the pipeline they followed for the developed IVR experience.
Checa et al. [45] developed a multiplatform (VR and Desktop) serious game experience to teach undergraduate computer hardware assembly concepts. The authors provided details around the used pipeline as (1) creation of 3D model using Blender software and imported 3D models, obtained under Creative Commons (CC) license, from different sources; (2) integration of these models in the Unreal game engine, which offers a high capacity to create photorealistic environments and its visual scripting system; (3) development of the 3D virtual environments; (4) creation of the VR learning experience; and (5) adaptations for VR and desktop applications. Try et al. [64] described their development pipeline in three main stages: (1) draw, referring to the collection of the 3D object models used in designing the VR application such as the laboratory building (made with Sketch-Up and Autodesk AutoCAD) and two nondestructive testing equipment (modeled with Blender), (2) build, enclosing to the coding phase on the Unity engine, using C# programming language as well the exported format (.exe), and (3) test, involving validation test with students.
The authors detailed the use of HMDs in their studies. The listed HMDs include the HTC Vive (50%), Meta Quest (16.7%), Oculus Rift (10%), Meta Quest 2 (10%), Oculus Go (6.7%), and 6.7% authors who did not specify the HMD they used in their papers. The HTC Vive was the most used HMD, with half of the papers relying on the device’s capabilities. The authors in various papers justified their choice of HMD for their IVR experiences. For instance, Franzluebbers et al. [54] emphasized the Meta Quest’s affordability and features like its high-resolution display and tracking capabilities. Qian et al. [55] detailed the HTC Vive’s hardware specifications and tracking systems. Some studies integrated additional hardware, such as Tobii’s eye-tracking technology [20], [60]. Qian et al. [55] developed an IVR experience specifically for Meta Quest devices. At the same time, other authors noted the Meta Quest 2’s limitations in rendering highly realistic graphics due to its computing power constraints [53]. In a specific case discussed by Arntz et al. [41], their developed VR application serves as a substitute for accessing real photovoltaics (PV)-arrays, with the simulation output resembling actual machinery. This application requires providing real-time data in the virtual representation. The authors integrated a back-end solution using Modbus data into a MariaDB storage. Similarly, Qian et al. [55] detailed the integration of VR hardware to facilitate user interactions, utilizing the tracking system to calculate the coordinate transformation of the HTC HMD and controllers through the lighthouse base station and OptiTrack cameras. They provided equations necessary to retrieve the device’s positions. Another aspect to consider when developing such applications is the expected duration of the IVR experience. The authors reported this information, which we presented in different ranges: less than 10 minutes (3.33%), between 10–20 minutes (23.33%), between 20–30 minutes (13.33%), between 30–40 minutes (13.33%) and 40 or more minutes (13.33%). Notably, 47% of the studies did not report the duration.
C. Degree of Embodiment and Immersion (RQ2)
In Table 6, we classified the developed IVR experiences according to the proposed conceptual framework. The majority of the papers focused on providing a third-degree embodiment (46.43%), followed by the fourth-degree (42.86%) and the second-degree (10.71%). No first-degree embodiment was discussed due to the nature and filtering established for this review, which is aimed at discussing immersive learning experiences. Among the fourth-degree examples is MaroonVR [62], an interactive IVR physics laboratory where students engage with simulations related to electromagnetism and electrostatics. This IVR experience is adaptable to various platforms, including desktop and mobile. Regarding embodiment properties, MaroonVR features a laboratory setting with gesture congruency for interaction and room-scale movement within the 3D environment. Similarly, Tang et al. [40] designed an IVR experience for visualizing molecular interactions, highlighting five key multimodal affordances: viewing, scaling, sequencing, modeling, and manipulating. Their design enhances these affordances through interactive features, such as editing stages of the visualization and engaging in shared immersive environments.
For a third-degree embodiment experiences, Checa et al. [45] created a serious VR game for teaching computer hardware assembly. Their experience includes a step-by-step tutorial, continuous feedback from an assistant robot, and a menu with component information. Gesture congruency is evident in hands-on activities where students assemble computer parts. However, limited movement and multimodal effects restrict this IVR experience to a third-degree embodiment. In the second degree, Wang et al. [50] investigated the impact of emotion on engagement and learning in a VR STEM activity but provided minimal details on tools or interactions. Based on assembly tasks, the assumed grabbing option results in a low sensorimotor degree. Additionally, 37% of the papers named their implemented tools.
Regarding immersion level, the designed IVR experiences were primarily considered to have high sensory immersion (89.29%), followed by medium sensory immersion (10.71%), with no instances of low sensory immersion due to the use of high-end devices. For actional immersion, high levels were most common (85.71%), followed by medium (10.71%) and low (3.57%). In terms of narrative immersion, the majority of approaches were medium (60.71%) and low (10.71%), with fewer papers implementing high narratives (28.57%). Similarly, for social immersion, most papers did not focus on this aspect, with low integration being the most common (60.71%), followed by medium (21.43%) and high (17.86%).
D. STEM Topics and Types of Learning Assessed (RQ2 and RQ3)
Papers conducted user studies on higher education scenarios, including sample students related to the subjects or targeted population in their designed research. The studies focused on different samples of higher education level and other stakeholders, such as undergraduate students (56.66%), graduates or postgraduate level (13.33%), or either both groups (13.33%); also, reported papers have samples with a variety of groups including students, professors, experts or workers (16.66%).
The selected subjects to instruct through IVR lessons are relevant points to discuss in this review. In Table 5, we listed the learning topics of the designed IVR experiences and classified them based on the STEM focus. From the selected topics, we found papers oriented through science concept (43.33%) for the explorations of topics such as geography [52], [59], biology [20], [26], [44], chemistry [46], [55], and physics [62]; Technology (20%) in terms of robotics [47], [49], [61], hardware assembly [45], [50], and solar panel experimentation [41]; Engineering (30%) delimited by industry safety operation [63], simulated field work [54], machinery assembly [43], [56], [90], and construction [11], [51]; and Mathematics (6.67%) for geometry [39], and 4D spaces [57]. The distribution of the papers’ topics by STEM field is summarized in Figure 5.
The authors highlighted several reasons for designing an IVR experience through high-end HMDs for their targeted STEM topic. They mentioned the trend of affordable devices extending the usage of immersive experiences in the classroom [26], [40], [54], [60]. Papers also point out that VR enhances visualization and understanding by allowing the visualization of three-dimensional objects and multi-dimensional information [49], [58] with a greater field of vision, such as the visualization of CFD data in an IVR environment [53]. Additionally, the authors supported that VR enables students to interact with and control their learning environments, providing hands-on and interactive experiences that are more engaging than traditional methods, such as virtual laboratories [55], [62]. VR also provides a controlled and safe environment for exploring and practicing different STEM concepts, including training in complex and dangerous situations, such as robot operations [47], which are unfeasible or expensive to simulate in real life [54], [90]. Several studies discuss how VR enhances embodied learning and cognition, allowing students to interact with learning material and improving understanding and retention physically. Research shows that VR engages and motivates students due to its interactive nature, leading to better learning outcomes [11], [59], [61]. Additionally, IVR allows students to learn at their own pace, make decisions about their learning path and methods, and engage with the material in a visually attractive way [43], [44], [45], [50], [56]. In response to public health concerns in recent years, VR provided an alternative for remote learning during the COVID-19 pandemic [48], [64].In the sense of the used evaluation metrics and how the authors assessed learning, we provide a category of metrics in Table 7. Half of the reviewed papers (50%) did not specify their assessment method or evaluated learning directly. Conversely, the authors indicated how these assessments were constructed and what knowledge participants could acquire through their designed IVR experiences. Additionally, some papers mentioned learning in their metrics even though the questionnaires and questions related more to the perceived learning experience rather than assessing learning outcomes [56], [57], [62]. Most authors assessed learning on the declarative (nine papers), followed by procedural (four papers) and strategic (four papers) learning categories. They employed self-designed questionnaires with multiple-choice and open-ended questions covering biology, robotics, computer assembly, and electronics. Furthermore, some authors focused on learning transfer by reflecting on how the IVR activity translated into real-world settings [55]. Fewer papers focused on schematic (two papers) learning, either using pre-existing surveys or considering evaluating all course content as a metric [46], [60].
Considering the effectiveness of the designed IVR experiences, the reported findings (as presented in Table 7) highlight several positive outcomes. The authors noted that participants found their IVR tools usable and accepted the instruction methods for STEM content learning [39], [40], [42], [43], [52], [53], [58]. The studies showed positive ratings and results for self-reported measurements such as self-efficacy, motivation, and engagement, which are closely related to learning effectiveness [11], [44], [55], [56], [62], [64], [90]. Comparisons between different groups and mediums revealed that female participants experienced a higher workload and lower presence in a Land surveying IVR experience [54]. At the same time, other studies found no significant gender differences in usability measures [43]. Novice participants reported higher difficulty levels with complex interactions and visualizations compared to experts [53], although experts with prior theoretical knowledge benefitted more in terms of assessment scores [47], [51], [57].
In comparisons between IVR-designed tools and other instruction mediums, authors found IVR could reduce anxiety and increase confidence compared to desktop counterparts [50], and it is associated with higher post-play content knowledge due to its interactivity and agentic nature [26]. IVR experiences resulted in significantly higher conceptual understanding at the end of courses compared to traditional lectures [20], successful learning transfer in training scenarios [56], and significant advantages in visual recognition for theoretical knowledge when compared to IVR serious games with desktop and webcam [45]. VR animations followed by prompts helped participants build connections between concepts and process content semantically [61]. Some authors reported a trend towards improved course grades and final exam scores, particularly among first-generation college students, when integrating IVR activities throughout the semester [46]. Additionally, including a pedagogical agent, especially a realistic one, led to lower factual knowledge acquisition than narration alone, but it aided the learning of conceptual information [48]. Overall, customized IVR experiences provided students with a practical and engaging way to interact with learning materials, significantly impacting learning outcomes and performance.
E. Reported Advantages/Disadvantages in Customized IVR Experiences (RQ4)
1) Learning Outcomes
The authors have different hypotheses and research questions that outline their expected outcomes from the designed and developed IVR lesson. Considering the used metrics and the proposed implementations, the advantages and disadvantages offered by the IVR interventions were compiled for each paper and then coded to be presented as a classification. In the sense of learning, authors provided different results of their assessment reflective advantages through different metrics and measurements as presented in Table 8 and reported disadvantages as listed in Table 9.
2) User Experience
As we delimited, the included papers should discuss the development of their own IVR lesson, adding to the expected activity and the learning enhancement. A software tool was designed so the authors could validate the prototype’s usability. As described in the discussed metrics (see Table 4), the authors validate aspects of human-computer interaction and the user perception of the developed IVR. Similarly, the advantages/disadvantages of learning these are also classified from the user experience perspective and summarized in Table 10 and Table 11, respectively.
Discussion
Designing and developing customized IVR can be considered a complex task, primarily due to using less common input systems such as HMDs and integrated motion controllers. Various tools and resources, such as 3D modeling software or popular game engines like Unity or Unreal, are available to leverage and facilitate the creation of those VR scenarios. Computer-based experience can be integrated as a part of the learning module to provide different perspectives and views of the learning content, specifically, in these cases, through immersive learning. Our systematic review arranges information on customized IVR experiences, relationships, and possible connections for future implemented immersive lessons. Considering the importance of agency and sense of presence for immersive learning in IVR experiences, the authors, through high-end HMDs, enhanced immersion and embodiment by incorporating features like voice commands, sensory feedback, hand gestures, VR animations, 360-degree visualization, and audio effects. Consequently, 89.29% of studies demonstrated high embodiment implementation (third or fourth degree), often categorized as fourth-degree due to multimodal feedback. This aligns with [22] statement about using immersive experiences for learning content that requires it, indicating congruence between the chosen STEM topics and the included affordances in the discussed papers. Moreover, by exploring learning content, designers should create content that exploits IVR’s main features [72]. Common features include interaction with virtual objects using hand-tracking or controllers, ambient environments (e.g., laboratories, factories, or museums) with free movement, and 3D visualization of complex models and data. Additionally, in-world menus and instructions, real-time feedback, audio narration, embedded assessments, collaborative environments, and control-based input interactions such as laser pointing, manipulation, and navigation.
Immersion, a predominant construct, was achieved through diverse 3D environments like museums, factories, and virtual classrooms, focusing more on actional and sensory immersion. However, social elements and networked environments were limited due to development complexities, reflecting a gap in “pedagogical features” promoting social interactions, as noted by [19]. Instructors and designers should consider how IVR can enhance the exploration and understanding of STEM content. Insights from various papers show that IVR effectively visualizes abstract science concepts [62], facilitates hands-on activities, and replicates complex or risky procedures [48]. Science topics are the most selected STEM topics for IVR, leveraging 3D perspectives to explore different abstract phenomena [14], [94]. IVR allows interaction with virtual artifacts and feedback, supporting constructivism learning principles [95]. However, as reported in previous reviews, only 50% of studies focused on learning outcomes, with others only prioritizing usability aspects of their designed tool [16]. Despite some limitations, IVR holds promise for specific STEM subjects, warranting further research. Evaluating examples and aligning IVR with learning theories can help determine its suitability for various educational purposes [26]. Other aspects not explored in the reviewed papers relate to the challenges and technical difficulties associated with implementing and using these high-end HMD devices in educational settings, so reports on these are relevant for practitioners to contextualize all involved factors when using VR.
Furthermore, the authors explored how customized IVR experiences enhance various learning outcomes such as collaboration, critical thinking, and mental model development. Studies like Hácha et al. [39] and Franzluebbers et al. [54] demonstrated the collaborative potential of IVR by enabling students to interact with 3D models and virtual equipment alongside peers and instructors. However, limitations like lack of voice chat affected engagement. Pirker et al. [62] highlighted IVR’s ability to foster critical thinking through immersive simulations of complex physics concepts, emphasizing the importance of interaction for enhancing learning transfer. Regarding mental models, Slezaka et al. [90] and Bagher et al [59] showed how interactive tasks in IVR can deepen understanding, with students achieving higher cognitive engagement and constructing more effective mental models in immersive environments.
As highlighted in Table 8, multiple studies reported higher learning outcomes with IVR tools compared to non-immersive mediums, demonstrating the effectiveness of their designs. For instance, M. Lui et al. [60] developed an IVR application for understanding a gene regulation system that included interactive assembly tasks, teleportation, and dynamic animations. Vogt et al. [61] designed an IVR experience featuring a robot assistant with audio narration, scene teleportation, and passive viewing. Similarly, Checa et al. [45] created an IVR tool for computer hardware learning that incorporated an assembly task, guided instruction, and a virtual instructor. Miller et al. [46] explored organic chemistry learning, integrating assembly tasks, guided instructions, and immersive ambient scenes. Pirker et al. [62] developed a laboratory-based electromagnetism simulation, which included parameter control, teleportation, and a networked environment. Wang et al. [50] focused on computer hardware learning, featuring assembly tasks and interactive components. Across these customized IVR experiences, including embodied interactions, such as assembly tasks requiring hand movements and 3D spatial recognition, proved critical. Engaging with hardware equipment, molecular structures, and other elements in the virtual environment fostered a stronger sense of agency and presence, resulting in significantly higher learning outcomes than non-immersive solutions like desktop applications and slideshow presentations.
This review highlights the main advantages and disadvantages of IVR experiences in user studies. Key advantages include an enhanced understanding of complex procedures, the development of mental models, and increased engagement and motivation, often leading to higher learning outcomes in immersive conditions. Disadvantages include the need for guidance for novice users, cognitive overload, and a possible need for multimodal feedback, such as haptic or audio elements [80], [96]. Usability benefits include intuitive control-based input interactions and realistic simulations, while challenges involve problems when reading text on VR, issues with notetaking, lack of enough guidance, and motion sickness [53]. A comprehensive analysis from learning and usability perspectives provides guidelines for future IVR designs, emphasizing adopting mixed methods for constructive feedback and enhanced virtual learning environments [45], [97].
A. Practical Implications
Based on methodologies, development pipelines, design features, and user studies of customized IVR experiences for STEM learning, we propose the following steps to consider when designing and developing these experiences using high-end HMD capabilities.
Step #1—Assessing the need for IVR: Before creating IVR experiences, analyzing why the lesson requires an immersive experience using HMD is essential. While customized IVR experiences offer unique advantages (see Section V-E), alternative solutions such as desktops or third-party tools should also be considered. The decision to use IVR should be justified by the student’s needs and potential benefits, such as simulating complex procedures, promoting mental models, or enhancing engagement, motivation, and sense of presence.
Step #2—Delimiting developer tools and expertise: Once the decision to use VR is made, appropriate resources must be selected. The choice of HMD should align with the lesson’s objectives, teaching methods, and classroom sizes, with devices like Meta Quest or HTC Vive as preferable for delivering high immersion levels. Other considerations, such as game engines (e.g., Unity or Unreal) and resources like 3D models, textures, and UI designs, are crucial for creating immersive environments. Depending on the lesson’s complexity, programming skills may be required, particularly for interactive or simulated STEM concepts. Collaborative work across disciplines, especially in software engineering, is recommended to design and develop adaptable IVR experiences. An example pipeline is presented by [53], who used a Unity game engine with several built-in and external packages and included CFD pre-computed data for the visualization and targeted to deploy on Meta Quest 2 HMD.
Step #3—Instructional design and expected embodiment degree: Planning students’ activities in the IVR environment is crucial, especially for self-instructed tasks. We suggest that our proposed conceptual framework (see Figure 1) can help to outline the different considerations for the customized IVR experience. To enhance embodied cognition, the embodiment framework ensures high sensory motor sensations, adequate immersion, and gestural congruency. Most reviewed examples show positive effects on user experiences and learning effectiveness, suggesting a focus on the third and fourth degrees of embodiment. However, including complex interactions, multi-modal feedback, or social immersion increases development complexity, so features should align with instructional goals and development scope. Providing clear instructions, tutorial scenes, and explanations of expected interactions is essential, especially for novice users.
Step #4—Implementing and using the IVR experience: The protocol for delivering IVR lessons is essential to this instruction. Designers and instructors should understand equipment configuration, student support during the intervention, space requirements (e.g., seated vs. standing experiences [20]), and enough guidance for device actions like controllers’ buttons, teleportation, or complex gestures.
Step #5—Assessments and experience validations: Customized IVR experiences can always be improved. Including assessments to validate lesson effectiveness, usability metrics, and user experience perceptions can help refine the tool. Assessments should be considered in any target learning type, such as declarative or procedural learning, to evaluate if embodied interactions aid students. Mixed methods can provide deeper insights into students’ experiences and performance. Application logs, such as points of attention, element interactions, and task completion times, are recommended to monitor the lesson’s length and feature relevance.
B. Research Gaps and Future Directions
The reviewed literature highlights several limitations and offers insights for improving customized IVR tools and their application in education. Authors have emphasized aspects such as immersion, sense of presence, engagement, cognitive load, and learning outcomes to validate and demonstrate the effectiveness of their designs. However, several areas remain open for future exploration:
Addressing the Reported Disadvantages: IVR has limitations, as with any educational tool. Future studies should address the drawbacks highlighted in Tables 9 and 11, such as challenges with note-taking in large VR-based lectures or difficulties in reading tasks and text formatting in immersive environments. Researchers should focus on redesigning these elements to mitigate their impact or conduct further studies to determine whether these issues are common across various IVR learning scenarios.
Increasing Sample Sizes: Many authors have cited small sample sizes as a limitation, which poses challenges for achieving statistically significant results. Future research should involve larger samples to validate findings and ensure broader applicability. For example, unlike the small sample size in [39], which only involved four students, more extensive studies in immersive classrooms could provide more robust evidence of IVR’s educational benefits.
Testing Customized Tools on a Variety of Learning Outcomes: Although many IVR tools have been designed to target specific learning outcomes, their features and visualizations may be adaptable to broader educational objectives. Future research could explore how these tools perform across different learning domains, whether for declarative knowledge, procedural skills, or conceptual understanding. For instance, Pirker et al. [62] applied their MaroonVR tool in multiple contexts, showing its potential to address varied learning needs.
Focusing on Learning as the Primary Objective: While many studies emphasize usability and user experience, future research should prioritize learning outcomes as the central goal of IVR design for STEM education. Evaluating knowledge acquisition, skill development, and the transfer of learning should take precedence, ensuring that IVR contributes to meaningful educational results. Many reviewed papers (50%) focus on usability without fully exploring how these tools impact learning. Addressing this gap is crucial for practitioners integrating IVR into STEM curricula.
C. Limitations
The review has some limitations. Exploring trends from 2016 to 2023, we encountered the exclusion of papers before 2018. We delimited strict criteria that may result in the exclusion of papers before 2018. However, recommendations suggest using individual year searches on databases to include more papers per specified year. Another limitation is exclusively selecting publications discussing their development and design of IVR experiences. Expecting authors to develop IVR applications was ambitious, leading to the elimination of publications evaluating IVR in learning due to using third-party solutions. However, these solutions may be justified as an option for designers/authors without programming experience.
Another criterion was using high-end HMDs in papers, excluding publications attempting an immersive environment with other HMD types like mobile alternatives (e.g., Google Cardboard or Samsung Gear VR), which were considered more accessible for educational settings. Regardless, in this review, we aimed to analyze VR experiences exploiting the potential of current VR technology (advanced devices). Notably, well-known VR conferences like IEEE VR and IEEE ISMAR were underrepresented, possibly due to keywords or conference topics not aligning with education. However, when we look at the IEEE VR 2024 call for papers (https://ieeevr.org/2024/contribute/papers/), there are no topics on teaching or learning. In this sense, the VR community might focus on advances in hardware and software innovations and perceptual studies rather than understanding how VR can be used for educational purposes. That scope differs from the more frequent journals (BJTE and JCAL) between the included articles that aim to examine the use of technologies to support learning, teaching, instructional design, and development and to demonstrate whether and how applications lead to improvements in formal and non-formal education at all levels.
As we are aware, the development of IVR experiences is a challenging task. However, the discussions around the development paths used allowed an understanding of the complexity of developing this type of application. The findings can serve as a starting point for new designers and developers. Finally, as a limitation in the sense of the results and conclusions presented, these should be carefully analyzed due to the existence of multiple papers in which the authors reported among their limitations that their findings come from a small sample. We have included and combined the findings with other proposed designs (with a larger sample size). Nevertheless, we observed patterns in both articles, so it is possible to generalize the results regarding adopting IVR lessons for STEM learning.
Conclusion
The rise of consumer VR devices (e.g., HTC Vive, Meta Quest) offers alternatives to embedded IVR technology in education, which is increasingly adopted as a classroom learning tool. In recent years, as reported by the reviewed papers from 2019-2023, IVR-designed environments have become a trend in leveraging immersive experiences for STEM learning concepts, particularly in science. High-end HMDs offer key affordances, enabling teaching topics traditionally conveyed through less immersive media like videos or slides—papers assessing learning outcomes predominantly focused on declarative and procedural learning approaches. The review highlighted advantages in learning and usability, enhancing learning outcomes, motivation, engagement, mental model building, and usable design, with intuitive interactions and congruence in included gestures. Experiences with higher embodiment levels (fourth and third degree) have shown several advantages in enhancing student learning performance, indicating that offering immersive interactions and HMD isolations can be an effective learning method. Regarding development, the authors primarily utilized the Unity game engine due to its capabilities and frameworks for developing IVR experiences, including graphical capabilities and hardware integration support. The authors discussed their development procedures and the software toolkits for implementing IVR tools. Although IVR development is not a straightforward process, we encourage researchers to explore the possibility of developing customized IVR experiences tailored to learning topics, considering learners’ needs and the possibilities VR offers. We suggest using the proposed conceptual frameworks to guide the creation of customized IVR experiences for future work. However, this is not limited to the dimensions considered; other aspects, such as interactivity, cognitive load, and specific learning objectives, should be considered for more refined designs.
ACKNOWLEDGMENT
Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect those of the National Science Foundation. The publication of this article was funded in part by the Purdue University Libraries Open Access Publishing Fund.