Journals & Magazines >IEEE Transactions on Learning... >Volume: 15 Issue: 1

Are You Still There? An Exploratory Case Study on Estimating Students’ LMS Online Time by Combining Log Files and Screen Recordings

Abstract:

The time students spend in a learning management system (LMS) is an important measurement in learning analytics (LA). One of the most common data sources is log files fro...Show More

Metadata

Abstract:

The time students spend in a learning management system (LMS) is an important measurement in learning analytics (LA). One of the most common data sources is log files from LMS, which do not directly reveal the online time, the duration of which needs to be estimated. As this measurement has a great impact on the results of statistical models in LA, its estimation is crucial. In the literature, there are many strategies for estimating the duration, which do not represent the actual online time of the students. In this article, we combine LMS log files of our students with parallel screen recordings and automatically analyze for how long the LMS is present in the video. We visualize the results and show that common online time estimation strategies do not represent the online time for our students accurately. By using modified online time estimation methods, we find estimations that fit the data of our students better on an individual basis.

Published in: IEEE Transactions on Learning Technologies ( Volume: 15, Issue: 1, 01 February 2022)

Page(s): 55 - 63

Date of Publication: 28 February 2022

ISSN Information:

DOI: 10.1109/TLT.2022.3154828

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

Log FILES from learning management system (LMS) are a frequently used data source in the field of learning analytics (LA) [1]. Several measurements of student activities are derived from these data, such as the number of clicks, logins, and time spent using the LMS (e.g., [2]–[4]). These data points form the features for further statistical analysis or predictions about the success of a student when attending a course and the potential need for support for passing the course, for example.

This makes the process of how we derive and define these measurements from the raw log data highly important, as further decision models are based on these data points. One of the most important and frequently used dimensions is the duration that students spend online in the LMS [3]–[6]. LMS log files usually do not directly define or track the duration of the time students spend in the system [4]. For this reason, it is common to use heuristics to summarize a series of clicks in a session and calculate the usage time, for example, based on the login and logout clicks [7].

Usually, web log files are based on clicks (HTTP requests), and therefore, stateless. It is not possible to clearly say if a user is really still using the LMS, or maybe left the browser or tab to do something else and continue using the LMS later on. This means that if we calculate the usage duration of a session based on the log files, we cannot say for sure if the LMS has been used for the entire time that we calculated based on a series of user clicks. It is difficult to define this construct because the data do not say directly what we want to know. We cannot keep track of off-task behavior from the LMS logs. This results in calculating a session duration that does not reflect the actual online time [7]. In data preprocessing for LA, there are several definitions to calculate the session duration in the literature [3], [4], [7]–[9]. However, it is essential to define and calculate this measure as accurately as possible, since session duration is a frequently used feature in learning outcome prediction models. Kovanovic et al. [4] showed that different strategies of estimating the time spent with the LMS can have a great effect on the produced statistical model and predictions.

We address this challenge by combining two data sources to conduct an empirical investigation of the duration of LMS sessions. We collect log files from the Moodle LMS and record the screen of the students’ tablets in parallel in the background [10]. We then use existing session duration definitions from the literature to calculate the session duration of our students’ Moodle logs and compare the results with the actual duration that the LMS was present in the students’ screen recordings. We collected data for four months, resulting in a dataset of more than 19 000 Moodle log entries and over 10 000 min of screen recordings. Our approach is based on computer vision and machine learning of Krieter and Breiter [11], [12] to automatically generate log files from our screen recordings and extract the Moodle usage time from the large amount of video data. In addition, we show that depending on the calculation of our LMS session duration, the number of sessions also changes considerably.

Our main contribution aims to investigate and enhance the preprocessing of LMS log data for LA. In this article, we present a methodological approach to estimate the online time duration based on two different data sources and discuss the implications and limitations of our methodology. Using counterexamples, we show that common definitions of online time estimation do not lead to precise results. Our goal is not to present a “perfect” estimation definition, but to make the community sensitive toward these assumptions.

By visualizing and comparing the results from the Moodle log files and from the screen recordings, we can show that the duration calculated, based on the Moodle log file sessions, differs significantly from the time users actually viewed the LMS on their screens.
Based on this, we try to find a session duration definition that provides more exact results for our dataset. We use different variations of common session definitions and test them against the video material to find an individual online time estimation definition for each of our students that best reflects their time spent in the LMS. From this perspective, we also take a look at how the number of sessions changes, taking the screen recordings of a user into account when we calculate the number of sessions based on our Moodle log files.

SECTION II.

Background and Related Work

In the first part of this section, we give an overview of research in LA that uses data sources that go beyond LMS log files or combine multiple sources of student data.

Previous research in the LA community has picked up on the struggle over how to define the online duration or time-on-task estimation in a way that works reliably and reflects the actual online time of students. Besides the work that specifically focuses on this important step in the data preparation process, we give some examples of common definitions for calculating the online time of students. Table I gives an overview of several session definitions from the literature that focus on different LMS log files. Most definitions refer to Moodle log files since we also use this LMS.

TABLE I Definitions for LMS Sessions and Online Time Estimation From the Literature

A. Data Sources in LA

Utilizing LMS log files makes it easy to follow the activities of students in the LMS unobtrusively with low effort. However, relying on LMS data as the only data source also limits the scope of analysis. Several research projects focus on utilizing additional data sources in digital learning environments [1], though the most important data source for LA remains log files from LMSs [13], [14]. Which student interactions in a digital learning environment are decisive for effective learning is not decided in general within the LA community [15]. There are example research projects on utilizing and testing different data sources outside the LMS to gain additional insights on students’ activities, varying from data from programming IDEs [16], screen recordings [12], questionnaires, interviews, web tracking software, open datasets, and virtual machines [14].

There are several examples of using data sources outside an LMS for LA in the context of learning programming. Blikstein [16] used a dataset of a three-week student assignment on programming using a programming environment that logs many users’ interactions, such as keystrokes, clicks, variables changes, and changes in the source code. He showed how these data can be used to find certain events in the process and suggested identifying situations in which students might need help.

Fernandes-Medina et al. [17] used compile messages as a data source and analyzed the work of students to report on the individual and comparative progress of learning. They used the results to inform students about their learning process. Another recent related approach to this was pursued by Öztürk et al. [18] by developing a web-based programming environment for novice students to collect data. By identifying metrics for student performance, they used these data to predict students at risk to drop out at an early course stage. Using screen recordings or screenshots as a data source for LA has been a subject in a previous research [12]; the authors presented a tool for LA that can generate log files from mobile screen recordings using computer vision and machine learning methods for optical character recognition (OCR) to find events based on the visual screen output.

B. Online Time Estimation in LA

Kovanovic et al. [4] presented a study focusing on the “black box of time-on-task estimation.” They stressed the problem that the time students spend on a task or in the LMS is a commonly used measure, but at the same time is not described in detail and often not accurate. To address this problem, they studied the effects of different time-on-task definitions on the results of a common prediction model. They showed that the results of the model change significantly, depending on the different time estimation methods. They encouraged further research and discussion on this problem. A study by Munk and Drlik [20] pointed in the same direction. They focused on the data preprocessing of log files in education and the difficulties when specifying a time window to define sessions in user logs and the calculation of the session duration.

There are several examples of different definitions of session duration estimation in the previous research. Zacharis [8] investigated how students at risk in blended learning courses can be predicted early by analyzing Moodle log data. For his model, he explored the predictive significance of 29 LMS usage variables. He defined the duration variable as a session of all clicks after the login of a student until logout. In the case of the user not actively logging out to close the session, he ended the session if 40 min of inactivity occurs.

A similar method was used by Conijn et al. [3] to define the estimated time students spend online in the LMS Moodle. They used the same 40-min threshold of inactivity to end a session. A session had to consist of at least two clicks, the duration of which was calculated based on the time between the first and the last click. They stated that raw log data do not provide concrete measurements and more insights are needed to explore how LMS data can be represented, as well as adding other data sources to add context to the log files.

Sael et al. [7] conducted a study on data preprocessing and using web usage mining methods on a dataset of Moodle log files. A session consisted of all clicks between a user's login and logout in their analysis. They indicated that domain-specific steps in the preprocessing of data for LA are still not explored sufficiently. They reflected on their method of estimating the duration students use the LMS and recognized that there were inconsistencies between the time spent online and the number of sessions done. From that, they derived that students were not following the LMS contents continuously, but switched to other activities while using the system.

We follow the suggestion of Kovanovic et al. to further investigate the methods to estimate the online time of students, which are used to process log data. To gain a deeper understanding of students’ LMS sessions, we add another data source (similar to Conijn et al.'s suggestion) to augment our LMS log files and put them into a different context. From the several session and online time definitions, we see that the decisive factor is the time-out variable that closes an opened session. Following this, we evaluate different time-out values of inactivity after the last click of a user (described in detail in the next section).

SECTION III.

Methods

Fig. 1 gives an overview of our research design and data sources: We gather data from an LMS and screen recordings. To make both data sources comparable in a quantitative way, we generate log files from the screen recordings and use the log records from both the sources to estimate the online time for our users on an individual basis.

Fig. 1.

Summary of research design, which involves the connection of two inherently very different data sources (screen recordings and LMS log files) in order to estimate students’ online time.

Show All

A. Data Collection

We collected our data in the context of two blended learning music classes in German Adult Learning Centers (ALC). The duration of the data collection was four months. The topic of these classes was learning to produce music on tablet devices. We used Moodle as a platform to support learning, as well as several applications for music creation. The setup of the classes was a mixture of formal and informal learning. The teacher and students had a weekly meeting with a combination of teaching and presenting their work and asking questions. Besides that, the students used their devices outside of the class to learn and solve tasks on their own.

1) Participants

Our course offer attracted nine participants at the ALCs, four females and five males. The ages ranged between 17 and 74 years. The maximum number of participants per course was eight. The recruitment was supported by the learning centers and ads in local newspapers. All participants received a Tablet (Samsung Galaxy Tab S3), a case with keyboard combination, mouse, and headphones for the duration of the course. As our intention is to show counter examples for online time estimation the sample number is sufficient.

2) Privacy and Legal Aspects

In order to apply our research design and methods of data collection, we precisely informed interested potential participants about the data collection and analyzing process in our research project. This was necessary to ensure that participants were able to understand all aspects of our research design, enabling them to make a reasonable, voluntary, and informed decision about their consent to be part of our study. Besides the necessary requirements by law (based on the GDPR), like informed consent of all participants, for example, our focus was to present our methods and research goals as accessible as possible. We did not expect potential participants to be experts in log file analysis or even familiar with technology at all. Additionally, recording the screen of a user all the time is a very invasive method of data collection that can make it hard to find participants for collecting field data in this way [23], [24]. Tang et al. [23] stressed that, in this context, building trust with the research team and informing participants in detail is important to convince potential participants. So, we informed the potential participants comprehensively in a separate nonbinding event before the start of the course. Besides a presentation, we explained our process in a compact but detailed document, additional to the usual required legal documents. None of the potential participants refused to participate because of privacy or legal considerations. Details of how participants perceived the way data were collected can be found in this article [25].

B. Dataset

1) Screen Recordings

We developed and installed an application that recorded the participants’ tablet screens permanently in the background and transferred the subsequent files to our server. From our screen recording application, we received a total of 1351 video files in MP4 format resulting in a total file size of 179 GB. The videos added up to 167 hours of screen capture, averaging around 1 h and 8 min per day. Our recording app tried to transfer the video material over the internet to our server, whenever the tablet was connected via the Wi-Fi and not in use, to prevent blocking the internet connection during phases of user activity. Our video quality setting used around 1 GB per hour. We reduced the video resolution to half (1024 × 768) of the display resolution (2048 × 1536) and recorded with a dynamic frame rate.

2) LMS Data

The LMS provided detailed log files that contained entries about the activities of the users within the system. Moodle exports log files in CSV format, containing nine data fields per entry. The most important fields are the timestamp and the description of the log entry like “The user with id ‘16’ viewed the course with id “3,” for example. The log files of our nine participants showed an average of 63 log entries per day. We saw a repeated pattern of weekly peaks right before and after the day of the course meetings. When we combined these data with daily screen recordings, we were able to follow a similar pattern of weekly activity peaks. In total, we collected 19 081 log entries; filtering all admin and teacher-related entries we end up with 11 503 log entries from our participants during the data collection phase.

C. Experimental Setup and Preprocessing

We want to compare the duration of sessions from two different data sources, which means that we have to bring the results from both sources into a comparable format.

1) Online Time Estimation Using Screen Recordings

The amount of collected video material was too large to analyze the recordings manually for the occurrence of the LMS and compare it to the log files afterward. Some recent research works focused on automated screen recording analysis [11], [24], [26]. Krieter and Breiter [11] presented an approach for automatically generating highly accurate log files from mobile screen recordings by using computer vision and machine learning techniques. They showed how their approach can be used to generate data for LA independent of the active applications used in the digital learning environment [12]. We use this [27] open-source implementation to detect all Moodle activity in the screen recordings and create log files containing log entries for each video frame. The computer vision and machine learning methods we apply are Tesseract for optical character recognition [28], OpenCV [29] for template matching and perception hashes [30] to find image similarities. Each time the LMS showed up on the user's screen, we summarized these consecutive video frames and saved the start and end of the LMS activity. The aim was to have a format of data that was easy to compare twith the results from the Moodle log files.

2) Online Time Estimation Using Moodle Log Files

The participants were able to deactivate the screen recordings in case they felt uncomfortable being recorded in certain situations. This meant that some data were lost to our analysis, but participants had control over their data. For our study, we just took those Moodle sessions that were also saved on video. For this reason, we just showed results of four of our participants, those that had a sufficient amount of data from both sources to give an example of our approach and methodological contribution. This results in comparing a total of 140 sessions which we investigate. We processed the Moodle log files multiple times using different thresholds to define our sessions and to estimate the duration. We split the raw log files into per-user log files. The actual content of the log messages is not important in this case. Similarly to the definitions for estimation from the literature, we use different thresholds for closing a session. This time-out value for the inactivity of a user is the most common way to create sessions and calculate the online time. We generate our sessions with a threshold starting at 0 up to 40 min, resulting in 41 different versions of session duration. A session must contain at least two entries. Similar to our video logs, a session consists of a timestamp for the start and endpoints.

3) Comparing the Online Time Estimations

To explore and directly compare the sessions from both data sources, we created dynamic timeline visualizations (see Fig. 2) using the Google Charts library [31]. By doing this, we can show how different the results in online time from both data sources are. We tested the 41 variants of the LMS sessions against the sessions from the video material. We estimated the online time for every variant and compared the duration to the one we got from the screen recordings. Based on this, we evaluated which session variant works best on a per user basis. The idea was to find the best threshold value (time-out) for the last action that is part of a session. The best value, in this case, meant that using this threshold resulted in a session duration that was the closest to the duration we got from the screen recordings of that participant.

Fig. 2.

Timeline visualization of an example from the dataset. The first line (blue) indicates the recording of the screen. The second line (red) marks the time slots in which we found Moodle activity in the screen recordings. (It can be seen that Moodle was active on the student's screen before the LMS created the first log entry, which is due to the user accessing the course site but not yet logging in.) The last line (orange) represents the corresponding session data from the Moodle LMS log files.

Show All

SECTION IV.

Results

A. Investigating Sessions in Timeline Visualizations

Several previous research studies in LA which used a session definition of online time estimation were aware of the inaccuracy of the estimation [3], [4], [7]. From our permanent screen recording, we can make direct comparisons between our Moodle logs and what happened at the same time on the screen of our participants. Fig. 2 shows a timeline visualization of our data sources. In general, we can see that the sessions we got from the LMS logs are clearly different from what we get from the screen recordings. The first line indicates the time spans of the recorded video material (see Fig. 2). The tablet as a mobile device switches off the screen after a period of inactivity or the user actively turns off the screen. In this example, we take a closer look at a timespan of roughly 25 min in the evening, starting around 6:35 p.m. and the screen is active most of the time. The second line shows when the LMS was visible on the screen, and the last timeline shows the corresponding session from our Moodle log files. The Moodle log file session, in this case, is specified by the individual threshold value that reflects the online time most accurately for this user (user 1, see the second part of the results section). In this case, this means that the threshold for the last action before we close a session is 30 min (see Fig. 3, user 1: a timeout of at least 30 min leads to the most accurate online time estimation). This results in an LMS Moodle session of 15 min. When we compare the Moodle log file session to the results of Moodle occurrences that we get in the screen recordings, we see that there are many overlaps, but some larger pauses in which Moodle was not on the screen. During the LMS Moodle session, Moodle was on the screen for only 4 min and 18 s.

Fig. 3.

Results from testing timeout thresholds from 0 to 40 min for the estimation of session duration based on the last action performed by a user. The blue dots indicate the length of the total online time of all sessions summed up (y-axis) based on a certain threshold (x-axis), while the orange line represents how long Moodle was actually visible on the screen of the user, based on the recordings. For example, for user 1, the LMS was present in the screen recordings for 168 min (orange line); if we use a threshold of 30 or higher, the results are closest to this value.

Show All

For this particular session, we manually explored the video material in addition to the automated analysis to add some context to the example. A special advantage of the combination of these two data sources is that we get exact information about off-task behavior. We see that Moodle was active on the participants’ screen before the LMS created the first log entry. This results from the user accessing the course website but not yet logging in. Instead, the user creates a bookmark for the LMS and rearranges the preconfigured bookmarks of the tablet's browser. The data of this session are from the beginning of the course. As Moodle can only recognize the user and add a log entry to his or her history when the user is logged in, we have no record of this in the Moodle log files in this case. The first log entries in Moodle are from several attempts of the user to log in, which fails because of trying wrong passwords. The next gap (in comparison to the LMS log session) in the screen recordings is caused by the user changing system preferences. The next few gaps result from switching to the application store several times and searching for specific music applications. The user seems to be unsure which one to install and scrolls through the Google Playstore suggestions for a while and then switches back to the LMS home page. This is followed by rearranging the applications’ icons on the Android start screen. The last action of the user is logging actively out of the LMS, which closes the session we get from the Moodle log files. The short occurrence of the LMS in the screen recordings after this results from returning to the Moodle website again, but without logging in again.

B. Testing Different Online Time Estimations

Only 10% of our sessions end with an active logout click by the user. This means we need an accurate estimation based on the last action in a user-session instead of using login and logout actions. As described in our method section, we use a simplified time-on-session estimation (compared to Kovanovic et al., for example, [4]) to calculate the total time spent using the LMS. We do not take into account specific single time-on-task estimation. The heuristic definition we use to measure the session duration of our Moodle log files is based on estimating the last action of a session based on a time threshold from zero up to 40 min. Depending on this definition of sessions, we compare the results of up to 140 sessions.

Fig. 3 shows the results on the heuristic session duration estimation in relation to the duration the LMS was visible on the screen of the users. The orange line indicates the total session duration (online time in the LMS) from our video recordings. The blue dots specify the total duration spent in the LMS (y-axis), estimated using a threshold t (x-axis). In general, we see some similarities between user two, three, and four.

For user 1, the total duration of LMS log sessions is closest to the value we get from the screen recordings (168 min) when the threshold for the last action is 30 or higher. We see that a general trend for this user with an ascending threshold is a longer total session duration, despite a drop around the t values 21 until 24. An example session of this user in visualized in Fig. 2. Although the threshold of 30 seems to result in an imprecise representation for the single example session, overall sessions this threshold leads to the best results for this user. Due to limited space, we cannot show visualizations for all users and sessions.

If we take a look at user 2, we get a different picture. We have a total occurrence of the LMS of 56 min in the video data. In this case, a threshold of two results in our estimate of online time (58 min) being closest to the value from the screen captures. The duration based on other thresholds varies between 52 min (t = 3) and 237 min (t = 35 or higher).

For the third user, we had to filter more data as we did not have all LMS log sessions in the screen recordings. This results in 24 min of Moodle usage in the screen captures. The closest value to this from our threshold test is for a threshold of 1 min, which results in an online time estimation of 38 min based on the Moodle log files. High values for t result in a maximum of 215 min of online time (t = 26 and 27).

The diagram for user 4 shows an analogous picture, but even the closest total session duration estimation is quite far from the value we get from the screen recordings. In this case, we have 28 min of Moodle online time in the screen recordings. The best threshold value in our test is 1 min. Using this threshold to calculate the online time results in 56 min. The other values also differ strongly from the duration of Moodle occurrence in the screen recordings.

C. Number of Sessions

The focus of this article is on the duration that students spend online in an LMS. But of course, as we test different session definitions, the number of sessions also changes, not just the duration. Because this number is an important measurement as well, we give a short summary of how the number of sessions changes in connection with changing the time-out threshold for inactivity.

Table II shows how many sessions there are per user, depending on different values for the time-out of a session in minutes. We chose the values based on common values used for Moodle log files in the literature (40, 30, 15 min; see Table I). We also added the timeouts which resulted in the closest results compared to the screen recordings, which was 2 min for user 3 and user 4, and 2 min for user 2. The number of sessions indicates that based on different timeout thresholds, the number of sessions varies. We can see that in general, we get fewer sessions if we increase the time-out, for all users. This could possibly reduce the influence on statistical models, as the values change in a similar ratio for all users.

TABLE II Number of Sessions Per User Based on Different Timeout Thresholds for Session Estimation Using the Moodle Log Files

SECTION V.

Discussion

By comparing the online sessions we get from our Moodle log files to the Moodle activity we get from the screen recordings, we can tell that the actual use of the LMS is very different from what one could expect from just analyzing the LMS log files. To best of our knowledge, there are no previous research data that can provide the same details on how long and when students actually use the LMS and compare that with parallel LMS log files. By visualizing the data from both sources in parallel timeline graphs, we can see that a lot of off-task activity is happening while the user is in an LMS usage session. By adding screen recordings to the data collection and generating log files from the video material, we get a new perspective on the log files. In a way, we can look over the student's shoulder and get an answer to the question, “are you really still using the LMS?” However, just because something is on the user's screen, we still cannot infer for sure, that the user is actively following or reading the content on the screen. Through the combination of “classic” log files and screen recordings or log files from screen recordings, we can put our LMS log files in a different context [32]. As we did with the description of our example visualization, we can add many insights that we cannot infer from Moodle log files. Further research in this direction could automatically analyze how long viewing the content, or an action in Moodle took. Besides that, we could track off-task behavior or on-task behavior that is happening outside of the LMS. In our case, this could be a student reading a task to download a certain music application and use it for an assignment. Using our approach, we could connect these activities and get a different perspective on learning in environments that contain multiple applications.

In previous research in LA, a threshold of 30 or 40 min was common for estimating the last action of a session and calculating the online time of a student (e.g., [3]). In our case, a value of 30 leads to quite exact results for one of our participants (user 1), but not for our three other example cases. If we compare the results of testing thresholds from zero to 40 min, we see that the resulting estimated online time varies by more than 2 h. As Kovanovic et al. [4] showed, the measurement of how long students stay with a task or the LMS significantly influences the results from statistical models. Therefore, the choice of how to estimate online time is highly important. But from just analyzing our Moodle log files, it is not possible to find an exact session duration estimation. For our dataset, there is no clear “winner” threshold that can represent the actual LMS online time in an accurate way. The combination with screen recordings as an additional data source provides context to the traces from the LMS logs and makes it possible to find more accurate thresholds.

The presented sample and analysis of 140 sessions from four participants points in the direction that the “right” formula for session time estimation is highly individual. In our data, we see very different thresholds that suit the actual time spent, when compared with the screen recordings. Our contribution is to show that existing online time estimations are not accurate, i.e., they over- or underestimate the time. Given the existing definitions, we intend to falsify [33] common assumptions. For this approach, one negative example would be enough, but four are better. Furthermore, we present a new methodological approach (combining log and screen capture data) on how we can to further investigate this. This ought to help a differentiated discussion and deliver first ideas for a novel approach. We are not trying to establish and prove a new general “perfect” session definition that would undoubtedly require many more participants and data (if even possible). However, for our case, we think a small sample of four participants or 140 sessions is sufficient to prove our point. We show that using several different definitions from the literature, we get very different and inaccurate results and that it seems rather random if they match with the actual online time estimations we get from the screen recordings. To make this transparent, we make our data and the source code of our approach available to the community.

We strongly encourage further research on how online time estimation of students can be tracked more accurately. We presented a methodological approach to determine online time accurately and individually. But the effort to collect screen recording data is significantly higher from a technical perspective, even if we automate the analysis and generate log files from the recordings. From the participants’ perspective, recording the screen permanently in the background is privacy-invasive and this can make it hard to find participants willing to join a research study [23], [24], [34]. We see a dilemma here from a research ethical point of view. On the one hand, it deeply affects the privacy of the students to record the screen in the background. On the other hand, it is problematic if predictions are made about the success or failure of students with inaccurately calculated estimations from LMS log files (e.g., [4]).

A. Limitations

We use tablet computers for our data collection. Using other devices like desktop computers, for example, might result in different findings. From the presented study, we cannot infer an “ideal” definition of how to estimate the online time of students in an LMS. In addition, screen recordings provide a very detailed view of what is happening on the screen, but we still cannot be sure whether the user is really present and “online.” Additionally, we focused on the threshold variable that decides about the last action of a session. Although this is considered to be the most important factor in estimating online time (see related work section), further research is needed to explore other factors as well (the influence of the number of sessions, for example). Another factor limiting the approach presented in this case study is scalability: Screen recordings are not easy to analyze and handle (compared to LMS log files), and on the other hand, student privacy is severely compromised, making large-scale deployment problematic.

SECTION VI.

Conclusion

In this article, we presented an approach to estimate the time students spend in an LMS based on linking LMS logs with parallel log files generated from screen recordings. By this, we can make the actual on- and off-task behavior of students visible. We explored example sessions and visualized the data from both sources in timeline diagrams that indicate great differences between estimation based on Moodle logs and based on screen recordings. We used a common online time estimation strategy from the literature and test different variations in comparison with the results from the screen recordings to find a definition that is most accurate for the individual student. We showed that the threshold of minutes of inactivity used to determine the end of a session is critical for calculating the online time and that there are large deviations. Our findings are in line with the results from previous research on online and on-task time estimations [3], [4], [7]. We showed that the usage sessions we infer from the Moodle log files do not reflect the actual usage time and characteristics that we can observe in the screen recordings. We suggested gathering data in different ways beyond the LMS to overcome the state of having to accept blurry data on online time or time-on-task estimation.

A. Future Work

For future work, it would be helpful to make the collection of data on off- and on-task behavior more feasible from a technical perspective for the research team and from a privacy perspective of a participant's point of view. Advanced web tracking techniques added through a plugin for Moodle, for example, could help improve online time estimation in practice. Furthermore, by augmenting the LMS log files with the log files from the screen recordings, we can study student behavior in a new and very detailed way and beyond the data points which an LMS can provide. An analysis based on the content on the screen is challenging but highly promising in terms of getting a greater picture of what is happening in a digital learning environment.

ACKNOWLEDGMENT

The author would like to thank all student participants for their time, effort, and involvement in the study.

References is not available for this document.

Are You Still There? An Exploratory Case Study on Estimating Students’ LMS Online Time by Combining Log Files and Screen Recordings

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Introduction