Loading web-font TeX/Math/Italic
Development and Validation of the Engineering Computational Thinking Diagnostic for Undergraduate Students | IEEE Journals & Magazine | IEEE Xplore

Development and Validation of the Engineering Computational Thinking Diagnostic for Undergraduate Students


Development and Validation of the Engineering Computational Thinking Diagnostic for Undergraduate Students. Continuous Improvement of the Diagnostic in the Figure.

Abstract:

Computational thinking is one barrier to enculturating as a professional engineer. We created the Engineering Computational Thinking Diagnostic (ECTD) as an instructional...Show More
Society Section: IEEE Education Society Section

Abstract:

Computational thinking is one barrier to enculturating as a professional engineer. We created the Engineering Computational Thinking Diagnostic (ECTD) as an instructional tool that can identify at-risk first-year engineering students. The purpose of this study is to provide construct validity, internal consistency reliability, item characteristics, and criterion validity evidence for this diagnostic. From fall 2020 to fall 2021, 469 students from three institutions in the United States took the diagnostic. The data from 152 students at one institution was used to provide evidence of predictive validity. Exploratory and confirmatory factor analyses resulted in 20 items loading onto one factor in a good model fit range, with the internal consistency reliability coefficient, Cronbach \alpha of 0.86. From item analyses based on classical test theory, the diagnostic items on average tended to be slightly easy but had sufficient discrimination power. The correlation matrix for criterion validity evidence indicated that the diagnostic functions well to differentiate students’ computational thinking ability by prior computer science course experience as well as by first-generation status. Predictive validity evidence from regression analyses revealed the statistically significant effect of students’ diagnostic scores assessed at the beginning of the first semester on predicting their end of semester course grades. The ECTD can have a broad impact because it provides a tool to gauge the entry-level skills of students, enabling early curriculum interventions to help retention and persistence to graduation. We make the case that the ECTD could contribute to the development of a more diverse workforce in engineering.
Society Section: IEEE Education Society Section
Development and Validation of the Engineering Computational Thinking Diagnostic for Undergraduate Students. Continuous Improvement of the Diagnostic in the Figure.
Published in: IEEE Access ( Volume: 11)
Page(s): 133099 - 133114
Date of Publication: 23 November 2023
Electronic ISSN: 2169-3536

Funding Agency:


SECTION I.

Introduction

The development of engineering workstations and personal computers resulted in engineering practice becoming rooted in computational modeling, system simulation, and data analysis. Design problems that were once analytically challenging using paper-based techniques are now more easily solved, verified, and extended by using computer-assisted design. As Moaveni noted while describing the characteristics of successful engineers, “engineers are adept at using computers in many different ways to model and analyze various practical problems” [1, p. 21]. The use of computational thinking to solve engineering problems is a natural extension of other skills acquired by engineering students such as mathematical reasoning, analytical thinking, evaluative judgment, and design creativity. Not all undergraduate engineering students are required to take coursework in computational thinking. However, in many fields of engineering, computational thinking is integrated throughout the curriculum and critical to retention between the first and second year as well as persistence to graduation [2], [3]. In these fields, students who fail to develop computational thinking will struggle to self-actualize as belonging to the community of engineers.

Enculturation into the engineering profession is a complex process for every person because of life experience, socioeconomic background, pre-university course availability, and entry-level preparedness. The first-year experience is known to be a foundational point where students begin to self-actualize, or enculture, as a member of the profession. Heywood provides a summary of decades of studies showing that the “reaction of the student to the first semester or year is crucial to subsequent performance” [4, pp. 449–455].

The research documented in this paper begins with an initial study at a large institution in the United States that discovered many factors affecting student success and enculturation during the first year of engineering education [5], [6], [7]. One key result from this study that extended the literature was that regardless of sex, the ability to use modern tools and practices, such as the computational thinking embedded in the first course, was challenging students, demotivating them, and making them question their choice of major. This institution is not unique in curriculum placement. Many students are introduced to computation, modeling, simulation, or computer programming in their coursework during the first year. Yet, the connection between professional enculturation and these computing courses is lacking in the literature.

Our research team is working to close the knowledge gap in how the placement of computing coursework impacts engineering student enculturation. We have developed and deployed an instrument for a quantitative research approach as well as semi-structured interviews as a qualitative research approach to help us build a model of computation thinking privilege and its impact on first-year retention [8], [9], [10], [11]. In this paper, we first situate the need for an instrument that measures computational thinking skills in first year students with a review of the literature. In this literature review, we report the scarcity of computational thinking assessment instruments for engineering at the college level and justify the need for a valid and reliable instrument. We then describe our instrument and the process taken to provide validity and reliability evidence. The instrument can have a broad impact on the community of engineering educators because it provides a tool to gauge the entry-level skills of engineering students, enabling early curriculum interventions to help retention. In addition, application of the instrument in a pre-post manner provides growth data that institutions can use for course evaluation and continuous curriculum improvement. Our hope is that our work advances the way computational thinking is taught to engineers at the university level and helps students persist in engineering.

SECTION II.

Psychometric Terminology

Since the purpose of the paper is to provide evidence of trustworthiness of the Engineering Computational Thinking Diagnostic (ECTD) in measuring computational thinking skills, we will examine psychometric characteristics of the ECTD such as validity and reliability as recommended practices in the engineering education research community [12], [13]. To form a common frame of reference for discussion, we begin by defining terms that are common in psychometric analysis.

Validity is the “process of constructing and evaluating arguments for and against the intended interpretation of the test scores and their relevance to the proposed use”, in our case, diagnosing computational thinking skills [14, p. 11].

Reliability is the “consistency of scores across replications of a testing procedure” [14, p. 33]. To demonstrate both validity and reliability evidence, we will provide definitions of the most relevant psychometric terms.

In psychology, a construct is an “attribute of people, assumed to be reflected in test performance” [15, p. 283]. Our construct in this work is computational thinking. To measure computational thinking, we have designed a 20-question instrument identified earlier in this paper as the ECTD.

Construct validity refers to how well test performance represents the attribute of people. It is inferred statistically from exploratory and confirmatory factor analyses, which demonstrates how well the questions match their given purpose. If the ECTD is shown to have good construct validity evidence, we can claim that we have developed an instrument that captures the computational thinking construct in students.

An instrument has criterion validity when it accurately forecasts future performance [16]. It is measured using correlation, multiple regression, ANOVAs, and t -tests between different indicators. If the ECTD has good predictive criterion validity evidence, it will be able to predict which engineering students will do well in future computational thinking related tasks.

Internal consistency reliability is a measure of whether different items that propose to measure the same construct produce similar scores [17]. It is measured using Cronbach’s \alpha coefficient. If different questions on the ECTD produce acceptable Cronbach’s \alpha scores close to 1.0, internal consistency reliability has been demonstrated. A coefficient over 0.70 is acceptable for good internal consistency reliability evidence.

Item difficulty is the percentage of respondents who answered an item correctly and ranges from 0.0 to 1.0. More difficult items score closer to zero. The discrimination index of an item is the ability to distinguish high and low scoring learners. The closer this value is to 1, the better the item distinguishes the learners who get high scores from those who get low scores [18].

SECTION III.

Research Questions

The research described in this paper is measurement and description of the psychometric characteristics of version gamma of the ECTD, such as evidence of construct validity, reliability, and criterion validity. The following research questions are addressed:

  1. To what extent does the construct validity of the ECTD hold for engineering students?

  2. What level of internal consistency reliability exists for the ECTD?

  3. What are the item characteristics, such as item difficulty and discrimination of the ECTD?

  4. To what extent does the criterion validity of the ECTD hold, differentiating engineering students’ computational thinking ability by sex, race/ethnicity, and socioeconomic background (SEB)?

SECTION IV.

Literature Review

The maturation of the computer industry resulted in widespread adoption of computers for data collection, data analysis, data modeling, design diagramming, component modeling, system simulation, and technical communication. As one of the greatest engineering achievements of the 20th century, the computer has enabled a digital transformation that changed the modern workplace as well as our day-to-day lives [19]. Throughout this maturation, the engineering profession adopted new computer-aided design and validation approaches. In fact, engineering accreditation organizations require universities to provide modern computer facilities and computational training appropriate for the field [3], [20]. Of course, multiple years of computing study are mandatory for computer science students [21].

Multiple authors link computation to the formation of professional engineers [22], [23], [24], [25], [26], [27], [28], [29], [30]. And computer programming is recognized as such a critical skill that multiple non-profit organizations provide training in computing to primary and secondary students. Example organizations include Hour of Code, Code.org, Project Lead the Way, Girls Who Code, and Black Girls Code [31], [32], [33], [34], [35]. Computational thinking and the ability to program a computer is critical for technological advancement.

The literature also reveals a disparity in the representation of a variety of social identity groups in the fields of computer science and engineering. For example, the Taulbee Survey describes inequities at doctoral granting institutions [36]. National data show the disparity more broadly [37], [38]. In a deep study at one institution, Margolis and Fisher found multiple factors that caused undergraduate women to be underrepresented at Carnegie Mellon University [39]. Both computer science and engineering programs systemically marginalize people from many racial and ethnic minority groups, first generation college attending students, and those from low socioeconomic status [40], [41], [42], [43], [44].

Computational thinking development is a learning objective in many engineering curricula. The engineering education community uses the terms computing, coding, programming, algorithmic thinking, and computational thinking somewhat indiscriminately. Computing skills are recognized in the Taxonomy of Engineering Education Research where “Computing skills (syn: Computing knowledge),” and “Computational thinking” are included in the student outcome category [45], [46]. Relationships between engineering and computational thinking have been found in the context of visualizing data, problem-solving, systems thinking, modeling, simulation, and design [47], [48], [49], [50], [51], [52]. However, the way engineers understand computational thinking in these works differs from the more nuanced frameworks in computational thinking from computer science. For example, a recent study analyzed students’ computational thinking practices in a first engineering course where computational thinking was defined as a problem-solving process and where learning outcomes were associated with Data Practices (DP) and Computational problem-solving Practices (CP) [52]. Their definitions, however, lacked detailed characteristics of computational thinking that could be observable and measurable in the form of direct outcomes for a diagnostic.

Thus, the challenge of defining computational thinking appropriate to an engineering context exists since most computational thinking frameworks suggested in the literature do not specifically address engineering or define computational thinking in a manner that can yield a precise diagnostic. For example, the International Society for Technology in Education (ISTE) and the Computer Science Teachers Association (CSTA) both have well designed and vetted frameworks, but these frameworks were designed for K-12 teaching [53], [54]. The Brennan and Resnick framework used for the MIT Media Lab’s Scratch language focused more on programming than computational thinking [55]. The Collaborative Process to Align Computing Education with Engineering Workforce Needs (CPACE) defined computing needs based on engineering workforce needs [56]. While their work is at the college level and centered on engineering, it combined information technology and computational thinking and is not focused on a broad view of developing computational thinking. While information technology and programming are powerful tools for engineering, computational thinking is broader [57].

The College Board, a non-profit organization headquartered in the United States, offers the Computer Science Principles (CSP) examination to measure computational thinking in students lucky enough to attend high schools with computing courses available, which is only half of the high schools in the United States [58]. The CSP has a well-designed computational thinking framework as well as a diagnostic examination; however, it was designed to represent general education computer science at the college level and is not specifically responsive to the needs of engineering.

In 2022, Lu et al. published a scoping review of computational thinking assessments in higher education [59]. Thirty-three instruments were reported assessing a multitude of dimensions, including attitudes, confidence, and skills. They also reported assessment via a plethora of approaches, including observations, interviews, and questionnaires among others. Among these instruments, the programming language Scratch is widely used as a tool for assessment. And, when a skill test was applied as part of assessment, results were reported only in fields outside engineering.

Other publications report efforts to assess computational thinking in higher education [60], [61], [62], [63]. While engineering students may have been included in the participants, they were not the focus.

Many publications on computational thinking have focused on K-12, an audience with different background and objectives [64], [65], [66], [67], [68], [69]. One written by Kukul and Karatas presents a validated instrument that measures K-12 self-efficacy in computational thinking [70].

Our work differs from and extends prior work found in the literature through its focus on the needs and abilities of undergraduate engineering students.

SECTION V.

Theortical Background

We developed a computational thinking framework that can measure computational thinking as an attribute of engineering students. This framework was built upon the earlier works and highlights characteristics recognized by expert professors in engineering education and computer science. These experts have taught computational thinking and programming to engineers for multiple years. Our framework for computational thinking is shown in Table 1. It incorporates five characteristics of computational thinking: (a) Abstraction, (b) Algorithmic Thinking and Programming, (c) Data Representation, Organization, and Analysis, (d) Decomposition, and (e) Impact of Computing. Key terms from the ECTD framework are defined below.

TABLE 1 Engineering Computational Thinking Diagnostic (ECTD) Framework Compared With the Literature (After Mendoza Diaz et al., 2020 [8])
Table 1- 
Engineering Computational Thinking Diagnostic (ECTD) Framework Compared With the Literature (After Mendoza Diaz et al., 2020 [8])

A. Abstraction

A new representation of a thing, a system, or a problem that reframes a problem by hiding details irrelevant to the question at hand.

B. Algorithmic Thinking and Programming

Developing systematic methods to solve problems and expressing these methods in an appropriate language.

C. Data Representation, Organization, and Analysis

Transforming raw data into information and knowledge.

D. Pattern Matching

Finding similarities between data or algorithms.

E. Automation

Plugging pieces into an algorithm to help with a result, sometimes involving programming.

F. Decomposition

Breaking a problem or system apart into smaller components that can be more easily and completely analyzed.

G. Impact of Computing

Considering both the potential harm and benefits to multiple groups when making computing choices and decisions.

Table 1 also delineates how the most recent ABET student outcomes, shown below, are aligned with the frameworks.

H. Abet Student Outcomes

ABET describes student outcomes as “what students will know and be able to do at the time of graduation” [3]. The outcomes noted in Table 1 are:

  1. an ability to identify, formulate, and solve complex engineering problems by applying principles of engineering, science, and mathematics.

  2. an ability to apply engineering design to produce solutions that meet specified needs with consideration of public health, safety, and welfare, as well as global, cultural, social, environmental, and economic factors.

  3. an ability to develop and conduct appropriate experimentation, analyze and interpret data, and use engineering judgment to draw conclusions.

  4. an ability to acquire and apply new knowledge as needed, using appropriate learning strategies.

The ECTD operationalizes all these aspects of computational thinking in 20 questions, each with five multiple choice options. Example questions from the diagnostic can be seen at the project website [71].

SECTION VI.

Method

A. Diagnostic Development

The initial version of the ECTD was created in 2017 because an NSF-funded project identified engineering students’ computational thinking as a needed outcome to be assessed to understand engineering students’ enculturation to the engineering profession [5], [6]. Since then, there have been several iterations to improve the ECTD based on the psychometric evaluation of items. Table 2 documents each version of the ECTD, the total number of diagnostic questions for each version, and the number of engineering students testing the ECTD for psychometric evaluation [10].

TABLE 2 ECTD Development Process
Table 2- 
ECTD Development Process

The pilot version was given to engineering students in Fall 2017 at a large public university (Institution A) in the United States. Each multiple-choice question had four distractors and one correct answer. The original factor categories were pattern matching, decomposition and solution, abstraction, and automation. These factors were a first attempt to capture the entry level of engineering students’ computational thinking skills. With this attempt it became clear that more formal research was necessary which motivated the development of an NSF proposal using this preliminary data.

NSF funded the refinement and expansion of the ECTD in 2019 with two additional U.S. institutions joining the research team: a medium sized public institution (Institution B) and a small private institution (Institution C). The research goals of the multi-institutional project include answering multiple research questions regarding computational thinking and how it impacts the enculturation of students into the engineering profession. Outputs of the project that have been documented previously include a computational thinking framework for engineering education, the multiple-choice questions that form the ECTD, and the continuous improvement process used to refine the diagnostic questions [8], [9], [10], [72], [73]. This continuous improvement process is diagrammed in Figure 1.

FIGURE 1. - Continuous improvement of the diagnostic.
FIGURE 1.

Continuous improvement of the diagnostic.

Question refinement and assessment is a continuous process for diagnostics. As Douglas et al. state, “Validity is never quite over. It is a goal we strive for, but given the nature of educational variables, the process of reevaluating the appropriateness of an instrument’s use is ongoing…evidence is collected both to inform future improvements to the instrument and to provide evidence of appropriate use” [12]. Each iteration of the Engineering Computational Thinking Diagnostic starts with a panel of experts reviewing the statistical assessment results from participant submissions. This review helps identify questions that potentially need adjustment or replacement. The set of questions chosen for refinement are reviewed for English language clarity, technical correctness, distractor effectiveness, sociodemographic bias, and difficulty.

Through this process of continuous improvement, the research team created the second version of the ECTD, called ECTD alpha, in the summer of 2019. The questions in the alpha version of the diagnostic were categorized into the five computational thinking factors as noted in Table 1. Two ECTD versions, 15 questions each on versions A and B, were created with the intention to be equivalent in terms of item difficulty and discrimination, so the diagnostic could be given as a pre-post assessment in a class without repeating the same items. Each version was designed to contain three questions of varying difficulty for each factor category: high, medium, and low difficulty. Five hundred and twenty-six (526) first-year engineering students at Institution A took ECTD alpha versions A and B in the fall of 2019 [8].

The analysis of the ECTD alpha did not produce the desired psychometric properties on item difficulty and discrimination. Several pairs of items were found to have negative correlation coefficients instead of the positive correlations that were expected. When the eigenvalues were analyzed using an inflection point in the scree plot, five eigenvalues were greater than 1.0 [74], [75]. This result could have indicated that the five factors used to design the diagnostic were present. Unfortunately, the experimentally determined factor loadings from both version A and B did not match the diagnostic design goal. When evaluating the five-factor model, version A had one factor supported by seven questions and the other four factors were supported by only one question. The five-factor model for version B similarly had one factor indicated by eight questions with only one or two questions supporting three other factors. These results fed the continuous improvement cycle of Figure 1 and caused the team to revise the ECTD again.

The research team created the third iteration of the diagnostic, called ECTD beta (versions A and B) by modifying questions from ECTD alpha that had undesired psychometric properties. This included the reconsideration of question content and phrasing, as well as the choice of distractors. As one might expect, data collection during 2020 was challenging due to the COVID-19 pandemic. Students were more reluctant to take the ECTD, possibly due to low motivation and pandemic-related fatigue. Through multiple recruitment attempts, the large public university was able to recruit 916 first-year engineering students to take ECTD beta. This ECTD beta had some better psychometric properties, including having positive correlation coefficients between all pairs of items. Unfortunately, only four eigenvalues were found to be greater than 1.0, indicating that the ECTD was measuring four factors instead of the five it was designed to measure. As with ECTD alpha, only one or two questions were loaded onto three factors, with a single factor having most of the questions loaded. These problems were found in both versions of ECTD beta. Since the psychometric properties of the diagnostic did not match the desired design goal, more modifications were necessary.

To improve the psychometric properties of the ECTD a subset of the items in the A and B versions of ECTD beta were combined into ECTD gamma, in the fall of 2020. This fourth iteration through the continuous improvement cycle of Figure 1 resulted in the version of the diagnostic documented in this paper. Four questions were selected for each factor based on measured psychometric properties. Hence, ECTD gamma has twenty questions. Questions were shortened and simplified to improve clarity and reduce wordiness. In addition, questions were re-examined to remove possible biases. Examples of corrected biases included the removal of gender pronouns, removal of a question that broached the sensitive topic of body-mass index, and removing questions that used contextual information that some students, such as international students or students with limited socioeconomic advantages, might not have. The preliminary analyses using exploratory factor analysis (EFA) revealed a one-factor structure of EFA but warranted the need for further validity analyses [10].

B. Participants

From fall 2020 to fall 2021, study participants were recruited through email at the three institutions previously described. Online survey invitations were sent out at the beginning of the fall semester. Students took the ECTD online with no time limit and with no supervision. Participants in this study were mainly first-year students. However, some second-, third-, or fourth-year students participated because they were enrolled in introductory courses. They appear as Second Year and Above in the table.

It was estimated to take less than 20 minutes to complete the ECTD. Recruitment response rates were lower than expected, likely due to COVID-19 fatigue. Table 3 shows the demographic characteristics of the participants who completed the ECTD at all three institutions. The average age (M ) of these 469 students was 19.02 years old with SD = 2.32.

TABLE 3 Demographic Characteristics of Fall 2020 and Spring 2021 Participants at All Three Institutions
Table 3- 
Demographic Characteristics of Fall 2020 and Spring 2021 Participants at All Three Institutions

The first-semester engineering foundation course grades of 152 first-year students from the Fall 2021 cohort at Institution A were collected to explore the predictive validity of ECTD on student performance. Institution A was used alone because differences between the introductory engineering sequence made direct comparison of grade impacts meaningless. We chose to use data from Institution A because it was the largest institution. Among the 152 students, 33 (21.7%) were female students, 118 (77.6%) were male students, and one (0.7%) student indicated “other/choose not to answer.” Regarding their race/ethnicity, there were 31 (20.4%) Hispanic, 1 (0.7%) American Indian or Alaska Native, 37 (24.3%) Asians, 4 (2.6%) Black, four (2.6%) Multiracial, and 74 (48.7%) White students.

The course letter grades at Institution A were converted to A = 4, B = 3, C = 2, and D, F, W = 1 points for numerical analyses. Here, grades of A, B, and C are passing grades, and D, F, W (DFW - poor, fail, withdraw and/or drop) are considered failing grades. In addition, the numbers of advanced mathematics courses and computer science courses taken pre-college by participants at Institution A, which ranged from 0 to 5, were also collected. The average number of prior math courses was 1.49 and the average number of prior computer science courses was 0.93.

C. Data Analysis

Student responses to the questions on the ECTD were coded in binary, 0 for incorrect and 1 for correct. This binary coding is naturally categorical and the distribution of responses for each item was skewed and did not follow a normal distribution. Therefore, robust weighted least squares (WLSMV) employed in Mplus 8.5 was utilized to obtain parameter estimates for factor analyses with categorical data [76], [77]. Fourteen students (2.9%) had one to three missing responses on the ECTD, and the missing responses seemed to be random, so they were handled using pairwise deletion.

First, using the data randomly split in half (n = 241) an EFA was conducted to identify underlying factor structure and irrelevant items that did not fit into any factors that exist on the scale. Eigenvalues and factor loadings after oblique rotation of GEOMIN, which is the default rotation of Mplus, were calculated to judge the number of factors and items for each factor. As we identified a factor structure and items for the ECTD from the EFA, we calculated the reliability coefficient of internal consistency; we used Cronbach’s \alpha , calculated from SPSS Statistics Version 27 to investigate how items are interrelated within each factor, subfactor, and the overall instrument [78].

Second, after identifying the factor structure and irrelevant items for the scale, we conducted a confirmatory factor analysis (CFA) using the other half of the data (n = 235) to confirm and refine the factor structure identified through the EFA. Based on the fit indices that Mplus provides, the Chi-square, root mean square error of approximation (RMSEA), comparative fit index (CFI), Tucker-Lewis index (TLI), and standardized root mean square residual (SRMR) were used to judge CFA model fits [77]. We attempted various confirmatory factor structure models with the results of the EFA to refine the model fits of the CFAs using modification indices until all goodness-of-fit indexes resided in the good-fit range. Note that modification indices specified areas of the model misfit that show items with a discrepancy between the data and the proposed model. We considered the model fit indexes in the good-fit range when RMSEA is close to 0.06 or below, CFI and TLI values are close to 0.95 or greater, and SRMR is close to 0.08 or below [77]. As we confirmed a factor structure and items for the ECTD, we calculated the reliability coefficient of internal consistency, Cronbach’s \alpha , calculated as before to investigate how items are interrelated within each factor, sub-factor, and the overall instrument [79].

Third, to address the third question on item characteristics, we applied classical test theory (CTT), a framework frequently used in measurement research. This simple measurement framework has been used in a variety of testing situations because of the simplicity of its theoretical model, weak theoretical assumptions, and the small sample size requirement for applying the framework in practice [80], [81], [82]. In the CTT-based framework, item difficulty is defined as the proportion of respondents who successfully answered a particular item. A higher item difficulty value therefore indicates the easiest item. Item discrimination is referred to as item-test correlation or typically defined as the point-biserial correlation (r_{\mathrm {pb}} ) between dichotomously scored items (i.e., correct or incorrect response coded as a dummy variable) and the raw total scores. Even though the correlation is dependent on item difficulty, high item correlation is desired because it indicates that high-ability respondents tend to get the item correct while low-ability respondents tend to get the item incorrect [83]. One of the largest drawbacks of applying the CTT framework is that these item statistics depend on the sample characteristics utilized for the analysis and the participants’ observed scores (e.g., the ECTD raw total scores) and are also determined by the selection of the items used for testing [84].

Fourth, for criterion validity evidence, we calculated a correlation matrix between the mean scores from the items loaded for the identified factors and variables of interests, such as sex (female vs. male), race/ethnicity, SEB in terms of first-generation status, and residency (i.e., domestic vs. international) [14]. Sex was categorized as female, male, and others, and race/ethnicity was categorized as members of groups that have been historically excluded from engineering (in this case Hispanic, American Indian or Alaska Native, African American/Black, and Hawaiian and Pacific Islander, Multiracial) and members of groups that have greater than proportional historical participation in engineering (in this case, Asian and White) due to small sample sizes of minority groups.

Finally, for predictive validity evidence, we ran a multiple regression model to explore the predictability of ECTD scores on student grades in introductory engineering courses using SPSS Statistics 27 [78]. Due to small sample sizes for Hispanic, American Indian or Alaska Native, and Black students, they were grouped together as underrepresented minority (URM) groups for the racial/ethnic category in the regression analysis. For the regression analysis, the assumptions for multiple regressions (e.g., linearity, independence of errors, and multicollinearity, in terms of tolerance and variance inflation factor) were checked before the analyses.

SECTION VII.

Results

A. Exploratory Factor Analysis

Tetrachoric correlation coefficients among the 20 items, which are binary categorical variables, revealed that the coefficients were positively correlated, and ranged from -0.067 to 0.799. Multicollinearity, indicated by a strong correlation over 0.85, was not observed between items, implying that most of the items do not measure the same aspect of engineering computational thinking ability. Five eigenvalues (9.41, 1.89, 1.28, 1.13, and 1.12) were over 1.0, but we extracted the number of factors underlying the data based on the point of inflection of the curve in the scree plot [75]. This yielded one factor considered for inclusion in a putative factor structure for the ECTD.

According to Stevens’ guideline about the relationship between sample size and cutoff factor loading, we considered items with a factor loading greater than 0.40 significant for the designated factor [85]. This cutoff usually functions to suppress any irrelevant items that do not fit well into the designated factor. This resulted in all 20 items that had statistically significant factor loadings onto the one factor, general computational thinking ability incorporating the five categories of (a) Abstraction, (b) Algorithmic Thinking and Programming, (c) Decomposition, (d) Data Representation, Organization, and Analysis, and (e) Impact of Computing, indicating each item’s unique contribution to the factor. Table 4 presents the factor loading and residual variance of 20 items from the exploratory factor analysis model.

TABLE 4 Exploratory Factor Analysis Results of the ECTD ( N=234 )
Table 4- 
Exploratory Factor Analysis Results of the ECTD (
$N=234$
)

The internal consistency reliability coefficient of the ECTD with 20 items from 226 participants was Cronbach’s \alpha = 0.862. All ECTD items were worthy of inclusion because removal of any items would not increase the score reliability for the ECTD as a whole [79].

B. Confirmatory Factor Analysis

A confirmatory factor analysis (CFA) was conducted to confirm and refine the factor structure for the 20-item ECTD using the other half of the data (n = 235). We evaluated the CFA model through three steps: (a) checking the consistency of multiple goodness-of-fit indexes and judging the fit of the model to the data; (b) examining localized areas of poor fit; and (c) inspecting parameter estimates, such as factor loadings, factor variances, and residual variances to ensure reliability on each item to the latent factor [77]. All items had loadings that met the minimum criteria of 0.40 and the 20 items from the ECTD yielded a good fit as shown in Table 5. While a CFA requires more constraints in relationships between items and factors than a model identified through an EFA, we found no modification indices above the minimum value of 4.0, along with fit indexes in a good-fit range: \chi ^{2} (170) = 251.9, p < 0.001 , RMSEA = 0.045 (90% confidence interval: 0.033-0.056), CFI = 0.965, TLI = 0.961, and SRMR = 0.096.

TABLE 5 Parameter Estimates of the CFA Model ( N = 235 )
Table 5- 
Parameter Estimates of the CFA Model (
$N = 235$
)

The internal consistency reliability coefficient of the ECTD with 20 items from 230 participants excluding missing responses was Cronbach’s \alpha = 0.866. All ECTD items were worthy of inclusion because removal of any items would not increase the score reliability for ECTD as a whole [79].

C. Item Analysis Based on the Classic Theory Test

Table 6 shows the item difficulty and item discrimination indexes of ECTD items based on classical test theory (CTT). Figure 2 shows two different trends in item discrimination and item difficulty in order of item number.

TABLE 6 Item Characteristics Based on Classical Test Theory ( N = 469 ). Items in Blue are the Maximum and Minimum Items in Difficulty
Table 6- 
Item Characteristics Based on Classical Test Theory (
$N = 469$
). Items in Blue are the Maximum and Minimum Items in Difficulty
FIGURE 2. - Item difficulty and item discrimination indexes based on classical test theory.
FIGURE 2.

Item difficulty and item discrimination indexes based on classical test theory.

1) Item Difficulty

Among the 20 ECTD items, Item 1 was the easiest question as 90% of the participants got the question correct and Item 16 was the hardest as only 36% of the participants got the question correct. The mean of the item difficulty was 0.69 (SD = 0.15), indicating that on average, items tended to be slig htly easier than the recommended mean of 0.50. We intended to design four items for each category from the easiest to the hardest and Figure 2 shows that the items in the Abstraction and Algorithmic Thinking categories were ordered as we intended. However, the items for the other categories were not ordered by item difficulty level. In particular, Item 12 was not the hardest item in the Decomposition category and all four items in the Impact of Computing category seemed to be relatively easy and show no over contrasts in the item difficulty level compared to other categories.

2) Item Discrimination

Among the 20 ECTD items, Item 14 was the item with the highest discrimination power (r_{\mathrm {pb}} = 0.67), while Item 17 was the item with the lowest discrimination power (r_{\mathrm {pb}} = 0.39). The mean of the item discrimination was 0.53 (SD = 0.09). Field [79] viewed items with a r_{\mathrm {pb}} of 0.30 or higher as having enough discrimination power to estimate respondents’ ability level. Therefore, all ECTD items seemed to have sufficient discrimination power.

D. Criterion Validity Evidence

To assess criterion validity, we used a subset of the data (n = 152) for which we obtained scores on students’ first-semester engineering foundation course grades in Spring 2022 at Institution A. Table 7 presents the correlation coefficients among students’ demographic backgrounds, the numbers of advanced mathematics courses and computer science courses taken during pre-college, ECTD scores taken at the beginning of the semester, and the course grades as an indicator of students’ engineering performance. In detail, the point-biserial correlation used for the correlation between one of the binary demographic variables and the continuous variables, such as the number of advanced mathematics or computer science courses taken during pre-college, and ECTD scores, is equivalent to the results from independent samples t -test statistics, comparing two group means of a continuous variable by the binary demographic variable.

TABLE 7 Correlation Matrix Between Engineering Students’ Demographic Background and the ECTD Scores ( N = 152 )
Table 7- 
Correlation Matrix Between Engineering Students’ Demographic Background and the ECTD Scores (
$N = 152$
)

For example, the non-significant correlation coefficients indicated that there were no sex-based differences in the numbers of advanced mathematics and computer science courses taken during pre-college, ECTD scores, and the first semester engineering foundation course grades. Even though its effect size is small, the negative correlation coefficient of -0.167 indicates that first-generation students scored on the ECTD lower than their counterparts and the difference was statistically significant. ECTD scores were statistically significantly correlated with the engineering foundation course grades with a correlation coefficient of 0.387, which is a moderate effect size [86].

As expected, the number of computer science courses taken during pre-college was significantly correlated with students’ ECTD scores assessed at the beginning of their first year and the first semester engineering foundation course grades achieved at the end of their first year. The statistically significant correlation coefficient of 0.313 between the number of computer science courses and the ECTD scores indicates that the more students took computer science courses, the higher they scored on the ECTD. Similarly, the statistically significant correlation coefficient of 0.222 between the number of computer science courses and the course grades indicates that the more students took computer science courses during their pre-college, the higher grades they achieved on the engineering foundation course at the end of their first semester.

1) Variations in Student ECTD Scores by Institutions

As a whole, the mean score (M ) of the ECTD of 469 students was 13.78 with SD = 4.64. There were 23 perfect scorers (4.9%). To explore any subgroup differences by institution, Table 8 shows the results from the statistical testing about mean differences in the ECTD scores by institution, differentiated by student level. The table reflects ECTD data collected in one academic year. Student level differs because some introductory courses are taken by students after their first year. Due to the small sample sizes from Institution C for students not in their first year, we only compared mean scores between Institutions A and B. One-way analysis of variance (ANOVA) indicated that there were no statistically significant differences in the ECTD scores of first-year engineering students among the three institutions. However, an independent samples t -test indicated a significant difference in the ECTD scores of second year and above engineering students between Institutions A and B. Institution A students had higher ECTD scores than Institution B students with a moderate effects size of Cohen’s d = 0.432.

TABLE 8 Differences in ECTD Mean Scores for Students at Three Institutions in One Academic Year
Table 8- 
Differences in ECTD Mean Scores for Students at Three Institutions in One Academic Year

When all student data were used regardless of student level, one-way ANOVA revealed that there was a significant difference in the ECTD scores. Gabriel’s post-hoc analysis, which is appropriate for unequal sample sizes, revealed a significant difference in the ECTD mean scores between Institutions A and B. Institution A students had a higher ECTD mean score than Institution B students.

E. Predictive Validity Evidence

Table 9 presents results from multiple regression modeling to explore predictability of ECTD scores on 152 engineering students’ performance at Institution A. When engineering students’ demographic backgrounds were controlled, the ECTD scores assessed at the beginning of the first year was a statistically significant predictor of students’ engineering foundation course grades at the end of the first year.

TABLE 9 Prediction of First-Year Engineering Students’ Engineering Foundation Course Performance
Table 9- 
Prediction of First-Year Engineering Students’ Engineering Foundation Course Performance

SECTION VIII.

Discussion and Conclusion

In this study, we provided validity and reliability evidence of the ECTD with 20 items. We discussed sources of the validity and reliability evidence of the ECTD based on the guidelines in Standards for Educational and Psychological Testing [14].

A. Validity Evidence on the ECTD Content

The validity evidence of the alignment between the content domain and the test content is of paramount importance in test development. In other words, tests are expected to appropriately sample “the domain set forward in curriculum standards” [14, p. 15]. Content validity evidence can also come from the judgment of experts in the field. Content validity also derives from the appropriateness of the intended use, or the inferences made by the test scores, for example, for a licensure test versus a placement test.

We consulted various sources of standards before proposing the items in the test. As our literature review portrays, our largest pre-college curriculum standards come from the College Board and ISTE organizations. These organizations reflect computing and computational thinking at multiple levels of education. We also relied on the expertise of investigators in the research team. Each member of this team is a seasoned computing instructor with decades of teaching experience. Members of our team serve on accreditation boards that ensure the quality of engineering and computing degree programs.

The last aspect of content validity, the one reflected on the appropriateness of the intended use, is manifested by our focus on engineering and computing courses. Participants of this study had been matriculated in a College of Engineering.

B. Validity Evidence on the Internal Structure of the ECTD

The results from factor analyses for construct validity showed that all diagnostic questions had statistically significant factor loadings onto one general computational thinking factor that incorporates the five categories of (a) Abstraction, (b) Algorithmic Thinking and Programming, (c) Decomposition, (d) Data Representation, Organization, and Analysis and (e) Impact of Computing. The 20 items were clustered in one factor establishing that such a factor is the desired computational thinking construct. While we anticipated the five factors based on the five categories from the engineering computational thinking framework, we recognized the difficulty of discriminating the five categories within the overall construct since these categories are closely intertwined.

The item analyses based on classical test theory showed that ECTD items on average tended to be slightly easy but seemed to have sufficient discrimination power. We ordered items from the easiest to the hardest for each category based on prior item analyses [10]. Among the five categories, the items for three categories were not well ordered by the item difficulty (see Table 6). In particular, items for the Impact of Computing category seemed to be consistently easy as more than 70% of students got the items correct. We developed these test items with the idea of measuring students’ ability to critically think on the impact of computing in society. These items were not planned to assess technical training. They were focused on ethical issues and students identified correct answers without encountering the level of technical difficulty inherent in the other test items. This type of social skill has been developing in students throughout their academic life. Therefore, there is room for improvement by revising items to have greater difficulty variability. Additional items with different item difficulty levels will enable the ECTD to better assess students with a wider variety of ability levels.

C. Validity Evidence of the ECTD on Relations to Other Variables

As shown in the correlation matrix in Table 7, when looking at only 152 first-year engineering students at Institution A, there was no significant difference between racial/ethnic groups. Similarly, there were no significant differences found by sex. This result corroborates previously published results from documented earlier use of the diagnostic [8]. We conclude that the data is an indication that more privileged students have chosen to participate in this study.

The results shown in Table 7 illustrate that first-generation students scored lower on the ECTD versus their counterparts. This result is consistent with the many studies demonstrating the challenges of first-generation students [87], [88], [89]. The moderate correlation between ECTD and course grades is consistent with another study under review that found strong correlation between course grades and normalized learning gains. The other positive correlations between the number of computer science courses and (a) ECTD scores, and (b) final course grades, highlight the privilege of being exposed to computational thinking before college.

Table 8 reveals a similarity in computational thinking skills by student levels. This might be an indication that the instrument may capture an ability that does not change simply because of exposure to college classes, even in engineering. Interestingly, first-generation students tended to have lower ECTD scores than their counterparts. However, considering the small sample size along with a potential sampling bias, further research at other institutions with large sample size is necessary.

D. Predictive Nature of the ECTD

The positive correlation of the ECTD scores with final grades provides the desired predictive criterion validity of the ECTD. This means that by virtue of correlating, this mechanism of prediction qualifies the instrument as a diagnostic tool. The ECTD therefore can be used to alter the university environment in a manner that counteracts privilege (e.g., separating entry and advanced level sections). In other words, the ECTD can be used to dictate interventions conducive to balance the effects of pre-college computer science classes, prevalent in disadvantaged groups, such as first-generation students.

E. Use of the ECTD

The availability of the ECTD provides opportunities for engineering programs to evaluate how the integration of computational thinking is impacting their programs. First, the ECTD can serve to diagnose entry level skills. This analysis could be used to provide additional resources to students with minimal background. It can also be used to appropriately place students into sections or groups based on prior learning. Second, the ECTD can be used as an evaluation tool to assess the effects of computational interventions. For example, the ECTD can be used by an instructor or program as a pre-test/ post-test to measure growth in computational thinking. Third, after diagnosing the status of engineering students’ computational thinking, programs can use this knowledge to apply additional interventions later in the curriculum where computational skills are used. Finally, the ECTD can be used to investigate the relationship between students’ computational thinking development and their engineering enculturation.

In addition, consider how the integration of computational thinking might impact program diversity and inclusion efforts. While the claim that computational skills are necessary for future engineers is not controversial, there are risks to including computational thinking early in engineering programs [8] [10] [11]. Computing is an area of great inequality, where students with greater privileges, who are disproportionately from dominant social groups, have substantially more opportunities to gain pre-university experience with computers and computational thinking [39] [41]. Prior experience with computing compounds other privileges retained by this group. It is therefore possible that integrating computational thinking into introductory engineering classes may exacerbate the substantial challenges that engineering programs have in attracting and retaining more diverse students.

Therefore, this instrument should not be used in situations where there is a risk of confounding ability with opportunity. For example, this instrument should not be used for college admission or admission to a major as it would advantage students who have benefitted from prior experience. Another example is improper use as a high-stakes test that brings critical consequences to test-takers based on scores.

The use of the ECTD in diversified populations may inform future research questions like:

  1. In what ways do students who are taught computational thinking in lower division classes learn compared to students who are taught in upper division classes? For example, do students who have already taken calculus learn computational thinking more effectively than those with lower levels of demonstrated mathematical achievement?

  2. In what ways can the ECTD provide insight into the learning of computational thinking in areas related to engineering (such as science, technology, and mathematics) and in different engineering majors? For example, in what ways do environmental engineers learn computational thinking compared to electrical engineers?

  3. In what ways can the results of the ECTD, when used as a pretest, inform instruction in computational thinking? Is it possible to use the ECTD to identify areas where groups of students may struggle in advance and make teaching decisions that lead to better student outcomes?

  4. In what ways could the ECTD provide information about how effective individual instructors are at developing computational thinking skills in students? In this way, the ECTD could even be used as one component in the evaluation of teaching. This analysis could lead to the identification of patterns of teaching that lead to success in computational thinking for students.

F. Significance of the Study

The ECTD will help institutions identify students with strong entry-level skills in computational thinking as well as students that require academic support. The ECTD will inform curriculum design by demonstrating which factors are more accessible to engineering students and which factors need more time and focus on the classroom. The long-term impact of the ECTD could be introductory engineering courses that better serve engineering students coming from diverse backgrounds. This can increase student self-efficacy, improve student retention, and improve student enculturation into the engineering profession in the end.

Through this work, we have provided the face/content, construct, criterion, and predictive validity evidence of a 20-question instrument that measures student skills in computational thinking. As addressed in its intended uses, the instrument can inform students, faculty, and programs in ways that can improve student enculturation into the engineering profession. Over five years and participation by 3,584 students, questions have been refined in a manner that accurately identifies the factor of computational thinking.

G. Study Limitations and Future Research

The limitations of the study include a high chance of sampling bias, considering the low response rates across all three institutions, possibly due to student fatigue and low motivation from the pandemic. Second, due to the small sample size, we only applied item analyses based on classical test theory. Considering the five multiple choices on each item, the application of item response theory using 3-parameter logistic (3PL) modeling might accurately identify the levels of item difficulty, item discrimination, and probability of guessing responses. Third, further investigation of required cognitive processing for each item would be beneficial to understand how specific categories of computational thinking solve the item and contribute to the determination of item difficulty and response style (e.g., random guessing), given an individual’s ability level.

Fourth, in the analysis for the criterion validity evidence of the ECTD using 152 first-year engineering students at Institution A, there is a potential learning effect from the first-year engineering foundation courses in the first year. The learning effect might be varied across students’ different backgrounds, but we were not able to identify any differences by social identities except for their first-generation status. This could be a result of low power due to the small sample size (e.g., racial/ethnic groups as URM students were grouped together in the analysis). To address this issue, much larger scale experiments will need to be run, especially at large institutions and institutions with greater diversity and at institutions outside the United States. These experiments will enable the evaluation of the ways in which computational thinking skill development is influenced by a student’s sex, gender, race/ethnicity, socioeconomic status, disability status, and other social identities. In addition, intersectional analyses considering multiple social identities simultaneously in the multiple regression analyses might reveal a hidden structure of disadvantages for certain groups regarding prior exposures to computational thinking experiences. Finally, while this study provided face/content validity, construct validity, criterion, and predictive validity evidence, there is a need to investigate other sources of validity evidence, such as concurrent, convergent, and discriminant validity evidence to support the proposed use of the ECTD and the interpretation of the ECTD scores based on the relationship with other variables for future research [11].

In summary, we have provided psychometric evidence that the ECTD can be used to measure computational thinking skills among engineering students.

ACKNOWLEDGMENT

Any opinions, findings, and conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References

References is not available for this document.