Introduction
Security concerns (also known as security issues) are risks whose occurrence would compromise the security of a software system in terms of confidentiality, integrity, and/or availability. Security concerns include software vulnerabilities, which are weaknesses in a software system (e.g., design flaws or implementation bugs) that attackers could exploit to harm the system's stakeholders, such as the owner or the users. The economic impact of security concerns can be dramatic. For example, the Heartbleed vulnerability, disclosed in the popular OpenSSL cryptography library, has cost at least 500 million USD to companies, according to eWEEK [4].
While the introduction of some security concerns in the source code is unavoidable (e.g., because a certain vulnerability is still un-known), others could be resolved before the delivery of software systems, or even developers could avoid their introduction into source code. Supporting tools are available to help developers detect (and possibly resolve) security concerns before the delivery of software systems. Among these supporting tools, we can distinguish Dynamic and Static Analysis Tools (DATs and SATs, respectively). DATs analyze software systems at run-time in order to detect insecure behaviors due to the presence of security concerns, while SATs identify security concerns by scanning the source/object code without executing it. As for SATs, there are three scenarios in which developers can leverage them to identify security concerns [25]: (i) while developers code, by highlighting the presence of security concerns directly in IDEs (Integrated Development Environments); (ii) within CI/CD (Continuous-Integration/Continuous-Delivery) pipelines, which could make a build fail if the code is not compliant with given security rules (e.g., it must not contain critical vulnerabilities); and (iii) during code reviews.
A recent report has shined a light on an attack vector that is often overlooked: the insecurity of web apps [24]. In other words, web apps are often developed without paying attention to secure coding practices. And, even worse, they are often not tested for security before being deployed (less than 50% are tested) [13]. Web apps are particularly subject to attacks that can exploit, for instance, flaws in the source code or web servers badly configured [21]. Among web apps, e-commerce ones ask for bank and personal information. As a consequence, if an attack on an e-commerce web app succeeds, the effect on the reputation and credibility of that web app could be dramatic, as well as the economic impact. Considering the number of attacks being on a surge, as well as the negative impact that successful attacks can have, it is high time to prioritize security while developing web apps. To do so, a possible way consists of adequately training the next generation of developers to secure the web apps they develop.
We studied (through a prospective empirical investigation) if bachelor students of a Software Technologies for the Web (STW) course were equipped to develop secure e-commerce web apps. Our study consisted of the following three steps:
Studying the behavior of bachelor students, enrolled in the STW course in the a.y. (academic year) 2021–22, towards software security when developing e-commerce web apps.
Defining a training plan for bachelor students involved in the next STW course (a.y. 2022–23) to let them improve their behavior towards software security.
Acting that plan and measuring the differences, if any, in bachelor students' behavior towards software security between the a.y. 2021–22 and 2022–23.
Steps (1) and (2) were part of our ICSE SEET idea paper published in 2023 [11]. One of the most important outcomes presented in this paper was that security concerns are widespread in the source code of web apps developed by students enrolled in the a.y. 2021–22. This was why, we planned and then carried out an intervention in the training of students enrolled in the a.y. 2022–23 to let them adequately deal with security concerns. One of the implemented actions consisted of asking these students (who were different from those enrolled in the a.y. 2021–22) to use a SAT, SonarCloud, to detect security concerns in their source code, but we did not force them to remove the security concerns the SAT detected in their source code. We planned to introduce SonarCloud in the development pipeline of students enrolled in the STW course because we believe that the use of a SAT like SonarCloud could improve students' engagement in producing secure software. That is, deploying a web app had to be similar to a game where the goal was to remove security concerns until all of them were no longer in the source code of that app. The results from the enactment of the planned teaching intervention are presented in this paper for the first time.
Statistical inference on the gathered data revealed that the number of security concerns in the web apps developed in the a.y. 2022–23 was significantly less than those developed in the a.y. 2021–22. The effect of this difference was large. We can conclude that we must train the next generation of developers to develop secure web apps and let them experience, in university courses, the use of tools to support the development of secure software since software security is nowadays of primary relevance, especially for web apps [24].
Paper Structure. In Section 2, we describe SonarCloud and then outline related work. We present the design of our study in Section 3, while we show the obtained results in Section 4. We discuss the main lessons learned, along with the study limitations in Section 5. Final remarks conclude the paper.
Background and Related Work
In this section, we first describe the SAT used in our work (i.e., SonarCloud) and present research related to ours.
2.1 Sonarcloud
SonarCloud is a cloud-based SAT designed to detect issues in sev-erallanguages, including those used in the STW course (i.e., Java, JavaScript, HTML, CSS, and XML) [17]. It allows developers to verify the compliance of their source code against a pre-defined set of rules. SonarCloud also provides detailed descriptions of the identified issues, as well as tips on how to resolve them.
When the analyzed source code violates a rule, SonarCloud generates an issue and classifies it according to three quality characteristics: reliability, maintainability, and securityin our paper, we are interested in the security quality characteristic only. SonarCloud uses two categories of security concerns-i.e., hotspots (security-sensitive code) and vulnerabilities (security concerns requiring immediate action)-to rate the security quality characteristic. A severity level is also assigned to any violated rule (i.e., to any kind of issue), including those associated with security concerns. From the least to most severe, the severity level of a security concern can be: minor, major, critical, or blocker. SonarCloud assigns one of these severity levels to a given security concern based on its impact (i.e., to what extent a kind of security concern causes harm to the stakeholders of the software system if exploited by an attacker) and likelihood (i.e., what is the probability that an attacker exploits that kind of security concerns). It is worth mentioning that the security rules by SonarCloud are based on well-established security standards like OWASP Top 10 [12] and CWE [7].
Developers can leverage SATs in three scenarios: (i) while coding in IDEs, (ii) within a CI/CD pipelines, and (iii) during code reviews. SonarCloud supports developers in the second scenario. To enable SonarCloud in CI/CD pipelines, it is required to install and configure a component called scanner. SonarCloud integrates with all leading CI/CD systems (e.g., GitHub Actions). Once Sonar-Cloud is configured into a CI/CD pipeline, the scanner analyzes the application's source code upon the occurrence of specified events (e.g., push or merge request) and the CI/CD system automatically sends the results to SonarCloud, where they are processed and made available in a dashboard. These results can be also used to control subsequent build actions such as automatic deployment.
SonarCloud is developed by SonarSource [16]. The same organization also develops SonarQube and SonarLint. SonarCloud and SonarQube share the same engine to detect issues; while the former is a software-as-a-service solution, the latter needs to be self-hosted and -managed. In contrast, SonarLint is easily pluggable into IDEs but does not detect all kinds of security concerns within the IDE on its own. We decided to use SonarCloud in our training intervention because it is the easiest choice for student teams that need to detect and resolve all kinds of security concerns in their source code. Moreover, due to its popularity [3], [9], SonarCloud is likely to be used by students in Computer Science (CS) in the future (i.e., once graduate).
2.2 Related Literature
There is a gap between the need for personnel skilled in software security and the availability of that personnel [8]. To respond to the shortage of personnel skilled in software security, an advocated solution is to incorporate security topics into university programming courses not specifically devoted to software security [26]. This is probably because many security concerns can be easily detected and do not require security experts to be fixed [6]. Unfortunately, CS students still graduate with scarce or no secure programming knowledge [1]. The above-mentioned solution should be applied to web programming courses [21]. Indeed, due to the shortage of personnel skilled in software security, it becomes necessary to provide adequate security training to the next generation of web developers: web apps, especially e-commerce ones, are critical because they are particularly exposed to attacks that can exploit flaws in the source code or web servers badly configured [21].
Some work [21], [22] experimented with the integration of security practices into web development pipelines. In particular, Zhu et al. [29] experimented with the use of an IDE, named ASIDE, in a web programming course. This tool provides instant security warnings together with descriptions of security issues. The results of a preliminary study with 20 students revealed that this kind of support could potentially help secure programming in the context of programming assignments. A similar approach has been proposed by Tabassum et al. [22], who adopted an Eclipse plug-in for Java, named ESIDE, to provide instant security warnings while coding. Two studies involving eight and 28 bachelor and master students, respectively, were conducted to compare the ESIDE approach with respect to person-to-person feedback on security concerns (referred to as a security clinic). The participants performed programming tasks, lasting between 15 to 20 minutes, and then filled in a survey. The results revealed that both approaches needed incentives to motivate students to adopt secure programming techniques. Unlike these studies [21], [22], to fill the shortage of software security skills in web-app development, we planned a training intervention based on a popular SAT, SonarCloud, which students are likely to use once graduated [3], [9].
Taeb and Chi [23] analyzed the gap in software security education and then proposed a framework based on three categories of hands-on labs. The first category introduced students to security concerns, the second category emphasized the importance of log files in security-concern detection, and the third category focused on secure coding practices. The proposed framework was then evaluated through a user study, based on questionnaires, involving 39 and 7 bachelor and master students, respectively. The results suggested an improvement in students' knowledge about software security. Although we share with Taeb and Chi [23] the goal of filling the gap in software security education, the intervention we devised is different.
Finally, Yilmaz and Ulusoy [28] studied security concerns in students' source code on two programming tasks over six semesters of a course non-devoted to software security, the Database Management Systems course. The study consisted of: (i) a survey to collect the security perception the students had of their own source code; (ii) an analysis of the security concerns the students introduced into their source code-to do so, the authors exploited SonarQube; and (iii) the use of lexical analysis to identify patterns in students' source code in order to understand the root causes behind the introduction of security concerns. The authors did not teach any security content to their students. From a didactic point of view, the results revealed that students do not consider security aspects for time reasons. The authors thus suggested providing a bonus when security issues are well resolved and considering the needed time to conduct this activity when planning the workload of the course. Unlike Yilmaz and Ulusoy [28], we carried out a training intervention based on a SAT and then evaluated its impact, if any, on the software security of e-commerce web apps developed by students.
Study Design
In this section, we describe the study design following the guidelines for experimentation in Software Engineering (SE) [5], [27]. The design is the same as we reported in our previous paper [11] with the following small change: we asked the students enrolled in a.y. 2022–23 to use SonarCloud, rather than SonarLint, because the latter does not detect all kinds of security concerns on its own (in particular, hotspots are not detected). SonarLint requires configuring a connection with SonarCloud to show the security concerns, including hotspots, that SonarCloud has detected previously. This is to say that SonarLint is more difficult to configure to show all kinds of security concerns in the IDE and this increased difficulty would have reduced its adoption among the students.
3.1 Goal
We formalized the goal of our study, according to the GQM (Goal-Question-Metric) template [2], as follows:
Analyze the use of a SAT for the purpose of evaluating its effect with respect to software security from the point of view of educators, researchers, and practitioners in the context of bachelor students in CS who develop (e-commerce) web apps.
On the basis of the above-mentioned goal, we formulated and then studied two Research Questions (RQs): RQ1 and RQ2.
RQ1. Are students equipped to manage the challenges associated with software security when developing web apps?
With this RQ, we aimed to study whether and to what extent bachelor students in CS take care of software security when developing (e-commerce) web apps. To that end, we detected security concerns in the web apps students developed. This is because, if these apps contain many security concerns, then students can be considered unprepared to manage software security. In [11], we negatively answered this RQ (i.e., students are not equipped to manage the challenges associated with software security) since we observed that security concerns are widespread in the source code of web apps developed by students enrolled in the STW course for the a.y. 2021–22. The answer to RQ1 represents the motivation behind our training plan based on a SAT, which we aim to assess with RQ2 (whose definition follows). The results from the enactment of our planned training intervention are presented in this paper for the first time.
This RQ aims to study whether and to what extent using a SAT like SonarCloud (possibly) improves the security of the web apps students developed in the STW course. If the number of security concerns is lower when using a SAT (as compared to not using it), we can speculate that its use improves the security of student-developed web apps. The use of a SAT thus represents the treatment to which students, enrolled in the STW course in the a.y. 2022–23, were subjected. We postulate that using a SAT like SonarCloud improves students' engagement in producing secure software because the use of such a tool could be perceived as a game where the goal is to remove security concerns from source code until all of them are no longer detected by the SAT.
RQ2. To what extent does using a SAT affect software security when developing Web apps?
3.2 Context
The context of our study is represented by bachelor students in CS taking the STW course in the a.y. 2021–22 and 2022–23. STW was scheduled in the second semester of the second year of the CS program. The students in the a.y. 2021–22 and 2022–23 had taken courses on structured (C) and object-oriented programming (Java), databases, computer architecture and operating systems; and they were attending courses on computer networks and algorithm design while taking the STW course. They had not yet taken a course on SE, and thus they could be unfamiliar with concepts like software testing, quality, and security. The students taking the STW course in the a.y. 2021–22 were different from those of the a.y. 2022–23.
The STW course alternated theoretical and practical lessons and covered the following topics: Git, HTTP/S; HTML and CSS; Java-based technologies for web development like Servlets, JavaBeans, and JSPs; model-view-controller pattern for web apps; authentication and access control; JavaScript and DHTML; XML and JSON; and AJAX. The reference IDE of the course was Eclipse. The exam of the STW course included a written test followed by the delivery and discussion of a project. To carry out the project, the students worked in teams, composed of one to four members (i.e., one-person teams were allowed). Furthermore, no time constraint was imposed-i.e., the students freely chose when to deliver and discuss their project among the available exam dates. The project consisted of the development of an e-commerce web app. Each team was free to choose the e-commerce web app to be developed; however, each web app had to implement at least the following functional requirements:
Allowing customers and administrators to log in and log out.
Allowing customers to sign up, navigate the catalogs of items (i.e., products or services), add items to the cart, buy items, and check their own orders.
Allowing administrators to manage the catalogs of items (e.g., modifying items) and customers' orders.
Further functional requirements were welcome, especially for larger teams. Also, each web app had to use the MVC pattern, interact with a database, and use the technologies presented in the course (e.g., Servlets, DHTML). Before implementing the web app, each team was asked to design that app and write a document including: (i) the web app description; (ii) market study of its competitors; (iii) functional requirements; (iv) database schema; (v) navigation schema; and (vi) page template, theme and color pallet.
We answered RQ1 by considering 45 projects delivered by 120 students who passed the STW exam in the a.y. 2021–22. As for RQ2, besides the projects delivered in the a.y. 2021–22, we took into account those delivered by the students who passed the exam in the a.y. 2022–23. As for the a.y. 2022–23, SonarCloud was used in 65 projects, for a total of 151 students involved. It is worth mentioning that the use of SonarCloud was not mandatory and therefore some students decided to not use it in their projects. This resulted in 11 projects (out of 76) where SonarCloud was not used-we discuss the reason behind their decision not to use SonarCloud in Section 4.2.
3.3 Intervention and Measurements
In the training intervention, the students (of the a.y. 2022–23) took part in five hours of lessons. This training consisted of:
Basic concepts of software security followed by an overview of the OWASP Top 10 [12];
Practical exercises on configuring and then using SonarCloud (within a GitHub Action workflow) to detect and resolve security concerns (i.e., vulnerabilities and hotspots)-we focused on the security concerns more common in the projects of the a.y. 2021–22.
The STW students enrolled in the a.y. 2022–23 were asked (but not forced) to use SonarCloud when developing the course mandatory project (i.e., an e-commerce web app). Independently from using or not a SAT in their project, the students could get the maximum in the final mark of STW. Since the students of the a.y. 2021–22 did not use any SAT while carrying out their project, we consider the projects of the a.y. 2021–22 as the baseline for comparison with those of the a.y. 2022–23. To estimate the software security of web apps developed in the a.y. 2021–22 and 2022–23, we counted the number of security concerns that SonarQube (ver. 9.4.0.54424) detected.
To study RQ2, we considered the following variables:
Independent Variable. We manipulated only one factor, namely SAT, which indicates whether, or not, the students received the training intervention and thus used SonarCloud to detect and resolve security concerns while developing their web apps. The SAT variable assumes two values: No and Yes. The No group comprised the STW students enrolled in the a.y. 2021–22, while the Yes group included those enrolled in the a.y. 2022–23.
Dependent Variable. To measure the software security of the developed web apps, we considered the following dependent variable: number of security concerns. For each project, this variable indicates the number of security concerns SonarQube detected in the last commit of the software repository of a given project. The higher the value of this variable, the lower the security of the web app.
3.4 Design and Experimental Procedure
We designed a prospective empirical study in which we gathered the number of security concerns in the web apps the students developed at the times T0 (before introducing our training intervention, a.y. 2021–22) and T1 (when the training intervention was introduced, a.y. 2022-23):
T0. The data gathered in the a.y. 2021–22 allowed us to conclude that students enrolled in the STW course do not address security concerns in their web apps, thus suggesting that they are not equipped to manage the challenges associated with software security when developing web apps. This is what we presented in [11]. We would like to point out that the shortage of security skills is not a phenomenon restricted to the STW course in our university, but a phenomenon highlighted in other universities as well [1], [6].
T1. The shortage identified at T0 can be filled by introducing an adequate training intervention (see Section 3.3). Lectures from universities different from ours could be encouraged to leverage the proposed training intervention if empirical evidence shows its positive effect on software security. We applied our training intervention at time T1. To do that and as mentioned before, we encouraged the students (a.y. 2022–23) to use SonarCloud without forcing them. To use this SAT, the students were prompted to configure Sonar-Cloud in the GitHub Action workflow. Our training intervention has a pedagogical goal, which can be summarized as follows: let the students familiarize themselves with both presented technology and challenges associated with software security. We emphasized to the students that the use of SonarCloud was not meant to be about zeroing in on all detected security concerns but rather approaching software security more consciously.
Regardless of the a.y., we informed the students that the gathered data would have been treated confidentially and shared anonymously for research purposes only.a
Results
In this section, we present the results regarding RQ2-those concerning RQ1 have been previously published in our idea paper [11]-and then the results of a post-questionnaire the students (a.y. 2022–23) filled in at the end of the course.
4.1 RQ2
In Table 1, we report some descriptive statistics-i.e., mean, Standard Deviation (SD), minimum (min), median, and maximum (max)-of the web apps the students developed in the a.y. 2021–22 (45 projects by 120 students without using any SAT) and 2022–23 (65 projects by 151 students using SonarCloud as a SAT). The values of these descriptive statistics allow having an idea of the size of the developed web apps in terms of KLOC and number of (Java, JSP, JavaScript, and CSS) files.
By looking at the descriptive statistic values of the KLOC metric, we can notice that the projects developed in the a.y. 2021–22 have, in general, a higher number of KLOC; indeed, the mean and median values of KLOC are 13.48 and 9.01, respectively, whereas they are 7.48 and 6.58 for the projects developed in the a.y. 2022–23. We postulate that this is a (welcome) side effect due to the use of a SAT since the minimum functional requirements to be implemented were the same for the a.y. 2021–22 and 2022–23. In particular, we believe that, since SonarCloud also informs developers about opportunities to clean source code (i.e., by providing information on the presence of code smells), the students performed refactorings on their code. To some extent, this provides more credit to our postulation that the use of a SAT improves the engagement of students to produce better source code, specifically more secure and cleaner code. On this latter point, there is preliminary empirical evidence suggesting that using a SAT helps students produce cleaner code [15].
The projects of the a.y. 2021–22 and 2022–23 contain a total of 1,882 and 1,336 security concerns, respectively. The projects of the a.y. 2022–23 are devoid of vulnerabilities, while those of the a.y. 2021–22 have two. In Table 2, we report some descriptive statistics on the number of vulnerabilities and hotspots detected by SonarQube in the web apps developed in the a.y. 2021–22 and 2022–23. Regarding hotspots, their number is greater in the projects of the a.y. 2021-22: indeed, the mean and median numbers of hotspots are 41. 78 and 38, respectively, for the a.y. 2021–22 whereas they are 20.55 and 11 for the a.y. 2022–23. The SD is similar (about 25) for both a.y. 2021–22 and 2022–23. The minimum number of hotspots is six in the a.y. 2021–22, while it is 0 in the a.y. 2022-23: this implies that there are projects (i.e., seven) devoid of hotspots (and thus security concerns) among those that used SonarCloud. We can also observe a higher maximum number of hotspots for the a.y. 2022–23 (128) as compared to the a.y. 2021–22 (103).
In Table 3, we show some descriptive statistics on the number of security concerns distinguished by category (vulnerability and hotspot) and severity (from blocker to minor). For any kind of category and severity, the mean number of security concerns, as well as the median number, is lower for the a.y. 2022–23. In addition, by considering the single means, the most spread security concerns in the a.y. 2021–22 are the same as those in the a.y. 2022-23; from the most to least spread, we find minor, major, critical, and blocker hotspots, followed by critical vulnerabilities.
The boxplot in Figure 1 shows the distributions of the security concerns (vulnerabilities and hotspots together). Overall, the boxplots indicate that the projects using SonarCloud have fewer security concerns (as also highlighted by the descriptive statistic values in Table 2 and Table 3). To prove that the observed reduction in the number of security concerns between the a.y. 2021–22 and 2022–23 is statistically significant, we tested the following null hypothesis: there is no statistically significant difference between the number of security concerns in the web apps the students of the STW course developed in the a.y. 2021–22 and 2022–23. Since the number of security concerns in both a.y. 2021–22 and 2022–23 is not normally distributed, we ran a non-parametric test, namely the two-sided Mann-Whitney U test (with a confidence level of 0.95), and obtained a p-value equal to 6.832e-07. Therefore, we can reject the defined null hypothesis and accept the alternative one that indicates the presence of a significant difference between the two groups being compared. As previously mentioned, this difference is in favor of the a.y. 2022–23. To estimate the magnitude of this difference, we relied on an effect size measure, namely the Cliff'sδ effect size, and found that it is large (|δ| = 0.56).1
To have a better understanding of the differences in terms of security between the web apps developed in the a.y. 2021–22 and a.y. 2022–23, we analyzed the violated security rules (i.e., kinds of security concerns) along with the number of security concerns introduced into the source code. In Table 4, we report the complete list of violated security rules along with the number of security concerns-note that we have more projects for the a.y. 2022–23 than the a.y. 2021–22 (65 vs. 45). The violated rules are grouped by category and severity. For a detailed description of these rules, see SonarSource's documentation [20].
As shown in Table 4, a total of 14 security rules were violated, two related to vulnerabilities and 12 to hotspots. The most detected kind of security concern is “Delivering code in production with debug features activated” for both a.y. 2021–22 (973 violations) and 2022–23 (641 violations). It accounts for 52% and 48% of all violations in the web apps developed in the a.y. 2021–22 and 2022–23, respectively. This hotspot has a minor severity level and indicates attacks related to sensitive data exposure due to security misconfiguration. Indeed, attackers could exploit this security concern to acquire information on the running system, app, and users. For example, Throwable.printStackTrace() prints a Throwable and its stack trace to System.Err; for web apps, this leads to displaying a client-side web page with all stack trace information. To follow, we find “formatting SQL queries” and “disabling resource integrity features”, which concern, respectively, the formatting of SQL strings that can lead to SQL injection and the absence of integrity checks on the external resources used (e.g., content delivery networks). The next ones are: “using slow regular expressions”, “hard-coded credentials”, and “authorizing an opened window to access back to the originating window”. These security concerns warn on, respectively, the use of regular expressions with nonlinear complexity that can be exploited to cause a denial-of-service attack, the extraction of sensitive information from the source or binary code, and the opening of untrusted external URLs that could allow phishing attacks.
To better understand the security concern resolution cycle in the projects of the a.y. 2022–23, we leveraged SonarCloud. In particular, we queried SonarCloud through its API to gather the security concerns, along with their resolution status, that arose across the commit histories of the projects of the a.y. 2022–23. For each violated security rule, Table 5 reports the number of security concerns based on their resolution status. Below, for the different resolution statuses reported in that table, we briefly describe their meaning according to SonarSource's documentation [18], [19]:
Unresolved
This status is automatically set by SonarCloud when a security concern is detected for the first time.
Fixed
This status is automatically set by SonarCloud when a sub- sequent analysis indicates that an existing security concern has been fixed or its file is no longer available (e.g., removed file).
Won't Fix
It is manually assigned to a vulnerability that, despite being recognized as a valid security concern, is consciously left in the source code.
False-Positive
It is manually assigned to a vulnerability that is not considered an actual security concern.
Safe
It is manually assigned to a hotspot that is recognized as posing no security threat.
We could not perform the aforementioned analysis with SonarQube because some resolution statuses (e.g., won't fix) had be to manually assigned by the students in SonarCloud while developing their web apps. Also, we would like to point out that there could be a few differences between the security concerns detected by SonarCloud and SonarQube. This is because, although they share the same engine to detect security concerns, the former is automatically updated being a cloud-based solution while the latter is not. This is to say that the analysis with SonarCloud (whose results are shown in Table 5) could show violated rules different from those returned by the analysis with SonarQube (see Table 4)
As shown in Table 5, the total number of security concerns detected by SonarCloud across the commit histories of the web apps developed in the a.y. 2022–23 is 1,725. Among these security concerns, 337 were fixed, 501 (20 + 5 + 476) were resolved without an actual fix in the source code, and 887 were left unresolved in the last commit. We can also notice that SonarCloud detected new violated rulesas mentioned before, SonarCloud is updated automatically, as opposed to SonarQube. In particular, there are new rules, among vulnerabilities, that focus on identification and authentication failures (i.e., “a secure password should be used when connecting to a database” and “credentials should not be hard-coded”), injections (i.e., “database queries should not be vulnerable to injection attacks” and “endpoints should not be vulnerable to reflected cross-site scripting (XSS) attacks”), and sensitive data exposure (i.e., “exceptions should not be thrown from servlet methods”). Also, there is a new blocker hotspot concerning specifically “hard-coded passwords”, while “disabling resource integrity features” was renamed to “using remote artifacts without integrity checks”.
To understand the motivations behind the resolution of security concerns without an actual fix in the source code (i.e., won't fix, false-positive, and safe), we looked at the comments that (possibly) the students left in SonarCloud and/or at their source code. The security concerns related to the management of credentials (i.e., “a secure password should be used when connecting to a database”, “credentials should not be hard-coded”, “hard-coded credentials”, and “hard-coded passwords”) are due to the use DataBase Management Systems (DBMSs) to store data. Due to the difficulty of managing secrets and the didactic nature of the project, most students left the credentials to establish a connection with the DBMS within the application source code; however, by looking at the source code of some web apps, we noticed that some students avoided these concerns by not pushing configuration files or by configuring environment variables at the web app startup. As for “using pseudorandom number generators (PRNGs)”, the generation of pseudorandom numbers occurred mainly in secure contexts. Students used regular expressions to verify that the input provided by users complied with a format suitable for the data being processed; in most cases, the use of regular expressions coincided with “using slow regular expressions”, and students found it impossible to simplify further the regular expression used. Many security concerns related to “formatting SQL queries” and “using remote artifacts without integrity checks” were deemed safe because, respectively, they involved the concatenation of strings in prepared statements for complex queries and references to trustworthy websites (e.g., Google).
Based on these results, we can answer RQ2 as follows:
Although we do not have evidence that the use of a SAT like SonarCloud changes the students' attitudes towards developing secure e-commerce web apps, we can however state that the use of that tool has a significant positive effect on the security and the practical effect on security is large. This could be not surprising but provides preliminary evidence to justify further study on the engagement of bachelor students in CS to produce secure web apps while using a SAT in their development pipeline.
4.2 Post-Questionnaire
We asked the students of the STW course enrolled in the a.y. 2022–23 to answer a post-questionnaire. The goal of this questionnaire was to know students' perspectives on the use of SonarCloud in the development pipeline of web apps. To that end, we invited the students who had passed the exam of the STW course in the a.y. 2022–23 to answer that post-questionnaire. The invitation was sent to both students who used SonarCloud and those who did not use it (see Section 3.2). We invited 169 students and received responses from 150 of them, resulting in an acceptance rate of 89%.
The questionnaire had three parts. The questions in the first part aimed to understand how much the students used SonarCloud's out-put to resolve vulnerabilities and hotspots, while we asked students' opinions on using SonarCloud in the second part. The questions in the first and second parts expected closed answers according to a four-point Likert scale. The third part of the questionnaire included an open question where the students could provide feedback on their learning experience.
As shown in Figure 2,15% (23) and 14% (21) of the students never resolved vulnerabilities and hotspots, respectively, SonarCloud detected. On the other hand, 44% (66) of the students stated that they always resolved vulnerabilities, 21% (31) often, and 20% (30) some-times. As for hotspots, 37% (55) of the students stated that they always resolved them, 23% (34) often, and 27% (40) sometimes.
Students' reasons not to use SonarCloud follow:
Three teams (out of 11) stated that had problems configuring SonarCloud, despite we provided full support to configure the development infrastructure during the STW course by making a tutor available twice a week;
Four teams reported that they did not have enough time to use SonarCloud (even if they configured the tool);
Four teams stated that they preferred not to focus their study on SonarCloud-in this respect, we would like to recall that the use of SonarCloud was not mandatory.
In Figure 3, we summarize the answers to the questions of the second part of the questionnaire. For example, 85% (57 + 71) and 82% (54 + 69) of respondents agreed or strongly agreed that SonarCloud helped them resolve vulnerabilities and hotspots, respectively. The percentages of respondents who found it easy to use SonarCloud are slightly lower, namely 81% (52 + 69) for vulnerabilities and 79% (55 + 63) for hotspots. Lastly, according to 83% of the respondents, SonarCloud improved their awareness with respect to software security (66 + 59) and should be used in the STW course (73 + 52).
The third part of the questionnaire concerned the following open question: ‘'Describe your experience (both positive and negative) related to using SonarCloud”. We performed open coding to identify themes, which we report in italics later on, in the answers given to that question. Most students (105) perceived SonarCloud as useful. Among them, 23 students explicitly mentioned the usefulness of SonarCloud in resolving security concerns. On the other hand, ten students considered SonarCloud useless, and six of them explicitly mentioned the uselessness of SonarCloud in resolving security concerns. Six students pointed out how using SonarCloud can show too many issues at a time (information overload), while two students recognized difficulties in understanding how to resolve some security concerns (difficulties in security-concern resolution). In addition, one student reported being troubled about false-positive security concerns, while another one complained about the long duration of analyses. Some students found SonarCloud difficult to use and difficult to configure (13 and nine, respectively) while others had opposite views, namely 11 students deemed SonarQube easy of use while one student deemed it easy to configure. As for SonarCloud documentation, ten students felt it was more than enough to address the discovered security concern (adequate documentation), whereas only one deemed it inadequate (inadequate documentation). One student stated that SonarCloud was a stimulus for autonomous learning on topics related to the STW course. Some students also left suggestions. One stated he/she would have preferred real-time detection of security concerns (instead of pushing the changes to GitHub and waiting for the analysis to finish). Others focused on aspects related to the STW course. In particular, one student felt that it should be decreased the effort to be devoted to SonarCloud within the STW course. In contrast, another would have preferred more training on SonarCloud.
Overall, the feedback we received from the students of the a.y. 2022–23 (through the post-questionnaire) indicates that our training intervention addresses to a large extent the request of the students of the a.y. 2021–22 to improve their knowledge of software security, i.e., students were not accustomed to developing secure web apps and manifested their will to fill this gap.
Discussion
In this section, we discuss the main lessons learned and the threats that might affect the validity of our study.
5.1 Overall Discussion and Implications
Below, we highlight the main lessons learned from our training intervention by using frames and then we discuss them.
Lesson Learned #1: The use of a SAT like SonarCloud improves students' engagement in producing secure software, which in turn helps students acquire security skills.
In [11], we found that security concerns are widespread in the source code of (e-commerce) web apps developed by students enrolled in the STW course for the a.y. 2021–22, and concluded that these students are not equipped to manage the challenges associated with software security. We postulated that using a SAT like SonarCloud improves students' engagement in producing secure software because the use of such a tool could be perceived as a game where the goal is to remove security concerns from source code until all of them are no longer detected by the SAT. Therefore, we introduced SonarCloud in the development pipeline of students enrolled in the STW course for the a.y. 2022–23. The obtained results suggest that the number of security concerns is significantly lower, with a large effect size, in the (e-commerce) web apps developed by students when SonarCloud is integrated into the development pipeline (a.y. 2022–23). This confirms our postulation that using a SAT like SonarCloud improves students' engagement in producing secure software. Such improved engagement, in turn, allows students to acquire security skills. Educators could be interested in introducing the use of a SAT like SonarCloud in university courses being conscious that its use makes a difference in training students to develop secure software. From the point of view of practitioners, software companies could be particularly interested in hiring these students since they have acquired security skills and are already familiar with a SAT, SonarCloud, used in the software industry [3], [9]. Researchers could be interested in studying if our results hold in different programming contexts (e.g., applications for smart devices) and with different kinds of developers (e.g., master students).
Lesson Learned #2: The use of a SAT like SonarCloud in a web programming course requires at least five hours of dedicated lessons, besides related tutoring activities.
Our training intervention consisted of five hours of lessons on software security, including practical exercises on configuring and then using SonarCloud to detect and resolve security concerns. In addition to this, we made available a tutor twice a week to solve problems related to the configuration and use of SonarCloud, and security-concern resolution. Nevertheless, the answers to the post-questionnaire revealed that the use of SonarQube was challenging for some students (e.g., they found it difficult to use). We postulate that, with a larger training intervention (i.e., more hours of lessons), we could reach even better results. Researchers could be interested in confirming this postulate by means of experimentation. On the other hand, the obtained results suggest educators that the allocation of only five hours of lessons in their web programming courses should be enough to let their students develop more secure code. From the point of view of the practitioner, software companies could be interested in planning five-hour training interventions, similar to ours, to help developers without security skills produce more secure code. However, we acknowledge that further research is needed on this matter (i.e., studies with developers) and our findings seem to justify it.
Lesson Learned #3: SATs integrated into CI/CD pipelines require extra effort.
In our study, we asked students enrolled in the a.y. 2022–23 to integrate SonarCloud into a CI/CD pipeline, namely GitHub Actions. Despite having lessons and tutoring activities on the configuration and use of SonarCloud, some students still experienced difficulties with that tool, according to the answers to the post-questionnaire. These difficulties might have been influenced by using Git, GitHub, and GitHub Actions for the first time. Indeed, students enrolled in the a.y. 2022–23 were asked to learn these technologies in addition to those related to web programming. We postulate that students might perceive the use of SATs plugged into IDEs (i.e., linters) as simpler. This postulation is also supported by the qualitative results (i.e., a student suggested using real-time detection of security concerns, as it happens with linters). Therefore, we foster researchers to experiment with the use of linters to detect and then resolve security concerns in the university context. However, as of today, some linters have limitations as compared to SATs integrated into CI/CD pipelines (e.g., SonarLint does not detect hotspots on its own).
5.2 Threats To Validity
To deal with the threats that might affect the validity of our study, we followed Wohlin et al.'s guidelines [27]. We discuss these threats with respect to internal, construct, conclusion, and external validity.
Threats to Internal Validity. They concern factors internal to a study that might affect its results. At both T0 and T1, we could not monitor the students while carrying out their projects since the development of these web apps took days of work and thus could not occur in class (threat of diffusion or imitation of treatments). However, we checked the delivered web apps against plagiarism, and no issue emerged; also, each team had to discuss its project during the exam with the course lecturers. We acknowledge that there might be a selection threat due to the natural variation of the involved students at T0 and T1. However, the students of the a.y. 2021–22 and 2022–23 had similar backgrounds.
Threats to Construct Validity. They concern the representation of the construct to be investigated. A threat of restricted generalizability across constructs might affect our results because the positive effect of using a SAT to improve software security might lead to side effects on unconsidered constructs. Although we did not disclose our research goal to the students (including the measurements based on SonarQube), they might try to guess it and adapt their behavior based on their guess (threat of hypotheses guessing).
Threats to Conclusion Validity. To mitigate a threat of random heterogeneity of participants, we compared two groups of students having similar backgrounds and taking the same course. The students at T1 received a training intervention to make them as more homogeneous as possible in terms of SonarCloud usage and behavior toward software security; however, a threat of reliability of treatment implementation might occur (e.g., some participants might follow tips to deal with security concerns more strictly than others). Finally, we mitigated a threat of violated assumptions of statistical tests by using proper tests (e.g., we employed non-parametric tests since the normality assumption did not hold).
Threats to External Validity. There could be a threat of interaction of selection and treatment when generalizing our findings to a population different from that being studied. In other words, our findings might not hold when considering other kinds of students or developers. The kind of project (i.e., e-commerce web app) might represent another threat to external validity: interaction of setting and treatment. However, e-commerce web apps are really wide-spread nowadays and their security is a crucial aspect [21]. Finally, we acknowledge that our results might not be generalized to a SAT different from SonarCloud. Our empirical evidence justifies future work on different kinds of web apps and SATs.
Conclusion
We published an idea paper at ICSE SEET 2023 to understand if Computer Science (CS) bachelor students, enrolled in a Software Technologies for the Web (STW) course, were equipped to manage security concerns in the (e-commerce) web apps they developed [11]. The gathered empirical evidence highlighted that students enrolled in the STW course in the a.y. (academic year) 2021–22 were not equipped to develop secure web apps although they devised the security of web apps as a relevant aspect. In [11], we also delineated a training plan to fill this gap while the results of the enactment of this plan and the gained teaching experience are presented in the present paper for the first time. Our training plan involved CS bachelor students enrolled in the STW course in the a.y. 2022–23 and is based on the use of SonarCloud, a Static Analysis Tool (SAT) that detects security concerns, in the development pipeline of web apps. Statistical inference on the gathered data revealed that the number of security concerns in the web apps developed in the a.y. 2022–23 was significantly less than those developed in the a.y. 2021–22 and this difference was large. We can conclude that lecturers must train students (i.e., the next generation of developers) to develop secure web apps and let them experience, in university courses, the use of SATs to support the development of secure software. Our results could be also relevant to software companies, which could be particularly interested in hiring students who have acquired some security skills and are familiar with a SAT, SonarCloud, used in the software industry.
DATA AVAILABILITY
The replication package, containing the raw data, is available on the web [10].