Introduction
The COVID-19 pandemic hPas forced many organizations to change their way of working with employees, therefore, working from their own homes has become a norm. As estimated recently, there might be close to 40% people working in the EU who started working remotely full-time due to the pandemic [1]. Furthermore, many companies have decided to switch to long-term remote work. In May 2020, Twitter’s CEO informed their staff that they could work from home forever.1 Coinbase has become a “remote-first” company, allowing most staff to do so indefinitely. Other prominent examples are Spotify, which will let all employees work from home permanently,2 and HubSpot, that allows certain employees to work from home two days a week.3
Recent studies in the context of COVID-19 agree that significant changes to the workplace or way of working will occur in post-pandemic times [2], [3], [4]. Since software development is a collaborative effort, it is essential to investigate the impact of remote work settings on the different activities of the software development processes. The impact of COVID-19 on individual factors, such as productivity, well-being, and work-life balance, have been explored in software engineering (SE) literature [5], [6], [7], [8], [9], [10]. Overall, these studies had some conflicting results: while some studies revealed that this change has negatively impacted developers’ productivity and well-being [11], [12], others show no impact [13], [14] or even an increase on developers’ productivity [15]. Furthermore, the impact of COVID-19 on different software development activities, such as requirements engineering, coding, and testing, might be diverse due to the different needs of physical communication. Hence, the first objective of this paper is to explore how professionals perceive the impact of COVID-19 on different software engineering activities. The activities were selected based on the survey proposed by [16] to study software engineering activities in software startups. As part of this goal, we aim to highlight, from their perspective, which activities have been mostly impacted, both positively and negatively.
Moreover, development teams from software startups might face particular challenges in a remote work setting. These companies lack a history of established practices and face several challenges, such as time and resource constraints, and they aim to develop innovative products or services [17]. Although, as mentioned earlier, while several papers focused on the impact of the COVID-19 pandemic on specific software engineering activities, to the best of our knowledge, no study has focused on software startups. Given the particular characteristics of these kinds of companies, such as their relationship with the customers (which might not be well-established or well-defined), small team size, lack of resources, and immature adoption of practices [17], [18], [19], changes induced by COVID-19 could have unique implications. Therefore, the second goal of this paper is to compare the impact of the COVID-19 pandemic on software startups to established companies, regarding different software development activities.
To deeply investigate the impact of COVID-19 on different software engineering activities, we established two additional goals for our research study. The third goal is related to the correlations among the effects in different activities, searching for the ones in which a positive or negative impact on one is followed by a similar effect on the other. Finally, the fourth goal focuses on the impact related to the time spent on each type of activity. That is, we aim to tell if the impact is positive or negative, when the time spent in that area increases or decreases. This analysis can reveal if changes driven by the COVID-19 pandemic altered the effort employed for each software engineering activity.
The research gap that this study aims to fill is related to the following factors: (a) a limited number of studies after one year of the COVID-19 restrictions; (b) lack of studies comparing startups and established companies in this context; (c) limited amount of studies that assess COVID-19 restrictions impact looking at the software engineering activities in a broader sense. Accordingly, to guide this study, we proposed the following four research questions:
RQ1:
How did the COVID-19 pandemic impact different software engineering activities?
RQ2:
How did the COVID-19 pandemic impact software engineering activities considering the context of software startups and established companies?
RQ3:
How do software engineering activities impacted by the COVID-19 pandemic relate to each other?
RQ4:
How does the impact of the COVID-19 pandemic on software engineering activities relate to changes in the amount of time dedicated to them?
To answer these questions, we conducted a global survey that received 170 valid answers from 29 countries with questions approaching the impact of COVID-19 on different software engineering activities. The answers were analyzed utilizing a quantitative approach. As a result, we found that most respondents answered that they did not perceive a significant impact on all the activities. Analyzing the reported impacts, we found that, in general, software startups observed a negative impact on requirements gathering, and established companies perceived a positive impact mainly on activities related to software architecture and quality assurance. Additionally, the time spent on each kind of activity presented a trend that evidenced its increase, however, we do not find any relation between the change in the amount of time effort and a positive or negative impact.
This work also adds to the existing literature by providing a perspective after one year of the pandemic restrictions. By understanding these impacts, companies, and startups might make a more informed decision when choosing to keep their development teams remote, giving special attention to the activities impacted negatively. Several studies were conducted when the restrictions started, and the companies were still adapting to this new reality. In the study reported in the present paper, since we received answers from April 2021 to August 2021, we present a view after the software development teams had time to adjust to the changes.
The remainder of this paper is organized as follows. Section II presents the literature review. We explain the research method in Section III. Section IV reports the results. In Section V, we discuss the results. Section VI concludes the paper.
Literature Review
In this section, we discuss the related literature. First, we discuss Work From Home (WFH) and related concepts in general, prior to the surge in WFH resulting from the pandemic. Afterward, we discuss works reporting the impacts of the pandemic and its impacts on SE in particular. To this end, we dedicate most of this section to studies about the impacts of COVID-19 on software teams’ productivity and well-being, software engineering activities, and software startups. The related work comprises investigations conducted with software practitioners. Finally, we briefly discuss software startup as a concept at the end of this section.
A. Work From Home Before the Pandemic
Ultimately, the situation created by COVID-19 is, or was a widespread Work From Home (WFH) situation. WFH is a long-standing area of research, with various concepts such as telework, e.g., [20], or telecommuting, e.g., [21], also being used to discuss the same phenomenon of working remotely. Various existing studies preceding the COVID-19 pandemic have studied WFH in different contexts and from other perspectives. The pandemic, however, has resulted in a situation where even those organizations and employees with no prior interest in WFH were largely forced to practice it, presenting new challenges for the individuals and their organizations.
In reports from 2007, WFH remained relatively rare [22] despite its lengthy history - and also despite its argued benefits. The reasons behind this situation have been debated to stem both from managerial resistance and various personal factors of the employees looking for work from home, such as stress, family situation, lack of suitable workspace at home, or personal characteristics such as lack of discipline while working at home [20], [22]. Indeed, much of the grey and black literature found online before the pandemic would argue against WFH by stating that it, among other things, reduced productivity due to lazy employees [23].
On the one hand, existing studies have continued to associate WFH with increased productivity [23] and cost savings [20] before the pandemic. On the other hand, these benefits are not necessarily straightforward. For example, Bloom [23] remarks that an increase in productivity is only seen when it comes to those workers who want to work from home and have the discipline to do so. Not everyone is suited for WFH or even wishes for WFH. Baker et al. [22] also argue that many of the previous studies highlighting the benefits of WFH are narrow in scope and only focused on an individual variable when studying the effects of WFH. In the case of some individuals, WFH may indeed reduce productivity. In this regard, the pandemic provides interesting new insights into WFH, not least by providing an abundance of data and potential case studies. It is worth highlighting that these studies are from some years ago, and, by that time, the tools to support remote collaboration were not the same as the ones being used today.
Following the pandemic, WFH became widespread out of necessity in areas where WFH is possible, including SE. After the pandemic, the number of studies looking at the effects of WFH has surged. Rather than having some of the employees of an organization working remotely, the situation has resulted in many cases where nearly entire organizations – or in some cases entire organizations indeed – have begun to work remotely. Moreover, WFH-only workers, those who entirely work whole weeks from home as opposed to occasionally working at the office as well, have become increasingly common as a result of the pandemic [24]. Whether WFH will remain more commonplace in the post-pandemic world remains to be seen.
The rest of this background section is dedicated to discussing these studies published after the start of the pandemic, particularly ones discussing its effects on SE rather than WFH studies in general. Many of the past studies on WFH have not been SE studies. On the other hand, various SE studies on WFH have been published since the start of the pandemic.
B. Impact of COVID-19 on Productivity and Wellbeing
The productivity of software professionals has been investigated through different lenses during the pandemic. A multiple case study was conducted with organizations in Germany that investigated the effect of the pandemic on the agile teams’ work [13]. The study collected data from 24 team members for three months, including data engineers, business analysts, managers, data scientists, software engineers, technical architects, and team leads. The data was gathered in two rounds (March 2020 and September 2020) and analyzed independently. The findings showed that the efficiency and performance of agile teams have not decreased during the pandemic. Moreover, in the participants’ opinion, the agile approach became more transparent and the communication and coordination more objective and efficient with the support of online tools, such as Microsoft Outlook, Microsoft Teams, Zoom, and Jira. Besides, they saw an intensification in the involvement of customers and stakeholders. On the other hand, an ethnographic study conducted in a service-based IT organization in India showed that the uncertainties arising from the pandemic caused a decrease in team productivity in the first weeks [11].
Other studies have observed software developers’ well-being and productivity focusing on work from home (WFH). A global longitudinal study investigated a typical WFH day and its effects on an individual’s well-being and productivity [14] in which the data was collected from a survey involving nearly 200 participants in two rounds (April and May 2020). The findings showed evidence that software developers seem to be more focused when working remotely, as WFH does not affect the time spent on their tasks. Furthermore, the significant time reduction of meetings suggests that online meetings are more time-efficient than physical ones. In another work, a survey with 2,225 software developers from 53 countries showed that the professionals perceive changes in their well-being and productivity are directly related [12]. The results revealed that professionals identify a difference in their productivity linked to home office ergonomics and disaster preparedness. The findings also indicated that the pandemic might disproportionately affect women, parents, and people with disabilities. Canna et al. [25] investigated the modifications implemented by a global software development organization to improve its employees’ wellness during the transition to WFH. In March and July 2020, the organization collected information about the staff welfare and needs and the changes imposed by the WFH regime. The findings revealed participants’ concerns about the balance between work and childcare, worries about family, unsuitable home environments for remote working, retaining privacy and security of code and data, and connection costs. From these results, the company implemented a set of interventions and evaluated their impact with ten employees in January 2021. The evaluation showed that the participants maintained the same productivity in the pre-pandemic phase and the transition to WFH; however, the employees considered that their productivity had increased after acclimatizing to the early move to WFH.
Surveys conducted by software professionals concluded that productivity and well-being also affect communication and collaboration in software development work. A Brazilian survey collected 233 responses on distributed collaboration and the participants’ well-being in April 2020 [26]. The study focused on understanding how the fully remote work arrangement occasioned by COVID-19 affects women and men differently. The results indicated that collaboration readiness is a significant predictor of women’s well-being. On the other hand, male respondents described communication challenges due to the lack of informal conversations, i.e., the need to schedule meetings to talk to their colleagues. Another Brazilian survey collected 58 responses from software professionals to investigate the influence of human and organizational factors on professionals’ productivity during the COVID-19 pandemic [15]. The results mainly showed that the participants felt motivated to conduct their activities and had good productivity and accessible communication with their coworkers. However, they pointed out that external interruptions, environmental adaptation, and emotional issues were the main factors influencing their productivity. Miller et al. [27] carried out two extensive surveys with full-time software engineers at a large software company in the USA who work from home (WFH). Two surveys were conducted, the first in April 2020 and the second in July 2020, collecting 2,265 and 608 developer responses respectively. The first survey investigated the changes in milestones achievement, team culture, and team support to the members. The second gathered data about collaboration, communication, and social interaction. As a result, the participants mentioned that even with increased working meetings, the feeling of being socially connected to the team and the ability to brainstorm with colleagues have decreased.
C. Software Engineering Activities and COVID-19
The literature has discussed the impact of COVID-19 on software development activities. A longitudinal study showed that the overall use of pair programming has decreased in the forced WFH regime [28]. The study collected data from professionals from three organizations in Norway, Sweden, and the USA during 2020 and 2021. The results showed that the sudden transition to the WFH led engineers to focus on individual tasks, temporarily reversing the social trend in software engineering. The study reported the lack of pair programming practices for remote work and the lack of tools that effectively support remote pair programming as essential challenges for agile practitioners. Bernasconi [29] discussed the adoption of an extreme requirement elicitation process when the conventional requirement elicitation methods are not applicable as in pandemic times. The recommendation covers suggestions such as dealing with diversity; investing in short, just-in-time pre-interview meetings; being use-case driven and curious; and being aware that a second opportunity for interaction is worthwhile.
Some studies on software engineering activities have also discussed productivity and well-being concerns. A study explored the impact of COVID-19 in 100 Java open-source in GitHub repository [30]. By surveying 279 software professionals from 32 countries (April - May 2020), the study reveals that COVID-19 did not impact the software project metrics (i.e., commit, pull requests). However, the findings indicated that the pandemic harmed the developers’ well-being, causing a high level of stress and, in some cases, sleep disorders. A survey investigated the influence of pandemics on agile productivity with 250+ professionals from 4 software industries in Pakistan [31]. The results revealed that the teams did not discuss user stories and the project’s complexity, making them feel less satisfied with their work. The primary reasons which affected their productivity were spending time with family, no official environment to work in, health and mental stress caused by the pandemic, no work pressure, and less opportunity to have conversations with others.
The use of online tools to support communication and collaboration in remote work has been investigated during the pandemic. An investigation conducted observations and interviews in June 2020 to examine the changes caused in the software activities and project management by the pandemic [32]. The results revealed that the distancing between the stakeholder and the team caused significant changes to less availability of the stakeholders to participate in the requirements elicitation. From the perspective of project management, the participants reported that the modifications in the development plan to start WFH did not compromise the team’s expectations, performance, and individual productivity. The findings showed that communication has been making significant progress by using platforms such as Figma (for the prototyping of screens), Slack (for sharing activities), and Discord (for meetings). In 2020, a survey with 120 agile practitioners from Asia, North America, Australia, South America, and Africa explored the impact of the COVID-19 pandemic on agile teams’ work [33]. The results revealed that the teams had an excellent adaptation to remote work because non-distributed non-local teams were already using tools like JIRA and Confluence for collaboration work and Microsoft Teams or Slack for communication before the pandemic. The findings showed that the teams did not have modifications to their Product Backlog and Product Vision issues. The release frequency and the ‘Definition of Done’ in most cases remained unchanged. The respondents also reported that the stakeholders’ involvement remained the same after COVID-19 started in over half the cases. A survey across different industries worldwide collected 95 responses relating to the significant impacts of COVID-19 on technology companies [34]. The responses revealed much more online collaboration than in face-to-face meetings. The authors discussed how communication, collaboration, and competencies could be translated to benefits to new standard work, such as flexibility, innovation, and efficiency, thus addressing the underlying root causes of the economic slowdown.
D. Software Startups
Software startups have been defined in different ways [18], [19]. Berg et al. [19] summarize a common definition of software startups as companies with an innovation focus, lack of resources, working under uncertainty and time–pressure, highly reactive, and rapidly evolving. In addition, Blank [35] describes a startup as a temporary organization that aims to create high-tech innovative products without having a prior working history. The author further highlights that in a startup context, the business and its product should be developed in parallel. Ries [36] defines a startup as a human institution that is designed to create a unique product or service under extreme uncertainty. Unlike an established company, a startup should be considered as a temporary organizational state, that seeks a validated and scalable business model [17]. A company with a dozen employees can still be in a startup state to validate a business model or a market.
Startups are found to be different from established companies in the strong presence of entrepreneurial personalities, behaviors, decision-making, and leadership [37], [38]. Software engineering literature also showed evidence of unique characteristics of product development in startup contexts. Giardino et al. [39] revealed reasons for project failure in startups. Among those, many are not relevant to established companies. Nguyen-Duc et al. [40] conceptualized the co-development of products and businesses in startups as hunting and gathering activities. In addition, Tripathi et al. [41] found that entrepreneurs’ background influences how startups’ products are developed. Also, Melegati et al. [42] showed evidence that startup founders have a special kind of influence on requirement engineering activities. Finally, Nguyen-Duc et al. [38] characterize the sense-making processes in software startups, which is unique to the organizational states.
Research has shown that software startups focus on various practices. Kemell et al. [43] identified 76 practices employed in these companies and discussed their implications. They agreed with Paternoster et al. [18] observing the use of many agile practices in an ad-hoc manner. In particular, Klotins et al. [16] analyzed 84 startups and proposed a life-cycle model for software startups consisting of four stages: inception, stabilization, growth, and maturity. In the first step, inception, the startup aims to build the first version of the product, then, in the stabilization stage, the product is further developed based on customer feedback. At this point, the team prepares the product for scaling. In these early stages, the focus is on “finding a relevant problem” and “devising a feasible solution.” In the third stage, growth, the goal is to acquire new customers to reach an expected market share. Finally, in the last stage, maturity, the startup transitions into a mature company. In these later steps, the focus is on marketing and efficiency improvement. Based on these life cycles, the authors identified several goals, challenges, and practices for these companies in the areas of team, requirements engineering, value focus, quality goals and testing, architecture and design, and project management. For instance, while in the early stages, the goal of requirements engineering is to balance customer value with time-to-market, in the later stages, the goal is to support business needs.
This balance between collecting customers’ feedback and acting accordingly, and the need to shorten the time-to-market is a key challenge of software startups [44]. Some consequences include accumulated technical debt and, consequently, hindered performance, and low product quality [44]. In the extreme, startups could pivot, i.e., perform a “strategic change of a business concept, product or the different elements of a business model” [45]. One way that startups could cope with this uncertain context is to employ experimentation, i.e., make assumptions about the product as hypotheses and test them using a systematic approach, such as problem or solution interviews or A/B tests [46]. However, research has shown that these companies still do not adopt experimentation often focusing instead on building the product [39], [47]. There are many reasons to explain this lack of adoption, including pressure from investors and the complexity of multiple-sided business, however, the influence of the founder and how the team perceives the idea seems a key aspect [46]. This result is in line with previous research stressing the importance of the founders in defining the process followed by the startup [48], [49].
In summary, software engineering in the context of startups is defined by unique aspects that are different from established companies, such as the balance between understanding the needs of the customer and building a viable product to fulfill these needs in a timely manner and under resource constraints. Besides that, the influence of the founders in defining how the process will be is probably stronger than in other companies.
Research Method
The primary goal of this study is to explore how COVID-19 has impacted the ways of carrying out software engineering activities after one year of the COVID-19 pandemic, comparing results on software startups and established companies. This study has a different context from the ones performed right after the start of the pandemic, in which the companies were still adapting to a new reality and way of working. After working for at least a year with the pandemic restrictions, we consider that the software development teams had time to adapt and stabilize in this new setting. To meet the research goal, we utilized a large-scale survey to get the perspectives of industry practitioners. In software engineering, surveys are intensively used for empirical investigations [50]. The survey, as one of the popular research methods in software engineering, facilitates gathering opinions from a large population and eventually helps in generalizing the findings [51]. Accordingly, therefore, we considered this approach an appropriate research method to address the research questions. We designed a cross-sectional survey, which was floated as an online questionnaire, to collect opinions. We followed the guidelines provided by Molleri et al. [51] while designing the survey. Figure 1 presents an overview of the research process, including the period they were performed. In the following sections, we describe each step in detail.
A. Study Design
We started designing the survey in October 2020 and continued until March 2021. In doing so, we utilized several brainstorming sessions with various researchers from across the globe. In particular, researchers from Brazil, Norway, Italy, Finland, Sweden, the UK, Portugal, Germany, Australia, Canada, China, and Vietnam facilitated the survey’s design. The design process started with a concrete definition of the population and the intended audience. Similarly, we consulted the academic literature on general software engineering, software startups, and a few available studies on COVID-19. Interestingly, we did not find comprehensive studies on COVID-19 when we started designing the survey. However, we continued consulting the growing literature on the topic gradually.
The target population for the study is software development companies around the world. We mainly targeted companies that switched their working style from onsite to working from home. Initially, we primarily intended to reach only software startup companies. However, later we adjusted to include established software development companies in order to be able to perform a comparative analysis. Similarly, to keep the intended focus, we defined strict inclusion criteria, such as removing companies from our target population whose working style was already remote. As a result, the unit of analysis in our study is a software development company i.e., an established software development company or a software startup. Therefore, the survey was open to a wide range of software development practitioners from such companies, ranging from business analysts, software architects and designers, software developers, testers, scrum masters, and a few other particular managerial roles, e.g., Chief Executive Officers (CEOs) and Chief Technological Officers (CTOs).
B. Instruments
The survey instrument consists of 45 questions. We classified these questions into five high-level groups: (1) understanding the current working conditions of the participants, (2) the contextual background of the participants, including the company and individual characteristics, and (3) the perception of the participants about the impact of COVID-19 on software engineering activities in their team and company, (4) the perception of the participants about the impact of COVID-19 on their companies’ innovation and resilience and finally (5) questions about perceived performance. The analysis reported in this paper was mainly based on the group (3) questions. The questions from groups (4) and (5) were not included in the scope of this study.
We designed a preliminary version of the survey questionnaire using the literature on software engineering and early studies on COVID-19. For the groups of questions (1) and (2), i.e., working conditions and participants’ background, respectively, we asked questions about the company characteristics (sector, number of employees, localization), tools used during the pandemic, working mode during COVID-19 (hybrid, remote, at the office). In the scope of this paper, we used questions from groups (1) and (2) to characterize the company as a startup or an established company and apply the inclusion/exclusion criteria. Group (3) questions were elaborated on based on the questionnaires presented in [52] and [16]. In the instrument proposed by [16], the authors propose questions to gather details about the activities performed by the companies in different software engineering areas (e.g. requirement engineering, software architecture). In our survey, we used the same approach separating the questions by software engineering areas, which were presented as survey sections; however, we defined questions concerning the participants’ perception of the impact of COVID-19 on software engineering activities for different software engineering areas. The participants answered the questions based on their team and company work. Besides, our questionnaire presented a smaller number of questions in comparison which the instrument proposed by [16].
In our survey questionnaire, we asked three types of questions, i.e., polar questions, Multiple Choice Questions (MCQs), and a few open-ended questions. Primarily, our MCQs used a five-point Likert scale, i.e. (1) Strongly Disagree, (2) Disagree, (3) Neither Disagree nor Agree, (4) Agree, and (5) Strongly Agree. We added a sixth option in a few questions, i.e., (6) Not Applicable. Likewise, open-ended questions were added to capture non-stock answers. In such questions, we mainly provided a free text option, e.g., “tool names” or “other,” to express their knowledge. We also allowed participants to share more details on a particular issue/question.
The complete survey instrumentation activities along with the timeline are shown in Table 1. The preliminary version was created using Google Forms. After that, the survey was validated and then refined based on the comments and suggestions from the team of researchers. Several online meeting sessions were conducted to carry out this process. Likewise, a few changes were introduced later, e.g. rephrasing Likert scale labels, removing a few questions for better focus, adding additional open-ended questions, and finally adding a sixth alternative to the Likert scale, “Not applicable”, in a couple of questions.
The survey was evaluated through a pilot study. The pilot study was carried out between February and March 2021. We collected responses from 25 participant companies, validating our constructs, scales, and questions in this validation process. We also utilized the opportunity to consult expert opinions, from senior researchers in software engineering, with expertise in executing survey research [7], [53]. Consequently, the final version of the survey questionnaire was designed by considering several feedback loops. We prepared and released the final version at the end of April 2021, using Lime Survey,4 an online survey tool for research institutes.
According to Linaker et al. [54], web-based tools are more efficient and help to gather a large number of responses. In addition, our selected tool facilitated us in automating validation checks at several preliminary questions, i.e. demographic information and participants’ experience of working under COVID-19. That eventually allowed us to get data from participants working under COVID-19 conditions. In addition to the survey questionnaire, we described our research objective at the forefront. The proceeding questions were primarily focused on collecting demographic information and assessing whether the participants’ company is a startup or an established software development company. Furthermore, the survey was initially designed in English and further translated into seven other languages, i.e. Italian, Spanish, Portuguese, Norwegian, Arabic, Indonesian, and Vietnamese. The translation was performed by seven members of the author team, who were native speakers of the specific language. The published version of the questionnaire is available online.5 The answers were anonymous, and before starting the survey, the participants needed to provide consent to use their responses for research purposes.
In this paper, we present the results obtained by analyzing the group (3) of questions (i.e., the one focused on the impact of COVID-19 on software engineering activities). Table 2 presents the questions divided into six groups. The first five groups provide questions regarding software engineering activities (i.e., requirements engineering, software architecture, user experience design, software implementation, and software quality assurance). For each activity, the respondent had to select an answer from a Likert scale that reflected the impact of COVID-19 on that activity, with the options: (a) negative, (b) little negative, (c) neutral, (d) little positive, (e) positive, or (f) not applicable. If any of the answers were not neutral, the respondent was asked to give details explaining why in an open-ended question. An additional set of questions asked about the impact of the time spent on each kind of activity for each of the five categories (see the last row in Table 2). For these questions, the options available were: (a) decreased, (b) slightly decreased, (c) did not change, (d) slightly increased, (e) increased, or (f) not applicable. Each question had a label, such as RE1_SQ1 and SA_SQ3, that is used further in charts to refer to that question. While the first part refers to the survey section, the second part refers to its position inside the section. When the question appears with the prefix “St_” or “Es_” in a chart, it refers to the answers to that question respectively given by participants from startups or established companies.
C. Dissemination
As the population is significant, we followed convenient and purposive sampling [55], two non-probability sampling techniques, for our survey research. Accordingly, we selected the sample based on easy accessibility and particular geographical regions. We utilized our personal yet established contacts within 13+ countries. Initially, we expected to receive 10–20 responses per country through these personal channels. While executing this strategy, we invited population members through our professional and social networks. We published the same pre-designed post about the survey on popular social media platforms, such as LinkedIn, Twitter, Reddit, Quora, and Facebook. Similarly, we also published the call for participation in several academic communities, e.g. SEWORLD mailing list,6 a mailing list of International Conference on Software Business7 and Software Startup Research Network.8 In the same vein, we also utilized our professional connections by asking co-authors to find people and sending invitation emails to those they thought could participate in our study. Hence, we capitalized on each co-author’s local knowledge to reach more people in their geographical zone. That was an effort to reach out to information-rich cases. Besides that, while co-authors applied purposive sampling, there was a need to translate the standard questionnaire into the local language. Therefore, each localization involved a slight variation in wording and phrases. The process was manual and completed by the native language speaker. In a few cases, co-authors printed out the survey and then disseminated the paper-based version. Lastly, we recruited participants from professional channels, such as Prolific.9
The utilization of online platforms for research participant recruitment and subject pool management has become increasingly prevalent in empirical research. One notable example is the platform Amazon Mechanical Turk (MTurk), which has seen significant growth in the number of published papers reporting social science experiments conducted with participants sourced via the platform, from 61 in 2011 to over 1,200 in 2015 [56]. Despite receiving criticism for its lack of explicit design for the scientific community, alternative platforms (e.g. Prolific) have emerged as viable options for researchers. Prolific, specifically, combines robust recruitment standards with cost-efficient measures, while also clearly informing participants that they are being recruited for participation in research studies. This platform has been utilized across a variety of research disciplines, including economics [57], psychology [58], and even food science [59]. As such, prolific can be considered a reasonable means for recruiting participants in surveys with multidisciplinary topics, such as those pertaining to engineering, management, innovation, and professional work practices. Additionally, Prolific implements a Quality Control process, in which studies are screened by their team to ensure adherence to ethical standards and compliance with the platform’s guidelines.
A typical recruitment process in Prolific is as follows: (a) Prolific sends an email to eligible participants in its pool of registered users, inviting them to participate in the study; (b) Participants can review the study details and accept or decline the invitation; (c) Once a participant has accepted the invitation, they are directed to the survey or task specified by the researcher; (d) After completing the survey or task, participants are usually compensated with a monetary reward; (e) Researchers can access the data collected from participants and use it for their research.
D. Data Collection
The final version of the survey’s data collection process started at the end of April 2021 with the pilot and continued until August 2021. We collected 413 responses, in total, through all our sampling strategies. Therefore, we tried to restrict the sample representatives from one company through our survey questionnaire tool, i.e., Lime Survey.10 In addition, the survey tool was configured to offer the facility to change the language of the survey. Figure 2 shows an aerial view of the data collected from several countries across the globe. As Figure 2 depicts, to our surprise, responses stemmed from 15 countries and are dominated by respondents from Brazil, the UK, Vietnam, the USA, and Poland. We added a screening question about the observance of COVID-19’s impact on their working environment on the first page before the actual survey questions. Therefore, if the participant did not observe any change, the survey is finished. We applied a strict data cleaning process described in Section III-F, resulting in 170 valid responses.
E. Exclusion Criteria
Before analyzing the survey data, we carried out data cleaning steps to remove responses that were not done thoughtfully or out of the study target. As a result, we adopted the following exclusion criteria to ensure the quality of data for analysis:
Responses with all blank fields were removed. In this case, some respondents had answered the filter question correctly but left the follow-up questions unanswered.
‘Straight line” responses whereby the respondents had answered the survey questionnaire with default answers were removed. This behavior depicts that questions were not properly read and thought about.
“Outlier” respondents with unrealistic completion times were removed. During the pilot study, we assessed that the questionnaire needs at least 3 minutes and at most 155 minutes to be completed. Therefore, all responses that were out of this estimated range were removed.
One question was asked to identify one-person companies to remove them from the data.
Responses that did not meet our pre-requisite question, i.e., if COVID-19 did not cause any change in their working environment were also removed.
F. Data Analysis
After removing the answers based on the exclusion criteria described in the previous section, we moved all open-ended responses to a separate file, where we performed an analysis to extract quotes that could complement the quantitative results. In addition, we used a few fields to categorize the data. In particular, we used the company type to classify the answer as being from a startup or an established company.
We begun the data analysis process with the intent to assess the impact of COVID-19 on software engineering practices. Unsurprisingly, the impact differs for each company, i.e., while some teams might have struggled to communicate with their workers to work at home, others might have, for instance, reduced overhead and improved work efficiency. Hence, it confined us only to assessing the median impact of COVID-19 on the overall population of software development companies. Median is a useful metric in identifying the middle observation that further informs us about the data distribution [60].
We also considered that some activities might not apply to all software projects. For instance, UX might not apply to the software without a user interface, or a given team might not conduct code inspections. Taking this perspective, we considered in the analysis only the answers which the participant provided responses different to “not applicable”. Even if a participant answered “not applicable” for one of the questions, its answers to the other questions are still included in the analysis.
The Shapiro-Wilk test was done for each activity’s responses, with the null hypothesis that the distribution is normal. The max P-value obtained from the test was 0.0005, which showed that none of the distributions are normal. Therefore, we considered the non-parametric test, the one-sample Wilcoxon signed-rank test [61], [62]. The null hypothesis of this test is that the distribution is symmetric and has a median equal to a known value that in our study is zero. To apply this test to a Likert scale, we considered the scale as numeric [63]. This decision was found reasonable and adopted in other software engineering studies, particularly those applying structural equation modelling [64]. While executing this, we converted the categorical values to a numeric scale using the following pattern: −2 for negative, −1 for little negative, 0 for no impact, 1 for little positive, and 2 for positive.
Moreover, for Wilcoxon signed rank tests [61], there are two ways to handle ties: the conventional way, where zeros are removed from the rank, and by splitting the zeros between positives and negatives. Since in our analysis, we were interested in comparing the answers that reported some impact to evaluate the presence of a trend for the positive or negative side, we decided to follow Wilcoxon’s first proposal [65] in our data analysis.
To compare the result in startups and established companies, we did not compare the distribution of the answers directly. We divided the samples based on the type of company and applied the tests to evaluate the impact. The comparison of the outcomes from tests allowed us to see whether the impact was different for startups and established companies or not.
We used Spearman’s rank correlation coefficient to evaluate the correlation among the questions. This technique was chosen because it is suitable for assessing the correlation between ordinal data with the non-parametric distribution. This coefficient aims to measure the degree of monotonicity between two groups of data [66], and it ranges from −1 to 1. The sign of Spearman’s coefficient indicates the direction of correlation, and its value shows its strength. The closer the value of the coefficient is to zero, the weaker the correlation [67]. We used Pandas,11 a Python library, to compute Spearman’s correlation between the questions of all types of activities.
Finally, we examined the answers to the open questions looking for illustrative quotes that could explain our quantitative results. Two authors explored the qualitative answers and extracted excerpts that supported us to better discuss our quantitative results.
We have made available a supplemental package12 containing the questionnaires, the raw data, and a Jupyter notebook containing the performed statistical analysis.
Results
The result section is organized into three subsections. Section IV-A describes the stakeholders’ perception of the impact of COVID-19 on software engineering activities, i.e., requirements engineering, software architecture, user experience design, software implementation, and quality assurance. Section IV-B presents our analysis of the correlation between the stakeholders’ perception of different software engineering activities. Finally, in Section IV-C, we describe an analysis of the correlation between the perception of the impact on software engineering activities and the time spent on these activities.
A. Impact on Software Engineering Activities in Software Startups and Established Companies
1) Requirements Engineering
Regarding requirements engineering activities, we collected 121 complete answers from established companies and 49 from software startups. Figure 4 describes the distribution of the answers according to the type of company, i.e., software startups vs. established companies. Overall, there is a large number of respondents (between 35% to 55%) who found no impact of COVID-19 on RE activities. Regarding requirements gathering activities (RE1_SQ1), like interviews or observations, 35% of respondents in established companies, and 47% in software startups, found COVID-19 has a negative or little negative impact on the activities, while only 25% and 18%, respectively, found the positive or little positive impact. It is similar to the perceived impact of COVID-19 on customer involvement in RE activities (RE1_SQ2) 31% in established companies, and 37% in software startups, of the respondents reported negative impacts, and only 15% found a positive or little positive impact. There is no difference in the frequency of negative and positive answers regarding requirement prioritization and management.
Table 3 summarizes the one-sample Wilcoxon signed-rank test results from answers related to requirements engineering. These tests have shown that some of the activities are affected based on the company type. For established companies, the respondents reported a positive impact on the fourth question, which is about requirements management. On the other hand, respondents from software startups reported a negative impact on the first question, which is related to requirements-gathering approaches.
Below, we refer to some excerpts from the answers to the open question regarding requirements engineering activities. With respect to requirements gathering approaches, one respondent from a startup company said that: “Maybe the greatest impact occurs in the low performance for the creation of new ideas. It has taken a lot for people to have new developments without being together and without generating synergies.” Another respondent from a startup company summarized the negative impact on requirements gathering approaches that: “Workshops cannot be held due to lockdown restrictions, questionnaires cannot be handed out to the public due to safety reasons. Getting customers’ feedback is slower due to lack of availability of some customers, internet connection dropping, not answering calls, not replying to messages in time, etc.”
According to our data, established companies coped better with the restrictions imposed by the pandemic. A respondent from an established company wrote: “The different way of working due to COVID-19 has not had any impact on our practices in a negative way. We have increased customer and end-user interactions via online meetings to improve access to their comments/issues on a more frequent basis than previously which has been a positive step.” Interestingly, the results have shown that software startups had no perceived impact regarding other aspects of requirements engineering, such as prioritization and management.
Finding 1: Most of the respondents reported no impact on requirements engi- neering activities. The statistical tests revealed a trend of observing a neg- ative impact on requirement gathering in software star- tups and a positive impact on requirement management in established companies.
2) Software Architecture
Regarding software architecture, we collected 115 completed answers from established companies and 45 from startups. Figure 5 presents the responses about the perceived impact of COVID-19 on software architecture activities. Overall, there is a large number of respondents (between 61% to 80%) who observed no impact on these activities.
Responses on how COVID-19 impact on software architecture activities (labels bellow 3% omitted).
Table 4 summarizes the one-sample Wilcoxon signed-rank test results for the impact of the COVID-19 pandemic on software architecture activities. The results show an interesting difference between startups and established companies. Startups perceive no impact on any of the software architecture activities. However, established companies reported a positive impact on all the questions related to software architecture activities, which are related to the architecture decisions, the architecture patterns, and the architecture conformance and quality.
According to some quotes returned, some startups justify the no impact on software architecture activities in two ways. First, some startups used to work remotely prior to the COVID-19 pandemic. Two respondents pointed out that “As a software startup, we were already used to working in different offices across the country and having remote team meetings”, and that “We were already working from home at the beginning of the startup, so nothing changed”. Second, software startups usually do not have specialized team roles (e.g., architect and quality expert). Hence, the COVID-19 impacts on software architecture might not be well observed in this area.
Regarding architectural design questions, we found more people who voted for the positive impact of the COVID-19 pandemic than people who voted for the negative impact. The followings are some excerpts from answers to the open question that illustrate how the COVID-19 pandemic positively impacted software architecture in established companies. With respect to the architecture decisions, one respondent reported: “COVID-19 has an impact on the architectural decisions because my company change some rules and make some new decisions.” Regarding the architecture patterns and conformance, another respondent reported: “With most staff working entirely remotely, there is an added incentive for enterprise software to be kept operating at the highest possible standards. Service interruptions are identified and addressed much quicker.”
Finding 2: Most of the respondents reported no impact on software archi- tecture activities. The statistical tests revealed a trend for the positive side of all investigated activities for established companies.
3) User Experience Design
For user experience design activities, we collected 111 complete answers from established companies and 47 from software startups. Figure 6 presents the responses about the perceived impact of COVID-19 on software architecture activities. Regarding UX design questions, the number of positive and negative answers is similar. Even though we can see a more positive trend for st_SD1_SQ2 in the chart, the one-sample Wilcoxon signed-rank test did not point to a positive or negative trend.
Table 5 summarizes the results of our statistical tests, which indicate that COVID-19 has no impact on the user experience design activities for both startups and established companies.
Based on the open questions related to user experience design activities, one respondent explained a possible reason why these activities have no impact: “User experience is done over the internet therefore there has been no impact.” It means that most of the approaches used to design and validate the user experience are performed online. Thus, COVID-19 has not changed the way to do this kind of activity. This feedback reveals some reasons for the negative experience, for instance, 1) user design activities that involve customers/end-users, or 2) activities related to brainstorming new UI/UX ideas. Below are some excerpts that refer to negative impacts on user experience design activities. With respect to user design activities that involve customers, one respondent wrote: “More difficult to have customers test on the devices we would like to use as they are only available in the office.” Another respondent said that: “We are far from having something like that. The validation tests with the end user is essential. Doing the A/B tests with the clients avoids rework of the teams with the front-end projects.” With respect to activities related to brainstorming new UI/UX ideas, one respondent wrote: “Brainstorming UI ideas was difficult online since we didn’t have the right tools. UX design always started in a room with a whiteboard of requirements.” Another respondent explained that: “I find it more difficult to visualize UX ideas when at home. In the office it is easier to discuss concerns and provide input.”
Finding 3: Most of the respondents reported no impact on UX activities. The statistical tests did not reveal any positive or negative trend for estab- lished companies and soft- ware startups.
4) Software Implementation
For software implementation activities, we collected 116 complete answers from established companies and 45 from software startups. Figure 7 shows the responses according to the company type. Overall, there is a large number of respondents (between 55% and 69%) who believed that COVID-19 had no impact on design implementation activities. Table 6 displays the one-sample Wilcoxon signed-rank test results for each question by established companies and software startups, and it did not detect impact for any of the questions.
Responses on how COVID-19 impact on Software Implementation activities (labels bellow 3% omitted).
The answers given to the open questions for this section also support the observed dispersion of facts and opinions. On the downside, most of the concerns pertained to loss of code quality and increased technical risk tolerance, primarily due to communication difficulties between developers. Some illustrative answers are as follows:
“Quality suffers as direct access to coders are not possible anymore”, “The quality of the delivered product has been hampered by the lack of interactivity by interested parties.” (translated from Portuguese)
“Given the worse communication between teams, I see that the amount of technical debt accumulated and the level of technical risk the team can tolerate has increased.” (translated from Portuguese)
On the other hand, other comments are contrary to this, giving a positive view of the changes caused by the new way of working. Some comments report greater cohesion in the development team and greater availability of time for code review, which has been seen as beneficial for software implementation activities. Some illustrative answers are as follows:
“More free time to look into the code and check it for bugs”, “[…] since we have started working from home, the team has been more eager to review code.”
“I felt an increase in developers’ commitment to delivery - the result, in my opinion, of their concern to demonstrate the efficiency of remote work.” (translated from Portuguese)
Finding 4: Most of the respondents reported no impact on software imple- mentation activities. The statistical tests did not reveal any positive or neg- ative trend both for estab- lished companies and soft- ware startups.
5) Software Quality Assurance
For the software quality assurance set of questions, we collected 119 completed answers from established companies and 45 from software startups. Figure 8 displays the distributions of responses about the perceived impact of COVID-19 on these activities. Overall, there is a large number of respondents (60%-67%) who found no impact of COVID-19 on quality assurance activities. Besides that, the one-sample Wilcoxon signed-rank test detected a trend toward a positive impact in established companies for ST1_SQ1 and ST1_SQ2.
Responses on how COVID-19 impact on Software quality assurance activities (labels bellow 3% omitted).
Table 7 presents the one-sample Wilcoxon signed-rank test results for software quality assurance tests according to the type of company.
The positive aspects are also present in participants’ reports:
“…there seems to be more concern in the team to produce better more maintainable code.”
“I think code inspection works better these days as new technologies have made it easier to review each other’s code and people seem to be more willing to critique others code in constructive ways.”
Most companies refer to the difficulty of performing remote acceptance testing due to new ways of communication between the development team and the users. In this regard, some illustrative comments are as follows:
“Users had greater difficulty in validation and acceptance testing remotely when they were not fully aware of the product.” (translated from Portuguese)
“Not being able to test with end-users in-person has had a negative impact on QA. ”
Finding 5: Most of therespondents reported noimpact on software qualityactivities. The statisticaltests revealed a positivetrend on testing activi-ties and code inspection forestablished companies.
B. Correlation Between the Impact on Different SE Activities
To evaluate the relationship between the impact on different software engineering activities, we calculated the Spearman correlation between all questions, and the result is presented in Fig. 9. It uses color to show the magnitude of the correlation, where a darker tone represents a stronger correlation. We also highlight the fact that the matrix representing the values is symmetric. We performed the analysis in two levels, within each SE group (i.e. RE, SA, SD, etc) and across groups. The values can be interpreted considering the following scale: 0.00 to 0.19 as a very weak correlation, 0.20 to 0.39 as a weak correlation, 0.40 to 0.69 as a moderate correlation, 0.70 to 0.89 as a strong correlation, and 0.90 to 1.00 as a very strong correlation.
Spearman correlations among all the questions related to software engineering activities.
As expected, we can see in Fig. 9 that the correlation among answers to the questions of the same activity (between 0.56 and 0.77), except for software implementation, is higher than the correlation among the answers to the questions of different activities. The correlation among Requirement Engineering questions (RE1_SQ1 to RE1_SQ4) is mild to strong (ranging from 0.56 to 0.72). The correlation strengths among Software Architecture questions (SA1_SQ1 to SA1_SA4) are at a medium or high level, ranging from 0.59 to 0.77. The correlation between the two SD questions has a value of 0.72. The correlations between Software Testing questions (SQ1_SQ1 to SQ1_SQ3) have values between 0.69 and 0.73. The correlation among Software Implementation questions (SI1_SQ1 to SI1_SQ5) is a bit weaker than those of other activities, ranging from 0.43 to 0.57. That is evidence that in areas like software requirements, software architecture, user experience, and quality assurance, the impact in one activity is frequently followed by a similar impact on others.
Regarding cross-group analysis, the correlations among questions fluctuate from medium to weak and very weak. The strongest cross-group correlation (0.64) was found between SI1_SQ1 (MVP/ product release) and ST1_SQ3 (effectiveness of user acceptance tests) and between SA1_SQ3 (architecture conformance) and SI1_SQ1 (product release), and between UX design validation and requirement prioritization.
Finding 6: Participants’ responses to questions within the activities requirement engineering, UX, software architecture, and software quality are strongly correlated. The responses to software imple- mentation questions are moderately correlated.
C. Changes of Time Spent With SE Activities and Correlation With Reported Impact
Figure 10 presents the distribution of the answers regarding the perception of the time spent on different software engineering activities according to the company type. By looking at the distribution of the answers in Fig. 10, one can notice that the majority of the answers do not report a change. This exactly accords with the answers about the impact. Considering the answers that reported an increase in the time, it is also possible to notice that it is more frequent answers that reported a smaller impact since the number of “slightly increase” is usually higher than the number of just “increased”.
Table 8 presents the results of Wilcoxon tests for questions regarding the impact on the time spent on the different software engineering activities. Considering the changes in the amount of time needed for each activity, the result obtained for all activities, both for software startups and established companies revealed an increase.
Finding 7: The majority of the respondents reported that there is no change in the amount of time spent on SE activities in gen- eral. Relatively, there are more respondents report- ing an increase in working time than respondents who observed a decrease in work- ing time.
To investigate the correlation between time and impacts on SE activities, we calculated Spearman’s rank correlation coefficient considering the question about the change in the amount of time for an activity and the respective questions about the impacts on it. In this analysis, we did not find any strong correlation that could suggest any relation between them. The values were in general close to zero as shown in the following:
Requirements Engineering: RE1_SQ1 (-0.11); RE1_SQ2 (-0.091); RE1_SQ3 (-0.094); RE1_SQ4 (-0.13)
Software Architecture: SA1_SQ1 (0.053); SA1_SQ2 (0.033); SA1_SQ3 (0.058); SA1_SQ4 (0.045)
User Experience Design: SD1_SQ1 (-0.017); SD1_SQ2 (-0.033)
Software Implementation: SI1_SQ1 (-0.081); SI1_SQ2 (-0.090); SI1_SQ3 (-0.040); SI1_SQ4 (-0.15); SI5_SQ4 (-0.11)
Software Quality Assurance: ST1_SQ1 (-0.010); ST1_SQ2 (-0.0031); ST1_SQ3 (-0.081)
Finding 8: It was not pos- sible to find a correlation between a change in the amount of time spent and a reported positive or nega- tive impact for any of the software engineering activi- ties.
Discussion
In this section, we summarize our results according to how they help us to answer our research questions. We also compare our results with previous studies in the literature and discuss the implications for research and practice.
The first point to be considered in the discussion section is the moment in which this survey was conducted. Several studies that investigated the impact of the pandemic conducted interviews and surveys a few months after the restrictions were imposed, which might provide a view from an adaptation period. However, since our survey received responses from April 2021 to August 2021, we considered that it provides a view from at least one year after the beginning of the pandemic.
A. RQ1: How Did the COVID-19 Pandemic Impact Different Software Engineering Activities?
The key point that is important to notice in the results is that, despite being a startup or an established company, most of the answers report no impact in all the questions investigated. This fact evidences that, in general, the impact of COVID-19 on Software Engineering activities was not found by our survey. Comparing the questions for different activities, the one with the highest reports of “no impact” was software architecture, with percentages of 61.7% to 68.7% for established companies and from 68.9% to 80.0% for software startups. On the other hand, the activity with fewer “no impact” answers was software requirements, with the percentages varying from 40.5% to 54.5% in established companies and 34.7% to 55.1% in software startups.
These results indicate that activities related to software architecture, such as decisions, usage of patterns, conformance, and quality, did not change much due to the COVID-19 pandemic. According to the literature, although architectural decisions and the choice of patterns are a group activity [68], most of the teams did not feel any impact. This result is also confirmed by a recent study that investigated the impact of remote work in a startup [69]. A possible reason is that architectural decisions usually have a more long-term impact on the projects, and with the other changes needed in this period, a decision could be made to keep the existing structure. One participant from an established company explained: “Again I can’t really say that I’ve seen any meaningful differences with regards to software architecture, things were a bit disrupted at the start but we quickly adjusted.”
On the other hand, in requirements engineering, we received the lowest number of “no impact” answers, especially for the activities related to requirements gathering and the contribution of customers. Considering that these activities require the participation of individuals external to the team, the difficulty of communicating with them due to the restrictions could be one explanation for this result. Our findings are similar to the discussion presented by de Mendonca et al. [32], who claimed that the WFH has an impact on the access to stakeholders to take part in the requirements elicitation activities.
B. RQ2: How Did the COVID-19 Pandemic Impact Software Engineering Activities Considering the Context of Software Startups and Established Companies?
Even with most of the answers stating a lack of impact on the software engineering activities, a significant number of participants declared that the COVID-19 pandemic somehow impacted their activities. We evaluated the distribution of the answers separately for software startups and established companies, using the Wilcoxon test to investigate if the answers reporting any impact reveal a tendency towards the positive or negative side. For several questions, no trend could be identified for both contexts. In all questions for software implementation and UX-related activities, for instance, the number of answers reporting a positive impact balanced with the answers with a negative one, revealing no clear tendency for either opinion. This result was also present in the answers to open questions. As an example of the contradicting impacts on UI/UX related activities, one respondent said that “no new designing was introduced” whereas another one said: “My company uses a new design having a positive impact in the company.”
Considering only the answers from startups, the only impact identified was a negative one in the effectiveness or feasibility of the requirements-gathering approaches. This question had the lowest number of no-impact answers (34,7%) among all questions. One of the possible reasons for that is the distance created by the restrictions between the startup team and its customers. Since several startups are still searching for their business model and do not understand precisely who their customers are [38], they have felt a higher impact than the companies with an existent established relationship with customers. The issue of requirements gathering during the pandemic has been discussed in the literature. Bernasconi [29] points out the need to adopt a use-case-driven approach to collecting data in extreme situations as occurs in pandemic times.
Software engineers think that communications go well when they are already familiar with the new environment, except in the case that they have to go beyond their team boundary, learning to work with external stakeholders in the new context. This issue was visible in the requirement elicitation gathering, particularly for startup companies. Also, these findings support the results from other studies [26], [32] suggesting that the lack of informal communication and less availability of stakeholders to participate in requirement elicitation are the main negative impact of COVID-19 pandemic. This result supports existing work [27] which highlights that the main issue is the quality rather than the quantity of communication. Our results indicate that startups particularly perceive the negative impacts on requirements-gathering activities. This finding can be the consequence of the fact that software startups are developing novel products. To support this endeavor, they rely on customers’ feedback more intensively. This issue affects especially the gathering and contact of customers as shown in our results.
Considering only established companies, we observed a positive impact on seven questions: in the requirements management, in all questions from software architecture, and in the software quality activities related to testing and code inspection. Despite the impact in all software architecture questions, those were also the ones with a higher number of neutral answers. The analysis of the open-question answers helped us to justify this positive impact with the fact that the migration to remote work generated a higher demand for quality in architectural tasks. As explained by one participant, “With most staff working entirely remotely, there is added incentive for enterprise software to be kept operating at the highest possible standards.” The positive impact in testing and code inspection was also mentioned in some answers as the result of actions to have higher control over a code produced remotely. Adopting automated tools was also mentioned as a factor for that positive impact.
It is important to highlight that due to the lower number of answers from startups compared to established companies, it is hard to detect a statistically significant impact. This limitation can be considered one of the reasons why we found more impacts for established companies.
C. RQ3: How Do Software Engineering Activities Impacted by the COVID-19 Pandemic Relate to Each Other?
Software implementation was the only activity in which the correlation among its questions was low (between 0.43 and 0.57). For instance, in these questions, the lowest correlation was between the impact on the frequency of product releases and the amount of technical debt accumulated (0.43). That provides evidence that, compared to the other activities, the impacts investigated related to software implementation are more independent from each other. Consequently, as a recommendation, for changes made due to the adaptations for the COVID-19 pandemic, improvements in these fields should be considered individual efforts, not necessarily impacting the others.
The Spearman correlations between different activities’ impacts vary from 0.24 to 0.64. The higher correlation (0.64) is between the frequency of product releases and the effectiveness of acceptance testing, also being high to the impact on other testing activities (0.62). This statement is supported by the literature that claims that a mature testing process is an essential part of a continuous deployment process [70]. The same question about releases is also correlated with architectural conformance (0.64), evidencing that following the architecture’s restrictions also contributes to a smooth deployment process. Moreover, the automation of such processes reduces the impact in these fields.
An unexpectedly higher correlation (0.62) was found between the approach for UX design validation and requirements prioritization. Some studies in the literature present evidence that the relation of these topics might be associated with the participation of the end-user in the development process. According to Zaina et al. [71], the involvement of end-users brings a clear view of the product to the development team, which can improve the team’s capacity for prioritizing the requirements. The same claim is supported by Abelein and Paech [72], which highlights the importance of having the customer onsite. Further studies could investigate how the migration to remote work affected teams that used to have onsite customers and what other strategies can be used to keep the customer close to the team.
Looking for the lower correlations, those with a Spearman correlation below 0.30, some interesting results can also be identified. The lowest correlations present are between the amount of accumulated technical debt and both architectural decisions (0.24) and architectural styles or patterns (0.25). This lack of correlation is unexpected, since negative impacts in the architectural design, causing its degradation, are associated by several studies as a cause for technical debt [73]. A possible explanation is that a positive impact on architectural decisions and usage of architectural patterns might influence only a particular kind of technical debt, the one related to the architecture [74], not having a significant impact on the others related to code.
Another lower correlation can be observed between the approaches for designing UX with two other questions: the contribution of customers/end-users to the requirements process (0.28) and the level of technical risk that the team can tolerate (0.26). On the one hand, we did not find any study that correlated the UX design approaches to team capacity to handle technical risks. Indeed, both aspects deal with independent factors, making sense of the resulting weak correlation. On the other hand, several UX design approaches depend directly on the customer and end-user involvement, so the weak correlation between these two questions was unexpected.
D. RQ4: How Does the Impact of the COVID-19 Pandemic on Software Engineering Activities Relate to Changes in the Amount of Time Dedicated to Them?
The first result related to that question is that besides several answers reporting no change, among the others, we detected a trend of an increased time spent on all activities. This increase in the time dedicated to a given activity can be, at first, associated with a negative impact in the sense that it requires more effort to be performed. For example, some participants reflected the increase in time in a negative way. One participant said, “Meetings with most of our clients were delayed and some were canceled even virtually because of non-availability of most of the people.”. However, if this increase reflects that the team is able to dedicate more time to perform it in an improved way, it can also be associated with a positive impact. One participant emphasized that “People working from home have more freedom to take more breaks from work, and so they can take more time for themselves, and to relax”.
The second result, referring to the relation between time and impact, was that the correlation analysis revealed a very weak correlation between them. To understand the distribution of the answers, we generated heat maps with the number of answers combining what was reported for the impact and its respective question about time increase. Fig. 11 presented an example of this heat map considering requirements management and the change in the time effort for requirements. The first information that calls our attention in this chart is the concentration of answers in the pair “no impact” and “do not change” for the time, reflecting the fact that most of the respondents declared no impact and that the time dedicated to each activity did not change (we refer to these as “neutral answers”).
Heat map with the number of answers considering the combined response to RE1_SQ4 and SE1_SQ1.
Taking a look at the distribution of answers, we cannot see any clear pattern that an increase or decrease in the amount of time dedicated to an activity is related to a positive or negative impact. For instance, if we consider the column that represents a time increase, we can see answers that report both a positive and negative impact (8 for the negative side and 9 for the positive side). This small number of answers in each region without a clear tendency can also be observed in other regions of the chart. Based on that, we cannot state that our data allows any conclusion connecting the impact to the change on the dedicated time.
Further analyzing the chart, we can see some regions, besides the neutral answers, with a higher concentration of answers. We analyzed all the heat maps generated combining each question related to an activity impact with its respective question about the time. Table 9, it is presented the number of neutral answers and other regions which received more answers (numbers in parenthesis represent the number of answers received). As can be seen, except in requirements, for all other questions, the combination of no impact with a slight time increase has the highest number of answers. As explained for Table 2, labels in Column “Questions” of Table 9 refer to specific questions in the survey. While the first part refers to the survey Section (RE - Requirements engineering, SA - Software Architecture, SD - User experience design, SI - Software implementation, ST - Software quality assurance), the second part refers to the question position inside the section.
A possible explanation for this result is that remote work can save some time from other activities while leaving more time to dedicate to activities directly related to software development. For example, development teams saved more time for activities such as code implementation and code review. As illustrated by one participant “More free time to look into the code and check it for bugs.” However, activities that involve stakeholders outside the development team (e.g., communication with customers) need more time to be performed. We found that the main factor contributing to this impact stems from ineffective communication. The restrictions of COVID-19 have limited the opportunity to reach customers and receive explicit feedback. As explained by one participant “Getting customers’ feedback is slower due to lack of availability of some customers, internet connection dropping, not answering calls, not replying to messages in time, etc.” In this sense, an additional short amount of time could be dedicated to these activities without having a significant impact.
E. Customer Perspective
While customers of startups and established companies were likely affected by the pandemic as well, the focus of this study was on developer-perceived impacts on companies’ software development teams. Even though we have not explicitly collected the opinion of customers and users, we have obtained some indicators of their behavior with the open-ended questions at the end of each section of the survey.
The interaction of customers and users with development teams in startups and established companies occurs most intensely in the processes of requirements engineering, user experience design, and testing. Thus, for example, regarding the requirements engineering activities, some answers to the corresponding open-ended question were (see Section III-A): “Getting customers’ feedback is slower due to lack of availability of some customers, internet connection dropping” and, on the contrary, “We have increased customer and end-user interactions via online meetings to improve access to their comments/issues on a more frequent basis than previously”.
From these and other similar responses, it can be inferred that some clients were affected regarding speed or availability to return feedback to the development team. In contrast, others felt the opposite effect of improving their relationship with the development team in the requirements engineering process. In a similar way, regarding testing and quality assurance processes, answers to the corresponding open-ended question also allow us to grasp the impact on customers and users (Section III-A): “Users had greater difficulty in validation and acceptance testing remotely when they were not fully aware of the product.”
F. Threats to Validity
We will discuss validity according to the four perspectives presented by Wohlin et al. [75], complemented by survey-specific validity aspects [51], [76].
Construct validity is concerned with the relationship between a theory behind an investigation and its observation [75]. The goal of the survey is to gain insights into the impact of COVID-19 and Software Engineering activities, and we do not aim at fully developing or validating hypotheses. To enhance construct validity, we used validated scales and questions for software engineering activities. We assured the confidentiality and anonymity of the respondents, hence reducing as much bias as possible. We are also aware of the absence of objective measures in survey data, where we can only base findings on the perception of respondents. In some questions, for example, customer validation, the perception of the respondents on the matter is not less meaningful than its objective measure.
Different countries had COVID-19 outbreaks and lockdowns at different times, facing various restrictions at distinct moments. This variation could change the experience of participants from different places. To mitigate that, we conducted the survey one year after the beginning of the COVID-19 pandemic, considering that at this point, the companies already faced restrictions and have had time to adapt. Moreover, we also asked at the beginning of the survey if there were changes in the participants’ work environment, allowing the elimination of the answers of companies that already had remote teams before or from places that did not face restrictions.
Internal validity deals with the relationship between a treatment and its results [75]. We have a filtering question so that only respondents who experienced an impact on their work and their companies can answer questions. Even though we still cannot guarantee that the participants are indeed developers. To mitigate that threat, we applied a strict data-cleaning process, which extracted 170 valid answers from the 413 collected. Another answer that depended on the participants’ judgment is the company’s classification as a startup. Thus, misclassification is a threat that might compromise the result and the results should be interpreted considering that it reflects the participants’ view of their company. An inherent threat to survey research is that it can only reflect respondents’ perceptions rather than other objective measurements. To make questions understood in the same way by all respondents, we reviewed and revised these several times. The survey versions were reviewed by people from representative countries to reduce the possibility of misunderstandings due to language or cultural differences.
External validity concerns the generalization of conclusions [75]. We cannot reach generalizable conclusions from our study. Proper sampling is very difficult given the lack of credible sampling frames (population lists) for the units of analysis in software engineering research study [7], [77]. However, size and breadth are considered reasonable arguments for the representativeness of a sample. This study collected 170 valid answers from a wide range of geographical locations and business domains, thus, although we cannot claim that the results apply to all software development teams, it presents interesting results that could be valuable to researchers and practitioners alike. Online participant recruitment might face risks of self-selection bias towards a group of people who have strong opinions. However, our data showed the appearance of a significant amount of neutral answers. Our unit of analysis is the software company, but we are not able to estimate the representativeness of our population due to the unavailability of empirical data from each industry sector and each country. A different result might be observed with a different sample. However, the survey can be repeatedly conducted and new results can be synthesized with what is reported in this study. We note that it is seen as uncommon to have a survey on a narrow topic in software engineering with more than 100 valid responses [78]. Another limitation is that we have not included customers’ perceptions in our study. This may be applicable to other studies, and from that point of view, their conclusions are also limited to the production side.
Conclusion validity is concerned with obstacles to drawing correct conclusions from a study [75]. Although we did not conduct random sampling, we tried our best to diversify the respondents regarding their geographical locations (from 29 countries), industrial sectors (more than 18 sectors), company types (software startups and established companies), and team size. However, one-quarter of the answers obtained came from Brazil, which might have influenced the result since cultural differences are relevant. We suggest the conduct of future studies that assess the impact of such differences.
Conclusion
Given that at least some of the shifts in working patterns and locations seem likely to stay in the medium to long term, it is helpful to take stock of how the pandemic and its impact have affected software engineering both in established companies and software startups. We have gathered evidence to show that many software engineering activities have been relatively unaffected since most survey participants reported no impact and no change. Requirements gathering in startups was the only activity in which a trend for a negative impact was identified, perhaps because they rely more on direct and fast feedback. For established companies, a tendency for a positive impact was detected in seven activities, mainly related to software architecture and software quality. That could be significant evidence to be considered when deciding to keep remote work in these companies with the end of restrictions for working. Regarding changes in the time effort, even if there is a trend for all activities that reveal an increase in the amount of time spent, we could not find any correlation that associates this increase with a positive or negative impact.
This paper adds to the body of work regarding software development during the COVID-19 pandemic, presenting a perspective after one year with the restrictions when the companies had enough time to adjust to the new reality. This result may be important in future crises, due to health emergencies or otherwise. Further, as governments relax restrictions, the options should be assessed when deciding to keep some of the new working arrangements, return to previous patterns, or establish hybrid models of remote and office working.
The findings of this work brought important implications for the software engineering field. Related to the mapping of positive and negative impacts of remote work, the results might be used to understand how the activity changed to search for solutions for its improvement. Even in activities where remote work has brought a positive impact, lessons could be learned to understand how the practices are used to improve the work of co-located teams. Additionally, the differences found between startups and established companies highlight the importance of considering these contexts when studying software development practices. For instance, practices for requirements engineering in startup remote teams are pointed out by our study as an activity that requires more attention, in which the practices should be further developed.
For future work, we suggest studies to get a deeper understanding of the reasons for the positive and negative impacts on each software engineering activity. Even in the ones where we generally did not find a trend toward a positive or negative side, some participants reported a perceived impact. Understanding these factors that can influence changes in each software engineering activity will make it possible to document recommendations and bad practices, helping teams implement a development process suitable for this new scenario. As an additional suggestion, it would be interesting to study software startups that grew up and became established companies in the considered period, understanding the contextual role of the COVID-19 pandemic in this process.
Aiming at providing better support to remote software startups, a recent work-in-progress is the Startup Digi-dojo platform [79]. The goal of this work is to build a digital space that can support startup remote work as well as research on remote startups. To design a digital space for startups, the insights taken from this study would also be considered.
NOTE
Open Access provided by 'Libera Università di Bolzano' within the CRUI CARE Agreement