Introduction
The survey is probably the most used research method in software engineering [1]. Beyond academia, it is a popular method in industry and public sector research [2], [3], and is to be found in diverse disciplines from software engineering to medicine to sociology [3]. Despite its popularity, there is no unique definition for the term survey, with descriptions ranging from bounded characterizations emphasising quantitative aspects [2], to more broad descriptions, covering methods “for collecting information from or about people to describe, compare or explain their knowledge, attitudes and behavior […] directly, by asking people to answer questions, or indirectly, by reviewing written, visual, or records” [4]. The concept of a survey evades narrow definition in academia, with a clear consensus proving elusive.
A broader definition of survey allows the acknowledgment of related methods that originated in other disciplines that use surveys with a more qualitative orientation [5], [6], [7]. Qualitative surveys, that have been recently employed in software engineering (SE) studies, e.g., [8], [9], and [10] but are still very rare in software engineering as a whole. These studies relied on a proposal from Jansen [7] who describes a qualitative survey or “diversity survey” as a method searching “for the empirical diversity in the properties of members, even if these properties are expressed in numbers” rather than counting frequencies of categories as done in quantitative surveys.
However, this definition has not been adopted across the broad spectrum of SE research. The term has been used to refer to differentiate data generated by humans or technical artifacts [11] or simply the fact of having open-ended questions in a questionnaire [12]. This issue is probably, at least in part, due to the lack of guidance on how to use the method [13], [14], [15], [16]. Even though the method has been repeatedly used in most domains, we could only identify one entry in the ACM SIGSOFT Empirical Standards [17], describing the method as based on interviews with open-ended questions in which researchers employ some kind of qualitative analysis technique. The relevance of interviews is such that the term “interview studies” is given as an alternative name for the method. The Standards also provide specific entries for questionnaire surveys and case surveys. This definition is inconsistent with Jansen's and with how other SE studies, such as those above-mentioned above employ the term.
Hence, the over-arching aim of this paper is to develop a better understanding of the use of qualitative surveys in SE and to identify a set of principles to guide the design of these surveys in the specific context of SE research. To achieve this goal, we performed a systematic mapping study (SMS) of SE literature, systematically analyzing 66 primary studies, to answer the following three questions:
RQ1 -
What qualitative surveys been conducted?
RQ2 -
How have qualitative surveys been implemented?
RQ3 -
How are the terms perceived, and how does it influence their use?
Our first step, represented by RQ1, is to understand the extent of qualitative survey use in SE as well as the diverse contexts to which they have been applied. To answer RQ2, we describe how researchers have operationalized the method regarding common steps of a study, i.e., sampling, data collection, and analysis. Finally, to address RQ3, we aim to investigate how researchers have understood the terms “qualitative” and “survey”, and if differences on this understanding have influenced how the method has been implemented.
While we show that the popularity of the method is increasing, there is a lack of consensus as to what constitutes a qualitative survey. This dissonance hinders the analysis of studies by readers and reviewers, and the combination of results from different inquiries to build a better understanding of SE phenomena. Despite this dissonance, we observed an increase in the use of the qualitative survey as a method for investigating the diversity of the characteristics in a population. In this paper, we support this stance when providing our definition for qualitative survey in SE, arguing why it could be beneficial for our research domain. (See Section III)
The papers that aimed at some form of identification of characteristic diversity came from a wide variety of the SE discipline and with very different purposes, ranging from agile methods [18] to micro-frontends [19], from cognitive biases [20] to smart contract development [21]. These studies focus on, for example, detecting challenges or benefits for adoption and proper use of practices and techniques [19], [21], [22], finding success factors for their adoption and use [18], or identifying the existent phenomena of a determined type [20]. These descriptive studies [23], contrasting with solution papers common in SE research [24], are usually concerned with the state-of-the-practice, i.e., how things are done in practice. Describing how practitioners perform SE activities has been investigated for many years (e.g. [25]) and is still in use to day up to the latest trends and developments emerging in the field, e.g., AI ethics [26]. To do so, researchers have employed a wide range of methods, including surveys [26], interviews [8], [27], case studies [28], mixed-methods studies [29], systematic [30] or multi-vocal literature reviews [31]. This lack of a common rational hinders the comparison and evaluation of the studies both by readers and reviewers. A formalization of a method to identify these diversities would just facilitate the evaluation but also the design, execution, and report of these studies, especially by novice researchers. Hence, we provide a succinct but clear definition of qualitative surveys for the SE domain and, based on that, we present a set of guidelines for qualitative surveys in SE research.
The remaining of this paper is organized as follows: Section II reviews the SE literature on methodological aspects of surveys. Section III discusses a definition for qualitative surveys, comparing them to their quantitative counterpart. Section IV presents the systematic mapping study on the use of qualitative surveys in SE. Section V describes the study results that are discussed in Section VI. Section VII presents guidelines for qualitative surveys presenting examples from the identified studies. Finally, Section VIII concludes the paper.
Background and Related Work
To provide some background context for the reader, we provide a overview of some key research concepts that we use in this paper, mostly drawn from Creswell and Creswell's seminal book on research frameworks [32].
Creswell and Creswell lay out a framework for contemplating the philosophical presuppositions that inform the selection of a research approach, including aspects such as ontology, epistemology, axiology, rhetoric, and methodology. They underscore the critical role of harmonization between these philosophical presuppositions, the research inquiries, and the research design and procedures. Thus, the organization and implementation of a study necessitate a dynamic interplay among research approaches, research designs, and research methods.
The broader philosophical perspective that informs a researcher's understanding and assumptions regarding the nature of reality and knowledge, and how to attain this knowledge is what we term a research approach, of which research methodology is a synonym in Creswell and Creswell's framework. Readers from the software engineering discipline may be more familiar with the related concepts of worldviews [33] and research paradigms [34], [35]. Approaches are characterized by the ontology, i.e., what is considered the nature of reality; epistemology, i.e., what is the relationship between researcher and the subject of study; and methodology, i.e., what is the process of research. These approaches provide the framework, or strategies, for conducting a research study. Although SE researchers generally do not explicitly state the paradigm they are following, even when human and social aspects are involved, it is possible to observe a predominance of pragmatism and a strong influence of postpositivism [36]. Postpositivism is, from an ontological perspective, based on critical realism, i.e., the belief that there is a single reality which, however, may not discernible by limited human cognition. Based on this perspective, true objectivity is not reachable but, from an ontological perspective, should be the goal nonetheless. Consequently, a commonly used method is experimentation, which relies on adequately controlling relevant factors. As the counterpoint, constructivism is a paradigm based on the belief that there is not a single reality, but multiple ones built on the experience and interpretation of the actors involved in the phenomenon. This position is called relativism. According to this stance, knowledge should be built through the interaction of researchers and subjects, generally by employing qualitative methods. Finally, pragmatism is focused on action, arguing that action should be the focus of research and the main source of knowledge creation (epistemology). Ontologicaly, they believe in a single reality but accept that some elements are subjectively created. Researchers following this paradigm are not restricted on methods and use those techniques most suitable for the problem at hand.
Once a research question or questions are formed, the next step is to design the research. A research design serves as a strategic plan that oversees the empirical examination of a research inquiry. The research design contains a comprehensive plan detailing the specific procedures for data gathering, analysis, and interpretation, guided by research questions, thus ensuring methodological precision and logical consistency within and between all parts of the study. Decisions belonging to the research design level are about: method, aim, boundary, setting, timing, outcome, and ambition [37]. Examples of research designs are exploratory, descriptive, correlational and experimental.
Within a particular research design lie research methods. A research method relates to the specific techniques, procedures, and instruments employed for data gathering, analysis, and interpretation. The choice of method hinges on its appropriateness for the research inquiry and design, as well as its compatibility with the researcher's ontological and epistemological assumptions. Research methods focus on how questions are used to delineate precise steps for data collection, analysis, interpretation, and validation. Data collection and data analysis methods belong to this layer. Examples of research methods include controlled experiments, surveys, correlation studies, interviews, focus groups.
The interpretation and use of these concepts often overlap and distinct boundaries that delineate them are often hard to find. Research methods are not exclusive to a particular research design or research approach.
A. Surveys in Software Engineering Research
Within this framework, surveys are regarded as one type of research method that do not involve the control or manipulation of independent variables [37]. Surveys can be utilized in various research approaches and designs. As research methods, surveys encompass data collection and analysis procedures, which contribute to the potential confusion. The data collection phase of a survey presents two primary options: interviews and questionnaires [32], [37], [38]. The second source of confusion arises from the interchangeable use of “survey” and “questionnaire” in the English language [39], where survey is used as a synonym for questionnaire.
The ubiquity of surveys has led to a plethora of guidelines about the method, including articles [40], book chapters [3], and even whole books [2], [4], [41]. However, a quick examination of these texts reveals slight but decisive differences across various definitions. On one hand, Groves et al. [2] define a survey as “a systematic method for gathering information from (a sample of) entities for the purposes of constructing quantitative descriptors of the attributes of the larger population of which the entities are members.” They justify the parentheses on “a sample of” because, sometimes, “it is about the whole population.” They define quantitative descriptors, or statistics, as “quantitative summaries of observations on a set of elements,” and classify them as “descriptive,” when describing the size and distributions of attributes of the populations, and “analytic,” when “measuring how two or more variables are related.” On the other hand, Fink [4, pp. 1] defines it as “a system for collecting information from or about people to describe, compare or explain their knowledge, attitudes and behavior,” adding that “surveyors can collect information directly, by asking people to answer questions, or indirectly, by reviewing written, oral, and visual records of people's thoughts and actions.” This diversity can be partially explained by the semantic issue of multiple meanings for the word “survey” in English [2], but it was also fostered by the emergence of mixed methods [32], supporting the combination of quantitative and qualitative techniques.
In SE literature, the survey research method has often been scrutinised in methodological papers, starting with a notable series of articles by Kitchenham and Pfleeger [40], [42], [43], [44], [45]. Since then, several other authors focused on guidelines for surveys in SE research. In a review of these guidelines, Molleri et al. [46] analyzed 15 articles, including technical reports by Kasunic [47] and Linåker et al. [48]. In a follow-up paper [49], Molleri et al. consolidated 12 methodological papers on surveys and proposed a checklist of 38 items, improved based on the feedback of survey designers. The checklist is divided according to ten steps of the research process: research objectives, study plan, identify population, sampling plan, instrument design, instrument validation, participant recruitment, response management, data analysis, and reporting. More recent methodological guidelines have pointed towards more inclusive definitions of surveys, e.g., [50], [51]. This decision is probably influenced by the increasing interest on human and social aspects of SE [52], [53]. The ACM SIGSOFT Empirical Standards [17] have a dedicated entry for questionnaire surveys describing it as “a study in which a sample of respondents answers a series of (mostly structured) questions, typically through a computerized or paper form.” The entry lists as essential attributes of the method the identification of the target population and definition of a sampling strategy, the instrument creation, and analysis of response rates. The text acknowledges three possible variations: descriptive surveys, to describe the properties of a phenomenon or a population; exploratory surveys, to generate insights or hypotheses; and confirmatory surveys, to test causal propositions. It is notable that the entry has the following footnote: “There is currently no standard for predominantly open-ended questionnaire surveys”, pointing to the fact the Standards are still subject to changes, in particular, to specific types of surveys.
Aside from questionnaire surveys, other types of surveys have been used in SE research: case surveys [54], [55] and qualitative surveys [8], [9], [10]. However, methodological or meta-research papers focused on these other types of surveys are limited and, sometimes, inconsistent and even contradictory. As far as we are aware, a chapter by Petersen [56] is the only methodological paper in SE that explicitly focuses on case surveys. The author presents guidelines as proposed by Larsson [6], who defines case survey as “a method of identifying and statistically testing patterns across studies” by selecting existing case studies relevant to research questions, systematically converting the qualitative description of the cases into quantified variables, and statistically analyzing the coded data. Petersen proposes a modification to the original research method to also include primary data. That is, instead of reviewing only previously published papers, researchers could extend the cases analyzed by collecting data from other cases. As an example, the author refers to a previous paper [55] where he and others analyzed 22 case studies regarding the component origin of software systems. On the other hand, Melegati and Wang [15] identified 12 studies published in SE literature that claimed to use the case survey method. The authors compared the research executed with the methodological guidelines stated byLarsson [6], and concluded that these studies, while strong in many ways, do not adhere to many aspects original research method. Regarding qualitative surveys, even though the method has been employed in SE research [8], [10], methodological guidelines are scarce. An early reference to a method with this name comes from a paper from 1996 by Kitchenham about evaluation methods [57]. She describes it as “a feature-based evaluation done by people who have experience of using or have studied the methods/tools of interest […] using the standard Feature Analysis approach but organizing the evaluation as a survey.” The next methodological reference comes from the ACM SIGSOFT Empirical Standards [17], more than twenty years later. The text defines the method as “research comprising semi-structured or open-ended interviews” and gives “interview studies” as an alternative name for the method. It lists as essential attributes the description of interviewees’ selection, description of the interviewer and interviewee, and “a clear chain of evidence from interviewee quotations to findings.” The entry also stresses that to generalize the results is not necessarily a goal of this type of study. It is interesting to note how divergent the definitions are. While Kitchenham's characterization refers to a questionnaire applied in a context of tool evaluation, the Standards stress differences on data collection methods and make no reference to evaluation.
From a conceptual perspective, Stol and Fitzgerald [58] considered surveys as an example of sample studies according to their ABC framework. In this taxonomy of research strategies, adapted from social sciences, the authors use two dimensions to classify research strategies: obtrusiveness and generalizability. Obtrusiveness concerns the extent to which the researcher alters the research setting. It contrasts unobtrusive ways, such as simply observation, to more obtrusive ones, such as variables’ manipulation as, for instance, in controlled experiments. Generalizability regards the extent to which the results apply to other contexts beyond the specific context of the study. According to the authors, it is not possible for a research strategy to achieve maximum potential on all the three aspects of generalizability over actors (A), precision of measurement of actors’ behavior (B), and realism of context (C). In the ABC framework, sample studies, one of the research strategies described, aim “to achieve generalizability over a certain population of actors, whether these are software professionals, software systems, or artifacts of the development process” in detriment of precise measurement, since “the researcher does not manipulate any variables during data collection,” and a realistic context. The authors also mention software repository mining as another typical method in this strategy. From this perspective, interview studies could, in some cases, not fit in the sample studies category, as quantitative surveys, since the employment of interviews would probably increase the precision of measurement and realism of the context at the expense of generalization. In other words, there could be interview studies that do not aim to increase or achieve generalizability. Actually, according to the authors, interviews and case studies, other commonly qualitative studies, are classified into field studies, i.e., “any research conducted in a specific, real-world setting to study a specific software engineering phenomenon” in which “a researcher does not actively control or change any parameters or variables.” In this category, studies obtain the maximum potential for realism of the context (C) with the expense of generabilizability and precision of behaviour measurement.
The scarcity of methodological guidelines regarding the design of the qualitative survey is not restricted to SE research. To the best of our knowledge, the only text exclusively dedicated to the method was published by Jansen [7]. The author suggests to use the term qualitative, or diversity, for a survey that “[…] does not count the frequencies of categories(/values), but searches for the empirical diversity in the properties of members, even if these properties are expressed in numbers.” The author makes this suggestion based on the fact that, although some studies have used the term, it rarely appears in textbooks on general social research methodology or on qualitative research methods. An exception is Fink [4] who describes qualitative surveys as those that “collect information on the meanings that people attach to their experiences and on the ways they express themselves” [4, p. 61]. Although Jansen [7] argues that Fink “does not specify the logic of qualitative design as a design,” the latter gives as a reason for these surveys, among others, to “add depth, meaning, and detail to statistical findings” as a way “to supplement traditional surveys or guide their development.”
To date, several studies have focused on identifying the diversity of particular aspect of SE, however using a wide range of research methods. For example, Chow and Cao [18] performed a survey, collecting data from 109 projects, and identified 12 possible critical success factors in agile software projects. Peltonen et al. [19] performed a multi-vocal literature review to identify motivations, benefits, and issues for adopting micro-frontends. Mohanini et al. [20] conducted a systematic mapping study to determine cognitive biases in SE processes. Zou et al. [21] performed a study combining interviews and a survey to identify challenges and opportunities in smart contract development. Fabijan et al. [28] conducted a multiple-case study to identify different types of software features and to recommend how to prioritize development activities. Since maximising diversity is generally a problem that prioritizes generalization, based on the ABC framework, then samples studies such as surveys are the most suitable. However, there is no discussion regarding how to guarantee, or at least maximise the inclusion of all possible values. In this regard, a dedicated method such as the qualitative survey would provide a reference for researchers to conduct these types of studies, and for reviewers and editors to assess them.
The goal of identifying empirical diversity of properties is related to developing taxonomies for element classification. Taxonomies are schemes of classification allowing the description of terms and their relationships in the context of a knowledge area [59]. Some methodological papers in SE literature [59], [60] have discussed how to properly build taxonomies. Ralph [60] describes five ways to achieve this task, namely secondary studies, interpretive case studies, grounded theories, single-source primary studies, and personal experience. According to Ralph, the best options are secondary studies, interpretive case studies or grounded theories, to avoid possible biases from software developers, experts, and the researchers involved.
B. Qualitative Research in Software Engineering
The increased interest in human aspects of SE has led to a significant increase in the use of qualitative research methods in the field [53]. In an early paper on these methods for SE research, Seaman [61] defines qualitative data as “data represented as words and pictures, not numbers” and the author stresses that qualitative research methods were designed “to study the complexities of human behavior”. Even though the publication of qualitative studies in software engineering has increased [53], these research methods have been subject to criticism [13], [58], [62], regarding aspects such as lack of generalizability, lack of control and research bias.
In this regard, methodological papers and standards could help researchers not only to design their studies but also for reviewers and readers to have a reference for analyzing and assessing these studies. By following defined guidelines, researchers reduce biases in the study and could act to increase control and generalizability, if this aspect is relevant for the study. In particular for qualitative surveys, there is a need to describe approaches to improve the coverage of the possible characteristics and, consequently, increasing the generalizability of results.
A Comparison Between Qualitative and Quantitative Surveys
Even without consensus on a definition, methodological papers agree on commonalities across different survey methods. First, they share a goal of deriving knowledge about a population based on a sample. Second, they share somewhat similar steps although the naming conventions may vary to some degree: defining knowledge aims, sampling, data collection, and analysis [7]. Based on this comparison, we can take the term “survey” as a family of research methods with the goal of deriving knowledge about particular characteristics, or the correlation between them, of a population based on a sample. Then, following Jansen [7], there are two types of surveys: quantitative, or statistical, and qualitative, or diversity. What defines the survey type is its goal. Quantitative surveys focus on quantitative descriptors, or statistics [2], either descriptive, describing the frequency of various attributes in the population, or analytic, regarding how two or more variables are related in that population. On the other hand, qualitative (or diversity) surveys focus on identifying the possible values of defined characteristics in a population. Although these methods employ a similar set of steps, i.e., sampling, data collection, and analysis, how these steps are performed varies between the two methods. To illustrate this idea, we employed the Unified Modeling Language (UML) [63], to describe a metaphor taking survey as an abstract class with two child concrete classes composed of abstract classes, such data collection and analysis methods, as described in Fig. 1. Below, we describe how the steps may vary according to the type of survey.
The family of survey methods. The lists of data collection and analysis methods are not exhaustive.
Sampling. Researchers employing quantitative surveys should argue why the results obtained from that sample generalize to the targeted population. On the other hand, qualitative surveys, do not aim to identify the distribution of characteristic values but rather aim to find most, if not all, the possible values of that characteristic in the population. In this case, the sampling should not follow specific distributions but rather encompass all the possible values for the characteristics in question. In this case, researchers must argue how the selected sample fulfills this requirement.
Data collection. Any survey must have at least one data collection method: a systematic pre-defined process to extract data from the sample. There are several possibilities available: questionnaires, either with close-ended, open-ended questions or a mix of both, interviews, artifact collection, secondary data (obtained from published articles). Although the most common data collection approach for a quantitative survey is a questionnaire, it is also possible to perform interviews or collection of artifacts, such as documentation or tool use logs [64]. Quantitative surveys could even employ qualitative data if a coding process to convert the qualitative into quantitative data is in place. Although the opposite is rarer, i.e., to use primarily quantitative data to a qualitative survey, this option is still possible [7]. A qualitative survey could investigate the diversity of a numerical property in the population. Generally, qualitative surveys employ interviews but also questionnaires with open-ended questions or even artifact collection, which may be software artifacts or even the literature itself.
Data analysis. After collecting data, researchers must analyze it. Quantitative surveys usually employ statistical analysis, either descriptive statistics, such as histograms, or inferential statistics. When qualitative data is collected, a coding step is needed to derive quantitative variables to be statistically analyzed, e.g., case surveys employ a coding scheme. This concept could be applied in any research design for a quantitative survey employing qualitative data. In qualitative surveys, the researchers should obtain the possible values for the identified characteristics. Therefore the analysis procedure will depend on the data type. In case of quantitative data, a simple clustering or ordering technique will probably suffice. If data is qualitative, researchers must employ an analysis technique to identify patterns that could be clustered. An example is thematic analysis [65], [66], “a method for identifying, analyzing, and reporting patterns (themes) within data” [66].
Based on this comparison, we now present the definition of qualitative surveys as proposed by Jansen [7] that we suggest to be used in the SE domain:
Qualitative surveys are research methods aiming to identify the diversity of characteristic values, rather than measuring their distribution, in a targeted population.
The Current Usage of Qualitative Surveys in Software Engineering Research
In our literature review, we did not find any methodological papers on qualitative surveys in SE. This is interesting when we consider the dominance of research and guidance on quantitative survey design in the SE literature. To fulfill this gap and answer our research question regarding how the term qualitative survey has been used in SE research, we performed a systematic mapping study (SMS) [67]. This method has been used to identify and critically review how research methods have been employed in SE studies, including SMSs [67] and case surveys [15].
Petersen [67] describe four steps for an SMS: identification of the need for the mapping study, selection of studies to be analyzed, data extraction, and study validation. Earlier in this section we demonstrated the need for the mapping study- to understand how SE researchers have been using the term “qualitative survey”.
A. Selection of Papers
The next step is to select studies to be analyzed. First, to define a query string to search for these studies, we employed PICO (Population, Intervention, Comparison, and Outcomes). As done by previous studies on methodological aspects [15], [67], the query string is restricted to the population that, for this study, is SE studies using the qualitative survey term. To avoid missing any relevant studies, we decided to use the term “software” instead of “software engineering.” For the term “qualitative survey,” we also used the form with a hyphen. Hence, we employed the query string: (“qualitative survey” OR “qualitative-survey”) AND “software”.
We performed the search in September 2021 on the following databases: ACM Digital Library, IEEE Xplore, Scopus and Web of Science, following the advice of considering IEEE and ACM besides two indexing databases [67], [68]. Table I presents the number of results returned by the query on each of the databases and the total excluding duplicates. After excluding duplicates, the search resulted in 1105 papers. We collected all the results and saved them in a spreadsheet. Then, we considered the following inclusion and exclusion criteria: i) the paper should be written in English; ii) the document should be a research paper, excluding, for example, messages from chair or table of contents; iii) the study should investigate some aspect of SE; and iv) the text should use the term “qualitative survey” or similar, such as “qualitative-based survey.”
To apply the inclusion and exclusion criteria, we increased the detail of each paper aiming at each to identify the criteria were met. Papers that clearly did not meet the criteria were excluded and not considered in the next step. In case of doubt, we followed a conservative approach and the paper was considered in the next step. First, we analyzed the title of the papers selecting 351 documents to further analysis. Then, the next step consisted of analyzing the abstracts. After this step, we selected 136 papers for which we downloaded the full-text. With a preliminary analysis of the papers’ full-text, we selected 76 papers.
B. Data Extraction and Analysis
In the extraction phase, we first collected information on the title, the publication year, and the venue where each paper was published in order to identify tendencies of the method usage in the time and also in different research communities. Regarding this last item, we also identified the topic of the paper. Then, we extracted the research design details of the studies including the overall method, sampling, data collection approach, and qualitative data analysis practices. Finally, to assess how the term “qualitative survey” was perceived and used, we collected how the terms “qualitative” and “survey” have been used and extracted methodological references used in the texts. To perform this phase, we used a spreadsheet but also NVivo 121 to keep track of excerpts supporting our classification. In this step, we removed further ten papers for which a deeper analysis revealed that the paper should not be included or if there was not enough details to allow data extraction. Hence, our final set consisted of 66 papers. Fig. 2 summarizes this process.
C. Supplemental Package
We provided a supplemental package [69] containing the lists of identified and analyzed papers including how each were classified.
D. Threats to Validity
Following Petersen et al. [67], to analyze potential threats to validity to the SMS, we followed a scheme consisting of descriptive validity, theoretical validity, generalizability, interpretive validity, and repeatability.
Descriptive validity regards to which extent observations are described accurately and objectively. A common threat to SMSs in this regard are poorly designed data extraction forms [67]. To mitigate this threat, we carefully designed an extraction form. The form represented a systematic list of elements to be analyzed and extracted from the analyzed studies. It was also used as a means to discuss, and if needed, to change the elements to be extracted. The process of data extraction was also performed iteratively to allow researchers to better capture the nuances of the different studies and to facilitate the grouping of similar studies.
Theoretical validity concerns the researchers’ ability to capture what was intended. Common issues in this regard are the quality of the sample obtained, publication bias, and researcher bias. To mitigate the first two threats, we followed a defined approach. Besides that, we adopted a conservative exclusion approach, so where there was a doubt about a publication, the document was selected to go further in the filtering. This approach led to more work but reduced the probability of excluding a relevant paper. In addition, we considered the databases generally employed in SMSs in SE and we did not enforce any filter of publication venues. To mitigate research bias, as we mentioned before, we followed a standard approach for data collection and analysis.
Generalizability regards to which extent the results are valid outside the studied sample. Our SMS aimed to collect all published studies in SE which employed the term “qualitative survey”. We did not capture studies that, although following a similar method, did not employ the term, since these studies do not follow specific methods making their selection an extremely labor-intensive task. Despite this limitation, the sample we employed allowed us to identify issues with the use of the term.
Interpretive validity concerns the conclusions drawn by the study based on the data. Also in this aspect, research bias is a threat. To mitigate this issue, we discussed among all the authors the conclusions drawn to ensure a consensus, and identify and resolve and inconsistent interpretations.
Repeatability regards the possibility of other researchers repeating the study. To ensure repeatability we described in detail the process followed. We also produced a supplemental package containing all the papers analyzed.
Results
A. RQ1 - What Qualitative Surveys Have Been Conducted?
First of all, we describe the identified papers. Regarding the publication year, we observe that the earliest papers are from 1998, but there was an increase in the term usage in 2013, and then another in 2018, with a peak of 13 papers in 2020.
Then, we inspected the venues in which the studies appear to understand if the term is concentrated in specific communities and topics. We identified 41 venues but the majority of venues (27) have only one qualitative survey paper. The venues with many relevant papers are Euromicro Software Engineering and Advanced Applications (SEAA) with five papers and Foundations of Software Engineering (ESEC/FSE) with four papers. With three papers each, there are the journals IEEE Transactions on Software Engineering and Information and Software Technology, and the conferences International Conference on Software Engineering (ICSE), International Symposium on Empirical Software Engineering and Measurement (ESEM), International Conference on Product-Focused Software Process Improvement, and International Conference on Engineering, Technology, and Innovation (ICE). By extracting the topics covered by the papers, we again found diversity, identifying 15 different topics. The most common topics were software maintenance (12 papers) and software testing (8). Software engineering education and training, and requirements engineering were discussed in six papers each. Open source software was the focus of five papers. Other areas were identified in four papers each, namely: agile development, experimentation, human aspects, and development processes.
B. RQ2 - How Have Qualitative Surveys Been Implemented?
Regarding this research question, the first step was to identify if a qualitative survey was used as the main research method of the study or as a small part of a larger study involving another method. 29 papers use qualitative survey as the overall research method. This use had a peak in 2020 as shown in Fig. 3. In the remaining studies, qualitative survey was used as part of a larger study. The most common situation, in 12 studies, was when the qualitative survey was part of a mixed-methods study. For example, Graziotin et al. [70] performed a study on the unhappiness of software developers. First, they investigated the distribution of unhappiness among software developers by employing a quantitative approach. Then, to explore possible reasons for that, they employed a qualitative survey. In two other papers, researchers performed artifact analysis in addition to the qualitative survey. For instance, Hata et al. [12] studied bug bounty programs and also through a survey with contributors analyzed the histories of 82 bug bounty programs. Other combinations, present in one study each, were artifact analysis and qualitative survey, artifact analysis, survey and interviews, and literature review and a qualitative survey. Then, another common use of qualitative surveys is as part of an empirical evaluation of a proposed tool or technique. This approach was identified in 13 papers. Qualitative surveys were also employed in other studies that had other overall research methods such as case study (six papers), experiments (three papers), and action research (one paper).
Number of papers per publication year employing qualitative survey grouped by the claimed overall research method. 2021* only includes data to end of September.
We analyzed the implementation of qualitative surveys, in terms of sampling, data collection, and analysis. The sampling methods employed were mainly non-probabilistic in nature. In 27 studies, researchers employed purposive sampling, i.e., selecting items they regarded most useful for the study [71]. Two other studies employed theoretical sampling, i.e., selecting subjects based on an emerging theory [71]. In 23 papers, researchers used convenience sampling, i.e., using available and accessible items. A smaller fraction of studies employed probability sampling. In four studies, researchers employed random sampling [71]. In three studies, researchers selected the whole sampling frame. Finally, in eight papers, the information provided did not allow the sampling method to be identified.
Regarding data collection the most common method was the self-administered questionnaire (37 papers). In 25 studies, data collection consisted of interviews. In two studies, artifacts related to software development were collected. Melegati et al. [10] used postmortems, referring to “texts available online reporting the failure of a startup,” to identify why these companies did not employ experimentation more often. Wong and Hong [72] analyzed popular mashups, websites created based on data from others, from two directories to identify combination patterns and their suitability. Finally, two other papers employed focus groups. For example, Runeson and Olsson [73] employed focus groups to identify challenges and opportunities in open data collaboration. With the exception of the studies performing artifact collection, all studies had questions prepared either in questionnaires, guides to interviews or focus groups. Open-ended questions were most common (24 studies), along with a mix of open-ended and closed questions (24 studies). In another three studies, there was only a single open-ended question. In three studies, there were only multiple-choice questions. For nine studies, we were not able to determine the nature of the questions.
When it comes to qualitative data analysis, most studies (26) did not describe the process employed in detail. Ten studies used thematic analysis, e.g., to identify causes and impacts of increase work-rates of developers [74], to analyze episodic volunteering in free/libre open source software communities [9], or to explore causes and mitigation strategies for researcher biases [75]. In a study on challenges of managing requirements interdependencies in agile development, Nurdiani et al. [76] mention “open coding” referring to the first step in grounded theory, in which “data are broken down analytically” [77]. Besides that, a paper on happiness of software developers [70] and another on the combination of agile development and software product lines [78], mentioned axial and selective coding in addition to open coding, i.e., the analysis techniques of Straussian Grounded Theory [58]. Axial coding refers to the process in which “categories are related to their subcategories, and their relationships tested against data” [77]. Finally, in selective coding, the identified categories are unified around a “core” category [77]. Overall, these methods are similar in that they generally guide authors to read data and look for emerging concepts, and to continuously analyze the excerpts to evaluate these concepts, i.e., constant comparison. In seven studies, researchers used the term “content analysis” to describe the qualitative analysis method: “a systematic way of categorizing and coding studies under broad thematic headings by using extraction tools designed to aid reproducibility” [79]. Similar to thematic analysis, the method was used “to identify patterns in answers” as mentioned in a study on ERP customization [80]. In another six studies, authors claimed to have used “qualitative coding” but still again associated with identification of patterns as challenges and opportunities in open data collaboration [73]. Five studies used the term “qualitative analysis.” Two studies employed card sorting, a knowledge-elicitation technique consisting of asking participants to sort items into meaningful groups which results range from counting how often items were grouped together to cluster analysis [81]. Other techniques used were clustering and frequency analysis. Finally, three studies did not have qualitative data, despite using the term qualitative survey.
C. RQ3 - How Are the Terms Perceived, and How Does It Influence Their Use?
Finally, we tried to capture how authors perceived the terms “qualitative” and “survey”. Regarding “survey”, although most of the studies (34) use it to describe a comprehensive research method, an almost equal share (32 studies), use the term as a synonym for “questionnaire”. Regarding “qualitative”, the most common interpretation (33 studies), was to explain its use is the nature of questions, i.e., when all or some items in a questionnaire are open, authors claim to have, respectively, a qualitative, or quantitative and qualitative, survey. In 20 papers the authors used the term as suggested by Jansen [7] to identify the diversity of characteristics of the population. In ten studies, the authors used the term to differentiate the results from data obtained from technical aspects, like software execution or maintenance activities. For example, in a study on the influence of automatically generated unit tests on software maintenance, Shamshiri et al. [11] performed a controlled experiment with 75 participants. Besides collecting data from the activities performed, the authors also collected “qualitative survey responses.” Finally, in three papers, the term “qualitative” was used to refer to an exploratory study.
Then, we compared how sampling (Table II), data collection (Table III) and analysis (Table IV) were performed depending on how the term “qualitative” is perceived. We also analyzed how these different connotations are used depending on the overall research method employed in the study (Table V). When comparing the two most commons connotations, qualitative as the nature of the questions or as an identification of the diversity, there was no meaningful difference in sampling practices. For data collection, when qualitative referred to the nature of the questions, researchers relied more often on self-administered questionnaires (21 studies or 64%) against 30% of diversity surveys, which, in turn, relied more often (13 studies or 65%) on interviews against 10 (30%). Regarding data analysis, researchers performing diversity surveys were more clear when describing the approach, with only 5 (25%) studies not giving sufficient details. For studies using qualitative as the nature of the questions, 15 of them (or 45%) did not disclose their procedures. We also compared the difference in the overall research methods, qualitative survey focusing on diversity were the overall method on 15 (75%) of those studies that referred to this connotation against only 10 (30%) of the studies that used qualitative to refer to the questions.
Regarding methodological references, the most commonly cited is Jansen [7] (11 studies). It is important to note that all these papers that cited Jansen's work adhered to the proposed definition of qualitative survey. Then, six papers cite Fink [4], a book on the survey method. Three other papers cite Corbin and Strauss [82], a book proposing the Straussian flavor of Grounded Theory. Other methodological sources cited include Creswell and Creswell [32].
Discussion
The findings indicate that qualitative surveys have been utilized in SE research since the 1990s. The frequency of their implementation has remained consistent, with an average of one study per year over a span of more than two decades. However, we observed an increase starting with 2013, suggesting a promising trend of further adoption in the coming years. However, as observed for other research methods [58], [83], its use has not been homogeneous. Of course, this issue could be a consequence of a coincidence of adding the term “qualitative” to survey without the awareness of the method. We still observed the use of the term “survey” as a synonym of “questionnaire,” i.e., not taking it as a comprehensive research method but as a data collection technique instead, even though this issue has already been discussed in the SE literature [1].
Despite the lack of consensus regarding the term, we observed an increased adoption of Jansen's proposal of using the qualitative survey to identify the diversity of the characteristics in a population. This effect can be observed by the increase of “qualitative survey” as the overall research method in published papers and citations to Jansen's paper [7]. By comparing how the different steps of the research method have been performed depending on the way researchers use the term “qualitative,” we can also observe that when performing a kind of diversity survey, researchers provide a better description of the data analysis, generally using an approach to group the answers that are more often obtained through interviews. Meanwhile, researchers who used the term to describe the nature of questions are often looser, and did not describe how the analysis was performed. In this case, the qualitative data is usually used to illustrate or better explain the results obtained by another research method.
Based on our results, we have some suggestions, only some of which have already been discussed in the SE literature. First of all, survey is a comprehensive research method and not simply a synonym of a questionnaire, which, in turn, is a data collection method and could be used as a tool for other research methods, such as case study and experiments. In this regard, surveys are not restricted to questionnaires either. They can rely on interviews or even artifact collection and analysis.
A. Practical Recommendations
Based on the consolidated of papers, we recommend that researchers deeply reflect on what they consider to be a qualitative survey and how they intend to use the term. We tentatively suggest that our definition from Section III provides some basis for use, in that any qualitative survey should aim to identify the different values of specific characteristics in a population. This suggestion has several advantages. First, its use will be consistent with the research method that is being used in other disciplines, especially social sciences, which would facilitate the adaptation of guidelines and discussions from other fields to SE. In this regard, we observed that researchers in SE are starting to coalesce on its meaning. Second, it designates a research method which could bring interesting and valuable results to the field. In this regard, qualitative surveys can be employed in descriptive and exploratory studies, and a standard name and procedure facilitates its use, allowing the community to build a cumulative tradition around core concepts that allow the field to analyse qualitative surveys across SE and discern the strengths, weaknesses and lessons learned from each. The results of this study could be employed, not only by researchers, but also by practitioners to act in the software development process. For example, qualitative surveys on developers’ reactions to a particular tool or method could uncover the amplitude of feelings regarding these artifacts, which could be useful for tackling adoption or implementation issues. Without a clear intent to identify all possible values, specific situations may be missed. Practitioners could focus a study on technical aspects, such as software design or architecture solutions employed in a determined class of systems. These results could reveal potential problematic solutions or good patterns that could be reused in similar situations. Finally, it distinguishes the research method from quantitative surveys, avoiding confusion regarding what is expected from these two methods.
In this regard, we should discuss the other reasons used in the analyzed papers to name their research methods as “qualitative”. The most common was the use of open-ended questions in surveys to enrich the results obtained from closed questions. First, they do not fit the definition put forward by our paper because the applied research method is generally not specific enough to detect all the possible values. Instead it is used to illustrate and explain quantitative results. Second, as stressed by the ABC framework [58], these studies do not present depth to allow a proper realism of the context as in case studies.
Nevertheless, questionnaires consisting only of open-ended questions could also be the data collection method of a quantitative survey, given that a codebook is used to convert qualitative descriptions into quantified variables. This process is the basis of case surveys in which published cases are converted into quantitative data using a coding scheme [6]. “Qualitative” should not be a synonym for “exploratory” either. A common way for distinguishing the purposes of research is to divide them into exploratory, descriptive, explanatory, and improving [84]. This argument has already been discussed by Yin in his seminal book on the case study method [85]. The author stresses that case studies are based on analytical generalization rather than statistical generalization [85]. Another example from the SE field is the possibility of generalizability for grounded theory studies [16]. Another misconception we found in our analysis was to consider any data from human subjects as qualitative as counterpart of quantitative data from software artifacts. Quantitative and qualitative describe the nature of the data and not the source. Humans can provide quantitative data, for example, through Likert scales in questionnaires, or even when reporting the occurrence of phenomena, or quantity of entities; software artifacts can be analyzed as qualitative data, for example, by investigating coding patterns.
Guidelines for Qualitative Surveys
To define a qualitative survey as a method to identify the diversity of a characteristic, it is essential to present guidelines for this method, aimed for researchers who want to employ the method and for reviewers to evaluate its use. In doing so, it is also important to consider the nuances and idiosyncrasies of SE research. In this section, we tackle this issue, presenting guidelines for qualitative surveys and presenting examples from the identified studies. We ground these guidelines mainly on Jansen's work [7] but also considering the peculiarities of SE research. We also consider guidelines proposed for development of taxonomies for SE [59], [60], since qualitative surveys aim to obtain the diversity of characteristics of a population and the first goals of a taxonomy is to identify all the terms in a particular context.
Below, we outline the following steps: deciding the research paradigm, definition of study goals, planning, sampling, data collection and analysis, summarized in Table VI. A planning step is not explicitly stated by Jansen. However we wanted to stress the importance of a proper planning for a survey. Although we present the steps in a linear way, researchers could employ an iterative process. For instance, a qualitative survey could be planned with theoretical saturation, including several cycles of data collection and analysis until no new information is obtained. However, before describing the steps, we present a discussion on the different research paradigms usually applied in SE and how qualitative surveys could fit the different alternatives. This discussion is essential given the influence of the paradigm on the execution of any method [34], [36], [86].
A. Research Paradigm
Since qualitative surveys aim to identify the possible values of one or more characteristics from a population, they are implicitly endorsing a single reality. Thus, it is natural to relate the method with critical realism, i.e., postpositivism or pragmatism. However, as pointed out by Jansen [7], it is possible to employ the method in the context of constructivism or other paradigms. For example, a qualitative survey could analyze the diversity of constructions regarding social aspects of development teams. Finally, the results of qualitative surveys could be considered “useful” since they could be used by practitioners to guide their decisions, e.g., to choose tools, which indicates the suitability for SE research given the predominance of pragmatism in the field [36].
B. Definition of Study Goals
While defining objectives, researchers should decide the research questions (RQs) to be investigated and/or the hypotheses to be tested. Given the nature of qualitative surveys in identifying the different values of one or more characteristics, these studies will generally be carried out based on one or more RQs. These RQs will be a generalization or characterization type, according to the classification proposed by Shaw [23], or a description and classification type, as suggested by Easterbrook et al. [50]. Examples are “what do we mean by X?,” “what are the characteristics/properties of X?,” “what are the varieties/types of X?” How questions could also be answered using qualitative surveys, e.g., “how is X performed in practice?” or “how can we perform X?” Based on these goals, researchers should evaluate if the survey is a suitable approach, i.e., if the goal is to describe characteristics of a population based on a sample. It is important to stress that the population does not necessarily mean all existent individuals or phenomenon instances. Once researchers conclude that a survey is adequate for answering the RQs, they should determine which type of survey they should perform. Quantitative surveys are suitable for RQs about descriptive, i.e., frequencies of characteristics or analytical descriptors, such as causal inferences. In this case, we suggest the papers describing guidelines for this type of survey (see Section II). In addition, researchers could employ qualitative surveys to investigate RQs regarding the possible values of determined properties.
C. Planning
In this step, researchers define how they will perform the sampling, data collection and analysis. They should also consider potential threats to validity and design ways to mitigate them. These steps will be influenced by the data source employed in the study. Similar to what has been proposed for taxonomy creation [60], researchers can analyze studies already published in the literature. Researchers could also employ primary studies through interview or questionnaires. Although we agree that is harder to reduce biases in single-source studies as pointed out by Ralph [60], we believe that this could be achieved through proper data collection and analysis. Another possibility that was not contemplated by Ralph's paper was the use of practitioner-produced literature [89]. This type of document has been recently called “gray literature” [90]. However, acknowledging Kitchenham et al.'s criticism [89], we will use the term “practitioner-produced literature” to refer to these documents. Qualitative surveys can also combine multiple types of data sources as a way to improve the coverage of the characteristic's possible values.
A practical way to perform this and subsequent steps is to follow guidelines proposed for research methods based on the same types of data as those to be employed in the qualitative survey. There are several methodological papers on systematic mapping studies for SE, e.g., [67], that could be used as guidelines for identifying peer-reviewed literature. For collecting practitioner-produced literature, a suggestion is to follow the guidelines proposed by Garousi et al. [90]. For questionnaire-based surveys, as mentioned in Section II, there are several published guidelines and methodological papers. The chapter from Lethbridge et al. [64] could be a starting point for interview and observations.
D. Sampling
In this step, researchers select a group of items to study (the sample) from a group of items of interest (the population) [71]. The list of items is called the “sampling frame”. A key concept in sampling is representativeness, i.e., to which extent the properties of the sample resemble those of the population [71]. Sampling is generally divided in two types depending on the use of randomness: non-probability and probability [4]. Probability sampling means that each item has the same probability of being chosen [4] and an example is simple random, in which subjects are selected by chance. Non-probability strategies include those in which different subjects have different probability of being selected [71]. For instance, convenience sampling, when subjects are selected based on availability, or purposive sampling, in which items are selected according to some logic or strategy but not randomly, such as “as diverse as possible” [71].
In qualitative surveys, samples should contain most or all of the possible values of the characteristics under study. Thus, probability sampling (selecting items by chance) is not suitable, and non-probability, mainly purposive sampling, should be employed, as is usually the case for qualitative research [91, p. 27]. Researchers should design a sampling mechanism in a way that the sample fulfills this requirement. An option is to perform an iterative design that stops when theoretical saturation is reached, i.e., when new data does not bring new results. Barcomb et al. [9] identified criteria “related to the research objective and cover a wide range of differences” by considering community size (single/multiple project) and community orientation (vendor/volunteer), representing four combinations. They sampled 13 communities across the four combinations.
In SE studies, Baltes and Ralph [71] concluded that the most common sampling strategies are purposive and convenience, suggesting, as the authors state, “a generalizability crisis” in the field. The authors propose this is due to the influence of three factors: the difficulty of probability for some methodologies, e.g., experiments with human participants, the adoption of interpretivism or other philosophical stances, and the lack of good sampling frames for most of SE phenomena. We expect an influence of the first and the third factors in terms of qualitative surveys, but to a lesser degree when compared to other methods, like quantitative surveys, since there is no need to require the representativeness in a probability sense but only to argue that all, or at least most, of the characteristics under study are present in the sample.
E. Data Collection
During this step, researchers collect data from the items selected in the previous step but this procedure will highly depend on data source type. If data consists of peer-reviewed articles or other public artifacts, such as blog or social media posts, the collection simply consists of manually or automatically storing this data to further analysis. It is important, however, to keep copies of data especially for ephemeral content such as social media posts. For example, Melegati et al. [10] stored the blog posts they analyzed. If data, instead, should be collected from the sampled subjects, survey instruments, either questionnaires, interview guides, or artifact collection procedures should be created and tested. Nurdiani et al. [76] employed a web-based questionnaire to inquiry practitioners regarding practices and challenges of managing requirements interdependencies. In some cases, there can be several different classes of subjects then diverse instruments might need to be created. For example, Barcomb et al. [9] had different styles of interviews for community managers and episodic volunteers. They also collected supplemental data: public interviews, web pages, and mailing list threads found through a web search. It is essential to check if the instrument obtains the desired information. For interviews or self-administered questionnaires with open questions, researchers should perform pilot studies similarly to as prescribed for quantitative questionnaires. Interviews are interactive and facilitate the clarification of the answers given by respondents with the disadvantages of being time and cost inefficient [64], demanding the transcription of conversation for further analysis. Researchers should balance the effort required and the availability of respondents with the complexity and variability of the topic under study. A simpler topic demanding only straightforward answers or for which a high variability is expected would be more suited to questionnaires. Meanwhile, one could argue that a subject in which more clarification is needed or less variability is foreseen would be more suited to interviews.
Besides questionnaires and interviews, Jansen suggested other options for data collection, i.e., “observing interactions or artifacts in any kind of situation” [7], even though he did not discuss it. These alternatives represent opportunities for SE studies, even though we only observed the collection of blog posts in the analyzed studies. Besides interviews and questionnaires, data collection methods for SE include, for example, static and dynamic analysis of systems, log analysis, instrumenting systems, or team observation [64]. For example, a study that mines software repositories could analyze a set of projects to identify different types of architectures used. Another study could analyze logs produced by similar systems to identify communication patterns inside these logs. Both studies could be framed as qualitative surveys as they focus on identifying the diversity of these elements. This variety of data types, e.g., interview answers, code, software design, or log messages, have implications for data analysis that we will discuss in the next section.
In qualitative surveys, after a first round of analysis, more data may be needed, e.g., analysis led to a better understanding of the phenomenon and researchers deem necessary other instances to check the possibility of other values for the properties under study. This could happen when a theoretical saturation approach is used.
F. Data Analysis
In this step, researchers seek answers to the RQs based on the collected data. In qualitative surveys, analysis should allow researchers to compare the instances in data to extract and identify classes. Given the plethora of different data types useful for SE studies, several different analysis techniques could be used. Most of the papers analysed have consisted of interviews or questionnaire collecting experiences or opinions from practitioners. In this regard, thematic analysis, “a method for identifying, analyzing, and reporting patterns (themes) within data” [65], is a suitable option. When thematic analysis is employed to synthesize data or results from different studies, it is usually referred as thematic synthesis [66]. In either case, coding is performed in one of three ways [66]: deductive, when an initial list of codes is used; inductive, when codes emerge through analysis of the text; or integrated approach when a combination of the two previous approaches is employed. Codes can then be grouped to form higher-level categories, and this process could go further to form a layered model to explain the phenomenon. For example, Melegati et al. [10] grouped the identified inhibitors for experimentation in software startups into themes, and then in categories organized in a progressive model. On the other hand, Barcomb et al. [9] employed an analytical framework based on literature to analyze the data.
Although the analyzed studies have used several labels, the data analysis practices used in most of the cases aimed to identify patterns or themes in data. However, the term “open coding” is prevalent and demands special attention. It refers to one of the analytical tools used in Grounded Theory and “refers to the coding applied in the early stages of the research study where the researchers remains open to any and all codes arising, ensuring a comprehensive coverage” (italics as in the original) [16]. Although the description is similar to inductive thematic analysis, we recommend to avoid the term in the context of qualitative surveys. Grounded Theory (GT) is a complete research method [16] and its use has suffered of method slurring in SE research [58]. A key aspect of GT is iterative and interleaved data collection and analysis steps, guided by an emerging theory [16]. Although qualitative surveys could employ an iterative procedure, this technique is not essential for them and neither to create a theory is necessarily a goal.
For other types of data (e.g., code), researchers could employ other techniques to group the instances under study. For example, software metrics could be used to help the grouping of systems’ architectures or a clustering algorithm could be used to group code excerpts.
G. Threats to Validity
A study's validity is usually assessed in terms of construct validity, internal validity, external validity, and reliability [85]. Construct validity examines whether a proposed construct, a quantifiable, but not directly measurable, concept, is real and if it is properly represented by a proposed indicator [71]. For example, there is not a direct measure for the happiness of software developers, but one could argue that it is quantifiable. This concept is usually discussed in the context of quantitative studies but a qualitative survey could focus on a construct and identify possible values for it. However, construct validity has also been discussed in qualitative studies regarding the potential divergence on understanding concepts. For example, whether constructs discussed in interviews are understood the same way by interviewer and interviewee [13]. It is easier to picture such issues in qualitative surveys. For instance, if diverse values for a characteristic mean the same for different interviewees, for researchers, and for readers. Internal validity is a concern when causal relations are examined and refers to the possible existence of a factor not considered in the study that is the responsible for an investigated factor but is attributed to another [13]. This issue is critical for quantitative surveys but of less concern for qualitative surveys. External validity regards if results are generalizable. This aspect is related to the sampling procedures applied and to what extent the sample is representative of the whole population. In qualitative surveys, it is essential to argue that the sample presents most, if not all, possible values for the characteristics under study. Finally, reliability examines if the study would have been the same if other researchers had performed the same procedures. This aspect is critical when qualitative surveys are based on data coded using, for instance, thematic analysis. In constructivist studies, it is generally discussed if the results are trustworthy rather than valid, by analyzing the aspects of: dependability, credibility, transferability, and confirmability [92].
Conclusion
Software engineering is a diverse research field, aggregating various topics. This variety led to the use of several research methods, many of them originally from social sciences. Given the distance between these fields and computer science, from where SE researchers generally come, methodological issues are common. This article explores the case of qualitative surveys that have recently been suggested to label studies focused on identifying the diversity of characteristics in a population. To do so, we performed a systematic mapping study to identify how the term has been used in SE research. We observed a lack of consistency but a trend to use the term with this connotation. Thus, we argue that adhering to this definition, or at least using this definition as a basis to critique and extend the interpretation and use of qualitative surveys, could be beneficial for the field. We present a set of guidelines to support researchers and reviewers of such studies.
This proposal has the potential to put together currently unrelated studies, facilitating comparison and graduate knowledge building. It is possible to foresee its application in most, if not all, areas of SE. Besides the obvious application to human aspects, researchers could apply qualitative surveys to identify, for example, coding patterns or architectural styles. For instance, a qualitative survey could identify challenges on using code generated by LLMs in existent software projects. As another example, researchers could determine different architecture styles in systems based in microservices.
From a methodological perspective, future work could explore practices to guarantee the inclusion of all possible values, or at least, a great majority of them. In terms of a limitation, the qualitative survey method is very useful for some purposes and less so for others. We do not propose that the qualitative survey should now become a more dominant method to others. Rather, it should be considered as part of a broader portfolio of research methods in SE. Our intention is simply to increase awareness amongst SE researchers of its potential use and benefits.
ACKNOWLEDGMENT
The authors would like to thank Dr. Klaas-Jan Stol for his feedback on an early version of this paper.
NOTE
Open Access funding provided by ‘Libera Università di Bolzano’ within the CRUI CARE Agreement