Principle-Driven Continuous Integration: Simplifying Failure Discovery and Raising Anti-Pattern Awareness

Public PhD Thesis Defense of Carmine Vassallo

Advisor: Prof. Dr. Harald C. Gall
2nd Advisor: Prof. Dr. Laurie Williams
3rd Advisor: Prof. Dr. Sebastian Proksch

Chair: Prof. Dr. Davide Scaramuzza

Date and time: Friday, September 18, 2020, 16:00 h
Location:  remotely via “Zoom” (link expired)

Extended Abstract: Continuous Integration (CI) is a software development practice that enables developers to build reliable software faster. Given its proven benefits, such as increased developer productivity and higher release frequency, most organizations have started adopting CI. This practice advocates full automation of all build steps (i.e., compilation, testing, and code quality assessment) to create a new version of the software. However, the mere introduction of an automated build infrastructure is not sufficient to practice CI well and to achieve its goals. Organizations have also to foster the application of several principles, such as commit often, that reduce conflicts in the team and ensure that the build is continuously executable. Living up to these principles is not easy especially when developers face tough deadlines. As a consequence, developers tend to deviate from principles generating anti-patterns, which are ineffective solutions to recurrent problems. Anti-patterns appear to be beneficial, but, in the end, they let CI decay and lower its effectiveness. In this dissertation, we characterize the problem of anti-patterns to implement solutions that help developers remove the root causes of anti-patterns and, therefore, follow principles. Based on the results of a preliminary study performed on opensource projects revealing the existence of deviations from core principles, we empirically derive a catalog of 79 anti-patterns encountered by developers in practice conducting semi-structured interviews with 13 practitioners and manually analyzing 2,300 posts from a well-know forum (i.e., StackOverflow) where users discuss issues related to the adoption of CI. By interpreting the resulting catalog of anti-patterns, we identify four main causes for their presence, which are (i) the poor knowledge of prerequisites for adopting CI, (ii) the difficulty of inspecting build failure logs, (iii) the presence of bad configurations, and (iv) the wrong usage of a CI process. While only better coaching in CI can efficiently remove the former, we implement several approaches to address the other causes. To improve the understandability of build failure logs, we first build a taxonomy of build failures through the manual analysis of errors contained in 34,182 build logs from open-source and closed-source projects, and then we develop a tool called BART that produces summaries for the most common build failure types. We evaluate the performance of our tool in a controlled experiment with 17 developers. To identify violations of CI principles in the form of configuration smells that developer should remove, we propose CDLinter, a semantic linter for CI/CD configuration files that is evaluated opening 145 issues in open-source projects and monitoring the acceptance of our bug reports and the removal of reported smells over a period of six months. Finally, we implement CI-Odor, an automated reporting tool that leverages information from repository and build history to monitor the presence of bad practices that slowly creep into a project over time. We evaluate its usefulness sending developers reports produced for 36 open-source projects. The results of our evaluations show that the proposed approaches are effective at removing and identifying the aforementioned causes of anti-patterns and, consequently, enforcing a principle-driven continuous integration. BART improves the understandability of the most common build failure types and developers are faster in solving build failures. In presence of build summaries, the resolution time is reduced by 23% when solving testing failures, 20% when repairing compilation errors, 43% when fixing missing dependencies, and 62% when dealing with code analysis failures. CD-Linter identifies smells that are relevant for developers. During the 6-month observation period, 53% of the project maintainers react positively to the issues detected by CD-Linter, with 9% that confirm the validity of the reported problem and 44% that fix it. Finally, the reports generated by CI-Odor are useful for monitoring anti-patterns. Many developers (67%) expect a positive effect of using our generated reports on their CI discipline and the majority (55%) is already willing to integrate CI-Odor in their CI processes.

Advertisement

Book: Rethinking Productivity in Software Engineering

We are proud to announce that our book that we authored three chapters in was just released. It is the result from a thought-provoking and discussion-intensive Dagstuhl Seminar in 2017. The book was edited by Caitlin Sadowski and Thomas Zimmermann, and is available for free (OpenAccess). In the book, software engineering researchers review and discuss productivity, by covering definitions and core concepts related to productivity, guidelines for measuring productivity in specific contexts, best practices and pitfalls, and theories and open questions on productivity. You’ll benefit from the many short chapters, each offering a focused discussion on one aspect of productivity in software engineering.

Developers’ Diverging Perceptions of Productivity

Free Access
To overcome the ever-growing demand for software, software development organizations strive to enhance the productivity of their developers. But what does productivity mean in the context of software development? A substantial amount of work on developer productivity has been undertaken over the past four decades. The majority of this work considered productivity from a top-down perspective (the manager view) in terms of the artifacts and code created per unit of time. Common examples of such productivity measures are the lines of source code modified per hour, the resolution time for modification requests, or function points created per month. These productivity measures focus on a single, output-oriented factor for quantifying productivity, and do not take into account developers’ individual work roles, practices and other factors that might affect their productivity, such as work fragmentation, the tools used, or the work/office environment. In our research, we investigated how productivity could be quantified from the bottom-up, following a mixed-methods approach that involved more than 800 software developers. By investigating developers’ individual productivity, it is possible to better understand the individual work habits and patterns, how they relate to the productivity perceptions and also which factors are most relevant for a developer’s productivity.

Fitbit for Developers: Self-Monitoring at Work

Free Access
Recently, we have seen an explosion in the number of devices and apps that we can use to track various aspects of our lives, such as the steps we walk, the quality of our sleep, or the calories we consume. People use devices such as the Fitbit activity tracker to increase and maintain their physical activity level by tracking their behavior, setting goals (e.g. 10’000 steps a day) and competing with friends. Many of these approaches have been shown to successfully encourage users to change their behaviors, often motivated through persuasive technologies, such as goal-setting, social encouragement and sharing mechanisms. We explored how we can map the tremendous success of these smart devices to the workplace, with the aim to increase software developers’ self-awareness about productivity through self-monitoring. Yet, little is known about expectations of, the experience with, and the impact of self-monitoring in the workplace. From a mixed-methods approach we inferred design elements for building workplace self-monitoring tools, which we then implemented as a technology probe called WorkAnalytics. We field-tested these design elements during a three-week study with software development professionals. In the field study, we found that self-monitoring paired with experience sampling increases developers’ awareness about work and motivates many to improve their behaviors, and that a wide variety of different metrics is needed to fulfill developers’ expectations. Our work can serve as a starting point for researchers and practitioners to build self-monitoring tools for the workplace.

Reducing Interruptions at Work with FlowLight

Free Access
Interruptions at the workplace can consume a lot of time and cause frustration, especially if they happen at moments of high focus. To reduce costly interruptions, we developed the FlowLight, a small LED Lamp mounted at a worker’s desk that computes a worker’s availability for interruptions based on computer interaction and indicates it to her coworkers with colors, similar to a traffic light. In a large study with 449 participants, we found that the FlowLight reduced interruptions by 46%. We also observed an increased awareness of the potential harm of interruptions and an increased feeling of productivity. In this chapter, we present our insights from developing and evaluating FlowLight, and reflect on the key factors that contributed to its success.

Design Recommendations for Self-Monitoring in the Workplace: Studies in Software Development

I am excited to announce our first paper to the CSCW conference!

Abstract: One way to improve the productivity of knowledge workers is to increase their self-awareness about productivity at work through self-monitoring. Yet, little is known about expectations of, the experience with, and the impact of self-monitoring in the workplace. To address this gap, we studied software developers, as one community of knowledge workers. We used an iterative, user-feedback-driven development approach (N=20) and a survey (N=413) to infer design elements for workplace self-monitoring, which we then implemented as a technology probe called WorkAnalytics. We field-tested these design elements during a three-week study with software development professionals (N=43). Based on the results of the field study, we present design recommendations for self-monitoring in the workplace, such as using experience sampling to increase the awareness about work and to create richer insights, the need for a large variety of different metrics to retrospect about work, and that actionable insights, enriched with benchmarking data from co-workers, are likely needed to foster productive behavior change and improve collaboration at work. Our work can serve as a starting point for researchers and practitioners to build self-monitoring tools for the workplace.

Co-Authors: André N. Meyer (University of Zurich), Gail C. Murphy (University of British Columbia), Tom Zimmermann (Microsoft Research), Thomas Fritz (University of Zurich)

You can download the pre-print here.

PersonalAnalytics, our self-monitoring tool, is available on Github here.

Today was a Good Day: The Daily Life of Software Developers

Co-Authors: André N. Meyer (University of Zurich), Earl T. Barr (University College London),  Chris Bird (Microsoft Research), Tom Zimmermann (Microsoft Research)

Abstract: What is a good workday for a software developer? What is a typical workday? We seek to answer these two questions to learn how to make good days typical. Concretely, answering these questions will help to optimize development processes and select tools that increase job satisfaction and productivity. Our work adds to a large body of research on how software developers spend their time. We report the results from 5971 responses of professional developers at Microsoft, who reflected about what made their workdays good and typical, and self-reported about how they spent their time on various activities at work. We developed conceptual frameworks to help define and characterize developer workdays from two new perspectives: good and typical. Our analysis confirms some findings in previous work, including the fact that developers actually spend little time on development and developers’ aversion for meetings and interruptions. It also discovered new findings, such as that only 1.7% of survey responses mentioned emails as a reason for a bad workday, and that meetings and interruptions are only unproductive during development phases; during phases of planning, specification and release, they are common and constructive. One key finding is the importance of agency, developers’ control over their workday and whether it goes as planned or is disrupted by external factors. We present actionable recommendations for researchers and managers to prioritize process and tool improvements that make good workdays typical. For instance, in light of our finding on the importance of agency, we recommend that, where possible, managers empower developers to choose their tools and tasks.

You may download the pre-print here.

Conceptual Framework characterizing typical developer workdays
Conceptual Framework characterizing good developer workdays

Survey on how Developers React to CI Build Failures

Among the benefits provided by Continuous Integration (CI), increased team productivity and integration frequency are perceived as the main advantages. However, changes that contain defects or that suffer from a poor-quality can lead to build failures that stop a team from delivering. The recent Report on the State of DevOps states: “When failures occur, it can be difficult to understand what caused the problem” and previous work found that developers spend on average one hour to fix build breaks!

In our group at the University of Zurich (Switzerland), we are developing new strategies to provide developers with the right assistance to solve build failures faster and more efficiently. To achieve this, we first need to understand the state of practice from real developers and we would like to learn about your personal experience with build failures in this survey.

We would really appreciate if you could find the time to fill out the following survey to help us in our research.

Filling out the survey will take you about 7 minutes. Please note that participating in the questionnaire is completely anonymous, but we will publish the anonymized answers as part of a scientific publication.

If you have any questions about the questionnaire or our research, please do not hesitate to contact us.

jenkins_status
Image from “Arduino Jenkins CI build monitor using car lights”, Gordons Garage, YouTube, 2016.

Screencast: Fostering Software Developers’ Productivity at Work

Screencast of my talk that I recently gave at Tasktop. I talked about how we aim to improve developer productivity by increasing their awareness about their work, interruptions, habits and goals.

Click here to access the full blogpost by Patrick Anderson from Tasktop.

Find out more about this work:

Sensing Interruptibility in the Office: A Field Study on the Use of Biometric and Computer Interaction Sensors

Knowledge workers experience many interruptions during their work day. Especially when they happen at inopportune moments, interruptions can incur high costs, cause time loss and frustration. Knowing a person’s interruptibility allows optimizing the timing of interruptions and minimize disruption. Recent advances in technology provide the opportunity to collect a wide variety of data on knowledge workers to predict interruptibility. While prior work predominantly examined interruptibility based on a single data type and in short lab studies, we conducted a two-week field study with 13 professional software developers to investigate a variety of computer interaction, heart-, sleep-, and physical activity-related data. Our analysis shows that computer interaction data is more accurate in predicting interruptibility at the computer than biometric data (74.8% vs. 68.3% accuracy), and that combining both yields the best results (75.7% accuracy). We discuss our findings and their practical applicability also in light of collected qualitative data.

You may access the pre-print here.

Characterizing Software Developers by Perceptions of Productivity

This work has been conducted by André Meyer (UZH), Thomas Zimmermann (Microsoft Research) and Thomas Fritz (UBC). This research has been published to the industrial papers track at the ESEM’17 in Toronto. Thomas Zimmermann will present it on Thursday, November 9th, 2017 at 1pm in Session 4B: Qualitative Research. Download Pre-Print

Studying Developers’ Perceptions of Productivity instead of Measuring it

To overcome the ever-growing demand for software, we need new ways of optimizing the productivity of software developers. Existing work has predominantly focused on top-down approaches for defining or measuring productivity, such as lines of code, function points, or completed tasks over time. While these measurements are valuable to compare certain aspects of productivity, we argue that they miss the many other factors that influence the success and productivity of a software developer, such as the fragmentation of their work, their experience, and so on. A developer who spends the workday with writing a high-quality test-case or helping a co-worker would have a bad productivity-score with said measurements. Hence, in our previous work we looked at productivity from the bottom-up, looking at developers’ individual perceptions of productivity contrary to what was done in previous work. We found that while perceptions of productivity are indeed very individual, they follow certain habitual patterns each day (e.g. Morning-People, Low-At-Lunch People, and Afternoon-People) and there are activities that most developers consider as unproductive or productive.

Similar Perceptions of Productivity

This previous work however, left us questioning if there are possibly more people with similar perceptions of productivity that can be clustered together. To investigate this, we run an online survey with 413 professional software developers who currently work at Microsoft (average experience 9.6 years) and asked them four questions asking them to describe productive (Q1) and unproductive (Q2) workdays, to rate their agreement with statements on factors that might affect productivity (Q3) and to rate the interestingness of productivity measures at work (Q4).

We found out that developers can roughly be clustered into six groups with similar perceptions: the lone, focused, balanced, leading, and goal-oriented developer. This allows us to abstract and simplify the variety of individual perceptions into groups and optimize productivity for these groups instead of individuals. In the following, I will describe the specific characteristics of these groups:

Some just love creative tasks with no clear goal, while others prefer measurable tasks.

  1. The social developers feel productive when helping coworkers, collaborating and doing code reviews. To get things done, they come early to work or work late and try to focus on a single task.
  2. The lone developers avoid disruptions such as noise, email, meetings, and code reviews. They feel most productive when they have little to no social interactions and when they can work on solving problems, fixing bugs or coding features in quiet and without interruptions. To reflect about work, they are mostly interested in knowing the frequency and duration of interruptions they encountered. Note that this group of developers is almost the opposite of the first group (the social developer) in how productive they feel when encountering social interactions.
  3. The focused developers feel most productive when they are working efficiently and concentrated on a single task at a time. They are feeling unproductive when they are wasting time and spend too much time on a task, because they are stuck or working slowly. They are interested in knowing the number of interruptions and focused time.
  4. The balanced developers are less affected by disruptions. They are less likely to come early to work or work late. They are feeling unproductive, when tasks are unclear or irrelevant, they are unfamiliar with a task, or when tasks are causing overhead.
  5. The leading developers are more comfortable with meetings and emails and feel less productive with coding activities than other developers. They feel more productive in the afternoon and when they can write and design things. They do not like broken builds and blocking tasks, preventing them (or the team) from doing productive work.
  6. The goal-oriented developers feel productive when they complete or make progress on tasks. They feel less productive when they multi-task, are goal-less or are stuck. They are more open to meetings and emails compared to the other clusters, in case they help them achieve their goals. In contrast to group 3 (the focused developer), goal-oriented developers care more about actually getting stuff done (i.e. crossing items off the todo-list), while the focused developer cares more about working efficiently.

Optimizing Productivity for Different Groups of Developers

The six clusters and their characteristics provide relevant insights into groups of developers with similar productivity perceptions that can be used to optimize the work and flow on the team and the individual level. The differences between software developers’ preferred collaboration and work styles show that not all developers are alike, and that the cluster an individual or team belongs to could be a basis for tailoring actions for improving their work and productivity.

For example, on the team level, we could provide quiet, less interruption-prone office to the lone and focused developers (cluster 2 and 3), and seat social developers (cluster C1) who feel more comfortable with discussions every now and then. Another example is task assignments, assigning an explorative task for a new product that is very open without clear goal might be less suitable for the goal-oriented developer (cluster 6) as opposed to the social and leading developer (cluster 1 and 5) who prefer explorative tasks that require intensive collaboration.

Not everyone feels productive when spending time in meetings.

On the individual level, developers might benefit from tailored user experiences for their (development) tools. Maybe someday, we can build virtual assistants, e.g. Cortana/Alexa for Developers, that recommend (or automatically take) actions, depending on the developers’ cluster. For example, they could block out notifications from email, Slack, and Skype during coding sessions for the lone developer (cluster 2) but allow them for the social developer (cluster 1). Or they could recommend the focused developer (cluster 3) to come to work early to have uninterrupted work time, or suggest the balanced developer (cluster 4) to take a break to avoid boredom and tiredness. Or they could help with scheduling meetings, depending on the users’ preferences.

 

In the paper (find a pre-print here) you may find more detailed explanations into the study method, and a much more detailed discussion of the clusters.

 

Survey on how You Plan your Most Productive Days!

We are currently running a survey to learn more about knowledge workers’ work days – How are they planning them? Are they using any tools? How could tools help with more efficient planning?

We invite you to participate in this short, 10-12 minutes survey. To future goal is to develop improvements for common task management software.

Access the survey here.

We appreciate your help a lot! Please contact us in case you have any questions.

André Meyer – ameyer@ifi.uzh.ch
Jürgen Cito – cito@ifi.uzh.c

seal @ ICSME 2017

We are very happy to announce that our research group got two papers accepted at ICSME 2017 in Shanghai, China.

The first paper is entitled “A Tale of CI Build Failures: an Open Source and a Financial Organization Perspective” and was written in collaboration with ING Nederland, University of Sannio and TU Delft. The authors of the paper are: Carmine Vassallo, Gerald Schermann, Fiorella Zampetti, Daniele Romano, Philipp Leitner, Andy Zaidman, Massimiliano Di Penta and Sebastiano Panichella.

Abstract: Continuous Integration (CI) and Continuous Delivery (CD) are widespread in both industrial and open-source software (OSS) projects. Recent research characterized build failures in CI and identified factors potentially correlated to them. However, most observations and findings of previous work are exclusively based on OSS projects or data from a single industrial organization. This paper provides a first attempt to compare the CI processes and occurrences of build failures in 349 Java OSS projects and 418 projects from a large financial organization, ING Nederland.

Carmine_abstract

Through the analysis of 34,182 failing builds (26% of the total number of observed builds), we derived a taxonomy of failures that affect the observed CI processes. Using cluster analysis, we observed that in some cases OSS and ING projects share similar build failure patterns (e.g., few compilation failures as compared to frequent testing failures), while in other cases completely different patterns emerge. In short, we explain how OSS and ING CI processes exhibit commonalities, yet are substantially different in their design and in the failures they report.

 

The second accepted paper is entitled “Towards Activity-Aware Tool Support for Change Tasks” and was written by Katja Kevic and Thomas Fritz.

Abstract: To complete a change task, software developers perform a number of activities, such as locating and editing the relevant code. While there is a variety of approaches to support developers for change tasks, these approaches mainly focus on a single activity each. Given the wide variety of activities during a change task, a developer has to keep track of and switch a lot between the different approaches.

katja_icsme2017

By knowing more about a developer’s activities and in particular by knowing when she is working on which activity, we would be able to provide better and more tailored tool support, thereby reducing developer effort. In our research we investigate the characteristics of these activities, whether they can be identified, and whether we can use this additional information to improve developer support for change tasks. We conducted two exploratory studies with a total of 21 software developers collecting data on activities in the lab and field. An empirical analysis of the data shows, amongst other results, that activities comprise a consistently small amount of code elements across all developers and tasks (approx. 8.7 elements). Further analysis of the data shows, that we can automatically detect the boundaries and types of activities, and that the information on activity types can be used to improve the identification of relevant code elements.