Category Archives: Papers

Characterizing Software Developers by Perceptions of Productivity

This work has been conducted by André Meyer (UZH), Thomas Zimmermann (Microsoft Research) and Thomas Fritz (UBC). This research has been published to the industrial papers track at the ESEM’17 in Toronto. Thomas Zimmermann will present it on Thursday, November 9th, 2017 at 1pm in Session 4B: Qualitative Research. Download Pre-Print

Studying Developers’ Perceptions of Productivity instead of Measuring it

To overcome the ever-growing demand for software, we need new ways of optimizing the productivity of software developers. Existing work has predominantly focused on top-down approaches for defining or measuring productivity, such as lines of code, function points, or completed tasks over time. While these measurements are valuable to compare certain aspects of productivity, we argue that they miss the many other factors that influence the success and productivity of a software developer, such as the fragmentation of their work, their experience, and so on. A developer who spends the workday with writing a high-quality test-case or helping a co-worker would have a bad productivity-score with said measurements. Hence, in our previous work we looked at productivity from the bottom-up, looking at developers’ individual perceptions of productivity contrary to what was done in previous work. We found that while perceptions of productivity are indeed very individual, they follow certain habitual patterns each day (e.g. Morning-People, Low-At-Lunch People, and Afternoon-People) and there are activities that most developers consider as unproductive or productive.

Similar Perceptions of Productivity

This previous work however, left us questioning if there are possibly more people with similar perceptions of productivity that can be clustered together. To investigate this, we run an online survey with 413 professional software developers who currently work at Microsoft (average experience 9.6 years) and asked them four questions asking them to describe productive (Q1) and unproductive (Q2) workdays, to rate their agreement with statements on factors that might affect productivity (Q3) and to rate the interestingness of productivity measures at work (Q4).

We found out that developers can roughly be clustered into six groups with similar perceptions: the lone, focused, balanced, leading, and goal-oriented developer. This allows us to abstract and simplify the variety of individual perceptions into groups and optimize productivity for these groups instead of individuals. In the following, I will describe the specific characteristics of these groups:

Some just love creative tasks with no clear goal, while others prefer measurable tasks.
  1. The social developers feel productive when helping coworkers, collaborating and doing code reviews. To get things done, they come early to work or work late and try to focus on a single task.
  2. The lone developers avoid disruptions such as noise, email, meetings, and code reviews. They feel most productive when they have little to no social interactions and when they can work on solving problems, fixing bugs or coding features in quiet and without interruptions. To reflect about work, they are mostly interested in knowing the frequency and duration of interruptions they encountered. Note that this group of developers is almost the opposite of the first group (the social developer) in how productive they feel when encountering social interactions.
  3. The focused developers feel most productive when they are working efficiently and concentrated on a single task at a time. They are feeling unproductive when they are wasting time and spend too much time on a task, because they are stuck or working slowly. They are interested in knowing the number of interruptions and focused time.
  4. The balanced developers are less affected by disruptions. They are less likely to come early to work or work late. They are feeling unproductive, when tasks are unclear or irrelevant, they are unfamiliar with a task, or when tasks are causing overhead.
  5. The leading developers are more comfortable with meetings and emails and feel less productive with coding activities than other developers. They feel more productive in the afternoon and when they can write and design things. They do not like broken builds and blocking tasks, preventing them (or the team) from doing productive work.
  6. The goal-oriented developers feel productive when they complete or make progress on tasks. They feel less productive when they multi-task, are goal-less or are stuck. They are more open to meetings and emails compared to the other clusters, in case they help them achieve their goals. In contrast to group 3 (the focused developer), goal-oriented developers care more about actually getting stuff done (i.e. crossing items off the todo-list), while the focused developer cares more about working efficiently.

Optimizing Productivity for Different Groups of Developers

The six clusters and their characteristics provide relevant insights into groups of developers with similar productivity perceptions that can be used to optimize the work and flow on the team and the individual level. The differences between software developers’ preferred collaboration and work styles show that not all developers are alike, and that the cluster an individual or team belongs to could be a basis for tailoring actions for improving their work and productivity.

For example, on the team level, we could provide quiet, less interruption-prone office to the lone and focused developers (cluster 2 and 3), and seat social developers (cluster C1) who feel more comfortable with discussions every now and then. Another example is task assignments, assigning an explorative task for a new product that is very open without clear goal might be less suitable for the goal-oriented developer (cluster 6) as opposed to the social and leading developer (cluster 1 and 5) who prefer explorative tasks that require intensive collaboration.

Not everyone feels productive when spending time in meetings.

On the individual level, developers might benefit from tailored user experiences for their (development) tools. Maybe someday, we can build virtual assistants, e.g. Cortana/Alexa for Developers, that recommend (or automatically take) actions, depending on the developers’ cluster. For example, they could block out notifications from email, Slack, and Skype during coding sessions for the lone developer (cluster 2) but allow them for the social developer (cluster 1). Or they could recommend the focused developer (cluster 3) to come to work early to have uninterrupted work time, or suggest the balanced developer (cluster 4) to take a break to avoid boredom and tiredness. Or they could help with scheduling meetings, depending on the users’ preferences.

 

In the paper (find a pre-print here) you may find more detailed explanations into the study method, and a much more detailed discussion of the clusters.

 

FlowLight: How a Traffic Light Reduces Interruptions at Work (CHI’17)

We are extremely happy to announce our newest project, FlowLight, a traffic-light-like light for knowledge workers to reduce their interruptions at work, and makes them more productive! The research project, published with the title “Reducing Interruptions at Work: A Large-Scale Field Study of FlowLight”, was conducted in close collaboration with researchers at ABB. It was also awared with an Honorable Mention award.

Authors: Manuela Züger, Christopher Corley, André N. Meyer, Boyang Li, Thomas Fritz, David Shepherd, Vinay Augustine, Patrick Francis, Nicholas Kraft, Will Snipes

In the media: Our work was also featured on The Telegraph, Wall Street Journal, GeekWireNBC NewsNew AtlasDigitalTrends, Business StandardThe New Yorker, New ScientistTechXplore, MailOnline/DailyMail, ScienceDaily, The Times (UK), News For Everyone, Evening Express, Yahoo News, India TodayPPP Focus, The StatesmanRadio Canada, LiveAtPC, Cantech Letter, Business Standard, Engineering 360, New Atlas, BT, Telengana TodayLe Matin (French), 20min.ch (German), Radio Energy (German), Die Presse (German), PresseText (German), Tages-Anzeiger (German) CnBeta (Chinese), PopMech (Russian), PcNews (Russian), Teknikan Maailma (Finnish), Utusan (Malaysian), Irish Examiner, Knowridge, CKNW Radio, Thrive GlobalTech.Rizlys, Appsforpcdaily.comEurekAlert, Lancashire Post, MetroNews, user-experience-blog (DE), Corriere della Sierra (Spanish), Breaking NewsUBC News, UBC Science, and many other blogs.

Reducing interruptions at the workplace

Various previous work has emphasized how bad constant interruptions and fragmentation of work is for knowledge workers’ productivity, the quality of their work, and also their motivation at work. When we were observing knowledge workers at their work in a previous study, we realized that signals, such as wearing headphones or closing their office door, were often used to visualize that they don’t want to be interrupted right now. However, this manual approach was often considered as quite cumbersome and not everybody was aware of these signs. Also, the long-term impact on teams and their work was unclear. This is why we developed the FlowLight, a physical traffic-light like LED combined with an automatic interruptibility measure based on computer interaction data.

The Research

In a large-scale and long-term field study with 449 participants from 12 different countries, we found, amongst other results, that the FlowLight reduced interruptions of participants by 46%, increased their awareness on the potential disruptiveness of interruptions, and most participants are still using it today!

These, and many other insights, can be found in detail in our publication to the CHI’17 conference (pre-print). Below, you find a video showcasing FlowLight:

This is a first step towards making knowledge workers more aware of, and reducing, interruptions at work. In the future, we plan to add extended computer interaction context and biometric sensing to improve FlowLight’s algorithm, to make it even more accurate.

Presentation & Demo at CHI’17

In case you are planning to attend the CHI’17 Conference in Denver next week, make sure to come to our presentation and learn much more about the FlowLight! The talk will take place on Monday, 9th 2017 at 11.30a to 12.50p.

You can find out more about (or soon order) FlowLight on this website.

 

A few more impressions:

 

“The Work Life of Developers: Activities, Switches and Perceived Productivity” accepted at TSE’17

We are happy to announce that our paper “The Work Life of Developers: Activities, Switches and Perceived Productivity” was accepted for the Transactions of Software Engineering (TSE) journal. You can access a pre-print here.

This work was conducted by André Meyer (UZH), Laura Barton (UBC), Gail Murphy (UBC), Thomas Zimmermann (Microsoft) and Thomas Fritz (UZH)

Make Developers Productive

Many software development companies strive to enhance the productivity of their engineers. All too often, efforts aimed at improving developer productivity are undertaken without knowledge about how developers spend their time at work and how it influences their own perception of productivity and well-being. For example, a software developers’ work day might be influenced by the tasks that are performed, by the infrastructure, tools used, or the office environment. Many of these factors result in activity and context switches that can cause fragmented work and, thus, often have a negative impact on the developers’ perceived productivity, quality of output and progress on tasks.

To fill this gap, we run an in-situ study with professional software developers from different companies, investigating developers’ work practices and the relationship to the developers’ perceptions of productivity more holistically, while also examining individual differences. One of the big questions we set out to answer is if there are observable trends in how developers perceive this productivity and how they can be potentially used to quantify productivity.

In-Situ Study to Investigate Productive Work Days

We deployed a monitoring application that logs developers’ interaction with the computer (e.g. programs used, user input) and asked 20 professional software developers to run it during 2-3 work weeks. We further asked participants to regularly self-report their perceived productivity, and the tasks and activities they have performed, every 90 minutes.

Corroborating earlier findings, we found that developers spend their time on a wide variety of activities and switch regularly between them, resulting in highly fragmented work. The findings further emphasize how individual developers’ work days are. For example, while some participants tend to span their work days out over as many as 21.4 hours (max), most developers keep more compact work hours, on average 8.4 (SD=1.2) hours per day. From that time, they spend on average 4.3 (SD=0.5) hours on their computer. And surprisingly little of it with development related activities (e.g. coding, testing, debugging): only about 30% of that time. The rest of the work day is split up into emails (15%), meetings (10%), web browsing (work related: 11%, unrelated: 6%) and other activities.

A next step was to investigate fragmentation of work in more details: Apart from meetings, developers remain only between 0.3 and 2.0 minutes in an activity before switching to another one. These very short times per activity and the variety of activities a developer pursues each day illustrate the high fragmentation of a developer’s work. From participant’s self-reported, perceived productivity we found that although there was a lot of variation between individuals, the plots can be categorized into three broad groups: morning people, afternoon people, and those whose perceived productivity dipped at lunch. Morning people often come to work a little bit earlier, and get the most important things done before the crowd arrives. Afternoon people usually arrive later and spend most of their mornings with meetings and emails, and get stuff done in the afternoon, thus feeling more productive then. These results suggest that while information workers in general have diverse perceived productivity patterns, individuals do appear to follow their own habitual patterns for each day.

Can we somehow quantify productivity?

We built explanatory models (stepwise linear regressions) to describe which factors (of the collected data) contributes to the productivity ratings reported by the study participant. We observe that productivity is a personal matter that varies greatly among individuals. There are some tendencies, however, such as that more user input is most often associated with a positive, and emails, planned meetings and work unrelated websites with a negative perception of productivity.

Existing, previous work predominantly focused on a single or small set of outcome measures, e.g. the lines of code or function points written. While these measures can be used across developers, e.g. for comparisons, they neglect to capture the individual differences in factors that impact the way that developers’ work. This suggests that measures or models that attempt to quantify productivity should take the individual differences into account, and what is perceived as productive or not; and capture the developer’s work more holistically, rather than just by a single outcome measure. Such individual models could then be used to provide better and more tailored support to developers, for instance to foster focus and flow at work. For example, we could help developers avoid interruptions at inopportune moments (see our FlowLight), increase the awareness about work and productivity using a retrospective view or help users to schedule a more productive work day, that avoids unproductive patterns as much as possible.

Finally, we examined if we can predict high and low productivity sessions based on the collected data for individual participants, using logistic regression. The results are promising and suggest that even with a relatively small number of reported productivity self-reports, it is possible to build personalized, predictive productivity models.

Contact André Meyer in case you have any questions or suggestions.

“Using (Bio)Metrics to Predict Code Quality” is currently one of the most downloaded articles in software engineering

We are happy to announce that our paper “Using (Bio)Metrics to Predict Code Quality Online”, written by Sebastian Müller and Thomas Fritz, was one of the most downloaded software engineering articles in June and July 2016. With 1709 downloads in 6 weeks, it scored the second place of all ACM software engineering articles. According to ACM, this is the first time that any paper was downloaded more than 1000 times.

screen-shot-2016-10-05-at-14-18-11

Image source: ACM SIGSOFT Software Engineering Notes. Volume 41 Number 4.

The paper investigates the use of biometrics, such as heart rate variability (HRV) or electro-dermal activity (EDA) to determine the difficulty that developers experience while working on real world change tasks and automatically identify code quality concerns while a developer is making a change to the code. It can be accessed here.

Journal of Systems and Software: Eye Gaze and Interaction Contexts for Change Tasks – Observations and Potential

The more we know about software developers’ detailed navigation behavior for change
tasks, the better we are able to provide effective tool support. In this article, we extend our work on the fine-granular navigation behavior of developers (see blogpost) and explore the potential of the more detailed and fine-granular data by examining the use of the captured change task context to predict perceived task difficulty and to provide better and more fine-grained navigation recommendations.

 

Check out our Journal article!

seal @ ICSE 2016

We are very happy to announce that our research group got two papers and a technical briefing accepted at ICSE 2016 in Austin, Texas.

The first accepted paper entitled “The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation” was written in collaboration with the University of Delft. The authors of the paper are : Sebastiano Panichella, Annibale Panichella, Moritz Beller, Andy Zaidman and Harald Gall.

Abstract: “Automated test generation tools have been widely investigated with the goal of reducing the cost of testing activities. However, generated tests have been shown not to help developers in detecting and finding more bugs even though they reach higher structural coverage compared to manual testing. The main reason is that generated tests are difficult to understand and maintain.

Test Case Summarizer

Our paper proposes an approach which automatically generates test case summaries of the portion of code exercised by each individual test, thereby improving understandability. We argue that this approach can complement the current techniques around automated unit test generation or search-based techniques designed to generate a possibly minimal set of test cases. In evaluating our approach we found that (1) developers find twice as many bugs, and (2) test case summaries significantly improve the comprehensibility of test cases, which is considered particularly useful by developers.”

A preprint of the paper can be found online.

The second paper is entitled “Using (Bio)Metrics to Predict Code Quality Online” and was written by Sebastian Müller and Thomas Fritz. The paper investigates the use of biometrics, such as heart rate variability (HRV) or electro-dermal activity (EDA) to determine the difficulty that developers experience while working on real world change tasks and automatically identify code quality concerns while a developer is making a change to the code.

overview

A preprint of the paper will be available soon.

Additionally, we had a technical briefing on “Using Docker Containers to Improve Reproducibility in Software Engineering Research”, by Jürgen Cito and Harald Gall, accepted, where we will present opportunities to aid reproducibility to the SE community.

Preprint: “Interruptibility of Software Developers and its Prediction Using Psycho-Physiological Sensors”

We are excited that our paper “Interruptibility of Software Developers and its Prediction Using Psycho-Physiological Sensors” by Manuela Züger and Thomas Fritz was accepted for CHI 2015 and like to share a preprint with you.

Interruptions of knowledge workers are common and can cause a high cost if they happen at inopportune moments. Our paper presents a lab and a field study with a total of 20 software developers, where we examined the use of psycho-physiological sensors to measure interruptibility of a knowledge worker in a real-world context.

The results show that a Naïve Bayes classifier can be used to automatically assess states of a knowledge worker’s interruptibility with high accuracy in the lab as well as in the field. This demonstrates the potential of psycho-physiological sensors to avoid expensive interruptions. For instance, such a classifier could be used to automatically turn of notifications while a knowledge worker’s interruptibility is low.

The preprint of the paper can be downloaded here.

Paper accepted for ICPC ’15

We are excited to announce that out paper “Discovering Loners and Phantoms in Commit and Issue Data” has been accepted for the 23rd IEEE International Conference on Program Comprehension (ICPC 2015) in Florence, Italy.

The interlinking of commit and issue data has become a de-facto standard in software development. Modern issue tracking systems, such as JIRA, automatically interlink commits and issues by the extraction of identifiers (e.g., issue key) from commit messages. However, the conventions for the use of interlinking methodologies vary between software projects. For example, some projects enforce the use of identifiers for every commit while others have less restrictive conventions. In this work, we introduce a model called PaLiMod (Partial Linking Model) to enable the analysis of interlinking characteristics in commit and issue data. We surveyed 15 Apache projects to investigate differences and commonalities between linked and non-linked commits and issues (RQ1). Based on the gathered information, we created a set of heuristics to interlink the residual of non-linked commits and issues (RQ2).

overview

We observed that in the majority of the analyzed projects, the number of commits linked to issues is higher than the number of commits without link. On average, 74% of commits are linked to issues and 50% of the issues have associated commits. Based on the survey data, we identified two interlinking characteristics which we call Loners (one commit, one issue) and Phantoms (multiple commits, one issue). For these two characteristics, we proposed heuristics to automatically interlink non-linked commit and issue data. The evaluation results showed that our approach can achieve an overall precision of 96% with a recall of 92% in case of the Loner heuristic and an overall precision of 73% with a recall of 53% in case of the Phantom heuristic.

The results of our evaluation indicate that the proposed PaLiMod model and heuristics enable an automatic interlinking and can indeed reduce the residual of non-linked commits and issues in software projects.

Preprint: “Stuck and Frustrated or In Flow and Happy: Sensing Developers’ Emotions and Progress”

Our paper “Stuck and Frustrated or In Flow and Happy: Sensing Developers’ Emotions and Progress” by Sebastian Müller and Thomas Fritz was accepted for ICSE 2015 and a preprint of the paper is now available.

The paper presents a study that investigates developers’ emotions and progress while working on a change task and how biometric measurements, such as heart rate or pupil sizes, can be used to assess them. In the study with 17 participants working on two change tasks each, the participants were wearing three biometric sensors and had to periodically assess their emotions and progress.

The results show that the wide range of emotions experienced by developers is correlated with their perceived progress on the change tasks. To investigate whether we can use biometric sensors to distinguish between positive and negative emotions as well as episodes of low and high progress that developers experience during change tasks, we applied a machine learning approach to the collected data.

Over the course of the participants’ work on both change tasks we collected biometric data for a total of 213 intervals. The following figure illustrates a set of four such intervals together with the collected EDA and the heart rate signal as well as the participant’s emotion and progress ratings. Especially for the EDA signal, the example shows a visible difference between the first episode with medium progress and higher valence compared to the last episode with the developer being stuck and a lower valence.

biometrics

Our analysis shows that we can build a classifier to distinguish between positive and negative emotions in 71.36% and between low and high progress in 67.70% of all cases. These results open up opportunities for improving a developer’s productivity. For instance, one could use such a classifier for providing recommendations at opportune moments when a developer is stuck and making no progress.

The preprint of the paper can be downloaded here.