FlowLight: How a Traffic Light Reduces Interruptions at Work (CHI’17)

We are extremely happy to announce our newest project, FlowLight, a traffic-light-like light for knowledge workers to reduce their interruptions at work, and makes them more productive! The research project, published with the title “Reducing Interruptions at Work: A Large-Scale Field Study of FlowLight”, was conducted in close collaboration with researchers at ABB. It was also awared with an Honorable Mention award.

Authors: Manuela Züger, Christopher Corley, André N. Meyer, Boyang Li, Thomas Fritz, David Shepherd, Vinay Augustine, Patrick Francis, Nicholas Kraft, Will Snipes

In the media: Our work was also featured on The Telegraph, Wall Street Journal, GeekWireNBC NewsNew AtlasDigitalTrends, Business StandardThe New Yorker, New ScientistTechXplore, MailOnline/DailyMail, ScienceDaily, The Times (UK), News For Everyone, Evening Express, Yahoo News, India TodayPPP Focus, The StatesmanRadio Canada, LiveAtPC, Cantech Letter, Business Standard, Engineering 360, New Atlas, BT, Telengana TodayLe Matin (French), 20min.ch (German), Radio Energy (German), Die Presse (German), PresseText (German), Tages-Anzeiger (German) CnBeta (Chinese), PopMech (Russian), PcNews (Russian), Teknikan Maailma (Finnish), Utusan (Malaysian), Irish Examiner, Knowridge, CKNW Radio, Thrive GlobalTech.Rizlys, Appsforpcdaily.comEurekAlert, Lancashire Post, MetroNews, user-experience-blog (DE), Corriere della Sierra (Spanish), Breaking NewsUBC News, UBC Science, and many other blogs.

Reducing interruptions at the workplace

Various previous work has emphasized how bad constant interruptions and fragmentation of work is for knowledge workers’ productivity, the quality of their work, and also their motivation at work. When we were observing knowledge workers at their work in a previous study, we realized that signals, such as wearing headphones or closing their office door, were often used to visualize that they don’t want to be interrupted right now. However, this manual approach was often considered as quite cumbersome and not everybody was aware of these signs. Also, the long-term impact on teams and their work was unclear. This is why we developed the FlowLight, a physical traffic-light like LED combined with an automatic interruptibility measure based on computer interaction data.

The Research

In a large-scale and long-term field study with 449 participants from 12 different countries, we found, amongst other results, that the FlowLight reduced interruptions of participants by 46%, increased their awareness on the potential disruptiveness of interruptions, and most participants are still using it today!

These, and many other insights, can be found in detail in our publication to the CHI’17 conference (pre-print). Below, you find a video showcasing FlowLight:

This is a first step towards making knowledge workers more aware of, and reducing, interruptions at work. In the future, we plan to add extended computer interaction context and biometric sensing to improve FlowLight’s algorithm, to make it even more accurate.

Presentation & Demo at CHI’17

In case you are planning to attend the CHI’17 Conference in Denver next week, make sure to come to our presentation and learn much more about the FlowLight! The talk will take place on Monday, 9th 2017 at 11.30a to 12.50p.

You can find out more about (or soon order) FlowLight on this website.

 

A few more impressions:

 

“The Work Life of Developers: Activities, Switches and Perceived Productivity” accepted at TSE’17

We are happy to announce that our paper “The Work Life of Developers: Activities, Switches and Perceived Productivity” was accepted for the Transactions of Software Engineering (TSE) journal. You can access a pre-print here.

This work was conducted by André Meyer (UZH), Laura Barton (UBC), Gail Murphy (UBC), Thomas Zimmermann (Microsoft) and Thomas Fritz (UZH)

Make Developers Productive

Many software development companies strive to enhance the productivity of their engineers. All too often, efforts aimed at improving developer productivity are undertaken without knowledge about how developers spend their time at work and how it influences their own perception of productivity and well-being. For example, a software developers’ work day might be influenced by the tasks that are performed, by the infrastructure, tools used, or the office environment. Many of these factors result in activity and context switches that can cause fragmented work and, thus, often have a negative impact on the developers’ perceived productivity, quality of output and progress on tasks.

To fill this gap, we run an in-situ study with professional software developers from different companies, investigating developers’ work practices and the relationship to the developers’ perceptions of productivity more holistically, while also examining individual differences. One of the big questions we set out to answer is if there are observable trends in how developers perceive this productivity and how they can be potentially used to quantify productivity.

In-Situ Study to Investigate Productive Work Days

We deployed a monitoring application that logs developers’ interaction with the computer (e.g. programs used, user input) and asked 20 professional software developers to run it during 2-3 work weeks. We further asked participants to regularly self-report their perceived productivity, and the tasks and activities they have performed, every 90 minutes.

Corroborating earlier findings, we found that developers spend their time on a wide variety of activities and switch regularly between them, resulting in highly fragmented work. The findings further emphasize how individual developers’ work days are. For example, while some participants tend to span their work days out over as many as 21.4 hours (max), most developers keep more compact work hours, on average 8.4 (SD=1.2) hours per day. From that time, they spend on average 4.3 (SD=0.5) hours on their computer. And surprisingly little of it with development related activities (e.g. coding, testing, debugging): only about 30% of that time. The rest of the work day is split up into emails (15%), meetings (10%), web browsing (work related: 11%, unrelated: 6%) and other activities.

A next step was to investigate fragmentation of work in more details: Apart from meetings, developers remain only between 0.3 and 2.0 minutes in an activity before switching to another one. These very short times per activity and the variety of activities a developer pursues each day illustrate the high fragmentation of a developer’s work. From participant’s self-reported, perceived productivity we found that although there was a lot of variation between individuals, the plots can be categorized into three broad groups: morning people, afternoon people, and those whose perceived productivity dipped at lunch. Morning people often come to work a little bit earlier, and get the most important things done before the crowd arrives. Afternoon people usually arrive later and spend most of their mornings with meetings and emails, and get stuff done in the afternoon, thus feeling more productive then. These results suggest that while information workers in general have diverse perceived productivity patterns, individuals do appear to follow their own habitual patterns for each day.

Can we somehow quantify productivity?

We built explanatory models (stepwise linear regressions) to describe which factors (of the collected data) contributes to the productivity ratings reported by the study participant. We observe that productivity is a personal matter that varies greatly among individuals. There are some tendencies, however, such as that more user input is most often associated with a positive, and emails, planned meetings and work unrelated websites with a negative perception of productivity.

Existing, previous work predominantly focused on a single or small set of outcome measures, e.g. the lines of code or function points written. While these measures can be used across developers, e.g. for comparisons, they neglect to capture the individual differences in factors that impact the way that developers’ work. This suggests that measures or models that attempt to quantify productivity should take the individual differences into account, and what is perceived as productive or not; and capture the developer’s work more holistically, rather than just by a single outcome measure. Such individual models could then be used to provide better and more tailored support to developers, for instance to foster focus and flow at work. For example, we could help developers avoid interruptions at inopportune moments (see our FlowLight), increase the awareness about work and productivity using a retrospective view or help users to schedule a more productive work day, that avoids unproductive patterns as much as possible.

Finally, we examined if we can predict high and low productivity sessions based on the collected data for individual participants, using logistic regression. The results are promising and suggest that even with a relatively small number of reported productivity self-reports, it is possible to build personalized, predictive productivity models.

Contact André Meyer in case you have any questions or suggestions.

seal @ ICSE 2017

We are very happy to announce that our research group got two papers at ICSE 2017 in Buenos Aires, Argentina.

The first accepted paper is entitled “Analyzing APIs Documentation and Code to Detect Directive Defects” and was written by Yu Zhou, Ruihang Gu, Taolue Chen, Zhiqiu Huang, Sebastiano Panichella and Harald Gall.

Abstract: “Application Programming Interface (API) documents represent one of the most important references for API users. However, it is frequently reported that the documentation is inconsistent with the source code and deviates from the API itself. Such inconsistencies in the documents inevitably confuse the API users hampering considerably their API comprehension and the quality of software built from such APIs.

approach

In this paper, we propose an automated approach to detect defects of API documents by leveraging techniques from program comprehension and natural language processing. Particularly, we focus on the directives of the API documents which are related to parameter constraints and exception throwing declarations. A first-order logic based constraint solver is employed to detect such defects based on the obtained analysis results. We evaluate our approach on parts of well documented JDK 1.8 APIs. Experiment results show that, out of around 2000 API usage constraints, our approach can detect 1146 defective document directives, with a precision rate of 83.1%, and a recall rate of 81.2%, which demonstrates its practical feasibility.”

A preprint of the paper will be available soon.

The second paper is entitled “Recommending and Localizing Code Changes for Mobile Apps based on User Reviews” and was written in collaboration with the University of Salerno. The authors of the paper are: Fabio Palomba, Pasquale Salza, Adelina Ciurumelea, Sebastiano Panichella, Harald Gall, Filomena Ferrucci and Andrea De Lucia.

Abstract: “Researchers have proposed several approaches to extract information from user reviews useful for maintaining and evolving mobile apps. However, most of them just perform automatic classification of user reviews according to specific keywords (e.g., bugs, features). Moreover, they do not provide any support for linking user feedback to the source code components to be changed, thus requiring a manual, time-consuming, and error-prone task.

screen-shot-2016-12-13-at-12-35-37
In this paper, we introduce ChangeAdvisor, a novel approach that analyzes the structure, semantics, and sentiments of sentences contained in user reviews to extract useful (user) feedback from maintenance perspectives and recommend to developers changes to software artifacts. It relies on natural language processing and clustering algorithms to group user reviews around similar user needs and suggestions for change. Then, it involves textual based heuristics to determine the code artifacts that need to be maintained according to the recommended software changes. The quantitative and qualitative studies carried out on 44683 user reviews of 10 open source mobile apps and their original developers showed a high accuracy of ChangeAdvisor in (i) clustering similar user change requests and (iii) identifying the code components impacted by the suggested changes.

Moreover, the obtained results show that ChangeAdvisor is more accurate than a baseline approach for linking user feedback clusters to the source code in terms of both precision +47%) and recall (+38%).”

Also in this case a preprint of the paper will be available soon.

“Reducing Redundancies in Multi-Revision Code Analysis” @ SANER’17

We’re happy to announce that the paper

“Reducing Redundancies in Multi-Revision Code Analysis”

written by Carol V. Alexandru, Sebastiano Panichella and Harald C. Gall, has been accepted into the technical research track of SANER 2017.

Abstract:

Software engineering research often requires analyzing multiple revisions of several software projects, be it to make and test predictions or to observe and identify patterns in how software evolves. However, code analysis tools are almost exclusively designed for the analysis of one specific version of the code, and the time and resources requirements grow linearly with each additional revision to be analyzed. Thus, code studies often observe a relatively small number of revisions and projects. Furthermore, each programming ecosystem provides dedicated tools, hence researchers typically only analyze code of one language, even when researching topics that should generalize to other ecosystems. To alleviate these issues, frameworks and models have been developed to combine analysis tools or automate the analysis of multiple revisions, but little research has gone into actually removing redundancies in multi-revision, multi-language code analysis. We present a novel end-to-end approach that systematically avoids redundancies every step of the way: when reading sources from version control, during parsing, in the internal code representation, and during the actual analysis. We evaluate our open-source implementation, LISA, on the full history of 300 projects, written in 3 different programming languages, computing basic code metrics for over 1.1 million program revisions. When analyzing many revisions, LISA requires less than a second on average to compute basic code metrics for all files in a single revision, even for projects consisting of millions of lines of code.

Use and extend LISA: https://bitbucket.org/sealuzh/lisa

Or try out LISA using a simple template: https://bitbucket.org/sealuzh/lisa-quickstart

 

“Analyzing Reviews and Code of Mobile Apps for better Release Planning” @ SANER 2017

We’re happy to announce that the paper “Analyzing Reviews and Code of Mobile Apps for better Release Planning” has been accepted into SANER 2017 as a full paper. The authors of the paper are:   Adelina Ciurumelea, Andreas Schaufelbühl, Sebastiano Panichella and Harald Gall.

Abstract:

The mobile applications industry experiences an unprecedented high growth, developers working in this context face a fierce competition in acquiring and retaining users. They have to quickly implement new features and fix bugs, or risks losing their users to the competition. To achieve this goal they must closely monitor and analyze the user feedback they receive in form of reviews. However, successful apps can receive up to several thousands of reviews per day,  manually analysing each of them is a time consuming task.

urr

To help developers deal with the large amount of available data, we manually analyzed the text of 1566 user reviews and defined a high and low level taxonomy containing mobile specific categories (e.g. performance, resources, battery, memory, etc.) highly relevant for developers during the planning of maintenance and evolution activities. Then we built the User Request Referencer (URR) prototype, using Machine Learning and Information Retrieval techniques, to automatically classify reviews according to our taxonomy and recommend for a particular review what are the source code artifacts that need to be modified to handle the issue described in the user review. We evaluated our approach through an empirical study involving the reviews and code of 39 mobile applications. Our results show a high precision and recall of URR in organising reviews according to the defined taxonomy. Furthermore, we discovered during the evaluation that using information about the specific structure of mobile software projects (e.g. how to find source code implementing the UI) improves the source code localization results”.

“Using (Bio)Metrics to Predict Code Quality” is currently one of the most downloaded articles in software engineering

We are happy to announce that our paper “Using (Bio)Metrics to Predict Code Quality Online”, written by Sebastian Müller and Thomas Fritz, was one of the most downloaded software engineering articles in June and July 2016. With 1709 downloads in 6 weeks, it scored the second place of all ACM software engineering articles. According to ACM, this is the first time that any paper was downloaded more than 1000 times.

screen-shot-2016-10-05-at-14-18-11

Image source: ACM SIGSOFT Software Engineering Notes. Volume 41 Number 4.

The paper investigates the use of biometrics, such as heart rate variability (HRV) or electro-dermal activity (EDA) to determine the difficulty that developers experience while working on real world change tasks and automatically identify code quality concerns while a developer is making a change to the code. It can be accessed here.

ARdoc: App Reviews Development Oriented Classifier @ FSE 2016

We are happy to announce that the paper “ARdoc: App Reviews Development Oriented Classifier” got accepted at the FSE 2016 Demonstrations Track! The authors of the paper are: Sebastiano Panichella, Andrea Di Sorbo, Emitza Guzman, Corrado Aaron Visaggio, Gerardo Canfora and Harald Gall.

The paper presents ARdoc (App Reviews Development Oriented Classifier) a Java tool that automatically recognizes natural language fragments in user reviews that are relevant for developers to evolve their applications. Specifically, natural language fragments are extracted according to a taxonomy of app reviews categories that are relevant to software maintenance and evolution. The categories were defined in our previous paper entitled “How Can I Improve My App? Classifying User Reviews for Software Maintenance and Evolution” ) and are: (i) Information Giving, (ii) Information Seeking, (iii) Feature Request and (iv) Problem Discovery. ARdoc implements an approach that merges three techniques: (1) Natural Language Processing, (2) Text Analysis and (3) Sentiment Analysis
(SA) to automatically classify useful feedback contained in app reviews important for performing software maintenance and evolution tasks.

Our quantitative and qualitative analysis (involving mobile professional developers) demonstrate that ARdoc correctly classifies feedback useful for maintenance perspectives in user reviews with high precision (ranging between 84% and 89%), recall (ranging between 84% and 89%), and an F-Measure (ranging between 84% and 89%). While evaluating our tool we also found that ARdoc substantially helps to extract important maintenance tasks for real world applications.

This video provides a short demonstration of ARdoc:

ARdoc is available for download at http://www.ifi.uzh.ch/en/seal/people/panichella/tools/ARdoc.html

What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes @ FSE 2016

We’re happy to announce that the paper

“What Would Users Change in My App? Summarizing App Reviews for
Recommending Software Changes” has been accepted into FSE 2016 as a full paper. The authors of the paper are:  Andrea Di Sorbo, Sebastiano Panichella,
Carol Alexandru, Junji Shimagaki, Corrado Visaggio, Gerardo Canfora and Harald
Gall.

Abstract:
Mobile app developers constantly monitor feedback in user reviews with the goal of improving their mobile apps and better meeting user expectations. Thus, automated approaches have been proposed in literature with the aim of reducing the effort required for analyzing feedback contained in user reviews via automatic classification (or prioritization) according to specific topics (e.g., bugs, features etc.).

ApproachOverview

In this paper, we introduce SURF (Summarizer of User Reviews Feedback), a novel approach to condense the enormous amount of information that developers of popular apps have to manage due to user feedback received on a daily basis. SURF relies on a conceptual model for capturing user needs useful for developers performing maintenance and evolution tasks. Then it uses sophisticated summarisation techniques for summarizing thousands of reviews and generating an interactive, structured and condensed agenda of recommended software changes. We performed an end-to-end evaluation of SURF on user reviews of 17 mobile apps (5 of them developed by Sony Mobile), involving 23 developers and researchers in total. Results demonstrate high accuracy of SURF in summarizing reviews and the usefulness of the recommended changes. In evaluating our approach we found that SURF helps developers in better understanding user needs, substantially reducing the time required by developers compared to manually analyzing user (change) requests and planning future software changes.

Journal of Systems and Software: Eye Gaze and Interaction Contexts for Change Tasks – Observations and Potential

The more we know about software developers’ detailed navigation behavior for change
tasks, the better we are able to provide effective tool support. In this article, we extend our work on the fine-granular navigation behavior of developers (see blogpost) and explore the potential of the more detailed and fine-granular data by examining the use of the captured change task context to predict perceived task difficulty and to provide better and more fine-grained navigation recommendations.

 

Check out our Journal article!