Faiz Ali Shah: “Extracting Information from App Reviews to Facilitate Software Development Activities”

On the 21st of February, Faiz Ali Shah defended his PhD thesis on “Extracting information from app reviews to facilitate software development activities” which evaluates the existing Text Mining techniques for extracting developer-relevant information from app reviews at different levels of granularity, and then combined these techniques to develop a tool REVSUM for competitive analysis.

Faiz Ali Shah

For software companies, it is extremely important to continuously evaluate the needs and expectations of their users to improve app quality. A convenient and useful source of information to re-access evolving user needs is app reviews in which users express their opinions on various aspects of an app. Prior studies have shown that users mention useful information in app reviews such as bug reports, feature requests, and feature evaluation. Such information can be fed back to various activities performed during the mobile development release cycle. As app marketplaces receive a large volume of user reviews received every day, automatic methods are needed to find such relevant information in user reviews.

Machine learning-based text classification models can be used to automatically categorize review text into developer-relevant information such as feature requests and bug reports. Compared to review classification models using rich linguistic features (i.e., part-of-speech tagger, constituency parse tree, and semantic dependency graph) for model training, classification models using words in review text as features also called Bag-of-Word (BoW) features are fast to train and easy to adapt to other review languages. In this direction, Shah’s research performed experiments to compare the performance of simple models using BoW features to models with rich linguistic features and models built on deep learning architectures, i.e., Convolutional Neural Network (CNN). The results of the experiments have shown that simple models can achieve almost the same performance as complex models for classifying app review information.

One can also perform a fine-grained analysis of app reviews at the level of app features. Such an analysis can help developers in understanding user’s perception towards delivered app features and it can also help to find out a set of newly requested app features and buggy features. The most crucial component for generating feature-level summaries is the automatic extraction of app features from user reviews. Several methods have been used for extracting app features automatically from app reviews such as rule-based, unsupervised and supervised machine learning methods.

In this thesis, Shah’s research investigated various factors influencing the performance of automatic app feature extraction methods, i.e. rule-based and supervised machine learning, he first established a baseline in a single experimental setting and then compared the performances in different experimental settings (i.e., varying annotated datasets and evaluation methods). Since the performance of supervised feature extraction methods is more sensitive than rule-based methods to (1) guidelines used to annotate app features in user reviews and (2) the size of the annotated data, Shah studied their impact on the performance of supervised feature extraction models by simulating changes in the existing annotation guidelines (AGs) and suggested new AGs that have the potential to improve the performance and quality of app feature extraction.

To make the research results of the thesis project also applicable to non-experts, Shah’s thesis developed a proof-of-concept tool called REVSUM for comparing competing apps. The tool combines review classification and app feature extraction methods and supports three typical use cases, i.e., viewing users’ sentiments toward app features in competing apps (UC 1), viewing features that were mentioned in reviews classified as bug reports in competing apps (UC 2), and viewing features that were requested by users in competing apps (UC 3). The tool has been evaluated by developers from industry who perceived it useful for extracting information relevant to software development activities.

DSpace