Humboldt-Universität zu Berlin - Statistics

Humboldt-Universität zu Berlin | School of Business and Economics | Statistics | News | Paper "Predicting question difficulty in web surveys: A machine-learning approach based on mouse movement features" by Fernandez-Fontelo, Kieslich, Henninger, Kreuter and Greven will appear in Social Science Computer Review

Paper "Predicting question difficulty in web surveys: A machine-learning approach based on mouse movement features" by Fernandez-Fontelo, Kieslich, Henninger, Kreuter and Greven will appear in Social Science Computer Review



The paper  "Predicting question difficulty in web surveys: A machine-learning approach based on mouse movement features" by A. Fernandez-Fontelo, P.J. Kieslich, F. Henninger, F. Kreuter and S. Greven will appear in Social Science Computer Review.

 

Abstract

Survey research aims to collect robust and reliable data from respondents. However, despite researchers’ efforts in designing questionnaires, survey instruments may be imperfect, and question structure not as clear as could be, thus creating a burden for respondents. If it were possible to detect such problems, this knowledge could be used to predict problems in a questionnaire during pre-testing, inform real-time interventions through responsive questionnaire design, or to indicate and correct measurement error after the fact. Previous research has used paradata, specifically response times, to detect difficulties and help improve user experience and data quality. Today richer data sources are available, for example movements respondents make with their mouse, as an additional detailed indicator for the respondent-survey interaction. This paper uses machine learning techniques to explore the predictive value of mouse-tracking data with regard to a question’s difficulty. We use data from a survey on respondents’ employment history and demographic information, in which we experimentally manipulate the difficulty of several questions. Using measures derived from mouse movements, we predict whether respondents have answered the easy or difficult version of a question, using and comparing several state-of-the-art supervised learning methods. We have also developed a personalization method that adjusts for respondents’ baseline mouse behavior, and evaluate its performance. For all three manipulated survey questions, we find that including the full set of mouse movement measures and accounting for individual differences in these measures improve prediction performance over response-time-only models.

 

DOI: 10.1177/08944393211032950/ ID: SSCR-20-0177.R2