Humboldt-Universität zu Berlin - High Dimensional Nonstationary Time Series


Topic Modeling for Analyzing Open-Ended Survey Responses

Andra-Selina Pietsch
Stefan Lessmann

Open-ended responses are widely used in market research studies. Processing of such
responses requires labor-intensive human coding. This paper focuses on unsupervised topic
models and tests their ability to automate the analysis of open-ended responses. Since state-ofthe-
art topic models struggle with the shortness of open-ended responses, the paper considers
three novel short text topic models: Latent Feature Latent Dirichlet Allocation, Biterm Topic
Model and Word Network Topic Model. The models are fitted and evaluated on a set of realworld
open-ended responses provided by a market research company. Multiple components
such as topic coherence and document classification are quantitatively and qualitatively
evaluated to appraise whether topic models can replace human coding. The results suggest that
topic models are a viable alternative for open-ended response coding. However, their
usefulness is limited when a correct one-to-one mapping of responses and topics or the exact
topic distribution is needed.

Market research, open-ended responses, text analytics, short text topic models

JEL Classification: