Humboldt-Universität zu Berlin - High Dimensional Nonstationary Time Series

IRTG1792DP2019 022

A Machine Learning Approach Towards Startup Success Prediction

Cemre Ünal
Ioana Ceasu

Abstract:
The importance of startups for a dynamic, innovative and competitive economy has
already been acknowledged in the scientific and business literature. The highly
uncertain and volatile nature of the startup ecosystem makes the evaluation of
startup success through analysis and interpretation of information very time
consuming and computationally intensive. This prediction problem brings forward
the need for a quantitative model, which should enable an objective and fact-
based approach to startup success prediction. This paper presents a series of
reproducible models for startup success prediction, using machine learning
methods. The data used for this purpose was received from the online investor
platform, crunchbase.com. The data has been pre-processed for sampling bias and
imbalance by using the oversampling approach, ADASYN. A total of six different
models are implemented to predict startup success. Using goodness-of-fit
measures, applicable to each model case, the best models selected are the
ensemble methods, random forest and extreme gradient boosting with a test set
prediction accuracy of 94.1% and 94.5% and AUC of 92.22% and 92.91%
respectively. Top variables in these models are last funding to date, first
funding lag and company age. The models presented in this study can be used to
predict success rate for future new firms/ventures in a repeatable way.

Keywords:
Machine learning

JEL Classification:
C00