Humboldt-Universität zu Berlin - High Dimensional Nonstationary Time Series

IRTG1792DP2018 063

Causal Inference using Machine Learning. An Evaluation of recent Methods through Simulations

Daniel Jacob
Stefan Lessmann
Wolfgang Karl Härdle

Abstract
The estimation of a causal parameter in a high-dimensional setting where the
functions are potentially complex is a challenging task. Parametric and linear
modelling is not sufficient to generate unbiased and consistent estimators. Modern
approaches, therefore, use machine learning (ML) algorithms to learn these nuisance
functions. However, this leads to new problems like the regularization bias or
overfitting that are common when using ML models.
This paper considers different novel methods that overcome these problems or at
least address them. These methods differ in terms of the target parameter, namely
the average treatment effect of the population, group heterogeneity or the conditional
average treatment effect for each individual. Each method is first investigated and
tested separately and second, they are compared among each other. To do this in a
disciplined manner, simulations with synthetic data are used. This ensures that all
distributions of the generated treatment effect parameters are known. The findings
are that each method has its limits in terms of unbiased estimation, the detection
of heterogeneity and also the determination of which covariates are responsible for
different causal effects.

Keywords:
causal inference, machine learning, simulation study, sample-splitting
double machine learning, sorted group ATE (GATES), causal tree

JEL Classification:
C01, C14, C31, C63