## B3 - Dynamic panel data modeling

Increasingly, researchers in economics and other social sciences have access to large panel data sets typically involving a large number of observational units (e.g. workers, firms, students, regions) which are observed over time. The econometric analysis of such data has fuelled a large methodological literature on microeconometric methods, see e.g. the graduate textbook by Woodridge (2010). The economic variables studied are typically the outcome of dynamic decisions by heterogeneous agents. Thus, panel data involve high dimensional and often nonstationary time series data. The research projects proposed in B3 suggest methodological improvements of standard microeconometric practice in dynamic panel data modelling and explore the usefulness of these suggestions based on specific research questions in economics.

Standard microeconometric modelling approaches consider very simple time series processes and researchers often do not account for the heterogeneity of these processes across observational units and their variation over time. For instance, the methods described by Wooldridge (2010) mostly build on parametric linear and nonlinear modelling approaches and they involve quite simple time series models. It is common practice to account for differences across individuals by time-invariant individual specific fixed effects. Two approaches are to model the individual specific effects as random effects or to estimate the fixed effects as unknown parameters, e.g., interactive fixed effect models Bai (2009), which leads to the well-known incidental parameter problem in nonlinear estimation problems. In addition, as a permanent-transitory decomposition, applied econometricians often assume, in addition, a stationary time series process (often i.i.d.), whose unknown parameters do not differ across individuals. Heterogeneity in the cross-sectional dependence in the innovations of the processes is rarely modelled.

One project aims to implement high dimensional microeconometric models, which account for rich heterogeneity across individuals and which allow for endogenous covariates. In the literature, Lasso estimators have been proposed in this setting. A key challenge when using such sparse estimators, however, is that they do not have a tractable limiting distribution. It is yet unclear how to perform statistical tests and assign uncertainty in high dimensional models with endogenous covariates or that have a panel structure. One particular way is to consider modelling panel data in a semiparametric factor model framework, where the high dimensional panel responses are decomposed into low dimensional factors with loadings. One also runs into challenges with selecting the number of factors. Moreover, we would like to quantify the effects of discretized shocks or structural changes on our underlying spatial-temporal models. Lastly, it would also be extremely interesting to investigate how the abovementioned methodology can be extended to multiple response models.

**Coordination**

**Bernd Fitzenberger: **His research interests are microeconometrics (panel data regression, quantile regression) and empirical applications in labour economics and the economics of education. Part of his research focuses on the econometric analysis of wage inequality

**Christoph Breunig**: His research interests are in developing methodology to account for econometric analysis of unobserved individual heterogeneity and endogenous selection.

**Weining Wang: **Her research interest are on financial econometrics and statistics. In particular, her research includes topics like non and semiparametric statistics, network models, high dimensional time series analysis, spatial temporal copula models, etc.

**Exemplary PhD-Theses**

- A projection based panel data model with interactive fixed effects

This thesis considers a new estimator of the regression parameters in a panel data model with interactive ﬁxed effects. The new estimator adopts the well-known form of partial least square estimation. Instead of projecting on the orthogonal spaces of factors, we project the (smoothed) data matrix onto the orthogonal linear subspace spanned by the covariates. This facilitates us to have a direct estimator for the interactive ﬁxed effect parameters without the needs of estimating factors. In addition, one shows that the estimator is √NT consistent.

- Modelling Wage Processes

Distinguishing empirically the permanent and the transitory components of wages over time plays a key role in the understanding of the evolution of wage inequality. When modelling wage processes, researchers have to take account of the fact that observed changes over time also involve an age component and changes in other labour market characteristics. Panel attrition and the effect of selection into employment are further issues to consider. Recent research relies on parsimonious parametric specifications of the time series process describing unexplained wage changes in a regression model, see Meghir and Pistaferri (2011), Cappellari and Leonardi (2015). This thesis will develop and apply semiparametric extensions to modelling wage processes. In particular, the parameters of the processes will be allowed to depend upon the characteristics of the worker and the job. Furthermore, the thesis will take account of selection issues in modelling wage processes. The application will be based on social security data for Germany.

- Inference in High Dimensional Instrumental Variable Models

High dimensional instrumental variable models have recently drawn a large amount of attention in the econometric literature. One of the main reasons is that economic theory is not explicit enough about which variables belong to the true model. Moreover, high dimensional models account for rich heterogeneity that one wants to control. A key challenge for implementation of sparse estimators, such as the Lasso, is that they do not have a tractable limiting distribution. Very little work has been done to construct confidence intervals, statistical testing and assigning uncertainty in high dimensional sparse models, in particular, in the context of instrumental variable estimation. So far, the so called Post-Lasso procedures was proposed by Belloni et al. (2012), while an alternative approach relies on desparsification of the Lasso as in Geer et al. (2014). The aim of this thesis is to compare both approaches, firstly, in an analytical framework and secondly, in a finite sample simulation study.

**References**

- Belloni A, Chen D, Chernozhukov V, Hansen C (2012) Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica, 80(6), 2369-2429, DOI: 10.3982/ECTA9626
- Cappellari L, Leonardi M (2015) Earnings instability and tenure. The Scandinavian Journal of Economics, 118(2), 202-234, DOI: 10.1111/sjoe.12142.
- Meghir C, Pistaferri L (2011) Earnings, consumption and life cycle choices. Handbook of labor economics, 4, 773-854, http://dx.doi.org/10.1016/S0169-7218(11)02407-5
- Van de Geer S, Bühlmann P, Ritov Y, Dezeure R (2014) On asymptotically optimal confidence regions and tests for high-dimensional models. Annals of Statistics, 42(3), 1166-1202, DOI: doi:10.1214/14-AOS1221.
- Woodridge J M (2010) Econometric Analysis of Cross Section and Panel Data. 2nd edition, Cambridge, MA: MIT Press.