## A1 - Structuring high dimensional time series

Econometric research and market analytics often deals with high dimensional data of a different nature. The stylized features of such data can be unknown dependence between components, time inhomogeneity in mean and in volatility, structural changes. Standard parametric methods like VAR face the “curse of dimensionality” problem, some structural assumptions can be used to reduce the complexity of the considered models. Independent Components (ICA) and Non-Gaussian Component Analysis (NCGA) provide useful examples of such a structural approach. Also, tail dependency through copulae moves into the focus of defining risk structures of high dimensional nonstationary time series.

Organizing time series data into groups (clusters) is one of the most fundamental modes of understanding the underlying structure of the data and gaining insight into data. The goal of clustering is to identify structure in an unlabelled data set by objectively organizing them into homogeneous groups. There appears to be increased interest in time series clustering as a task in data mining research. Although a large number of clustering techniques have been developed in statistics and data mining, we aim to develop a new clustering technique, which is fully adaptive to the unknown number of clusters, structure of the clusters etc. and applies equally well to convex and shaped clusters of different type and density.

Stylized features of economic data are unknown dependence between components, time inhomogeneity in mean and in volatility, structural changes. In real life applications, it is often the case that a meaningful part of the data is structured and structural assumptions can be used to significantly reduce the complexity of the considered models. A number of useful approaches have been developed in the literature to fulfil this task. Independent Components Analysis (ICA) is one unsupervised learning method, which allows for the separation of the data of interest from independent uninformative noise that it is contaminated with. Non-Gaussian Component Analysis (NCGA) is another useful unsupervised method, which suggests that the Gaussian components usually bear no information and the non-Gaussian components are believed to be meaningful, Diederichs et al (2013). This brings a promising perspective into modelling of dependency through copulae and moves into focus the defining of risk structures of high dimensional nonstationary time series.

**Coordination**

**Vladimir Spokoiny**: His research interests are mathematical statistics and econometrics (time series, dimension reduction, error-in-variable and instrumental regression) with various applications.

**Ying Fang**: His main interests are Econometrics, Applied Econometrics, Economy of China. His research includes work in nonparametric and semi-parametric method, panel data analysis, and instrumental variable selection.

**Exemplary PhD-Theses**

1. Inference for spectral projectors of sample covariance matrix

Let X_1,…,X_n be an i.i.d. sample in R_p with zero mean and the covariance matrix Σ. The problem of recovering the projector onto the eigenspace of Σ from these observations naturally arises in many applications. Recent technique from Koltchinskii and Lounici (2015) helps to study the asymptotic distribution of the distance in the Frobenius norm |P_r − Q_r|_2 between the true projector P_r on the subspace of the rth eigenvalue and its empirical counterpart Q_r in terms of the effective trace of Σ. A challenging task is to design a bootstrap procedure for building sharp confidence sets for the true projector P_r from the given data. This procedure should not rely on the asymptotic distribution of |P_r − Q_r|_2 and its moments, which will justify the method for small or moderate sample size n and large dimension p. The research tasks include the theoretical study of validity of the proposed procedure for finite samples with an explicit error bound on the error of bootstrap approximation and applications to multivariate econometric data.

2. Structural modelling in high dimensional time series

A typical problem in econometric and financial applications is to understand the structural dependence of the observed response Y_t on the explanatory process X_t. In the case of a high dimensional process X_t, the problem faces the curse of dimensionality and some structural assumptions have to be used to reduce the complexity of the model. Typical examples are given by single-index, multi-index, dynamic factor models. If the processes X and Y are nonstationary, the problem becomes even more involved. The thesis focuses on inference on the projector on the effective dimension reduction (edr) space, K.C. Liu (1991). The task is to design a data-driven resampling procedure to identify the dimensionality of the e.d.r. space and of the corresponding projector.

A typical problem in econometric and financial applications is to understand the structural dependence of the observed response Y_t on the explanatory process X_t. In the case of a high dimensional process X_t, the problem faces the curse of dimensionality and some structural assumptions have to be used to reduce the complexity of the model. Typical examples are given by single-index, multi-index, dynamic factor models. If the processes X and Y are nonstationary, the problem becomes even more involved. The thesis focuses on inference on the projector on the effective dimension reduction (edr) space, K.C. Liu (1991). The task is to design a data-driven resampling procedure to identify the dimensionality of the e.d.r. space and of the corresponding projector.

**References**

• Diederichs E, Juditsky A, Nemirovski A, Spokoiny V (2013) Sparse non Gaussian component analysis by semidefinite programming. Machine Learning, 91 (2), 211-238, DOI: 10.1007/s10994-013-5331-1.

• Koltchinskii V, Lounici K (2015) Normal approximation and concentration of spectral projectors of sample covariance. arXiv preprint arXiv:1504.07333.

• Li KC (1991) Sliced inverse regression for dimension reduction. J. Amer. Statist. Assoc, 86, 316–327, DOI: 10.2307/2290563.