Humboldt-Universität zu Berlin - High Dimensional Nonstationary Time Series

Abstracts

IRTG 1792 DP Abstracts

IRTG1792DP2019 016

What makes cryptocurrencies special? Investor sentiment and return
predictability during the bubble

Cathy Yi-Hsuan Chen
Roméo Després
Li Guo
Thomas Renault

Abstract:
The 2017 bubble on the cryptocurrency market recalls our memory in the dot-com
bubble, during which hard-to-measure fundamentals and investors’ illusion for
brand new technologies led to overvalued prices. Benefiting from the massive
increase in the volume of messages published on social media and message boards,
we examine the impact of investor sentiment, conditional on bubble regimes, on
cryptocurrencies aggregate return prediction. Constructing a crypto-specific
lexicon and using a local-momentum autoregression model, we find that the
sentiment effect is prolonged and sustained during the bubble while it turns out
a reversal effect once the bubble collapsed. The out-of-sample analysis along
with portfolio analysis is conducted in this study. When measuring investor
sentiment for a new type of asset such as cryptocurrencies, we highlight that
the impact of investor sentiment on cryptocurrency returns is conditional on
bubble regimes.

Keywords:
Cryptocurrency; Sentiment; Bubble; Return Predictability

JEL Classification:
G02; G10; G12

IRTG1792DP2019 017

Portmanteau Test and Simultaneous Inference for Serial Covariances

Han Xiao
Wei Biao Wu

Abstract:
The paper presents a systematic theory for asymptotic inferences based on
autocovariances of stationary processes. We consider nonparametric tests for se
rial correlations using the maximum and the quadratic deviations of sample
autocovariances. For these cases, with proper centering and rescaling, the
asymptotic distributions of the deviations are Gumbel and Gaussian, respec
tively. To establish such an asymptotic theory, as byproducts, we develop a
normal comparison principle and propose a sufficient condition for summability
of joint cumulants of stationary processes. We adapt a blocks of blocks
bootstrapping procedure proposed by Kuensch (1989) and Liu and Singh (1992) to
the maximum deviation based tests to improve the finite-sample performance.

Keywords:
Autocovariance, blocks of blocks bootstrapping, Box-Pierce test, extreme value
distribution, moderate deviation, normal comparison, physical dependence
measure, short range dependence, stationary process, summability of cumulants

JEL Classification:
C00

IRTG1792DP2019 018

Phenotypic convergence of cryptocurrencies

Daniel Traian Pele
Niels Wesselhöfft
Wolfgang K. Härdle
Michalis Kolossiatis
Yannis Yatracos

Abstract:
The aim of this paper is to prove the phenotypic convergence of
cryptocurrencies, in the sense that individual cryptocurrencies respond to
similar selection pressures by developing similar characteristics. In order to
retrieve the cryptocurrencies phenotype, we treat cryptocurrencies as financial
instruments (genus proximum) and find their specific difference (differentia
specifica) by using the daily time series of log-returns. In this sense, a daily time
series of asset returns (either cryptocurrencies or classical assets) can be
characterized by a multidimensional vector with statistical components like
volatility, skewness, kurtosis, tail probability, quantiles, conditional tail
expectation or fractal dimension. By using dimension reduction techniques
(Factor Analysis) and classification models (Binary Logistic Regression,
Discriminant Analysis, Support Vector Machines, K-means clustering, Variance
Components Split methods) for a representative sample of cryptocurrencies,
stocks, exchange rates and commodities, we are able to classify cryptocurrencies
as a new asset class with unique features in the tails of the log-returns
distribution. The main result of our paper is the complete separation of the
cryptocurrencies from the other type of assets, by using the Maximum Variance
Components Split method. More, we observe a divergent evolution of the
cryptocurrencies species, compared to the classical assets, mainly due to the
tails behaviour of the log-returns distribution. The codes used here are
available via www.quantlet.de.

Keywords:
cryptocurrency, genus proximum, differentia specifica, classification,
multivariate analysis, factor models, phenotypic convergence, divergent
evolution

JEL Classification:
C14, C22, C46, C53, G32

IRTG1792DP2019 019

Modelling Systemic Risk Using Neural Network Quantile Regression

Georg Keilbar
Weining Wang

Abstract:
We propose an approach to calibrate the conditional value-at-risk (CoVaR) of
financial institutions based on neural network quantile regression. Building on
the estimation results we model systemic risk spillover effects across banks by
considering the marginal effects of the quantile regression procedure. We adopt a
dropout regularization procedure to remedy the well-known issue of overfitting
for neural networks, and we provide empirical evidence for the favorable out-of-
sample performance of a regularized neural network. We then propose three
measures for systemic risk from our fitted results. We find that systemic risk
increases sharply during the height of the financial crisis in 2008 and again
after a short period of easing in 2011 and 2015. Our approach also allows
identifying systemically relevant firms during the financial crisis.

Keywords:
Systemic risk, CoVaR, Quantile regression, Neural networks

JEL Classification:
C00

IRTG1792DP2019 020

Rise of the Machines? Intraday High-Frequency Trading Patterns of
Cryptocurrencies

Alla A. Petukhina
Raphael C. G. Reule
Wolfgang Karl Härdle

Abstract:
This research analyses high-frequency data of the cryptocurrency market in
regards to intraday trading patterns. We study trading quantitatives such as
returns, traded volumes, volatility periodicity, and provide summary statistics
of return correlations to CRIX (CRyptocurrency IndeX), as well as respective
overall high-frequency based market statistics. Our results provide mandatory
insight into a market, where the grand scale employment of automated trading
algorithms and the extremely rapid execution of trades might seem to be a
standard based on media reports. Our findings on intraday momentum of trading
patterns lead to a new view on approaching the predictability of economic value
in this new digital market.

Keywords:
Cryptocurrency, High-Frequency Trading, Algorithmic Trading, Liquidity,
Volatility, Price Impact, CRIX

JEL Classification:
G02, G11, G12, G14, G15, G23

IRTG1792DP2019 021

FRM Financial Risk Meter

Andrija Mihoci
Michael Althof
Cathy Yi-Hsuan Chen
Wolfgang Karl Hardle

Abstract:
A daily systemic risk measure is proposed accounting for links and mutual
dependencies between financial institutions utilising tail event information. FRM
(Financial Risk Meter) is based on Lasso quantile regression designed to capture
tail event co-movements. The FRM focus lies on understanding active set data
characteristics and the presentation of interdependencies in a network topology.
Two FRM indices are presented, namely, FRM@Americas and FRM@Europe. The FRM
indices detect systemic risk at selected areas and identifies risk factors. In
practice, FRM is applied to the return time series of selected financial
institutions and macroeconomic risk factors. Using FRM on a daily basis, we
identify companies exhibiting extreme "co-stress", as well as "activators" of
stress. With the SRM@EuroArea, we extend to the government bond asset class. FRM
is a good predictor for recession probabilities, constituting the FRM-implied
recession probabilities. Thereby, FRM indicates tail event behaviour in a
network of financial risk factors.

Keywords:
Systemic Risk, Quantile Regression, Financial Markets, Risk Management, Network
Dynamics, Recession

JEL Classification:
C00

IRTG1792DP2019-021-1

FRM Financial Risk Meter

Andrija Mihoci
Michael Althof
Cathy Yi-Hsuan Chen
Wolfgang Karl Härdle

Abstract:
A daily systemic risk measure is proposed accounting for links and mutual
dependencies between financial institutions utilising tail event information. FRM
(Financial Risk Meter) is based on Lasso quantile regression designed to capture
tail event co-movements. The FRM focus lies on understanding active set data
characteristics and the presentation of interdependencies in a network topology.
Two FRM indices are presented, namely, FRM@Americas and FRM@Europe. The FRM
indices detect systemic risk at selected areas and identifies risk factors. In
practice, FRM is applied to the return time series of selected financial
institutions and macroeconomic risk factors. Using FRM on a daily basis, we
identify companies exhibiting extreme "co-stress", as well as "activators" of
stress. With the SRM@EuroArea, we extend to the government bond asset class. FRM
is a good predictor for recession probabilities, constituting the FRM-implied
recession probabilities. Thereby, FRM indicates tail event behaviour in a
network of financial risk factors.

Keywords:
Systemic Risk, Quantile Regression, Financial Markets, Risk Management, Network
Dynamics, Recession

JEL Classification:
C00

IRTG1792DP2019 022

A Machine Learning Approach Towards Startup Success Prediction

Cemre Ünal
Ioana Ceasu

Abstract:
The importance of startups for a dynamic, innovative and competitive economy has
already been acknowledged in the scientific and business literature. The highly
uncertain and volatile nature of the startup ecosystem makes the evaluation of
startup success through analysis and interpretation of information very time
consuming and computationally intensive. This prediction problem brings forward
the need for a quantitative model, which should enable an objective and fact-
based approach to startup success prediction. This paper presents a series of
reproducible models for startup success prediction, using machine learning
methods. The data used for this purpose was received from the online investor
platform, crunchbase.com. The data has been pre-processed for sampling bias and
imbalance by using the oversampling approach, ADASYN. A total of six different
models are implemented to predict startup success. Using goodness-of-fit
measures, applicable to each model case, the best models selected are the
ensemble methods, random forest and extreme gradient boosting with a test set
prediction accuracy of 94.1% and 94.5% and AUC of 92.22% and 92.91%
respectively. Top variables in these models are last funding to date, first
funding lag and company age. The models presented in this study can be used to
predict success rate for future new firms/ventures in a repeatable way.

Keywords:
Machine learning

JEL Classification:
C00

IRTG1792DP2019 023

Can Deep Learning Predict Risky Retail Investors? A Case Study in Financial Risk
Behavior Forecasting

A. Kolesnikova
Y. Yang
S. Lessmann
T. Ma
M.-C. Sung
J.E.V. Johnson

Abstract:
The paper examines the potential of deep learning to produce decision support
models from structured, tabular data. Considering the context of financial risk
management, we develop a deep learning model for predicting whether individual
spread traders are likely to secure profits from future trades. This embodies
typical modeling challenges faced in risk and behavior forecasting. Conventional
machine learning requires data that is representative of the feature-target
relationship and relies on the often costly development, maintenance, and
revision of handcrafted features. Consequently, modeling highly variable,
heterogeneous patterns such as the behavior of traders is challenging. Deep
learning promises a remedy. Learning hierarchical distributed representations of
the raw data in an automatic manner (e.g. risk taking behavior), it uncovers
generative features that determine the target (e.g., trader’s profitability),
avoids manual feature engineering, and is more robust toward change (e.g.
dynamic market conditions). The results of employing a deep network for
operational risk forecasting confirm the feature learning capability of deep
learning, provide guidance on designing a suitable network architecture and
demonstrate the superiority of deep learning over machine learning and rule-
based benchmarks.

Keywords:
risk management, retail finance, forecasting, deep learning

JEL Classification:
C00

IRTG1792DP2019 024

Risk of Bitcoin Market: Volatility, Jumps, and Forecasts

Junjie Hu
Weiyu Kuo
Wolfgang Karl Härdle

Abstract:
Among all the emerging markets, the cryptocurrency market is considered the most
controversial and simultaneously the most interesting one. The visibly
significant market capitalization of cryptos motivates modern financial
instruments such as futures and options. Those will depend on the dynamics,
volatility, or even the jumps of cryptos. In this paper, the risk
characteristics for Bitcoin are analyzed from a realized volatility dynamics
view. The realized variance RV is estimated with (threshold-)jump components
(T)J, semivariance RSV+(−) , and signed jumps (T)J+(−) . Our empirical results
show that the Bitcoin market is far riskier than any other developed financial
market. Up to 68% of the sample days are identified to entangle jumps. However,
the discontinuities do not contribute to the variance significantly. By
employing a 90-day rolling-window method, the in-sample evidence suggests that
the impacts of predictors change over time systematically under HAR-type models.
The out-of-sample forecasting results reveal that the forecasting horizon
plays an important role in choosing forecasting models. For long-horizon risk
forecast, a finer model calibrated with jumps gives extra utility up to 20 basis
points annually, while an approach based on the roughest estimators suits the
short-horizon risk forecast best. Last but not least, a simple equal-weighted
portfolio not only significantly reduces the size and quantity of jumps but also
gives investors higher utility in short-horizon risk forecast case.

Keywords:
Cryptocurrency, Bitcoin, Realized Variance, Thresholded Jump, Signed Jumps,
Realized Utility

JEL Classification:
C53, E47, G11, G17

IRTG1792DP2019 025

SONIC: SOcial Network with Influencers and Communities

Cathy Yi-Hsuan Chen
Wolfgang Karl Härdle
Yegor Klochkov

Abstract:
The integration of social media characteristics into an econometric framework
requires modeling a high dimensional dynamic network with dimensions of
parameter Θ typically much larger than the number of observations. To cope with
this problem, we introduce a new structural model — SONIC which assumes that
(1) a few influencers drive the network dynamics; (2) the community structure of
the network is characterized as the homogeneity of response to the specific
influencer, implying their underlying similarity. An estimation procedure is
proposed based on a greedy algorithm and LASSO regularization. Through
theoretical study and simulations, we show that the matrix parameter can be
estimated even when the observed time interval is smaller than the size of the
network. Using a novel dataset retrieved from a leading social media platform–
StockTwits and quantifying their opinions via natural language processing, we
model the opinions network dynamics among a select group of users and further
detect the latent communities. With a sparsity regularization, we can identify
important nodes in the network.

Keywords:
social media, network, community, opinion mining, natural language processing

JEL Classification:
C1, C22, C51, G41

IRTG1792DP2019 026

Affordable Uplift: Supervised Randomization in Controlled Exprtiments

Johannes Haupt
Daniel Jacob
Robin M. Gubela
Stefan Lessmann

Abstract:
Customer scoring models are the core of scalable direct marketing. Uplift models
provide an estimate of the incremental benefit from a treatment that is used for
operational decision-making. Training and monitoring of uplift models require
experimental data. However, the collection of data under randomized treatment
assignment is costly, since random targeting deviates from an established
targeting policy. To increase the cost-efficiency of experimentation and
facilitate frequent data collection and model training, we introduce supervised
randomization. It is a novel approach that integrates existing scoring models
into randomized trials to target relevant customers, while ensuring consistent
estimates of treatment effects through correction for active sample selection.
An empirical Monte Carlo study shows that data collection under supervised
randomization is cost-efficient, while downstream uplift models perform
competitively.

Keywords:
Uplift Modeling, Causal Inference, Experimental Design, Selection Bias

JEL Classification:
C00

IRTG1792DP2019 027

VCRIX - a volatility index for crypto-currencies

Alisa Kim
Simon Trimborn
Wolfgang Karl Härdle

Abstract:
Public interest, explosive returns, and diversification opportunities gave
stimulus to the adoption of traditional financial tools to crypto-currencies.
While the CRIX index offered the first scientifically-backed proxy to the
crypto- market (analogous to S&P 500), the introduction of Bitcoin futures by
Cboe became the milestone in the creation of the derivatives market for crypto-
currencies. Following the intuition of the "fear index" VIX for the American
stock market, the VCRIX volatility index was created to capture the investor
expectations about the crypto-currency ecosystem. VCRIX is built based on CRIX
and offers a forecast for the mean annualized volatility of the next 30 days,
re-estimated daily. The model was back-tested for its forecasting power,
resulting in low MSE performance and further examined by the simulation of VIX
(resulting in a correlation of 78% between the actual VIX and VIX estimated with
the VCRIX model). VCRIX provides forecasting functionality and serves as a proxy
for the investors’ expectations in the absence of the de- veloped derivatives
market. These features provide enhanced decision making capacities for market
monitoring, trading strategies, and potentially option pricing.

Keywords:
index construction, volatility, crypto-currency, VCRIX

JEL Classification:
C51, C52, C53, G10

IRTG1792DP2019 028

Group Average Treatment Effects for Observational Studies

Daniel Jacob
Stefan Lessmann
Wolfgang Karl Härdle

Abstract:
The paper proposes an estimator to make inference on key features of
heterogeneous treatment effects sorted by impact groups (GATES) for non-
randomised experiments. Observational studies are standard in policy evaluation
from labour markets, educational surveys and other empirical studies. To control
for a potential selection-bias we implement a doubly-robust estimator in the
first stage. Keeping the flexibility to use any machine learning method to learn
the conditional mean functions as well as the propensity score we also use
machine learning methods to learn a function for the conditional average
treatment effect. The group average treatment effect is then estimated via a
parametric linear model to provide p-values and confidence intervals. The result
is a best linear predictor for effect heterogeneity based on impact groups.
Cross-splitting and averaging for each observation is a further extension to
avoid biases introduced through sample splitting. The advantage of the proposed
method is a robust estimation of heterogeneous group treatment effects under
mild assumptions, which is comparable with other models and thus keeps its
flexibility in the choice of machine learning methods. At the same time, its
ability to deliver interpretable results is ensured.

Keywords:
C01, C14, C31, C63

JEL Classification:
causal inference, machine learning, simulation study, confidence intervals,
multiple splitting, sorted group ATE (GATES), doubly-robust estimator

IRTG1792DP2019 028-1

Group Average Treatment Effects for Observational Studies

 

Daniel Jacob
Wolfgang Karl Härdle
Stefan Lessmann

 

Abstract:
The paper proposes an estimator to make inference of heterogeneous treatment effects sorted by impact groups (GATES) for non-randomised experiments. Observational studies are standard in policy evaluation from labour markets, educational surveys and other empirical studies. To control for a potential selection-bias we implement a doubly-robust estimator in the first stage. Keeping the flexibility, we can use any machine learning method to learn the conditional mean functions as well as the propensity score. We also use machine learning methods to learn a function for the conditional average treatment effect. The group average treatment effect, is then estimated via a parametric linear model to provide p-values and confidence intervals. To control for confounding in the linear model we use Neyman-orthogonal moments to partial out the effect that covariates have on both, the treatment assignment and the outcome. The result is a best linear predictor for effect heterogeneity based on impact groups. We introduce inclusion-probability weighting as a form of cross-splitting and averaging for each observation to avoid biases through sample splitting. The advantage of the proposed method is a robust linear estimation of heterogeneous group treatment effects in observational studies.

Keywords: causal inference, machine learning, simulation study, confidence intervals, multiple splitting, sorted group ATE (GATES), doubly-robust estimator

JEL Classification: C01, C14, C31, C63

IRTG1792DP2019 029

Antisocial Online Behavior Detection Using Deep Learning

Elizaveta Zinovyeva
Wolfgang Karl Härdle
Stefan Lessmann

Abstract:
The shift of human communication to online platforms brings many benefits to
society due to the ease of publication of opinions, sharing experience, getting
immediate feedback and the opportunity to discuss the hottest topics. Besides
that, it builds up a space for antisocial behavior such as harassment, insult
and hate speech. This research is dedicated to detection of antisocial online
behavior detection (AOB) - an umbrella term for cyberbullying, hate speech,
cyberaggression and use of any hateful textual content. First, we provide a
benchmark of deep learning models found in the literature on AOB detection. Deep
learning has already proved to be efficient in different types of decision
support: decision support from financial disclosures, predicting process
behavior, text-based emoticon recognition. We compare methods of traditional
machine learning with deep learning, while applying important advancements of
natural language processing: we examine bidirectional encoding, compare
attention mechanisms with simpler reduction techniques, and investigate whether
the hierarchical representation of the data and application of attention on
different layers might improve the predictive performance. As a partial
contribution of the final hierarchical part, we introduce pseudo-sentence
hierarchical attention network, an extension of hierarchical attention network –
a recent advancement in document classification.

Keywords:
Deep Learning, Cyberbullying, Antisocial Online Behavior, Attention Mechanism,
Text Classification

JEL Classification:
C00

IRTG1792DP2019 030

Combining Penalization and Adaption in High Dimension with Application in Bond
Risk Premia Forecasting

Xinjue Li
Lenka Zboňáková
Weining Wang
Wolfgang Karl Härdle

Abstract:
The predictability of a high-dimensional time series model in forecasting with
large information sets depends not only on the stability of parameters but also
depends heavily on the active covariates in the model. Since the true empirical
environment can change as time goes by, the variables that function well at the
present may become useless in the future. Combined with the instable parameters,
finding the most active covariates in the parameter time-varying situations
becomes difficult. In this paper, we aim to propose a new method, the Penalized
Adaptive Method (PAM), which can adaptively detect the parameter homogeneous
intervals and simultaneously select the active variables in sparse models. The
newly developed method is able to identify the parameters stability at one hand
and meanwhile, at the other hand, can manage of selecting the active forecasting
covariates at every different time point. Comparing with the classical models,
the method can be applied to high-dimensional cases with different sources of
parameter changes while it steadily reduces the forecast error in high-
dimensional data. In the out-of-sample bond risk premia forecasting, the
Penalized Adaptive Method can reduce the forecasting error(RMSPE and MAPE)
around 24% to 50% comparing with the other forecasting methods.

Keywords:
SCAD penalty, propagation-separation, adaptive window choice, multiplier
bootstrap, bond risk premia

JEL Classification:
C4, C5, E4, G1

IRTG1792DP2020 001

Estimation and Determinants of Chinese Banks’ Total Factor Efficiency: A New
Vision Based on Unbalanced Development of Chinese Banks and Their Overall Risk

Shiyi Chen
Wolfgang K. Härdle
Li Wang

Abstract:
The paper estimates banks’ total factor efficiency (TFE) as well as TFE of each
production factor by incorporating banks’ overall risk endogenously into bank’s
production process as undesirable by-product in a Global-SMB Model. Our results
show that, compared with a model incorporated with banks’ overall risk, a model
considering only on-balance-sheet risk may over-estimate the integrated TFE
(TFIE) and under-estimate TFE volatility. Significant heterogeneities of bank
TFIE and TFE of each production factor exist among banks of different types and
regions, as a result of still prominent unbalanced development of Chinese
commercial banks. Based on the estimated TFIE, the paper further investigates
the determinants of bank efficiency, and finds that shadow banking, bank size,
NPL ratio, loan to deposit ratio, fiscal surplus to GDP ratio and banking sector
concentration are significant determinants of bank efficiency. Besides, a model
with risk-weighted assets as undesirable outputs can better capture the impact
of shadow banking involvement.

Keywords:
Nonparametric Methods, Commercial Banks, Shadow Bank, Financial Risk

JEL Classification:
C00

IRTG1792DP2020 002

Service Data Analytics and Business Intelligence

Desheng Dang Wu
Wolfgang Karl Härdle

Abstract:
With growing economic globalization, the modern service sector is in great need
of business intelligence for data analytics and computational statistics. The
joint application of big data analytics, computational statistics and business
intelligence has great potential to make the engineering of advanced service
systems more efficient. The purpose of this COST issue is to publish high-
quality research papers (including reviews) that address the challenges of
service data analytics with business intelligence in the face of uncertainty and
risk. High quality contributions that are not yet published or that are not
under review by other journals or peer-reviewed conferences have been collected.
The resulting topic oriented special issue includes research on business
intelligence and computational statistics, data-driven financial engineering,
service data analytics and algorithms for optimizing the business engineering.
It also covers implementation issues of managing the service process,
computational statistics for risk analysis and novel theoretical and
computational models, data mining algorithms for risk management related
business applications.

Keywords:
Data Analytics, Business Intelligence Systems

JEL Classification:
C00

IRTG1792DP2020 003

Structured climate financing: valuation of CDOs on inhomogeneous asset pools

Natalie Packham

Abstract:
Recently, a number of structured funds have emerged as public-private
partnerships with the intent of promoting investment in renewable energy in
emerging markets. These funds seek to attract institutional investors by
tranching the asset pool and issuing senior notes with a high credit quality.
Financing of renewable energy (RE) projects is achieved via two channels: small
RE projects are financed indirectly through local banks that draw loans from the
fund’s assets, whereas large RE projects are directly financed from the fund. In
a bottom-up Gaussian copula framework, we examine the diversification properties
and RE exposure of the senior tranche. To this end, we introduce the LH++ model,
which combines a homogeneous infinitely granular loan portfolio with a finite
number of large loans. Using expected tranche percentage notional (which takes a
similar role as the default probability of a loan), tranche prices and tranche
sensitivities in RE loans, we analyse the risk profile of the senior tranche. We
show how the mix of indirect and direct RE investments in the asset pool affects
the sensitivity of the senior tranche to RE investments and how to balance a
desired sensitivity with a target credit quality and target tranche size.

Keywords:
Renewable energy financing, structured finance, CDO pricing, LH++ model

JEL Classification:
C61, G13, G32