Humboldt-Universität zu Berlin - High Dimensional Nonstationary Time Series

IRTG1792DP2020 007

Deep Learning application for fraud detection in financial statements

Patricia Craja
Alisa Kim
Stefan Lessmann

Abstract:
Financial statement fraud is an area of significant consternation for potential
investors, auditing companies, and state regulators. Intelligent systems
facilitate detecting financial statement fraud and assist the decision-making of
relevant stakeholders. Previous research detected instances in which financial
statements have been fraudulently misrepresented in managerial comments. The
paper aims to investigate whether it is possible to develop an enhanced system
for detecting financial fraud through the combination of information sourced
from financial ratios and managerial comments within corporate annual reports.
We employ a hierarchical attention network (HAN) with a long short-term memory
(LSTM) encoder to extract text features from the Management Discussion and
Analysis (MD&A) section of annual reports. The model is designed to offer two
distinct features. First, it reflects the structured hierarchy of documents,
which previous models were unable to capture. Second, the model embodies two
different attention mechanisms at the word and sentence level, which allows
content to be differentiated in terms of its importance in the process of
constructing the document representation. As a result of its architecture, the
model captures both content and context of managerial comments, which serve as
supplementary predictors to financial ratios in the detection of fraudulent
reporting. Additionally, the model provides interpretable indicators denoted as
“red-flag” sentences, which assist stakeholders in their process of determining
whether further investigation of a specific annual report is required. Empirical
results demonstrate that textual features of MD&A sections extracted by HAN
yield promising classification results and substantially reinforce financial
ratios.

Keywords:
fraud detection, financial statements, deep learning, text analytics

JEL Classification:
C00