Humboldt-Universität zu Berlin - Statistics

Humboldt-Universität zu Berlin | School of Business and Economics | Statistics | News | Prof. Greven and Prof. Rügamer received DFG funding for „Statistical inference for Bayesian semi-structured regression models“

Prof. Greven and Prof. Rügamer received DFG funding for „Statistical inference for Bayesian semi-structured regression models“



Prof. Greven (HU Berlin) and Prof. Rügamer (LMU) received DFG funding for „Statistical inference for Bayesian semi-structured regression models“:

 

Project abstract

 

This project aims to establish principled statistical inference for Bayesian semi-structured regression (SSR) models. SSRs extend classical regression by combining structured components, such as linear, smooth, or random effects, with flexible deep neural networks to capture complex interactions or effects of non-tabular data sources such as text or images. While this combination enables powerful modeling, it poses major challenges for statistical inference: the posterior distribution is high-dimensional, often multimodal, and inference for structured parameters may be biased by the deep neural network part.

Building on recent advances in sampling-based inference for BNNs, we will develop novel procedures for quantifying the statistical relevance of deep network predictors in the model. In addition, we will devise new approaches to infer about the significance of structured components in the SSR model in the presence of the neural network components while accounting for potential bias in the sampling procedure. To facilitate applicability in large-scale datasets, we will further develop an unbiased sampling procedure that preserves validity even in settings that require splitting the dataset into smaller batches. These advances will allow us to establish a framework that provides interpretable and scalable uncertainty quantification for SSR models and to validate its efficacy in a functional regression setting based on a large-scale biomechanical dataset. We will provide an open-source software implementation for all developments with comprehensive documentation, enabling dissemination of our methods to the wider research community and fostering their application in both statistical and machine learning domains.