Direkt zum InhaltDirekt zur SucheDirekt zur Navigation
▼ Zielgruppen ▼

Humboldt-Universität zu Berlin -


Some thoughts on writing a master thesis with us…

Research at the Chair of Information Systems (CIS) focuses on big data analytics (BDA). Approaches associated with the support of managerial decision-making and quantitative, data-driven methods are of particular interest. This scope offers a variety of research questions to be examined in a master’s dissertation, examples for which you can find here. Hence, students with skills in, e.g., scientific computing, databases, programming, econometrics and statistics, machine learning, etc. are welcome to write their master thesis at the CIS.

Potential business fields to be studied include, but are not limited to, marketing, consumer finance and risk management, logistics and supply chain management as well as speculative and betting markets.  As a general guideline, a master’s thesis written at the CIS should contain an empirical part related to some real-world planning problem in one of above areas. For example, evaluating the effectiveness of a novel method from the realms of BDA in a real-world setting through empirical experimentation is a common task for a master thesis. Such endeavor may also include the development of an entirely new methodology or the extension of some known technique to better fulfill the requirements of the application setting in question.


There are several possibilities to acquire the data for such a study. For example:

  • The master’s thesis is written in cooperation with a partner who provides data
  • The data is collected during the work on the master’s thesis (e.g through accessing a data provider’s API, say Twitter, or web scraping)
  • Data from an academic data mining/forecasting competition (KDD Cup, Data Mining Cup, NN3 or NN5 Competition, etc.) or a Kaggle competition (www.kaggle.com) can be used
  • Data from ongoing research projects of the CIS can be used. At the moment projects/data sets from the following areas are available:
  • Direct marketing / churn modeling
  • Automotive industry /sales of pre-owned vehicles
  • Real-time targeting in e-commerce
  • Credit scoring or PD modeling
  • Bet markets (esp. sport bets)


If required, data from the following fields can be obtained (not guaranteed)

  • Data about energy consumption for electrical load forecasting
  • Data from peer-to-peer lending


In addition to the data, a crucial issue is to clarify the research question(s) of the thesis and to define its research goal. There are multiple ways to achieve this. In general, the topic is somewhat flexible and can be adapted to students’ interests. Thinking about the type of data one would like to work with could help students in identifying a suitable thesis topic. Standard analytical methods work with structured data (tables as e.g. in a regression model). In addition, there is the special case of time series data (e.g., sales forecasts, stock predictions, etc.). Furthermore, one can analyze unstructured data such as textual data (e.g. sentiment analysis) and networked data (social media analytics).    


In a similar manner, the type of methodology one would like to work with helps with finding a thesis topic. Empirical prediction methods are at the core of the research at the CIS. For example, the following techniques are relatively used / sparsely used in business settings and thus interesting: 

  • Ensemble selection (e,g, for marketing or credit-scoring)
  • Kalman filter (e.g. for time series data or real-time targeting of advertising in e-commerce)
  • Multi-armed bandit models (e.g. for or real-time targeting of advertising in e-commerce)
  • Deep learning (basically applicable everywhere)
  • Survival models (e.g. for price optimization in the automotive industry)
  • Choice models / hierarchical Bayesian models (e.g. for or real-time targeting of advertising in e-commerce or bet markets)
  • (Recurrent) neural networks (e.g. for turnover or financial markets predictions or in interaction with metaheuristics for model training or model selection)


Furthermore, there is a large body of literature on novel learning paradigms, which differ substantially from conventional explanatory (e.g regression or classification) or descriptive (e.g. clustering) methods. Evaluating such modelling strategies in business settings is also an interesting task for a master thesis. Some examples include:

  • Active learning
  • Learning with privileged information (e.g. financial forecasting)
  • Semi-supervised/ transductive learning (e.g. for churn prediction)
  • Imbalanced learning (e.g. for  marketing or credit scoring)
  • Multi-task learning (e.g. financial forecasting)
  • Reject inference (credit scoring)
  • Online learning


Hopefully, the previous exposition has conveyed the general notion of master theses at the CIS. In addition to the above topics/suggestions, it is always a good idea to look at our ongoing research projects to identify further opportunities for dissertation topics.

As soon as you have clarified your preferences (industry, business application, data, method), feel free to contact a member of the CIS to proceed with preparing for your master thesis.

Note that that it is in principle also possible to write a master thesis outside of our core research area in BDA. If you are interested in such setting, please prepare an exposé that describes your topic and send it to Prof. Lessmann. More specifically, your exposé should clarify:

  • What research question(s) you would like to analyze
  • What is the academic and empirical importance of the topic
  • How your thesis contributes to existing literature  

To elaborate on the above points, it is useful to reference relevant literature in your exposé.


Please find FAQ to Bachelor and Master Dissertations at the Chair of Information Systems here.