Lectures and seminars SfoEpi Seminar Series in Biostatistics: Professor Stijn Vansteelandt
Title: Causal machine learning: challenges, solutions and improvements
Speaker: Professor Stijn Vansteelandt, the Department of Applied Mathematics, Computer Science and Statistics at Ghent University and the Department of Medical Statistics at the London School of Hygiene and Tropical Medicine.
Abstract: The evaluation of treatment effects from observational studies typically requires adjustment for high-dimensional confounding. This is the result of a lack of comparability between treated and untreated subjects in possibly many (pre-treatment) factors that are also related to outcome. While such adjustment is routinely achieved via parametric modelling, it is not entirely satisfactory as model misspecification is likely, and even relatively minor misspecifications over the observed data range may induce large bias in the treatment effect estimate. Over the past 2 decades, there has therefore been growing interest in the use of machine learning methods to assist this task. This is not surprising if one considers the enormous contributions that the machine learning literature has offered on how to predict outcomes based on possibly high-dimensional predictors or features. In this talk, I will therefore focus on the use of machine learning for the evaluation of (causal) treatment effects. This turns out to be a challenging task: while the prediction performance of a given machine learning algorithm can be measured by contrasting observed and predicted outcomes, such evaluation becomes impossible when machine learning is used for treatment effect estimation since the `true’ treatment effect is always unknown. In this talk, I will demonstrate that naive use of existing machine learning algorithms is problematic for treatment evaluation and explain why that is the case. I will next give a gentle introduction to pioneering work on Targeted Learning and on Double Machine Learning, and will discuss recent improvements of these techniques. Throughout the talk, `machine learning’ will be considered in the broad sense as any algorithm that uses data to `learn’ a proper model for the data, thus including (though not being limited to) routine variable selection procedures. The talk will be accessible to attendees without a detailed understanding of machine learning algorithms.