Lectures and seminars SFOepi Seminar Series in Biostatistics: Georg Heinze, Medical University of Vienna

12-03-2026 10:00 am - 11:00 am Add to iCal
Campus Solna Ragnar Granit, Biomedicum (this event is not online)

Title: Towards evidence-based guidance on variable selection methods for multivariable regression models
Speaker: Georg Heinze, Center for Medical Data Science, Institute of Clinical Biometrics, Medical University of Vienna

Please note that this event is not filmed nor streamed – only participation in person.

Abstract

Georg Heinze, Theresa Ullmann, Daniela Dunkler, on behalf of TG2 of the STRATOS initiative. Medical University of Vienna, Center for Medical Data Science, Institute of Clinical Biometrics

Multivariable regression modelling has a central role in empirical research, and it is used to answer descriptive, predictive or explanatory research questions. Often data-driven variable selection methods are used to identify relevant and irrelevant variables, but they may also lead to false omission of relevant covariates, inclusion of irrelevant variables, biased coefficient estimates, poorly calibrated predictions, and unstable models. Alternatively, outcome-ignorant screening of variables based on results of Initial Data Analysis can often reduce the number of predictors without compromising model stability (Heinze et al, 2024). Although some methodological recommendations exist, only limited evidence is available about the relative and absolute performances of these methods (Sauerbrei et al, 2020). The aim of the STRATOS initiative is to give evidence-based guidance on the design and analysis of observational studies (https://www.stratos-initiative.org/).

I will present some recent activities of STRATOS' topic group on selection of variables and functional forms for multivariable models (see also https://stratostg2.github.io). First, I will clarify the role of data-driven variable selection in different types of research questions. Next, I will discuss a principled approach to data screening as an invaluable preliminary step in model building. Third, I will report on a systematic methodological review of the practice of variable and functional form selection in COVID-19 prognosis models, which revealed a huge gap between the state-of-the-art and analysis practice. Fourth, I will report on our simulation studies to evaluate competing methods neutrally, and provide interactive access to their results (Ullmann et al, 2024). Lastly, I will provide an outlook to our activities on evaluating methods that allow simultaneous variable and functional form selection.

Depending on the desired purpose of data-driven variable selection, these results allow us to conclude under which conditions specific methods may be applicable and where they should better not be used. In summary, data-driven selection can only complement, but not replace substantive knowledge–driven selection.

References

Sauerbrei, W., Perperoglou, A., Schmid, M., Abrahamowicz, M., Becher, H., Binder, H., Dunkler, D., Harrell, F.E., Royston, P., Heinze, G., for TG2 of the STRATOS initiative, 2020. State of the art in selection of variables and functional forms in multivariable analysis—outstanding issues. Diagn Progn Res 4, 3. https://doi.org/10.1186/s41512-020-00074-3

Heinze, G., Baillie, M., Lusa, L., Sauerbrei, W., Schmidt, C.O., Harrell, F.E., Huebner, M., on behalf of TG2 and TG3 of the STRATOS initiative, 2024. Regression without regrets –initial data analysis is a prerequisite for multivariable regression. BMC Med Res Methodol 24, 178. https://doi.org/10.1186/s12874-024-02294-3

Ullmann, T., Heinze, G., Hafermann, L., Schilhart-Wallisch, C., Dunkler, D., for TG2 of the STRATOS initiative, 2024. Evaluating variable selection methods for multivariable regression models: A simulation study protocol. PLoS ONE 19, e0308543. https://doi.org/10.1371/journal.pone.0308543

Registration: No registration is needed

If you have any questions, please contact Marie Jansson at marie.jansson@ki.se