Speaker: Rebecca Hubbard, Professor at University of Pennsylvania
Title: Fair and Valid Learning with Real-World Data


Enthusiasm for using “real world data,” data generated as a by-product of electronic documentation of healthcare encounters including electronic health records (EHR) and medical claims data, has exploded over the past decade. Research using real-world data has the potential for improved efficiency and generalizability relative to studies that rely on primary data collection. However, using data sources that were not collected for research purposes comes at a cost, and naïve use of such data without considering the complex data generating mechanisms they arise from can lead to erroneous inference and perpetuation of societal biases. In this talk, I will provide an overview of my research portfolio on statistical methods for the analysis of real-world data focusing on three areas: error and missingness arising due to EHR data provenance, challenges to algorithmic fairness resulting from differential data quality, and integration of EHR and clinical trial data. The overarching objective of this work is to improve the validity of research results derived from real-world data by identifying and addressing data quality issues and resultant biases. I will discuss applications of this methodological work in the context of pharmacoepidemiology and cancer epidemiology and identify a few areas for future research collaboration.


