What is causal inference, and why should data scientists know? by Ludvig Hult

The speaker, Ludwig Holt, is a PhD student at Uppsala University studying machine learning and causal inference, focusing on combining robust conclusions with modern machine learning techniques.
Adversarial attacks on AI models, such as one where Google's Image labeler, Inception III, mislabeled a cat as guacamole, demonstrate that AI doesn't understand images but relies on patterns from training sets.
Causal inference distinguishes from merely statistical patterns, aiming to understand the underlying causal structures and reasons behind data correlations.
Causal inference is defined not strictly but by example, essentially understanding cause and effect from the data.
Establishing a causal relationship goes beyond correlation, challenging the idea that correlation does not imply causation. Hans Reichenbach's principle proposes that statistical dependence implies a causal reason.
Data science tasks can be categorized into description, prediction, and causal inference, with the latter seeking to understand what happens when the system is intervened upon.
Structural causal models, represented by equations and corresponding graphical models, clarify the direct causes between variables and allow for interventions where one can set specific values to see outcomes.
Understanding causal structures can lead to correct analyses and robust conclusions, even with imperfect data, using methods such as back door adjustment.
The speaker provided a practical Python example using inverse probability weighting to calculate the average causal effect (ACE) in a kidney stone surgery dataset.
Causal inference is necessary when experiments can't be perfect, when understanding the exact causal structure is crucial (e.g., detecting discrimination), or when working with non-randomized data.
Emerging research includes causal discovery from data, transportation of conclusions across different domains, and integrating machine learning with causal inference for better model robustness.
While there are tools for causal inference, including a Microsoft-backed Python library called DoWhy, the field requires domain knowledge and careful assumption-making.
The session ended with a Q&A, discussing topics like the relationship between causal models and probabilistic models, the challenges of constructing causal graphs especially with many variables or missing variables, and the research addressing unmeasured confounding variables.

Share these insights with your friends