BIRDiE: A Discussion
2025-01-24
The Big Picture
Objective:
- Estimate disparity in outcome by race
- If outcome is denoted as \(Y\), race is denoted \(R\) want to know: \[
\mathbb{E}(Y \mid R),\ \operatorname{var}(Y \mid R), \ \text{or if we are greedy} \ \ \mathbb{P}(Y \mid R)
\]
Problem:
- Race is not measured
- Predictors of race, also potentially related to the outcome
BIRDiE does not replace BISG
BISG is an input to BIRDiE
- Stage 1: Get predictions (\(\hat{R}\)) from BISG
- Stage 2: Use BISG preictions for estimating disparities (\(\mathbb{E}(Y | \hat{R})\))
We can think of it as two different quantities:
- BISG: Race (\(R\)) \(\rightarrow\) predicted race (\(\hat{R}\))
- BIRDiE: \(\hat{R} \rightarrow \mathbb{E}(Y \mid \hat{R}) \rightsquigarrow \mathbb{E}(Y \mid R)\)
Note:
- Race \(R\) must be “observed” in the dataset for estimating BISG model
- Race \(R\) is not needed for obtaning predictions from a “learned” BISG model
The Analysis Pipeline
1. Identification (the difficult part)
Connect what we want to compute (\(\mathbb{P}(Y|R)\)) with what we can observe (\(\mathbb{P}(Y, X)\))
- Conditional independence assumptions: CI-YS/CI-YR
- Technical assumptions: accuracy, overlap, etc.
2. Estimation (the easy part)
Estimate what we can observe (expectations) using what we can measure (estimates)
- thresholding estimator
- weighting estimator
- BIRDiE
- Direct marginal likelihood inference/EM algorithm
- Pooled/Saturated/Mixed Models
The Analysis Pipeline
1. Identification (the easy part)
Connect what we want to compute (\(\mathbb{P}(Y|R)\)) with what we can observe (\(\mathbb{P}(Y, X)\))
- Conditional independence assumptions: CI-YS/CI-YR
- Technical assumptions: accuracy, overlap, etc.
2. Estimation (the difficult part)
Estimate what we can observe (expectations) using what we can measure (estimates)
- thresholding estimator
- weighting estimator
- BIRDiE
- Direct marginal likelihood inference/EM algorithm
- Pooled/Saturated/Mixed Models
Identification
- Identification relies on (frequently untestable) assumptions
![]()
Source: McCartan et al.(2024)
Identification
aka which assumption can you justify (or get away with 🤫)
Identification Path 1 (CI-YR)
- Assuming outomce (\(Y\)) and race (\(R\)) are independent once we know the surname, location, and other observed characteristics, it is okay to use the weighted estimator
Identification Path 2 (CI-YS)
- Assuming outcome (\(Y\)) and surname (\(S\)) are independent conditional on race, location and other observed characteristics, it is okay to use BIRDiE
They are not exclusive
- When both assumptions apply, either BIRDiE or weighted estimator is fine, choose carefully though
Thank you!
\[
\Huge{\text{Questions?}}
\]