BIRDiE: A Discussion

2025-01-24

The Big Picture

Objective:

  • Estimate disparity in outcome by race
  • If outcome is denoted as \(Y\), race is denoted \(R\) want to know: \[ \mathbb{E}(Y \mid R),\ \operatorname{var}(Y \mid R), \ \text{or if we are greedy} \ \ \mathbb{P}(Y \mid R) \]

Problem:

  • Race is not measured
  • Predictors of race, also potentially related to the outcome

BIRDiE does not replace BISG

BISG is an input to BIRDiE
  • Stage 1: Get predictions (\(\hat{R}\)) from BISG
  • Stage 2: Use BISG preictions for estimating disparities (\(\mathbb{E}(Y | \hat{R})\))

We can think of it as two different quantities:

  • BISG: Race (\(R\)) \(\rightarrow\) predicted race (\(\hat{R}\))
  • BIRDiE: \(\hat{R} \rightarrow \mathbb{E}(Y \mid \hat{R}) \rightsquigarrow \mathbb{E}(Y \mid R)\)

Note:

  • Race \(R\) must be “observed” in the dataset for estimating BISG model
  • Race \(R\) is not needed for obtaning predictions from a “learned” BISG model

The Analysis Pipeline

1. Identification (the difficult part)

Connect what we want to compute (\(\mathbb{P}(Y|R)\)) with what we can observe (\(\mathbb{P}(Y, X)\))

  • Conditional independence assumptions: CI-YS/CI-YR
  • Technical assumptions: accuracy, overlap, etc.

2. Estimation (the easy part)

Estimate what we can observe (expectations) using what we can measure (estimates)

  • thresholding estimator
  • weighting estimator
  • BIRDiE
    • Direct marginal likelihood inference/EM algorithm
    • Pooled/Saturated/Mixed Models

The Analysis Pipeline

1. Identification (the easy part)

Connect what we want to compute (\(\mathbb{P}(Y|R)\)) with what we can observe (\(\mathbb{P}(Y, X)\))

  • Conditional independence assumptions: CI-YS/CI-YR
  • Technical assumptions: accuracy, overlap, etc.

2. Estimation (the difficult part)

Estimate what we can observe (expectations) using what we can measure (estimates)

  • thresholding estimator
  • weighting estimator
  • BIRDiE
    • Direct marginal likelihood inference/EM algorithm
    • Pooled/Saturated/Mixed Models

Identification

  • Identification relies on (frequently untestable) assumptions

Source: McCartan et al.(2024)

Identification

aka which assumption can you justify (or get away with 🤫)

Identification Path 1 (CI-YR)

  • Assuming outomce (\(Y\)) and race (\(R\)) are independent once we know the surname, location, and other observed characteristics, it is okay to use the weighted estimator

Identification Path 2 (CI-YS)

  • Assuming outcome (\(Y\)) and surname (\(S\)) are independent conditional on race, location and other observed characteristics, it is okay to use BIRDiE

They are not exclusive

  • When both assumptions apply, either BIRDiE or weighted estimator is fine, choose carefully though

Thank you!






\[ \Huge{\text{Questions?}} \]