Algorithmic bias research · Open source
AI systems decide your freedom, your job, your future. This project proves the bias is measurable — and shows exactly how to remove it.
Fair Code is a research and engineering project that exposes bias in real-world AI systems and demonstrates concrete mitigation strategies. Both projects follow the same structure.
Train a biased model. Measure the fairness gap. Engineer a fair model. Measure again. No theory. Just data, code, and results.
The bias in these systems is documented, measurable, and fixable. Removing a protected attribute isn't enough. Proxy variables carry the same signal through. Both must go.
yakew7/Fair-CodeEvery project in this repo follows the same bias detection and mitigation workflow. Reproducible. Transparent. Measurable.
Real-world data with demographic signals: ProPublica COMPAS, Kaggle recruitment datasets.
Include protected attributes (race, gender, age) alongside predictive features.
Calculate positive prediction rate differences across demographic groups.
Drop protected attributes and correlated proxy variables that smuggle bias back in.
Fair model trained on merit features only. Gap measured again. Results compared.
A real algorithm used in US courtrooms. ProPublica's public dataset, 70,000+ records. The bias is not a glitch. It's baked in.
Removing race alone isn't enough. Custody Status is a proxy variable. It carries the racial signal through the model even when the race column is dropped. Both features had to go.
Women hired 20.9% less than equally qualified men. The algorithm wasn't told to discriminate. It learned to.
Dropping gender and age, retaining only Experience Years and Technical Test Score, collapsed the fairness gap from 4.51% to 0.12%. Merit features alone produce near-perfect demographic parity.
A lending model rates young applicants as bad credit risks at 6+ points higher than older applicants with identical financial profiles. It learned age from job tenure.
Employment tenure looks like a legitimate financial signal — and it is. But it's also a near-perfect proxy for age. A 24-year-old cannot have 10 years of employment history. The model penalising short tenure was partially penalising youth. Dropping both age and employment forced it to evaluate what a borrower has — savings, credit history, loan purpose — rather than how long they've been alive.
An insurance AI flags older patients for high-cost claims at 7.93 points higher than younger patients. BMI, smoking status, and diabetic status encode race without naming it.
BMI, smoker status, and diabetic diagnosis rates all differ significantly by race and class — so a model trained on them learns to discriminate by race without the word ever appearing. These are the CustodyStatus of health insurance: clinical-sounding features that carry protected-class signal because of structural inequalities baked into American healthcare. Dropping them alongside age and gender reduced the age gap by 60% and the gender gap by 72%.
An automated means-test flags male applicants as ineligible at 18 points higher than female applicants — not because of what they earn, but because of who they're married to.
Automated benefits systems don't need to name sex or race to discriminate by them. relationship (Husband/Wife), marital.status, hours.per.week, and occupation are the CustodyStatus of welfare AI — features that sound purely economic but carry protected-class signal because of how work, caregiving, and labour markets are structurally organised. Dropping all four alongside the direct protected attributes reduced the sex gap by 53%, the race gap by 46%, and the national-origin gap by 88%.
A hospital readmission model flags patients for high clinical risk using payer code and discharge destination — variables that measure insurance access, not medical severity.
Healthcare readmission models don't need race or gender to discriminate by them. payer_code, discharge_disposition_id, medical_specialty, and number_inpatient are the CustodyStatus of clinical AI — features that look like neutral operational data but encode structural inequalities in insurance, geography, and access to preventive care. The age gap reduced 68% and the race gap 25%. The gender gap increased slightly (0.02% → 0.04%) — proxy removal shifted the model in a way that widened it by 0.02pp. The causal direction matters: lower SNF access creates readmission risk. The patient does not bring the risk to the gap — the gap creates the risk.
Each card opens a focused explainer page. The homepage stays light, the long-form write-ups stay on site, and the source markdown lives in the repo.
of companies use AI to screen job applicants before a human sees a resume (Forbes, 2024)
US states have used algorithmic risk tools in criminal sentencing
federal laws currently require hiring AIs to be audited for gender or racial bias
Algorithms like COMPAS are deployed in courtrooms right now. Hiring AIs filter your resume before a human ever reads it. The bias in these systems is documented, measurable, and fixable.
More datasets, more domains, more bias exposed and fixed. Follow the project on Instagram for updates.