Simpson’s Paradox in layman’s term is the reversal of relationship within data with respect to the subgroups of data after combining those subgroups data.
For Example, If there are two departments in a university and both of them have a high probability of a woman getting accepted then after combining their data by intuition overall woman’s acceptance probability should be high but this may not be true.
Given, a1/b1 < c1/d1 and a2/b2 < c2/d2 then (a1+a2)/(b1+b2) < (c1+c2)/(d1+d2)?
Simpson’s Paradox says it may not be true.
7/8 < 2/2 and 1/2 < 5/8 yet, (7+1)/(2+2) > (2+5)/(2+8)
A similar case was seen in the lawsuit against UC Berkeley’s regarding the admissions data showing that men were having a higher probability of getting applications accepted than the woman’s application. But after examining the individual departments a reverse scenario came into consideration as maximum of the departments were favoring women over men.
Why was this happening ?
This kind of behavior was seen because more women were applying to competitive departments with low rates of admission whereas more men were applying to less competitive departments with
high acceptance rates.
We can see from the table that 825 men have applied in comparison to 108 women in high acceptance rate department A. Whereas more girls are applying in departments with low rates like F and E. Which finally led to more men being accepted by the university than women.
Suppose we have a configuration as shown in figure below with two types of beans green and blue colored.
Probability of picking a green bean from Jar,
7/8 < 2/2 (Jar1) (Jar2) 1/2 < 5/8 (Jar3) (Jar4)
Probability of picking a green bean from Jar
8/10 > 7/10 Inequality (Jar1 + Jar3) (Jar2 + Jar4)
Here also we can see that initially jars 1 and 3 had a higher probability of picking green beans than Jar 2 and Jar 4 respectively, but after mixing the content of jars the relationship got reversed. After mixing, the content of Jar 2 and Jar 4 combined had a higher probability of picking green beans. This is a very simple example of Simpson’s Paradox.
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.
- Birthday Paradox
- Introduction of Statistics and its Types
- Difference between Descriptive and Inferential statistics
- Z-Score in Statistics
- Student's t-distribution in Statistics
- Python – 68-95-99.7 rule in Statistics
- Variance and Standard Deviation - Probability | Class 11 Maths
- Probability of Knight to remain in the chessboard
- Mathematics | Conditional Probability
- Bayes's Theorem for Conditional Probability
- Aptitude | Probability | Question 1
- Aptitude | Probability | Question 2
- Aptitude | Probability | Question 3
- Aptitude | Probability | Question 4
- Aptitude | Probability | Question 5
- Aptitude | Probability | Question 6
- Aptitude | Probability | Question 7
- Aptitude | Probability | Question 8
- Aptitude | Probability | Question 9
- Aptitude | Probability | Question 10
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.
Improved By : ManasChhabra2