Two-Proportions Z-Test in R Programming

A two-proportion z-test allows us to compare two proportions to see if they are the same.

It calculates the range of values that is likely to include the difference between the population proportions.

The z-test is based on a standard normal distribution. It has a critical value i.e. 1.96. for 5% two-tailed.

In R, the function used for performing a z-test is prop.test().

Syntax:

prop.test(x, n, p = NULL, alternative = c(“two.sided”, “less”, “greater”),

correct = TRUE)

Parameters:

x = number of successes and failures in data set.

n = size of data set.

p = probabilities of success. It must be in the range of 0 to 1.

alternative = a character string specifying the alternative hypothesis.

correct = a logical indicating whether Yates’ continuity correction should be applied where possible.

There are two types of hypotheses:

The null hypothesis H0 for the test is that the proportions are the same. Conditions are as follows.

H_{0}: p_{A}=p_{B}

The alternate hypothesis Ha is that the proportions are not the same. Conditions are as follow.

H_{a}: p_{A} \neq p_{B} \text { (different) }

H_{a}: p_{A}>p_{B} \text { (greater) }

H_{a}: p_{A}<p_{B} \text { (less) }

The two-proportions z-test is used to compare two observed proportions. For example, let there be two groups of individuals:

Group A with lung cancer: n = 500
Group B, healthy individuals: n = 500

The number of smokers in each group is as follows:

Group A with lung cancer: n = 500, 490 smokers, p_A = 490/500 = 98
Group B, healthy individuals: n = 500, 400 smokers, p_B = 400/500 = 80

In this setting:

The overall proportion of smokers is p = frac(490+400) 500 + 500 = 89
The overall proportion of non-smokers is q = 1 – p = 11

So we want to know, whether the proportions of smokers are the same in the two groups of individuals.

The Formula for Two-Proportion Z-Test

The test statistic (also known as z-test) can be calculated as follow:

where, p_A: the proportion observed in group A with size n_A p_B: the proportion observed in group B with size n_B p and q: the overall proportions

Example 1

Let’s say we have two groups of students A and B. Group A with an early morning class of 400 students with 342 female students. Group B with a late class of 400 students with 290 female students. Use a 5% alpha level. We want to know, whether the proportions of females are the same in the two groups of the student. Here let’s use prop.test().

# prop Test in R 

prop.test(x = c(342, 290),

          n = c(400, 400))

Output:

       2-sample test for equality of proportions with continuity correction
data:  c(342, 290) out of c(400, 400)
X-squared = 19.598, df = 1, p-value = 9.559e-06
alternative hypothesis: two.sided
95 percent confidence interval:
0.07177443 0.18822557
sample estimates:
prop 1 prop 2  
0.855   0.725

It returns a p-value
alternative hypothesis
a 95% confidence intervals
a probability of success

Thus, as a result, The p-value of the test is 9.558674e-06 is greater than the significance level of alpha. which is 0.05. That means there is no difference between the Two Proportions. Now if you want to test whether the observed proportion of Females in group A(p_A) is less than the observed proportion of Females in group B(p_B), then the command is:

# prop Test in R 

prop.test(x = c(342, 290), 

        n = c(400, 400), 

        alternative = "less")

Output:

2-sample test for equality of proportions with continuity correction
data:  c(342, 290) out of c(400, 400)
X-squared = 19.598, df = 1, p-value = 1
alternative hypothesis: less
95 percent confidence interval:
 -1.0000000  0.1792664
sample estimates:
prop 1 prop 2 
 0.855  0.725

If we want to test whether the observed proportion of Females in group A(p_A) is greater than the observed proportion of Females in group(p_B), then the command is:

# prop Test in R 

prop.test(x = c(342, 290), 

        n = c(400, 400), 

        alternative = "greater")

Output:

2-sample test for equality of proportions with continuity correction
data:  c(342, 290) out of c(400, 400)
X-squared = 19.598, df = 1, p-value = 4.779e-06
alternative hypothesis: greater
95 percent confidence interval:
 0.08073363 1.00000000
sample estimates:
prop 1 prop 2 
 0.855  0.725

Example 2

ABC company manufactures tablets. For quality control, two sets of tablets were tested. In the first group, 32 out of 700 were found to contain some sort of defect. In the second group, 30 out of 400 were found to contain some sort of defect. Is the difference between the two groups significant? Use a 5% alpha level. Here let’s use prop.test().

# prop Test in R 

prop.test(x = c(32, 30), 

          n = c(700, 400))

Output:

       2-sample test for equality of proportions with continuity correction
data:  c(32, 30) out of c(700, 400)
X-squared =3.5725, df = 1, p-value = 0.05874
alternative hypothesis: two.sided
95 percent confidence interval:
-0.061344109 0.002772681
sample estimates:
 prop 1      prop 2  
0.04571429  0.07500000

It returns a p-value
alternative hypothesis
a 95% confidence intervals
a probability of success

Thus as a result The p-value of the test is 0.0587449 is greater than the significance level of alpha, which is 0.05. That means there is no significant difference between the Two Proportions. Now if you want to test whether the observed proportion of defect in group one is less than the observed proportion of defect in group two, then the command is:

# prop Test in R 

prop.test(x = c(342, 290), 

        n = c(400, 400), 

        alternative = "less")

Output:

    2-sample test for equality of proportions with continuity correction
data:  c(342, 290) out of c(400, 400)
X-squared = 19.598, df = 1, p-value = 4.779e-06
alternative hypothesis: greater
95 percent confidence interval:
 0.08073363 1.00000000
sample estimates:
prop 1 prop 2 
 0.855  0.725

If we want to test whether the observed proportion of defects in group one is greater than the observed proportion of defects in group two, then the command is:

# prop Test in R 

prop.test(x = c(342, 290), 

        n = c(400, 400), 

        alternative = "greater")

Output:

    2-sample test for equality of proportions with continuity correction
data:  c(342, 290) out of c(400, 400)
X-squared = 19.598, df = 1, p-value = 4.779e-06
alternative hypothesis: greater
95 percent confidence interval:
 0.08073363 1.00000000
sample estimates:
prop 1 prop 2 
 0.855  0.725

Article Tags :

R Language

data-science