Open In App

Z-test

Different tests are used in statistics to compare distinct samples or groups and make conclusions about populations. These tests, also referred to as statistical tests, concentrate on examining the probability or possibility of acquiring the observed data under particular premises or hypotheses. They offer a framework for evaluating the evidence for or against a given hypothesis. 

A statistical test starts with the formulation of a null hypothesis (H0) and an alternative hypothesis (Ha). The alternative hypothesis proposes a particular link or effect, whereas the null hypothesis reflects the default assumption and often states no effect or no difference.



The p-value indicates the likelihood of observing the data or more extreme results assuming the null hypothesis is true. Researchers compare the calculated p-value to a predetermined significance level, often denoted as α, to make a decision regarding the null hypothesis. If the p-value is smaller than α, the results are considered statistically significant, leading to the rejection of the null hypothesis in favor of the alternative hypothesis.

The p-value is calculated using a variety of statistical tests, including the Z-test, T-test, Chi-squared test, ANOVA, Z-test, and F-test, among others. In this article, we will focus on explaining the Z-test.



What is Z-Test?

Z-test is a statistical test that is used to determine whether the mean of a sample is significantly different from a known population mean when the population standard deviation is known. It is particularly useful when the sample size is large (>30).

Z-test can also be defined as a statistical method that is used to determine whether the distribution of the test statistics can be approximated using the normal distribution or not. It is the method to determine whether two sample means are approximately the same or different when their variance is known and the sample size is large (should be >= 30).

The Z-test compares the difference between the sample mean and the population means by considering the standard deviation of the sampling distribution. The resulting Z-score represents the number of standard deviations that the sample mean deviates from the population mean. This Z-Score is also known as Z-Statistics, and can be formulated as:

where,

z-test assumes that the test statistic (z-score) follows a standard normal distribution.

Example:

The average family annual income in India is 200k, with a standard deviation of 5k, and the average family annual income in Delhi is 300k.

Then Z-Score for Delhi will be.

This indicates that the average family’s annual income in Delhi is 20 standard deviations above the mean of the population (India).

When to Use Z-test:

Hypothesis Testing

A hypothesis is an educated guess/claim about a particular property of an object. Hypothesis testing is a way to validate the claim of an experiment.

Level of significance: It means the degree of significance in which we accept or reject the null hypothesis. Since in most of the experiments 100% accuracy is not possible for accepting or rejecting a hypothesis, we, therefore, select a level of significance. It is denoted by alpha (∝).

Steps to perform Z-test:

Type of Z-test

Below is an example of performing the z-test:

Example One-Tailed Test:

 A school claimed that the students who study that are more intelligent than the average school. On calculating the IQ scores of 50 students, the average turns out to be 110. The mean of the population IQ is 100 and the standard deviation is 15. State whether the claim of the principal is right or not at a 5% significance level.

Code Implementations

# Import the necessary libraries
import numpy as np
import scipy.stats as stats
 
# Given information
sample_mean = 110
population_mean = 100
population_std = 15
sample_size = 50
alpha = 0.05
 
# compute the z-score
z_score = (sample_mean-population_mean)/(population_std/np.sqrt(50))
print('Z-Score :',z_score)
 
# Approach 1: Using Critical Z-Score
 
# Critical Z-Score
z_critical = stats.norm.ppf(1-alpha)
print('Critical Z-Score :',z_critical)
 
# Hypothesis
if z_score >  z_critical:
    print("Reject Null Hypothesis")
else:
  print("Fail to Reject Null Hypothesis")
 
# Approach 2: Using P-value
     
# P-Value : Probability of getting less than a Z-score
p_value = 1-stats.norm.cdf(z_score)
 
print('p-value :',p_value)
 
# Hypothesis
if p_value <  alpha:
    print("Reject Null Hypothesis")
else:
  print("Fail to Reject Null Hypothesis")

                    

Output:

Z-Score : 4.714045207910317
Critical Z-Score : 1.6448536269514722
Reject Null Hypothesis
p-value : 1.2142337364462463e-06
Reject Null Hypothesis

Two-sampled z-test:

In this test, we have provided 2 normally distributed and independent populations, and we have drawn samples at random from both populations. Here, we consider u1 and u2 to be the population mean, and X1 and X2 to be the observed sample mean. Here, our null hypothesis could be like this:

and alternative hypothesis

and the formula for calculating the z-test score:

where   and   are the standard deviation and n1 and n2 are the sample size of population corresponding to u1 and u2 . 

Example:

There are two groups of students preparing for a competition: Group A and Group B. Group A has studied offline classes, while Group B has studied online classes. After the examination, the score of each student comes. Now we want to determine whether the online or offline classes are better.

Group A: Sample size = 50, Sample mean = 75, Sample standard deviation = 10
Group B: Sample size = 60, Sample mean = 80, Sample standard deviation = 12

Assuming a 5% significance level, perform a two-sample z-test to determine if there is a significant difference between the online and offline classes.

Solution:

Step 1: Null & Alternate Hypothesis

Step 2: Significance Label

Step 3: Z-Score

Step 4: Check to Critical Z-Score value in the Z-Table for apha/2 = 0.025

Step 5: Compare with the absolute Z-Score value

Code Implementations

import numpy as np
import scipy.stats as stats
 
 
# Group A (Offline Classes)
n1 = 50
x1 = 75
s1 = 10
 
# Group B (Online Classes)
n2 = 60
x2 = 80
s2 = 12
 
# Null Hypothesis = mu_1-mu_2 = 0
# Hypothesized difference (under the null hypothesis)
D = 0
 
# Set the significance level
alpha = 0.05
 
# Calculate the test statistic (z-score)
z_score = ((x1 - x2) - D) / np.sqrt((s1**2 / n1) + (s2**2 / n2))
print('Z-Score:', np.abs(z_score))
 
# Calculate the critical value
z_critical = stats.norm.ppf(1 - alpha/2)
print('Critical Z-Score:',z_critical)
 
 
# Compare the test statistic with the critical value
if np.abs(z_score) > z_critical:
    print("""Reject the null hypothesis.
There is a significant difference between the online and offline classes.""")
else:
    print("""Fail to reject the null hypothesis.
There is not enough evidence to suggest a significant difference between the online and offline classes.""")
 
# Approach 2: Using P-value
     
# P-Value : Probability of getting less than a Z-score
p_value = 2 * (1 - stats.norm.cdf(np.abs(z_score)))
print('P-Value :',p_value)
 
# Compare the p-value with the significance level
if p_value < alpha:
    print("""Reject the null hypothesis.
There is a significant difference between the online and offline classes.""")
else:
    print("""Fail to reject the null hypothesis.
There is not enough evidence to suggest significant difference between the online and offline classes.""")

                    

Output:

Z-Score: 2.3836564731139807
Critical Z-Score: 1.959963984540054
Reject the null hypothesis.
There is a significant difference between the online and offline classes.
P-Value : 0.01714159544079563
Reject the null hypothesis.
There is a significant difference between the online and offline classes.

Type 1 error and Type II error:

  Null Hypothesis is TRUE Null Hypothesis is FALSE
Reject Null Hypothesis

Type I Error

(False Positive)

Correct decision
Fail to Reject the Null Hypothesis Correct decision

Type II error

(False Negative)


Article Tags :