Let us see how to predict the air quality index using Python. AQI is calculated based on chemical pollutant quantity. By using machine learning, we can predict the AQI.
AQI: The air quality index is an index for reporting air quality on a daily basis. In other words, it is a measure of how air pollution affects one’s health within a short time period. The AQI is calculated based on the average concentration of a particular pollutant measured over a standard time interval. Generally, the time interval is 24 hours for most pollutants, and 8 hours for carbon monoxide and ozone.
We can see how air pollution is by looking at the AQI
AQI Level |
AQI Range |
Good |
0 – 50 |
Moderate |
51 – 100 |
Unhealthy |
101 – 150 |
Unhealthy for Strong People |
151 – 200 |
Hazardous |
201+ |
Let’s find the AQI based on Chemical pollutants using Machine Learning Concept.
Data Set Description
It contains 8 attributes, of which 7 are chemical pollution quantities and one is Air Quality Index. PM2.5-AVG, PM10-AVG, NO2-AVG, NH3-AVG, SO2-AG, OZONE-AVG are independent attributes. air_quality_index is a dependent attribute. Since air_quality_index is calculated based on the 7 attributes. You can download the dataset Here.
As the data is numeric and there are no missing values in the data, so no preprocessing is required. Our goal is to predict the AQI, so this task is either Classification or regression. So as our class label is continuous, regression technique is required.
Regression is supervised learning technique that fits the data in a given range. Example Regression techniques in Python:
- Random Forest Regressor
- Ada Boost Regressor
- Bagging Regressor
- Linear Regression etc.
Python3
import pandas as pd
train = pd.read_csv( 'AQI.csv' )
train.head()
|
Output:

Python3
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import RandomForestRegressor
m1 = RandomForestRegressor()
train1 = train.drop([ 'air_quality_index' ], axis = 1 )
target = train[ 'air_quality_index' ]
m1.fit(train1, target)
m1.score(train1, target) * 100
m1.predict([[ 123 , 45 , 67 , 34 , 5 , 0 , 23 ]])
m2 = AdaBoostRegressor()
m2.fit(train1, target)
m2.score(train1, target) * 100
m2.predict([[ 123 , 45 , 67 , 34 , 5 , 0 , 23 ]])
|
Output:
By this, we can say that by given test data we got 123 and 95 so the AQI is Unhealthy.
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape,
GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out -
check it out now!
Last Updated :
28 Aug, 2023
Like Article
Save Article