How To Calculate Average For Every Column In A Csv File
Last Updated :
21 Mar, 2024
We are given a CSV file and our task is to find the average of each column in Python using different approaches. In this article, we will see how we can calculate the average for every column in a CSV file.
Example:
Input:
data.csv
Age,Salary
30,50000
25,60000
28,55000
Output: Average Age: 27.67, Average Salary: 55000.00
Calculate the Average For Every Column in a Python CSV file
Below are some of the ways by which we can calculate the average for every column in a Python CSV file:
- Using CSV module and Manual Calculation
- Using Pandas Library
- Using NumPy Library
data.csv
Age,Salary
30,50000
25,60000
28,55000
Using the csv and Manual Calculation
In this example, the Python program reads a CSV file (‘data.csv’), calculates the sum and count of numeric values in each column, and then computes and prints the average for each column, handling non-numeric values gracefully. The results are displayed with two decimal places.
Python3
import csv
with open ( 'data.csv' , newline = '') as csvfile:
reader = csv.reader(csvfile)
headers = next (reader)
sums = [ 0 ] * len (headers)
counts = [ 0 ] * len (headers)
for row in reader:
for i, value in enumerate (row):
try :
num = float (value)
sums[i] + = num
counts[i] + = 1
except ValueError:
pass
for i, header in enumerate (headers):
average = sums[i] / counts[i] if counts[i] ! = 0 else 0
print (f "Average {header}: {average:.2f}" )
|
Output:
Average Age: 27.67
Average Salary: 55000.00
Using Pandas Library
In this example, the Python script uses the Pandas library to read a CSV file (‘data.csv’) into a DataFrame. It then calculates the average for each column using the mean()
function and displays the results, providing a concise and efficient approach for calculating column averages in a CSV dataset.
Python3
import csv
import pandas as pd
df = pd.read_csv( 'data.csv' )
column_averages = df.mean()
print ( "Average for each column:" )
print (column_averages)
|
Output:
Average for each column:
Age 27.666667
Salary 55000.000000
dtype: float64
Using NumPy Library
In this example, the Python script utilizes the NumPy library to read a CSV file (‘data.csv’) and convert it into a NumPy array of integers, skipping the header row. It then calculates the average for specific columns (Age and Salary) using np.mean()
and displays the results with two decimal places. This approach provides a concise method for computing column averages in a CSV dataset with numerical data.
Python3
import numpy as np
import csv
with open ( 'data.csv' , 'r' ) as f:
reader = csv.reader(f)
next (reader)
data = np.array( list (reader), dtype = int )
age_avg = np.mean(data[:, 0 ])
salary_avg = np.mean(data[:, 1 ])
print (f "Average Age: {age_avg:.2f}" )
print (f "Average Salary: ${salary_avg:.2f}" )
|
Output:
Average Age: 27.67
Average Salary: $55000.00
Share your thoughts in the comments
Please Login to comment...