Open In App

How To Calculate Average For Every Column In A Csv File

Last Updated : 21 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

We are given a CSV file and our task is to find the average of each column in Python using different approaches. In this article, we will see how we can calculate the average for every column in a CSV file.

Example:

Input: 
data.csv
Age,Salary
30,50000
25,60000
28,55000

Output: Average Age: 27.67, Average Salary: 55000.00

Calculate the Average For Every Column in a Python CSV file

Below are some of the ways by which we can calculate the average for every column in a Python CSV file:

  1. Using CSV module and Manual Calculation
  2. Using Pandas Library
  3. Using NumPy Library

data.csv

Age,Salary
30,50000
25,60000
28,55000

Using the csv and Manual Calculation

In this example, the Python program reads a CSV file (‘data.csv’), calculates the sum and count of numeric values in each column, and then computes and prints the average for each column, handling non-numeric values gracefully. The results are displayed with two decimal places.

Python3




# Python program for the above approach
import csv
 
# Open the CSV file for reading
with open('data.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    headers = next(reader)  # Read the header row
 
    # Initialize variables to store column sums and counts
    sums = [0] * len(headers)
    counts = [0] * len(headers)
 
    # Iterate through each row in the CSV file
    for row in reader:
        for i, value in enumerate(row):
            try:
                num = float(value)
                sums[i] += num
                counts[i] += 1
            except ValueError:
                pass  # Ignore non-numeric values
 
    # Calculate and print the average for each column
    for i, header in enumerate(headers):
        average = sums[i] / counts[i] if counts[i] != 0 else 0
        print(f"Average {header}: {average:.2f}")


Output:

Average Age: 27.67
Average Salary: 55000.00

Using Pandas Library

In this example, the Python script uses the Pandas library to read a CSV file (‘data.csv’) into a DataFrame. It then calculates the average for each column using the mean() function and displays the results, providing a concise and efficient approach for calculating column averages in a CSV dataset.

Python3




import csv
import pandas as pd
 
# Read the CSV file (replace 'data.csv' with your file path)
df = pd.read_csv('data.csv')
 
# Calculate column averages
column_averages = df.mean()
 
# Display the results
print("Average for each column:")
print(column_averages)


Output:

Average for each column:
Age          27.666667
Salary    55000.000000
dtype: float64

Using NumPy Library

In this example, the Python script utilizes the NumPy library to read a CSV file (‘data.csv’) and convert it into a NumPy array of integers, skipping the header row. It then calculates the average for specific columns (Age and Salary) using np.mean() and displays the results with two decimal places. This approach provides a concise method for computing column averages in a CSV dataset with numerical data.

Python3




import numpy as np
import csv
 
# Read the CSV file (replace 'data.csv' with your file path)
with open('data.csv', 'r') as f:
    reader = csv.reader(f)
    next(reader)  # Skip the header row
    data = np.array(list(reader), dtype=int)
 
# Calculate column averages
age_avg = np.mean(data[:, 0])  # Column 0 (Age)
salary_avg = np.mean(data[:, 1])  # Column 1 (Salary)
 
# Display the results
print(f"Average Age: {age_avg:.2f}")
print(f"Average Salary: ${salary_avg:.2f}")


Output:

Average Age: 27.67
Average Salary: $55000.00


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads