Open In App

Calculate Average for Each Row in a CSV file using Python

Last Updated : 19 Feb, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Working with CSV files is a common task in data analysis and manipulation. Finding the average of each row in a CSV file becomes crucial when dealing with datasets. In this guide, we’ll explore how to achieve this using Python, providing a step-by-step approach and practical examples.

dataset.csv

total_bill,tip,sex,smoker,day,time,size
16.99,1.01,Female,No,Sun,Dinner,2
10.34,1.66,Male,No,Sun,Dinner,3
21.01,3.5,Male,No,Sun,Dinner,3
23.68,3.31,Male,No,Sun,Dinner,2
24.59,3.61,Female,No,Sun,Dinner,4
25.29,4.71,Male,No,Sun,Dinner,4
8.77,2.0,Male,No,Sun,Dinner,2
26.88,3.12,Male,No,Sun,Dinner,4
15.04,1.96,Male,No,Sun,Dinner,2
14.78,3.23,Male,No,Sun,Dinner,2
10.27,1.71,Male,No,Sun,Dinner,2
35.26,5.0,Female,No,Sun,Dinner,4
15.42,1.57,Male,No,Sun,Dinner,2
18.43,3.0,Male,No,Sun,Dinner,4

Find the Average of Each Row in Python

Below are some of the ways by which we can find the average of each row in a CSV file in Python:

  1. Using the CSV module
  2. Using Pandas
  3. Using NumPy

Average of Each Row Using the CSV module

After converting the data into numerical values, we can find the average of each row. To calculate the average, we use the sum() function to add up all the numerical values in the row, and the len() function to determine the number of values. By dividing the sum by the length, we obtain the average for that particular row.

Python3




import csv
 
def convert_to_float(value):
    try:
        return float(value)
    except ValueError:
        return None
 
with open('dataset.csv', 'r') as file:
    # Create a CSV reader object
    csv_reader = csv.reader(file)
 
    # Skip the header if it exists
    next(csv_reader, None)
 
    for row in csv_reader:
        values = []
 
        # Iterate through each value in the row
        for value in row:
            converted_value = convert_to_float(value)
 
            # Check if the conversion was successful (not None)
            if converted_value is not None:
                values.append(converted_value)
 
        # Check if there are numeric values in the row
        if values:
            # Calculate the row average
            row_average = sum(values) / len(values)
 
            # Print the result
            print(f"Row: {row}, Average: {row_average}")


Output:

Row: ['16.99', '1.01', 'Female', 'No', 'Sun', 'Dinner', '2'], Average: 6.666666666666667
Row: ['10.34', '1.66', 'Male', 'No', 'Sun', 'Dinner', '3'], Average: 5.0
Row: ['21.01', '3.5', 'Male', 'No', 'Sun', 'Dinner', '3'], Average: 9.17
Row: ['23.68', '3.31', 'Male', 'No', 'Sun', 'Dinner', '2'], Average: 9.663333333333332
Row: ['24.59', '3.61', 'Female', 'No', 'Sun', 'Dinner', '4'], Average: 10.733333333333334
Row: ['25.29', '4.71', 'Male', 'No', 'Sun', 'Dinner', '4'], Average: 11.333333333333334
Row: ['8.77', '2.0', 'Male', 'No', 'Sun', 'Dinner', '2'], Average: 4.256666666666667
Row: ['26.88', '3.12', 'Male', 'No', 'Sun', 'Dinner', '4'], Average: 11.333333333333334
Row: ['15.04', '1.96', 'Male', 'No', 'Sun', 'Dinner', '2'], Average: 6.333333333333333
Row: ['14.78', '3.23', 'Male', 'No', 'Sun', 'Dinner', '2'], Average: 6.669999999999999
Row: ['10.27', '1.71', 'Male', 'No', 'Sun', 'Dinner', '2'], Average: 4.66
Row: ['35.26', '5.0', 'Female', 'No', 'Sun', 'Dinner', '4'], Average: 14.753333333333332
Row: ['15.42', '1.57', 'Male', 'No', 'Sun', 'Dinner', '2'], Average: 6.329999999999999
Row: ['18.43', '3.0', 'Male', 'No', 'Sun', 'Dinner', '4'], Average: 8.476666666666667

Average of Each Row Using Pandas

In this example, the Pandas library is used to read a CSV file (dataset.csv) and calculate the average of numeric values for each row in the specified columns (‘total_bill’, ‘tip’, ‘size’). The function average_of_each_row reads the CSV file, drops any NaN values from the specified columns, calculates the row-wise mean, and prints the average along with the corresponding row values.

Python3




import pandas as pd
 
def average_of_each_row(csv_file, numeric_columns=None):
    df = pd.read_csv(csv_file, usecols=numeric_columns)
    for index, row in df.iterrows():
        row = row.dropna()  # Drop any NaN values
        if not row.empty:
            average = row.mean()
            print("Average of row {} : {}".format(row.values, average))
 
csv_file = 'dataset.csv'
# Specify the columns that contain numeric data
numeric_columns = ['total_bill', 'tip', 'size']
average_of_each_row(csv_file, numeric_columns)


Output:

Average of row [16.99  1.01  2.  ] : 6.666666666666667
Average of row [10.34 1.66 3. ] : 5.0
Average of row [21.01 3.5 3. ] : 9.17
Average of row [23.68 3.31 2. ] : 9.663333333333332
Average of row [24.59 3.61 4. ] : 10.733333333333334
Average of row [25.29 4.71 4. ] : 11.333333333333334
Average of row [8.77 2. 2. ] : 4.256666666666667
Average of row [26.88 3.12 4. ] : 11.333333333333334
Average of row [15.04 1.96 2. ] : 6.333333333333333
Average of row [14.78 3.23 2. ] : 6.669999999999999
Average of row [10.27 1.71 2. ] : 4.66
Average of row [35.26 5. 4. ] : 14.753333333333332
Average of row [15.42 1.57 2. ] : 6.329999999999999
Average of row [18.43 3. 4. ] : 8.476666666666667

Average of Each Row Using NumPy

In this example, the NumPy library is used to read a CSV file (dataset.csv) and calculate the average of each row in the dataset. The function average_of_each_row employs np.genfromtxt to load the data, removes NaN values from each row, calculates the row-wise mean using np.mean, and prints the average along with the corresponding row index.

Python3




import numpy as np
 
def average_of_each_row(csv_file):
    data = np.genfromtxt(csv_file, delimiter=',')
    for i, row in enumerate(data):
        row_without_nan = row[~np.isnan(row)]  # Remove nan values from the row
        average = np.mean(row_without_nan)
        print("Average of row {}: {}".format(i + 1, average))
 
csv_file = 'dataset.csv'
average_of_each_row(csv_file)


Output:

Average of row 1: 6.666666666666667
Average of row 2: 5.0
Average of row 3: 9.17
Average of row 4: 9.663333333333332
Average of row 5: 10.733333333333334
Average of row 6: 11.333333333333334
Average of row 7: 4.256666666666667
Average of row 8: 11.333333333333334
Average of row 9: 6.333333333333333
Average of row 10: 6.669999999999999
Average of row 11: 4.66
Average of row 12: 14.753333333333332
Average of row 13: 6.329999999999999
Average of row 14: 8.476666666666667

Conclusion

To sum up, working with CSV files is important for analyzing data. This guide showed an easy way to find the average of each row in a CSV file using Python. We used a module called `csv` to handle the data efficiently. The key steps include turning text values into numbers, dealing with non-number data correctly, and going through the process step by step: opening the file, changing the data, and figuring out the averages using the different methods as discussed above.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads