Efficient Methods to Iterate Rows in Pandas Dataframe

Iterating over rows in a Pandas DataFrame means accessing each row one by one to perform operations or calculations. For example, you have a DataFrame of employees salaries and bonuses and want to calculate total compensation for each employee efficient row-wise operations are essential.

Let’s consider this DataFrame:

Python

import pandas as pd
import numpy as np

data = {'A': np.random.randint(1, 20, 10),
        'B': np.random.randint(10, 30, 10),
        'C': np.random.choice(['X', 'Y', 'Z'], 10)}
df = pd.DataFrame(data)
print(df)

Output

A B C
0 2 21 X
1 7 21 X
2 14 27 X
3 2 29 X
4 16 21 Z
5 18 10 Y
6 7 28 Z
7 12 21 Z
8 15 11 X
9 13 24 Z

Now, let’s explore the most efficient methods one by one.

Using itertuples()

itertuples() returns each row as a lightweight named tuple, preserving data types and consuming less memory. It is ideal for large datasets when you need structured row-wise access.

Example: In this example, we compute a new column Result based on column C. If C is 'X', we multiply A and B; otherwise, we add them.

Python

results = []
for row in df.itertuples(index=False):
    results.append(row.A * row.B if row.C == 'X' else row.A + row.B)

df['Result'] = results
print(df)

Output

A B C Result
0 11 12 X 132
1 10 24 Y 34
2 11 28 Y 39
3 17 22 Z 39
4 9 20 Z 29
5 13 15 Z 28
6 2 27 Y 29
7 10 18 Z 28
8 5 14 Y 19
9 17 25 X 425

Explanation:

df.itertuples(index=False) iterates over each row as a named tuple.
For rows where C is 'X', A * B is calculated; otherwise A + B.
Results are stored in a list and assigned to the new column Result.

Using apply()

.apply() allows applying a custom function to each row or column. It is flexible for complex logic that depends on multiple columns but slower than itertuples().

Example: In this example, a custom function calculates Result based on column C.

Python

def custom_func(row):
    return row['A'] * 2 if row['C'] == 'X' else row['B'] * 3

df['Result'] = df.apply(custom_func, axis=1)
print(df)

Output

A B C Result
0 16 23 X 32
1 2 26 X 4
2 5 24 Z 72
3 16 22 X 32
4 16 28 X 32
5 9 10 Z 30
6 17 16 Z 48
7 15 11 Y 33
8 3 27 Z 81
9 9 10 Y 30

Explanation:

custom_func is applied to each row of the DataFrame (axis=1).
If C is 'X', the function returns A * 2; otherwise, it returns B * 3.
The results are directly assigned to a new column Result.

Vectorization

Vectorized operations perform calculations on entire columns at once without explicit iteration. They are the fastest method for large datasets and should be preferred when possible.

Example: In this example, Result is computed using np.where for conditional vectorized operations.

Python

df['Result'] = np.where(df['C'] == 'X', df['A'] * df['B'], df['A'] + df['B'])
print(df)

Output

A B C Result
0 5 28 X 140
1 18 29 X 522
2 6 17 X 102
3 11 19 Y 30
4 15 10 X 150
5 9 20 Y 29
6 14 17 X 238
7 8 16 X 128
8 2 22 Y 24
9 11 27 X 297

Explanation:

np.where applies a vectorized conditional operation across the entire DataFrame column.
For rows where C is 'X', A * B is calculated; otherwise, A + B.
The entire column is updated in a single operation without explicit iteration.

Efficient Methods to Iterate Rows in Pandas Dataframe

Using itertuples()

Using apply()

Vectorization

Explore