Clean String Data in the given Pandas Dataframe

Last Updated : 30 Sep, 2025

Goal is to remove leading/trailing whitespace and make each product name have its first letter uppercase and the rest lowercase (e.g., UMbreLla -> Umbrella).

Let's consider a DataFrame with product names that are not formatted properly:

Python
import pandas as pd
df = pd.DataFrame({ 'Date': ['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
                    'Product': [' UMbreLla', '  maTtress', 'BaDmintoN ', 'Shuttle'],
                    'Updated_Price': [1250, 1450, 1550, 400],
                    'Discount': [10, 8, 15, 10] })
print(df)

Output

Date Product Updated_Price Discount
0 10/2/2011 UMbreLla 1250 10
1 11/2/2011 maTtress 1450 8
2 12/2/2011 BaDmintoN 1550 15
3 13/2/2011 Shuttle 400 10

Now let's see different methods to clean product names in a DataFrame.

Using vectorized string methods

Pandas provides a set of vectorized string functions accessible via .str accessor. These functions operate efficiently on entire columns at once, allowing you to clean text data quickly.

Python
df['Product'] = df['Product'].str.strip().str.capitalize()
print(df)

Output

Date Product Updated_Price Discount
0 10/2/2011 Umbrella 1250 10
1 11/2/2011 Mattress 1450 8
2 12/2/2011 Badminton 1550 15
3 13/2/2011 Shuttle 400 10

Explanation:

  • .str.strip(): remove leading/trailing whitespace for every value in the column at once.
  • .str.capitalize(): convert each string to have the first letter uppercase and the rest lowercase.
  • assignment updates the Product column in place.

Using apply() with a lambda function

apply() function can be used to execute a custom operation on each element of a column individually. By combining it with a lambda function, you can remove unwanted spaces and adjust capitalization of strings.

Python
df['Product'] = df['Product'].apply(lambda x: x.strip().capitalize())
print(df)

Output

Date Product Updated_Price Discount
0 10/2/2011 Umbrella 1250 10
1 11/2/2011 Mattress 1450 8
2 12/2/2011 Badminton 1550 15
3 13/2/2011 Shuttle 400 10

Explanation: df['Product'].apply(lambda x: x.strip().capitalize()) for each product string x, remove outer whitespace and capitalize it; assign results back to the column.

Using an explicit in-place loop

Explicit loop allows you to access and modify each row of DataFrame manually. By iterating over the rows and updating column values with string operations like strip() and capitalize(), you can clean the data step by step.

Python
def fmt(d):
    for i in range(d.shape[0]):
        d.iat[i,1] = d.iat[i,1].strip().capitalize()

fmt(df)
print(df)

Output

Date Product Updated_Price Discount
0 10/2/2011 Umbrella 1250 10
1 11/2/2011 Mattress 1450 8
2 12/2/2011 Badminton 1550 15
3 13/2/2011 Shuttle 400 10

Explanation:

  • Define fmt(d) to loop through rows and clean column 1 by stripping spaces and capitalizing strings.
  • Call fmt(df) to apply it and print(df) to display the cleaned DataFrame.
Comment

Explore