Goal is to remove leading/trailing whitespace and make each product name have its first letter uppercase and the rest lowercase (e.g., UMbreLla -> Umbrella).
Let's consider a DataFrame with product names that are not formatted properly:
import pandas as pd
df = pd.DataFrame({ 'Date': ['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
'Product': [' UMbreLla', ' maTtress', 'BaDmintoN ', 'Shuttle'],
'Updated_Price': [1250, 1450, 1550, 400],
'Discount': [10, 8, 15, 10] })
print(df)
Output
Date Product Updated_Price Discount
0 10/2/2011 UMbreLla 1250 10
1 11/2/2011 maTtress 1450 8
2 12/2/2011 BaDmintoN 1550 15
3 13/2/2011 Shuttle 400 10
Now let's see different methods to clean product names in a DataFrame.
Using vectorized string methods
Pandas provides a set of vectorized string functions accessible via .str accessor. These functions operate efficiently on entire columns at once, allowing you to clean text data quickly.
df['Product'] = df['Product'].str.strip().str.capitalize()
print(df)
Output
Date Product Updated_Price Discount
0 10/2/2011 Umbrella 1250 10
1 11/2/2011 Mattress 1450 8
2 12/2/2011 Badminton 1550 15
3 13/2/2011 Shuttle 400 10
Explanation:
- .str.strip(): remove leading/trailing whitespace for every value in the column at once.
- .str.capitalize(): convert each string to have the first letter uppercase and the rest lowercase.
- assignment updates the Product column in place.
Using apply() with a lambda function
apply() function can be used to execute a custom operation on each element of a column individually. By combining it with a lambda function, you can remove unwanted spaces and adjust capitalization of strings.
df['Product'] = df['Product'].apply(lambda x: x.strip().capitalize())
print(df)
Output
Date Product Updated_Price Discount
0 10/2/2011 Umbrella 1250 10
1 11/2/2011 Mattress 1450 8
2 12/2/2011 Badminton 1550 15
3 13/2/2011 Shuttle 400 10
Explanation: df['Product'].apply(lambda x: x.strip().capitalize()) for each product string x, remove outer whitespace and capitalize it; assign results back to the column.
Using an explicit in-place loop
Explicit loop allows you to access and modify each row of DataFrame manually. By iterating over the rows and updating column values with string operations like strip() and capitalize(), you can clean the data step by step.
def fmt(d):
for i in range(d.shape[0]):
d.iat[i,1] = d.iat[i,1].strip().capitalize()
fmt(df)
print(df)
Output
Date Product Updated_Price Discount
0 10/2/2011 Umbrella 1250 10
1 11/2/2011 Mattress 1450 8
2 12/2/2011 Badminton 1550 15
3 13/2/2011 Shuttle 400 10
Explanation:
- Define fmt(d) to loop through rows and clean column 1 by stripping spaces and capitalizing strings.
- Call fmt(df) to apply it and print(df) to display the cleaned DataFrame.