As we know, In today’s world data analytics is being used by all sorts of companies out there. While working with data, we can come across any sort of problem which requires an out of the box approach for evaluation. Most of the Data in real life contains the name of entities or other nouns. It might be possible that the names are not in proper format. In this post, we are going to discuss the approaches to clean such data.
Suppose we are dealing with the data of an e-commerce based website. The name of the products is not in the proper format. Properly format the data such that the there are no leading and trailing whitespaces as well as the first letters of all products are capital letter.
Solution #1: Many times we will come across a situation where we are required to write our own customized function suited for the task at hand.
Now we will writer our own customized function to solve this problem.
Solution #2 : Now we will see a better and efficient approach using Pandas
Let’s use the Pandas
DataFrame.apply() function to format the Product names in the right format. Inside the Pandas
DataFrame.apply() function we will use lambda function.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.