Replacing strings with numbers in Python for Data Analysis
Last Updated :
05 Feb, 2018
Sometimes we need to convert string values in a pandas dataframe to a unique integer so that the algorithms can perform better. So we assign unique numeric value to a string value in Pandas DataFrame.
Note: Before executing create an example.csv file containing some names and gender
Say we have a table containing names and gender column. In gender column, there are two categories male and female and suppose we want to assign 1 to male and 2 to female.
Examples:
Input :
---------------------
| Name | Gender
---------------------
0 Ram Male
1 Seeta Female
2 Kartik Male
3 Niti Female
4 Naitik Male
Output :
| Name | Gender
---------------------
0 Ram 1
1 Seeta 2
2 Kartik 1
3 Niti 2
4 Naitik 1
Method 1:
To create a dictionary containing two
elements with following key-value pair:
Key Value
male 1
female 2
Then iterate using for loop through Gender column of DataFrame and replace the values wherever the keys are found.
import pandas as pd
file_handler = open ( "example.csv" , "r" )
data = pd.read_csv(file_handler, sep = "," )
file_handler.close()
gender = { 'male' : 1 , 'female' : 2 }
data.Gender = [gender[item] for item in data.Gender]
print (data)
|
Output :
| Name | Gender
---------------------
0 Ram 1
1 Seeta 2
2 Kartik 1
3 Niti 2
4 Naitik 1
Method 2:
Method 2 is also similar but requires no dictionary file and takes fewer lines of code. In this, we internally iterate through Gender column of DataFrame and change the values if the condition matches.
import pandas as pd
file_handler = open ( "example.csv" , "r" )
data = pd.read_csv(file_handler, sep = "," )
file_handler.close()
data.Gender[data.Gender = = 'male' ] = 1
data.Gender[data.Gender = = 'female' ] = 2
print (data)
|
Output :
| Name | Gender
---------------------
0 Ram 1
1 Seeta 2
2 Kartik 1
3 Niti 2
4 Naitik 1
Applications
- This technique can be applied in Data Science. Suppose if we are working on a dataset that contains gender as ‘male’ and ‘female’ then we can assign numbers like ‘0’ and ‘1’ respectively so that our algorithms can work on the data.
- This technique can also be applied to replace some particular values in a datasets with new values.
References
Share your thoughts in the comments
Please Login to comment...