Replacing strings with numbers in Python for Data Analysis

Sometimes we need to convert string values in a pandas dataframe to a unique integer so that the algorithms can perform better. So we assign unique numeric value to a string value in Pandas DataFrame.

Note: Before executing create an example.csv file containing some names and gender

Say we have a table containing names and gender column. In gender column, there are two categories male and female and suppose we want to assign 1 to male and 2 to female.

Examples:

Input : 
---------------------
    |  Name  |  Gender
---------------------
 0    Ram        Male
 1    Seeta      Female
 2    Kartik     Male
 3    Niti       Female
 4    Naitik     Male 

Output :
    |  Name  |  Gender
---------------------
 0    Ram        1
 1    Seeta      2
 2    Kartik     1
 3    Niti       2
 4    Naitik     1 

Method 1:

To create a dictionary containing two 
elements with following key-value pair:
Key       Value
male      1
female    2

Then iterate using for loop through Gender column of DataFrame and replace the values wherever the keys are found.

filter_none

edit
close

play_arrow

link
brightness_4
code

# import pandas library
import pandas as pd
  
# creating file handler for 
# our example.csv file in
# read mode
file_handler = open("example.csv", "r")
  
# creating a Pandas DataFrame
# using read_csv function 
# that reads from a csv file.
data = pd.read_csv(file_handler, sep = ",")
  
# closing the file handler
file_handler.close()
  
# creating a dict file 
gender = {'male': 1,'female': 2}
  
# traversing through dataframe
# Gender column and writing
# values where key matches
data.Gender = [gender[item] for item in data.Gender]
print(data)

chevron_right


Output :

    |  Name  |  Gender
---------------------
 0    Ram        1
 1    Seeta      2
 2    Kartik     1
 3    Niti       2
 4    Naitik     1 

Method 2:
Method 2 is also similar but requires no dictionary file and takes fewer lines of code. In this, we internally iterate through Gender column of DataFrame and change the values if the condition matches.

filter_none

edit
close

play_arrow

link
brightness_4
code

# import pandas library
import pandas as pd
  
# creating file handler for
# our example.csv file in
# read mode
file_handler = open("example.csv", "r")
  
# creating a Pandas DataFrame
# using read_csv function that
# reads from a csv file.
data = pd.read_csv(file_handler, sep = ",")
  
# closing the file handler
file_handler.close()
  
# traversing through Gender 
# column of dataFrame and 
# writing values where
# condition matches.
data.Gender[data.Gender == 'male'] = 1
data.Gender[data.Gender == 'female'] = 2
print(data)

chevron_right


Output :

    |  Name  |  Gender
---------------------
 0    Ram        1
 1    Seeta      2
 2    Kartik     1
 3    Niti       2
 4    Naitik     1 

Applications

  1. This technique can be applied in Data Science. Suppose if we are working on a dataset that contains gender as ‘male’ and ‘female’ then we can assign numbers like ‘0’ and ‘1’ respectively so that our algorithms can work on the data.
  2. This technique can also be applied to replace some particular values in a datasets with new values.

References



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.




Article Tags :

1


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.