Python | Data Comparison and Selection in Pandas
Last Updated :
17 Sep, 2018
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages, and makes importing and analyzing data much easier.
The most important thing in Data Analysis is comparing values and selecting data accordingly. The “==” operator works for multiple values in a Pandas Data frame too. Following two examples will show how to compare and select data from a Pandas Data frame.
To download the CSV file used, Click Here.
Example #1: Comparing Data
In the following example, a data frame is made from a csv file. In the Gender Column, there are only 3 types of values (“Male”, “Female” or NaN). Every row of Gender column is compared to “Male” and a boolean series is returned after that.
import pandas as pd
data = pd.read_csv( "employees.csv" )
new = data[ "Gender" ] = = "Male"
data[ "New" ] = new
data
|
Output:
As show in the output image, for Gender= “Male”, the value in New Column is True and for “Female” and NaN values it is False.
Example #2: Selecting Data
In the following example, the boolean series is passed to the data and only Rows having Gender=”Male” are returned.
import pandas as pd
data = pd.read_csv( "employees.csv" )
new = data[ "Gender" ] ! = "Female"
data[ "New" ] = new
data[new]
|
Output:
As shown in the output image, Data frame having Gender=”Male” is returned.
Note: For NaN values, the boolean value is False.
Share your thoughts in the comments
Please Login to comment...