Boolean Indexing in Pandas

In boolean indexing, we will select subsets of data based on the actual values of the data in the DataFrame and not on their row/column labels or integer locations. In boolean indexing, we use a boolean vector to filter the data.

Boolean indexing is a type of indexing which uses actual values of the data in the DataFrame. In boolean indexing, we can filter a data in four ways –

  • Accessing a DataFrame with a boolean index
  • Applying a boolean mask to a dataframe
  • Masking data based on column value
  • Masking data based on index value

Accessing a DataFrame with a boolean index :
In order to access a dataframe with a boolean index, we have to create a dataframe in which index of dataframe contains a boolean value that is “True” or “False”. For Example

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing pandas as pd
import pandas as pd
   
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
   
df = pd.DataFrame(dict, index = [True, False, True, False])
   
print(df)

chevron_right


Output:

Now we have created a dataframe with boolean index after that user can access a dataframe with the help of boolean index. User can access a dataframe using three functions that is .loc[], .iloc[], .ix[]

Accessing a Dataframe with a boolean index using .loc[]

In order to access a dataframe with a boolean index using .loc[], we simply pass a boolean value (True or False) in a .loc[] function.

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing pandas as pd
import pandas as pd
   
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
  
# creating a dataframe with boolean index 
df = pd.DataFrame(dict, index = [True, False, True, False])
  
# accessing a dataframe using .loc[] function 
print(df.loc[True])

chevron_right


Output:

Accessing a Dataframe with a boolean index using .iloc[]

In order to access a dataframe using .iloc[], we have to pass a boolean value (True or False) in a iloc[] function but iloc[] function accept only integer as argument so it will throw an error so we can only access a dataframe when we pass a integer in iloc[] function
Code #1:

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing pandas as pd
import pandas as pd
   
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
  
# creating a dataframe with boolean index  
df = pd.DataFrame(dict, index = [True, False, True, False])
  
# accessing a dataframe using .iloc[] function 
print(df.iloc[True])

chevron_right


Output:

TypeError 

Code #2:

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing pandas as pd
import pandas as pd
   
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
  
# creating a dataframe with boolean index  
df = pd.DataFrame(dict, index = [True, False, True, False])
   
  
# accessing a dataframe using .iloc[] function
print(df.iloc[1])

chevron_right


Output:

Accessing a Dataframe with a boolean index using .ix[]

In order to access a dataframe using .ix[], we have to pass boolean value (True or False) and integer value to .ix[] function because as we know that .ix[] function is a hybrid of .loc[] and .iloc[] function.
Code #1:

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing pandas as pd
import pandas as pd
   
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
  
# creating a dataframe with boolean index
df = pd.DataFrame(dict, index = [True, False, True, False])
   
  
# accessing a dataframe using .ix[] function
print(df.ix[True])

chevron_right


Output:

Code #2:

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing pandas as pd
import pandas as pd
   
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
  
# creating a dataframe with boolean index 
df = pd.DataFrame(dict, index = [True, False, True, False])
   
  
# accessing a dataframe using .ix[] function
print(df.ix[1])

chevron_right


Output:

 
Applying a boolean mask to a dataframe :
In a dataframe we can apply a boolean mask in order to do that we, can use __getitems__ or [] accessor. We can apply a boolean mask by giving list of True and False of the same length as contain in a dataframe. When we apply a boolean mask it will print only that dataframe in which we pass a boolean value True. To download “nba1.1” CSV file click here.

Code #1:

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing pandas as pd
import pandas as pd
   
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["MBA", "BCA", "M.Tech", "MBA"],
        'score':[90, 40, 80, 98]}
   
df = pd.DataFrame(dict, index = [0, 1, 2, 3])
   
  
  
print(df[[True, False, True, False]])

chevron_right


Output:

Code #2:

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing pandas package
import pandas as pd
   
# making data frame from csv file
data = pd.read_csv("nba1.1.csv")
   
df = pd.DataFrame(data, index = [0, 1, 2, 3, 4, 5, 6,
                                 7, 8, 9, 10, 11, 12])
  
   
df[[True, False, True, False, True,
    False, True, False, True, False,
                True, False, True]]

chevron_right


Output:

 
Masking data based on column value :
In a dataframe we can filter a data based on a column value in order to filter data, we can apply certain condition on dataframe using different operator like ==, >, <, <=, >=. When we apply these operator on dataframe then it produce a Series of True and False. To download the "nba.csv" CSV, click here.

Code #1:

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing pandas as pd
import pandas as pd
   
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["BCA", "BCA", "M.Tech", "BCA"],
        'score':[90, 40, 80, 98]}
  
# creating a dataframe 
df = pd.DataFrame(dict)
   
# using a comparsion operator for filtering of data
print(df['degree'] == 'BCA')

chevron_right


Output:

Code #2:

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing pandas package
import pandas as pd
   
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
   
# using greater than operator for filtering of data
print(data['Age'] > 25)

chevron_right


Output:

 
Masking data based on index value :
In a dataframe we can filter a data based on a column value in order to filter data, we can create a mask based on the index values using different operator like ==, >, <, etc... . To download "nba1.1" CSV file click here.

Code #1:

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing pandas as pd
import pandas as pd
   
# dictionary of lists
dict = {'name':["aparna", "pankaj", "sudhir", "Geeku"],
        'degree': ["BCA", "BCA", "M.Tech", "BCA"],
        'score':[90, 40, 80, 98]}
   
  
df = pd.DataFrame(dict, index = [0, 1, 2, 3])
  
mask = df.index == 0
  
print(df[mask])

chevron_right


Output:

Code #2:

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing pandas package
import pandas as pd
   
# making data frame from csv file
data = pd.read_csv("nba1.1.csv")
  
# giving a index to a dataframe
df = pd.DataFrame(data, index = [0, 1, 2, 3, 4, 5, 6,
                                 7, 8, 9, 10, 11, 12])
  
# filtering data on index value
mask = df.index > 7 
  
df[mask]
  

chevron_right


Output:



My Personal Notes arrow_drop_up