Let’s see how to get all rows in a Pandas DataFrame containing given substring with the help of different examples.
Code #1: Check the values PG in column Position
# importing pandas import pandas as pd
# Creating the dataframe with dict of lists df = pd.DataFrame({ 'Name' : [ 'Geeks' , 'Peter' , 'James' , 'Jack' , 'Lisa' ],
'Team' : [ 'Boston' , 'Boston' , 'Boston' , 'Chele' , 'Barse' ],
'Position' : [ 'PG' , 'PG' , 'UG' , 'PG' , 'UG' ],
'Number' : [ 3 , 4 , 7 , 11 , 5 ],
'Age' : [ 33 , 25 , 34 , 35 , 28 ],
'Height' : [ '6-2' , '6-4' , '5-9' , '6-1' , '5-8' ],
'Weight' : [ 89 , 79 , 113 , 78 , 84 ],
'College' : [ 'MIT' , 'MIT' , 'MIT' , 'Stanford' , 'Stanford' ],
'Salary' : [ 99999 , 99994 , 89999 , 78889 , 87779 ]},
index = [ 'ind1' , 'ind2' , 'ind3' , 'ind4' , 'ind5' ])
print (df, "\n" )
print ( "Check PG values in Position column:\n" )
df1 = df[ 'Position' ]. str .contains( "PG" )
print (df1)
|
Output:
But this result doesn’t seem very helpful, as it returns the bool values with the index. Let’s see if we can do something better.
Code #2: Getting the rows satisfying condition
# importing pandas as pd import pandas as pd
# Creating the dataframe with dict of lists df = pd.DataFrame({ 'Name' : [ 'Geeks' , 'Peter' , 'James' , 'Jack' , 'Lisa' ],
'Team' : [ 'Boston' , 'Boston' , 'Boston' , 'Chele' , 'Barse' ],
'Position' : [ 'PG' , 'PG' , 'UG' , 'PG' , 'UG' ],
'Number' : [ 3 , 4 , 7 , 11 , 5 ],
'Age' : [ 33 , 25 , 34 , 35 , 28 ],
'Height' : [ '6-2' , '6-4' , '5-9' , '6-1' , '5-8' ],
'Weight' : [ 89 , 79 , 113 , 78 , 84 ],
'College' : [ 'MIT' , 'MIT' , 'MIT' , 'Stanford' , 'Stanford' ],
'Salary' : [ 99999 , 99994 , 89999 , 78889 , 87779 ]},
index = [ 'ind1' , 'ind2' , 'ind3' , 'ind4' , 'ind5' ])
df1 = df[df[ 'Position' ]. str .contains( "PG" )]
print (df1)
|
Output:
Code #3: Filter all rows where either Team contains ‘Boston’ or College contains ‘MIT’.
# importing pandas import pandas as pd
# Creating the dataframe with dict of lists df = pd.DataFrame({ 'Name' : [ 'Geeks' , 'Peter' , 'James' , 'Jack' , 'Lisa' ],
'Team' : [ 'Boston' , 'Boston' , 'Boston' , 'Chele' , 'Barse' ],
'Position' : [ 'PG' , 'PG' , 'UG' , 'PG' , 'UG' ],
'Number' : [ 3 , 4 , 7 , 11 , 5 ],
'Age' : [ 33 , 25 , 34 , 35 , 28 ],
'Height' : [ '6-2' , '6-4' , '5-9' , '6-1' , '5-8' ],
'Weight' : [ 89 , 79 , 113 , 78 , 84 ],
'College' : [ 'MIT' , 'MIT' , 'MIT' , 'Stanford' , 'Stanford' ],
'Salary' : [ 99999 , 99994 , 89999 , 78889 , 87779 ]},
index = [ 'ind1' , 'ind2' , 'ind3' , 'ind4' , 'ind5' ])
df1 = df[df[ 'Team' ]. str .contains( "Boston" ) | df[ 'College' ]. str .contains( 'MIT' )]
print (df1)
|
Output:
Code #4: Filter rows checking Team name contains ‘Boston and Position must be PG.
# importing pandas module import pandas as pd
# making data frame df1 = df[df[ 'Team' ]. str .contains( 'Boston' ) & df[ 'Position' ]. str .contains( 'PG' )]
df1 |
Output:
Code #5: Filter rows checking Position contains PG and College must contains like UC.
# importing pandas module import pandas as pd
# making data frame df1 = df[df[ 'Position' ]. str .contains( "PG" ) & df[ 'College' ]. str .contains( 'UC' )]
df1 |
Output: