How to LEFT ANTI join under some matching condition in Pandas
LEFT ANTI Join is the opposite of semi-join. excluding the intersection, it returns the left table. It only returns the columns from the left table and not the right.
Method 1: Using isin()
On the created dataframes we perform left join and subset using isin() function to check if the part on which the datasets are merged is in the subset of the merged dataset.
Syntax:
DataFrame.isin(values)
Parameters:
- values: iterable, Series, DataFrame or dict
Returns:
DataFrame
Example:
In the below code, we used the indicator to find the rows which are ‘Left_only’ and subset the merged dataset, and assign it to df. finally, we retrieve the part which is only in our first data frame df1. the output is antijoin of the two data frames.
Python3
import pandas as pd
df1 = pd.DataFrame({
"city" : [ "new york" , "chicago" , "orlando" , 'mumbai' ],
"temperature" : [ 21 , 14 , 35 , 30 ],
"humidity" : [ 65 , 68 , 75 , 75 ],
})
df2 = pd.DataFrame({
"city" : [ "chicago" , "new york" , "orlando" ],
"humidity" : [ 67 , 60 , 70 ]
})
df3 = df1.merge(df2, on = 'city' , how = 'left' , indicator = True )
df = df3.loc[df3[ '_merge' ] = = 'left_only' , 'city' ]
d = df1[df1[ 'city' ].isin(df)]
print (d)
|
Output:
city temperature humidity
3 mumbai 30 75
Method 2: Using semi join
We can use the ‘~’ operator on the semi-join. It results in anti-join.
Semi-join: Similar to inner join, semi-join returns the intersection but it only returns the columns from the left table and not the right. it has no duplicate values.
Syntax:
[~df1[‘column_name’].isin(df2[‘column_name’])]
where,
- df1 is the first dataframe
- df2 is the second dataframe
- column_name is the matching column in both the dataframes
Example:
In this example, we merge df1 and df2 on ‘city’ by default it is ‘inner join’, after merging, We exclude the part of df1 which is in df3 and print out the resultant dataframe.
Python3
import pandas as pd
df1 = pd.DataFrame({
"city" : [ "new york" , "chicago" , "orlando" , 'mumbai' ],
"temperature" : [ 21 , 14 , 35 , 30 ],
"humidity" : [ 65 , 68 , 75 , 75 ],
})
df2 = pd.DataFrame({
"city" : [ "chicago" , "new york" , "orlando" ],
"humidity" : [ 67 , 60 , 70 ]
})
df3 = df1.merge(df2, on = 'city' )
df = df1[~df1[ 'city' ].isin(df3[ 'city' ])]
print (df)
|
Output:
city temperature humidity
3 mumbai 30 75
Last Updated :
19 Dec, 2021
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...