How to get a cartesian product of a huge Dataset using Pandas in Python?
Last Updated :
21 Apr, 2022
In this article, we will discuss how to do a cartesian product of a huge Dataset. The function which we are using here to do cartesian product is the merge function which is the entry point for all standard database join operations between DataFrame objects.
Syntax:
data1 = pd.DataFrame({‘dataset_name_1’: [dataset_1]})
data2 = pd.DataFrame({‘dataset_name_2’: [dataset_2]})
data3 = pd.merge(data1.assign(key=1), data2.assign(key=1), on=’key’).drop(‘key’, axis=1)
Parameters:
- dataset_name_1, dataset_name_2: Here, these names refer to the dataset names of which cartesian product has to be done.
- dataset_1, dataset_2: Here, these terms refer to the complete dataset of which cartesian product has to be done.
- data1: It refers to a data frame object.
- data2: It refers to another data frame object.
- on: The column names which have to be joined.
Stepwise Implementation:
Step 1: First of all, import the library Pandas.
import pandas as pd
Step 2: Then, obtain the datasets on which you want to perform a cartesian product.
data1 = pd.DataFrame({'column_name': [dataset_1]})
data2 = pd.DataFrame({'column_name': [dataset_2]})
Step 3: Further, use a merge function to perform the cartesian product on the datasets obtained.
data3 = pd.merge(data1.assign(key=1), data2.assign(key=1),
on='key').drop('key', axis=1)
Step 4: Finally, print the cartesian product obtained.
print(data3)
Example:
Python3
import pandas as pd
data1 = pd.DataFrame({ 'P' : [ 1 , 3 , 5 ]})
data2 = pd.DataFrame({ 'Q' : [ 2 , 4 , 6 ]})
data3 = pd.merge(data1.assign(key = 1 ), data2.assign(key = 1 ),
on = 'key' ).drop( 'key' , axis = 1 )
print (data3)
|
Output:
P Q
0 1 2
1 1 4
2 1 6
3 3 2
4 3 4
5 3 6
6 5 2
7 5 4
8 5 6
Share your thoughts in the comments
Please Login to comment...