Open In App

How to get a cartesian product of a huge Dataset using Pandas in Python?

Last Updated : 21 Apr, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss how to do a cartesian product of a huge Dataset. The function which we are using here to do cartesian product is the merge function which is the entry point for all standard database join operations between DataFrame objects.

Syntax:

data1 = pd.DataFrame({‘dataset_name_1’: [dataset_1]})

data2 = pd.DataFrame({‘dataset_name_2’: [dataset_2]})

data3 = pd.merge(data1.assign(key=1), data2.assign(key=1), on=’key’).drop(‘key’, axis=1)

Parameters:

  • dataset_name_1, dataset_name_2: Here, these names refer to the dataset names of which cartesian product has to be done.
  • dataset_1, dataset_2: Here, these terms refer to the complete dataset of which cartesian product has to be done.
  • data1: It refers to a data frame object.
  • data2: It refers to another data frame object.
  • on: The column names which have to be joined.

Stepwise Implementation:

Step 1: First of all, import the library Pandas.

import pandas as pd

Step 2: Then, obtain the datasets on which you want to perform a cartesian product.

data1 = pd.DataFrame({'column_name': [dataset_1]})
data2 = pd.DataFrame({'column_name': [dataset_2]})

Step 3: Further, use a merge function to perform the cartesian product on the datasets obtained.

data3 = pd.merge(data1.assign(key=1), data2.assign(key=1),
                 on='key').drop('key', axis=1)

Step 4: Finally, print the cartesian product obtained.

print(data3)

Example:

Python3




# Python program to get Cartesian
# product of huge dataset
  
# Import the library Pandas
import pandas as pd
  
# Obtaining the dataset 1
data1 = pd.DataFrame({'P': [1,3,5]})
  
# Obtaining the dataset 2
data2 = pd.DataFrame({'Q': [2,4,6]})
  
# Doing cartesian product of datasets 1 and 2 
data3 = pd.merge(data1.assign(key=1), data2.assign(key=1), 
                 on='key').drop('key', axis=1)
  
# Printing the cartesian product of both datasets
print(data3)


Output:

   P  Q
0  1  2
1  1  4
2  1  6
3  3  2
4  3  4
5  3  6
6  5  2
7  5  4
8  5  6

Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads