SQL Query to Avoid Cartesian Product
SQL Server is a versatile database. It adheres to the principles of Relational Database Management and hence it is a popular RDBMS. As data is available in multiple tables, if SQL query is not written in an efficient manner, in major scenarios, a Cartesian product occurs. We will see that here.
In general Mathematics, a Cartesian product of two sets namely M and N is denoted with M × N. It is the set of all ordered pairs where m is in M and n is in N.
While coming to SQL, if table M is having 10 rows of data and if table N is having 20 rows of data, the Cartesian product table will have 10 * 20 = 200 rows of data. i.e. all the possible ordered pairs of the original set of elements are added. We can think of each and every primary key of the first table is mapped with a primary key of the second table. Usually, Cartesian product leads to redundant data, and that too for large datasets, it provides very huge redundant data. Hence we need to write efficient SQL queries
The Cartesian product involves a large number of computational operations that are usually redundant. Hence we need to avoid creating a Cartesian product
Process to Avoid Cartesian Product:
Step 1: Let us take a database named ‘geeks for geeks’ and in that let us take 2 tables namely ‘Fruits’ and ‘Sizes’ in that and let us have the above set of data.
USE GEEKSFORGEEKS CREATE TABLE [dbo].[FRUITS] ( [fruitName] VARCHAR (20) NULL ); INSERT INTO FRUITS VALUES('Apples'); INSERT INTO FRUITS VALUES('Mangoes'); SELECT * FROM FRUITS;
CREATE TABLE [dbo].[SIZES] ( [sizeName] VARCHAR (20) NULL ); INSERT INTO SIZES VALUES('Small'); INSERT INTO SIZES VALUES('Medium'); INSERT INTO SIZES VALUES('Big'); SELECT * FROM SIZES;
Step 2: Now if we just combine two tables to get both the columns from it, we may be writing query as:
SELECT A.*, B.* FROM FRUITS A,SIZES B;
Step 3: Now if it is with a large table, the resultant output will be very huge and most of the data are redundant. Hence we need to avoid that. It can be done by doing the ‘JOIN’ condition. Actually, now both tables are not having a common connecting column, hence we need to introduce a common connecting column.
Let us introduce ‘fruidId’ to have a unique value in the ‘Fruits’ table and the same ‘fruitId’ as the Foreign key in the ‘Sizes’ table. After applying the values:
ALTER TABLE FRUITS ADD FRUITID INT NULL; -- SET THE VALUE 1 TO APPLES AND 2 TO MANGOES SELECT * from FRUITS;
ALTER TABLE SIZES ADD FRUITID INT NULL; --LET US APPLY VALUES OF FRUITID TO 1 FOR SMALL, 2 FOR MEDIUM,1 FOR BIG SELECT * from SIZES;
Step 4: As we have a common connecting column, we can join both the tables easily with that connecting column:
SELECT A.fruitName, B.sizeName FROM FRUITS A,SIZES B WHERE A.FRUITID = B.FRUITID; --Here we need to join fruitid in both the tables --In this way, we can avoid cartesian product --As in Sizes table, fruitid is having 1 and 2 only, -- we need to get apples and mangoes only --that too apples will be having small and big sizes and -- mangoes will be having medium sizes. --We can achieve that with this query
- We need to write the queries. carefully.
- Any two tables having column name with the same data types will be used for joining.
- Especially always it is good to keep Primary Key and Foreign key relationship among tables. That will help to synchronize the results easily. In these scenarios, most Primary Key will be available in the first table with one column name and the same column name is used in the second table as a foreign key.
- Meaningful joins are good and always join and avoid the Cartesian products. Either by keeping Primary and Foreign key relationship it can be achieved or by keeping common column name we can create joins.
- Joining conditions need to be given in the ‘WHERE’ clause and in the case of ‘Group By’ operations, it should be applied in the ‘Having’ clause.
Cartesian product data is highly redundant especially when we work for large data sets and they can be corrected by means of valid joins. That too inner join (which is covered in the above example) will give exact output, that is matching values in both or more tables only are retrieved. Additionally, we have Left outer join and Right outer join but they are all needed to be used when required, otherwise, inner join is the efficient one to avoid the Cartesian product.
Please Login to comment...