How to Find Duplicate Records that Meet Certain Conditions in SQL?
In this article, we will understand how to find Duplicate Records that meet certain conditions in SQL. Using the GROUP BY and HAVING clauses we can show the duplicates in table data. The GROUP BY statement in SQL is used to arrange identical data into groups with the help of some functions. i.e if a particular column has the same values in different rows then it will arrange these rows in a group.
For the purpose of demonstration, we will be creating a Participant table in a database called “GeeksForGeeksDatabase“.
Step 1: Creating the Database
Use the below SQL statement to create a database called GeeksForGeeksDatabase.
CREATE DATABASE GeeksForGeeksDatabase;
Step 2: Using the Database
Use the below SQL statement to switch the database context to GeeksForGeeksDatabase.
Step 3: Table Definition
CREATE TABLE Geeks( GeekID INTEGER PRIMARY KEY, GeekName VARCHAR(255) NOT NULL, GeekRank INTEGER NOT NULL, GeekSchool VARCHAR(255) NOT NULL );
INSERT INTO Geeks VALUES (101, 'Nix',2 ,'Code Valley School'); INSERT INTO Geeks VALUES (102, 'Rutz',4 ,'Blue Chip School'); INSERT INTO Geeks VALUES (103, 'Shrey',1 ,'GCOEA School'); INSERT INTO Geeks VALUES (104, 'Ankx',3 ,'Round Robin Play School'); INSERT INTO Geeks VALUES (105, 'Ridz',7 ,'Dream School'); INSERT INTO Geeks VALUES (106, 'Mayo',6 ,'Silver Shining School'); INSERT INTO Geeks VALUES (107, 'Bugs',5 ,'Twinkle Star Convent');
Step 4: To see the contents of the created table we use given below query:
SELECT * FROM Geeks;
The above the geeks table has normal records with no duplicate values.
Step 5: Let’s imagine a situation where a bug occurred in the geek’s table due to some reasons few duplicates are inserted. Ideally, each row should have a unique value for GeekRank but now in our table duplicate geeks with duplicate ranks are inserted.
INSERT INTO Geeks VALUES (108, 'Maria', 5 ,'Code Valley School');
In the above newly updated table, we could see there are 2 records with the same geek rank of 5 . GeekID 107 and GeekID 108 are having the same rank of 5. Now we need to find this duplication using SQL Query.
Using GROUP BY and HAVING clause :
One way to find duplicate records from the table is the GROUP BY statement. The GROUP BY statement in SQL is used to arrange identical data into groups with the help of some functions. i.e if a particular column has the same values in different rows then it will arrange these rows in a group.
Query to find the duplicates :
Query to find the duplicate records having the same GeekRank :
SELECT GeekRank, COUNT(GeekID) AS DuplicateRanks FROM Geeks GROUP BY GeekRank HAVING COUNT(GeekRank)>1;
Using the GROUP BY and HAVING clauses we can show the duplicates in table data. By ‘GROUP BY GeekRank‘ means, to place all the rows with the same value of only that particular column i.e GeekRank in one group. We use ‘HAVING COUNT(GeekRank)>1′ to select a column having more than 1 rank of geeks in output. The COUNT() function of SQL is used here to count the duplicate rows . Here we are naming our new column as ‘DuplicateRanks‘ which count duplicate ranks.
As we can see, the rows with duplicate GeekRank are grouped under the same GeekRank and their corresponding COUNT is the count of the GeekRank of duplicate rows. GeekID 107 and GeekID 108 are having the same rank 5. Thus in the above output, we could see GeekRank as 5 (because this 5th rank is found duplicated )and since two GeekIDs were having the same GeekRank 5 so DuplicateRank i.e count of duplicate record is 2. Once you found the duplicate rows, you may choose to remove those duplicate rows using the DELETE statement.
Please Login to comment...