How to Remove Duplicate Records Except a Single Record in MySQL?
Last Updated :
09 Jul, 2024
In MySQL, remove duplicate records while preserving one representative entry using a DELETE statement with a self-join. Match records based on criteria like identical values in a column and delete duplicates based on conditions, such as retaining the record with the lowest ID, ensuring data cleanliness, and reducing redundancy in the database.
Managing data efficiently is crucial in any database system, and dealing with duplicate records is a common challenge. In this article, we will explore a practical approach to remove duplicate records except a single record in MySQL. This process is essential for maintaining data integrity and ensuring that your database contains unique and relevant information.
Eliminating Duplicate Records Except One in MySQL
When working with databases, it's not uncommon to encounter situations where duplicate records need to be cleaned up. The process involves identifying redundant data based on specific criteria and keeping only one instance of each unique record. In MySQL, this can be achieved using a combination of the DELETE statement and self-joins.
Syntax:
The main concept revolves around using a self-join to compare records within the same table and then selectively delete duplicate entries. Here's the basic syntax:
DELETE t1 FROM table_name t1
JOIN table_name t2 ON t1.column_name = t2.column_name AND t1.primary_key > t2.primary_key;
- DELETE: Specifies the action to be taken, which is the removal of records.
- table_name: The name of the table containing duplicate records.
- t1 and t2: Aliases for the same table, enabling a self-join.
- column_name: The column used to identify duplicate records.
- primary_key: The primary key column of the table, ensuring we keep one instance.
Example: Eliminating Duplicate Records Keeping Lowest ID in MySQL
Duplicate records in a MySQL table can lead to data inconsistencies. This example demonstrates how to remove duplicate records from the 'employee' table, preserving only the records with the lowest ID for each unique name.
Consider a table named employee with columns id, name, and salary. We'll insert some sample data, including duplicate records.
-- Table creation and data insertion
CREATE TABLE employee (
id INT PRIMARY KEY,
name VARCHAR(50),
salary INT
);
INSERT INTO employee (id, name, salary) VALUES
(1, 'John', 50000),
(2, 'Alice', 60000),
(3, 'Bob', 55000),
(4, 'Alice', 60000),
(5, 'Charlie', 70000),
(6, 'John', 50000);
Now, let's use the DELETE statement to remove duplicate records, keeping only the one with the lowest id.
-- Remove duplicate records, keeping the one with the lowest id
DELETE e1 FROM employee e1
JOIN employee e2 ON e1.name = e2.name AND e1.id > e2.id;
Output:
Before deleting the duplicate record:
Before deleting the duplicate record.This shows the initial state of the employee table with duplicate records. Notably, there are entries for 'Alice' and 'John' with the same salary.
AFTER deleting the duplicate record:
After deleting the duplicate record.Explanation:
This depicts the result after executing the DELETE statement. Duplicate records, such as the second 'Alice' and the second 'John,' have been removed, leaving only one instance of each unique record based on the lowest id.
To put it simply, becoming adept at the task of removing duplicate records, leaving only one unique entry in MySQL, is a valuable skill in keeping databases organized and efficient. The combination of self-joins with the DELETE statement allows you to confidently get rid of redundancy, promoting data accuracy and ensuring optimal performance for your database. Success in this process hinges on being adaptable to different table structures and carefully considering the criteria for removal. Ultimately, a well-maintained database forms the bedrock for robust applications, making these techniques indispensable for anyone engaged in database management or development.
Conclusion
The DELETE statement in MySQL can be used with a self-joined query to get rid of duplicate records while keeping a single representative record. The query checks records for matches in a column and deletes duplicate records based on conditions such as keeping the record with the least ID. This helps to keep data clean, reduce redundancy, and improve database performance, especially when dealing with repetitive entries.
FAQs on How to Remove Duplicate Records Except a Single Record in MySQL
What is the purpose of removing duplicate records in MySQL?
Removing duplicate records helps maintain data integrity, reduce redundancy, and improve database performance by ensuring that each piece of information is stored only once.
Can I use other criteria besides the primary key to determine which duplicate to keep?
Yes, you can modify the self-join condition to use other criteria, such as keeping the record with the latest timestamp or the highest value in a specific column.
Is it possible to automate the process of removing duplicate records in MySQL?
Yes, you can create a stored procedure to automate the process of identifying and removing duplicate records based on your specific criteria.
Get IBM Certification and a 90% fee refund on completing 90% course in 90 days! Take the Three 90 Challenge today.
Master Data Analysis using Excel, SQL, Python & PowerBI with this complete program and also get a 90% refund. What more motivation do you need? Start the challenge right away!
Similar Reads
How to Remove Duplicate Records Except a Single Record in MySQL?
In MySQL, remove duplicate records while preserving one representative entry using a DELETE statement with a self-join. Match records based on criteria like identical values in a column and delete duplicates based on conditions, such as retaining the record with the lowest ID, ensuring data cleanlin
4 min read
How to Remove All Duplicate Rows Except One in MariaDB?
Duplicate rows in database tables can lead to complexity and data integrity issues and affect performance. Removing all duplicate rows while keeping one instance of each unique row is important for maintaining a clean database. In this article, we will explore various methods with the help of exampl
5 min read
How to Remove All Duplicate Rows Except One in SQLite?
SQLite is a lightweight and open-source relational database management system (RDBMS). SQLite does not require any server to process since it is a serverless architecture that can run operations and queries without any server. In this article, we will understand how to remove duplicate rows except o
5 min read
How to Remove All Duplicate Rows Except One in PostgreSQL?
When working with databases, we often have duplicate rows in the table. These duplicates can arise due to various reasons such as erroneous data entry or data migration processes. Removing duplicates while retaining one instance of each is a frequent requirement in database maintenance tasks. So to
5 min read
How to Find Duplicate Records in SQL?
To find duplicate records in SQL, we can use the GROUP BY and HAVING clauses. The GROUP BY clause allows us to group values in a column, and the COUNT function in the HAVING clause shows the count of the values in a group. Using the HAVING clause with a condition of COUNT(*) > 1, we can identify
3 min read
How to Find and Remove Duplicates in Excel
Removing duplicates in Excel is essential when cleaning up data to ensure accuracy and avoid redundancy. Whether you’re working with small datasets or large spreadsheets, Excel provides built-in tools and methods to help you identify and remove duplicates effectively. This guide will walk you throug
9 min read
How to Remove Duplicates in LibreOffice?
Removing duplicates in LibreOffice is a useful way to clean up your data and make sure you have accurate and unique information in your spreadsheets. Duplicates can cause confusion and errors in data analysis and reporting, so it's important to know how to remove them efficiently. LibreOffice Calc,
5 min read
How to Remove Duplicates From Array Using VBA in Excel?
Excel VBA code to remove duplicates from a given range of cells. In the below data set we have given a list of 15 numbers in “Column A” range A1:A15. Need to remove duplicates and place unique numbers in column B. Sample Data: Cells A1:A15 Final Output: VBA Code to remove duplicates and place into n
2 min read
How to Fetch Duplicate Rows in a Table?
Identifying duplicate rows in a database table is a common requirement, especially when dealing with large datasets. Duplicates can arise due to data entry errors, system migrations, or batch processing issues. In this article, we will explain efficient SQL techniques to identify and retrieve duplic
3 min read
How to Remove Duplicates in Google Sheets
Google Sheets as a part of Google Workspace, is one of the popular cloud-based spreadsheet applications widely used for data management and analysis. It allows users to create and edit data on spreadsheets and enables us to share spreadsheets online which can be accessible from any device with inter
5 min read
How to Delete Duplicate Rows in MySQL?
Duplicate rows are a common problem in MySQL databases. Duplicate rows can cause problems with data accuracy and integrity. They can also make it difficult to query and analyze data. They can occur for a variety of reasons, such as: Data entry errorsData import/export errorsDatabase synchronization
3 min read
How to Remove duplicate elements from array in JavaScript ?
In this article, we will discuss the methods to remove duplicate elements from a Javascript array. There are various methods to remove duplicates in the array. These are the following ways: Table of Content Using filter() MethodUsing set() MethodUsing forEach() MethodUsing reduce() MethodUsing index
4 min read
How to Find Duplicate Rows in PL/SQL
Finding duplicate rows is a widespread requirement when dealing with database analysis tasks. Duplicate rows often create problems in analyzing tasks. Detecting them is very important. PL/SQL is a procedural extension for SQL. We can write custom scripts with the help of PL/SQL and thus identifying
5 min read
How to Erase Duplicates and Sort a Vector in C++?
In this article, we will learn how to remove duplicates and sort a vector in C++. The simplest method to remove the duplicates and sort the vector is by using sort() and unique() functions. Let’s take a look at an example: [GFGTABS] C++ #include <bits/stdc++.h> using namespace std; int main()
3 min read
How to Remove Duplicate Elements from an Array using Lodash ?
Removing duplicate elements from an array is necessary for data integrity and efficient processing. The approaches implemented and explained below will use the Lodash to remove duplicate elements from an array. Table of Content Using uniq methodUsing groupBy and map methodsUsing xor functionUsing un
3 min read
How to Remove Duplicate Elements from Array in Ruby?
This article focuses on discussing how to remove duplicate elements from an array in Ruby. In Ruby, an array is a fundamental data structure that stores a collection of elements in a specific order. Various methods and approaches can be used to remove duplicate elements from an array. Removing Dupli
2 min read
How To Remove Duplicates From Vector In R
A vector is a basic data structure that is used to represent an ordered collection of elements of the same data type. It is one-dimensional and can contain numeric, character, or logical values. It is to be noted that the vector in C++ and the vector in R Programming Language are not the same. In C+
4 min read
How to Delete Duplicate Records in Oracle Database?
Duplicate records in a database can lead to inefficiencies and incorrect query results. Oracle SQL provides several efficient methods for identifying and removing these duplicates, ensuring data accuracy and integrity. This article explains step-by-step how to remove duplicates using the ROWID, a un
3 min read
How to Delete Duplicate Rows in PL/SQL?
Inconsistencies and inefficiencies in data management are frequently caused by duplicate rows in a database table. Eliminating duplicate rows is a typical PL/SQL activity to maintain data integrity and improve database performance. This article will guide you on how to remove duplicated rows in PL/S
4 min read