current position: Chen Weiliang Blog » MySQL » How does MySQL query handle duplicate data?Filter to remove duplicate data statement

How does MySQL query handle duplicate data?Filter to remove duplicate data statement

Updated on: January 2017, 11

Article directory

1 MySQL handles duplicate data
2 Prevent duplicate data in tables
3 Statistics duplicate data
4 filter duplicate data
5 deduplicate data

MySQLHow do queries handle duplicate data?Filter to remove duplicate data statement

MySQL Handling duplicate data

There may be duplicate records in some MySQL data tables. In some cases, we allow the existence of duplicate data, but sometimes we also need to delete these duplicate data.

In this chapter, we will introduce how to prevent duplicate data in the data table and how to delete the duplicate data in the data table.

Prevent duplicate data in tables

You can set the specified field in the MySQL data table as PRIMARY KEY Or UNIQUE (unique) Index to ensure the uniqueness of data.

Let's try an example: There are no indexes and primary keys in the table below, so the table allows multiple duplicate records.

CREATE TABLE person_tbl
(
    first_name CHAR(20),
    last_name CHAR(20),
    sex CHAR(10)
);

If you want to set the fields first_name and last_name in the table, the data cannot be repeated, you can set the double primary key mode to set the uniqueness of the data. If you set the double primary key, the default value of that key cannot be NULL, but can be set to NOT NULL.As follows:

CREATE TABLE person_tbl
(
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   PRIMARY KEY (last_name, first_name)
);

If we set a unique index, then when inserting duplicate data, the SQL statement will fail to execute successfully and throw an error.

The difference between INSERT IGNORE INTO and INSERT INTO is that INSERT IGNORE ignores the data that already exists in the database. If there is no data in the database, it will insert new data, and if there is data, it will skip this data.In this way, the existing data in the database can be preserved, and the purpose of inserting data in the gap can be achieved.

The following example uses INSERT IGNORE INTO, which executes without error and without inserting duplicate data into the data table:

mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
    -> VALUES( 'Jay', 'Thomas');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
    -> VALUES( 'Jay', 'Thomas');
Query OK, 0 rows affected (0.00 sec)

INSERT IGNORE INTO When inserting data, after the uniqueness of the record is set, if duplicate data is inserted, no error will be returned, but only a warning will be returned.And REPLACE INTO into If there is a primary or unique record, it will be deleted first.Insert a new record.

Another way to set the uniqueness of your data is to add a UNIQUE index like this:

CREATE TABLE person_tbl
(
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10)
   UNIQUE (last_name, first_name)
);

Statistics duplicate data

Below we will count the number of duplicate records of first_name and last_name in the table:

mysql> SELECT COUNT(*) as repetitions, last_name, first_name
    -> FROM person_tbl
    -> GROUP BY last_name, first_name
    -> HAVING repetitions > 1;

The above query statement will return the number of duplicate records in the person_tbl table.In general, to query for duplicate values, do the following:

Determine which column contains possible duplicate values.
Use COUNT(*) in the column select list to list those columns.
Columns listed in the GROUP BY clause.
The HAVING clause sets the number of repetitions greater than 1.

filter duplicate data

If you need to read unique data, you can use the DISTINCT keyword in the SELECT statement to filter duplicate data.

mysql> SELECT DISTINCT last_name, first_name
    -> FROM person_tbl;

You can also use GROUP BY to read unique data from a table:

mysql> SELECT last_name, first_name
    -> FROM person_tbl
    -> GROUP BY (last_name, first_name);

deduplicate data

If you want to delete duplicate data in the data table, you can use the following SQL statement:

mysql> CREATE TABLE tmp SELECT last_name, first_name, sex
    ->                  FROM person_tbl;
    ->                  GROUP BY (last_name, first_name, sex);
mysql> DROP TABLE person_tbl;
mysql> ALTER TABLE tmp RENAME TO person_tbl;

Of course, you can also add INDEX (index) and PRIMAY KEY (primary key) in the data table to delete duplicate records in the table.Methods as below:

mysql> ALTER IGNORE TABLE person_tbl
    -> ADD PRIMARY KEY (last_name, first_name);

Hope Chen Weiliang Blog ( https://www.chenweiliang.com/ ) shared "How does MySQL query and handle duplicate data?Filtering and Deduplication Statements" will help you.

Welcome to share the link of this article:https://www.chenweiliang.com/cwl-499.html

Welcome to the Telegram channel of Chen Weiliang's blog to get the latest updates!

Click here to join the Telegram channel now

🔔 Be the first to get the valuable "ChatGPT Content Marketing AI Tool Usage Guide" in the channel top directory! 🌟
📚 This guide contains huge value, 🌟This is a rare opportunity, don’t miss it! ⏰⌛💨
Share and like if you like!
Your sharing and likes are our continuous motivation!

Previous:How does MySQL create sequences? MySQL database creation auto-increment sequence statement

Next: Case Study of New Traffic Theory: How did the little fairy take advantage of Jay Chou's concert to increase Weibo fans by 45?