【问题标题】:Delete one of all duplicates that appear upto 2 times in MySQL Table column删除在 MySQL 表列中出现最多 2 次的所有重复项之一
【发布时间】:2017-05-29 22:47:49
【问题描述】:

我有一个大约 10,000 条记录的表,其中标题中有重复项,有些重复超过 5 次。

样本数据

id| titleslug | views
--------------------
1 |the-box|  200
2 |the-box|  100
3 |the-box|   10
4 |the-man|   15
5 |the-man|   30
6 |the-cup|   10
7 |the-cup|   20

该框出现了 3 次,所以我想留下它,但 'the-man' 和 'the-cup' 出现了 2x 我想删除它们中的每一个,以便决赛桌变成

id| titleslug | views
--------------------
1 |the-box|  200
2 |the-box|  100
3 |the-box|   10
5 |the-man|   30
7 |the-cup|   20

如果可能的话,我想将被删除的观看次数添加到保留的最高观看次数中。

通过下面的查询,我能够知道项目重复的次数。

select titleslug, count(*) as c from articles
group by titleslug having c > 1
order by c desc

我想删除仅重复两次的记录之一,并保留其余记录。 我正在考虑这个查询如下

 DELETE a
    FROM articles as a, articles as b
    WHERE
     (a.titleslug = b.titleslug OR a.titleslug IS NULL AND b.titleslug IS NULL)
      AND a.views < b.views;

但我需要帮助来限制仅当我们有两个重复项时才删除一个。

我使用了下面的查询,它报告了受影响的行,但是在我查询之后似乎没有删除重复项

DELETE a
  FROM articles_copy a
  JOIN (SELECT MAX(t.Views) AS max_a1, t.TitleSlug
          FROM articles_copy t
      GROUP BY t.TitleSlug, t.Views
        HAVING COUNT(*)>1 AND COUNT(*)<=2) b ON b.TitleSlug = a.TitleSlug
                              AND b.max_a1 > a.View

【问题讨论】:

  • DELETE FROM table WHERE col1 IN (SELECT id FROM table GROUP BY id HAVING (COUNT(col1) > 1))
  • 尝试按titleslug分组。让我知道这是否有帮助。否则请提供一些示例数据,我将尝试根据该数据编写查询
  • @YashveerSingh 你能看看我的查询和建议为什么不工作
  • 当我尝试您的查询时:- 从articles_copy 中删除 TitleSlug IN ( SELECT TitleSlug FROMarticles_copy GROUP BY TitleSlug HAVING ( COUNT(TitleSlug) > 1 AND COUNT(TitleSlug)

标签: mysql duplicates


【解决方案1】:

一个选项可以是(评估性能问题):

mysql> DROP TABLE IF EXISTS `articles`;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE IF NOT EXISTS `articles` (
    ->   `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
    ->   `title` VARCHAR(25) NOT NULL,
    ->   `views` INT UNSIGNED
    -> );
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO `articles`
    ->   (`title`, `views`)
    -> VALUES
    ->   ('the-box', 200),
    ->   ('the-box', 100),
    ->   ('the-box', 10),
    ->   ('the-man', 15),
    ->   ('the-man', 30),
    ->   ('the-cup', 10),
    ->   ('the-cup', 20);
Query OK, 7 rows affected (0.00 sec)
Records: 7  Duplicates: 0  Warnings: 0

mysql> SELECT
    ->   `id`,
    ->   `title`,
    ->   `views`
    -> FROM
    ->   `articles`;
+----+---------+-------+
| id | title   | views |
+----+---------+-------+
|  1 | the-box |   200 |
|  2 | the-box |   100 |
|  3 | the-box |    10 |
|  4 | the-man |    15 |
|  5 | the-man |    30 |
|  6 | the-cup |    10 |
|  7 | the-cup |    20 |
+----+---------+-------+
7 rows in set (0.00 sec)

mysql> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

mysql> UPDATE `articles`
    ->   INNER JOIN (
    ->     SELECT MAX(`id`) `id`, SUM(`views`) `views`
    ->     FROM `articles`
    ->     GROUP BY `title`
    ->     HAVING COUNT(`title`) = 2
    ->   ) `der`
    -> SET `articles`.`views` = `der`.`views`
    -> WHERE `articles`.`id` = `der`.`id`;
Query OK, 2 rows affected (0.00 sec)
Rows matched: 2  Changed: 2  Warnings: 0

mysql> DELETE FROM `articles`
    -> WHERE `id` IN (SELECT MIN(`der`.`id`)
    ->                FROM (SELECT `id`, `title`
    ->                      FROM `articles`) `der`
    ->                GROUP BY `der`.`title`
    ->                HAVING COUNT(`der`.`title`) = 2);
Query OK, 2 rows affected (0.00 sec)

mysql> COMMIT;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT
    ->   `id`,
    ->   `title`,
    ->   `views`
    -> FROM
    ->   `articles`;
+----+---------+-------+
| id | title   | views |
+----+---------+-------+
|  1 | the-box |   200 |
|  2 | the-box |   100 |
|  3 | the-box |    10 |
|  5 | the-man |    45 |
|  7 | the-cup |    30 |
+----+---------+-------+
5 rows in set (0.00 sec)

【讨论】:

  • 这方面的帮助很大,你是明星
  • 您的更新查询需要更长的时间,所以我使用了 DELETE a FROMarticles_copy a JOIN (SELECT MAX(t.Views) AS max_a1, t.TitleSlug FROMarticles_copy t GROUP BY t.TitleSlug HAVING COUNT()>1 AND COUNT() a.View 在 12K 行上耗时 1.335 秒
猜你喜欢
  • 2019-06-13
  • 2021-05-25
  • 1970-01-01
  • 2021-01-28
  • 1970-01-01
  • 2016-01-02
  • 1970-01-01
  • 2013-03-16
  • 2019-04-03
相关资源
最近更新 更多