【发布时间】:2013-12-12 12:34:43
【问题描述】:
我有一个充满标准化地址的客户数据库。有重复。
每个用户都创建了自己的记录,并输入了自己的地址。所以我们在用户和地址之间是一对一的关系:
CREATE TABLE `users` (
`UserID` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`Name` VARCHAR(63),
`Email` VARCHAR(63),
`AddressID` INT UNSIGNED,
PRIMARY KEY (`UserID`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `addresses` (
`AddressID` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`Duplicate` VARCHAR(1),
`Address1` VARCHAR(63) DEFAULT NULL,
`Address2` VARCHAR(63) DEFAULT NULL,
`City` VARCHAR(63) DEFAULT NULL,
`State` VARCHAR(2) DEFAULT NULL,
`ZIP` VARCHAR(10) DEFAULT NULL,
PRIMARY KEY (`AddressID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
还有数据:
INSERT INTO `users` VALUES
(1, 'Michael', 'michael@email.com', 1),
(2, 'Steve', 'steve@email.com', 2),
(3, 'Judy', 'judy@email.com', 3),
(4, 'Kathy', 'kathy@email.com', 4),
(5, 'Mark', 'mark@email.com', 5),
(6, 'Robert', 'robert@email.com', 6),
(7, 'Susan', 'susan@email.com', 7),
(8, 'Paul', 'paul@email.com', 8),
(9, 'Patrick', 'patrick@email.com', 9),
(10, 'Mary', 'mary@email.com', 10),
(11, 'James', 'james@email.com', 11),
(12, 'Barbara', 'barbara@email.com', 12),
(13, 'Peter', 'peter@email.com', 13);
INSERT INTO `addresses` VALUES
(1, '', '1234 Main Street', '', 'Springfield', 'KS', '54321'),
(2, 'Y', '1234 Main Street', '', 'Springfield', 'KS', '54321'),
(3, 'Y', '1234 Main Street', '', 'Springfield', 'KS', '54321'),
(4, '', '5678 Sycamore Lane', '', 'Upstate', 'NY', '50000'),
(5, '', '1000 State Street', 'Apt C', 'Sunnydale', 'OH', '54321'),
(6, 'Y', '1234 Main Street', '', 'Springfield', 'KS', '54321'),
(7, 'Y', '1000 State Street', 'Apt C', 'Sunnydale', 'OH', '54321'),
(8, 'Y', '1234 Main Street', '', 'Springfield', 'KS', '54321'),
(9, '', '1000 State Street', 'Apt A', 'Sunnydale', 'OH', '54321'),
(10, 'Y', '1234 Main Street', '', 'Springfield', 'KS', '54321'),
(11, 'Y', '5678 Sycamore Lane', '', 'Upstate', 'NY', '50000'),
(12, 'Y', '1000 Main Street', 'Apt A', 'Sunnydale', 'OH', '54321'),
(13, '', '9999 Valleyview', '', 'Springfield', 'KS', '54321');
哦,是的,让我添加外键关系:
ALTER TABLE `users` ADD CONSTRAINT `AddressID`
FOREIGN KEY `AddressID` (`AddressID`)
REFERENCES `addresses` (`AddressID`);
我们的地址列表由第三方服务清理,该服务对数据进行规范化并指出我们在哪里有重复。这就是Duplicate 列的来源。如果有一个“Y”,它是另一个地址的副本。如示例数据所示,主地址未标记为重复。
我显然想删除所有重复的记录,但是有指向它们的用户记录。我需要他们指出不重复的地址版本。
那么如何更新users 中的AddressID 以匹配不重复的地址?
我能想到的唯一方法是使用高级语言遍历所有数据,但我相当确定 MySQL 拥有以更好的方式执行此类操作所需的所有工具。
这是我尝试过的:
SELECT COUNT(*) as cnt, GROUP_CONCAT(AddressID ORDER BY AddressID) AS ids
FROM addresses
GROUP BY Address1, Address2, City, State, ZIP
HAVING cnt > 1;
+-----+--------------+
| cnt | ids |
+-----+--------------+
| 2 | 5,7 |
| 6 | 1,2,3,6,8,10 |
| 2 | 4,11 |
+-----+--------------+
3 rows in set (0.00 sec)
从那里,我可以遍历每个结果行并执行以下操作:
UPDATE `users` SET `AddressID` = 1 WHERE `AddressID` IN (2,3,6,8,10);
但必须有更好的 MySQL 专用方法,不是吗?
当一切都说完了,数据应该是这样的:
SELECT * FROM `users`;
+--------+---------+-------------------+-----------+
| UserID | Name | Email | AddressID |
+--------+---------+-------------------+-----------+
| 1 | Michael | michael@email.com | 1 |
| 2 | Steve | steve@email.com | 1 |
| 3 | Judy | judy@email.com | 1 |
| 4 | Kathy | kathy@email.com | 4 |
| 5 | Mark | mark@email.com | 5 |
| 6 | Robert | robert@email.com | 1 |
| 7 | Susan | susan@email.com | 5 |
| 8 | Paul | paul@email.com | 1 |
| 9 | Patrick | patrick@email.com | 9 |
| 10 | Mary | mary@email.com | 1 |
| 11 | James | james@email.com | 4 |
| 12 | Barbara | barbara@email.com | 1 |
| 13 | Peter | peter@email.com | 13 |
+--------+---------+-------------------+-----------+
13 rows in set (0.00 sec)
SELECT * FROM `addresses`;
+-----------+-----------+--------------------+----------+-------------+-------+-------+
| AddressID | Duplicate | Address1 | Address2 | City | State | ZIP |
+-----------+-----------+--------------------+----------+-------------+-------+-------+
| 1 | | 1234 Main Street | | Springfield | KS | 54321 |
| 4 | | 5678 Sycamore Lane | | Upstate | NY | 50000 |
| 5 | | 1000 State Street | Apt C | Sunnydale | OH | 54321 |
| 9 | | 1000 State Street | Apt A | Sunnydale | OH | 54321 |
| 13 | | 9999 Valleyview | | Springfield | KS | 54321 |
+-----------+-----------+--------------------+----------+-------------+-------+-------+
5 rows in set (0.00 sec)
帮助?
【问题讨论】:
标签: php mysql sql normalization duplicate-removal