【发布时间】:2010-10-06 11:00:08
【问题描述】:
我正在 Oracle 中测试某些东西并用一些示例数据填充了一个表,但在此过程中我不小心加载了重复记录,所以现在我无法使用某些列创建主键。
如何删除所有重复的行并只保留其中一个?
【问题讨论】:
标签: sql oracle duplicates delete-row
我正在 Oracle 中测试某些东西并用一些示例数据填充了一个表,但在此过程中我不小心加载了重复记录,所以现在我无法使用某些列创建主键。
如何删除所有重复的行并只保留其中一个?
【问题讨论】:
标签: sql oracle duplicates delete-row
使用rowid 伪列。
DELETE FROM your_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM your_table
GROUP BY column1, column2, column3);
其中column1、column2 和column3 构成每条记录的标识键。您可以列出所有列。
【讨论】:
来自Ask Tom
delete from t
where rowid IN ( select rid
from (select rowid rid,
row_number() over (partition by
companyid, agentid, class , status, terminationdate
order by rowid) rn
from t)
where rn <> 1);
(修正了缺少的括号)
【讨论】:
来自DevX.com:
DELETE FROM our_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM our_table
GROUP BY column1, column2, column3...) ;
其中 column1、column2 等是您要使用的键。
【讨论】:
DELETE FROM tablename a
WHERE a.ROWID > ANY (SELECT b.ROWID
FROM tablename b
WHERE a.fieldname = b.fieldname
AND a.fieldname2 = b.fieldname2)
【讨论】:
delete from emp
where rowid not in
(select max(rowid) from emp group by empno);
delete from emp where rowid in
(
select rid from
(
select rowid rid,
row_number() over(partition by empno order by empno) rn
from emp
)
where rn > 1
);
delete from emp e1
where rowid not in
(select max(rowid) from emp e2
where e1.empno = e2.empno );
【讨论】:
create table t2 as select distinct * from t1;
【讨论】:
distinct * 将获取每列中至少有 1 个符号不同的记录。您只需要从您想要创建主键的列中选择不同的值 - 比尔的回答就是这种方法的一个很好的例子。
您应该使用游标 for 循环执行一个小的 pl/sql 块并删除您不想保留的行。例如:
declare
prev_var my_table.var1%TYPE;
begin
for t in (select var1 from my_table order by var 1) LOOP
-- if previous var equal current var, delete the row, else keep on going.
end loop;
end;
【讨论】:
要选择重复项,查询格式可以是:
SELECT GroupFunction(column1), GroupFunction(column2),...,
COUNT(column1), column1, column2...
FROM our_table
GROUP BY column1, column2, column3...
HAVING COUNT(column1) > 1
因此,根据其他建议,正确的查询是:
DELETE FROM tablename a
WHERE a.ROWID > ANY (SELECT b.ROWID
FROM tablename b
WHERE a.fieldname = b.fieldname
AND a.fieldname2 = b.fieldname2
AND ....so on.. to identify the duplicate rows....)
此查询将根据WHERE CLAUSE 中选择的条件在数据库中保留最旧的记录。
Oracle 认证助理(2008 年)
【讨论】:
create table abcd(id number(10),name varchar2(20))
insert into abcd values(1,'abc')
insert into abcd values(2,'pqr')
insert into abcd values(3,'xyz')
insert into abcd values(1,'abc')
insert into abcd values(2,'pqr')
insert into abcd values(3,'xyz')
select * from abcd
id Name
1 abc
2 pqr
3 xyz
1 abc
2 pqr
3 xyz
Delete Duplicate record but keep Distinct Record in table
DELETE
FROM abcd a
WHERE ROWID > (SELECT MIN(ROWID) FROM abcd b
WHERE b.id=a.id
);
run the above query 3 rows delete
select * from abcd
id Name
1 abc
2 pqr
3 xyz
【讨论】:
真正大桌子的最快方法
创建具有以下结构的异常表: 异常表
ROW_ID ROWID
OWNER VARCHAR2(30)
TABLE_NAME VARCHAR2(30)
CONSTRAINT VARCHAR2(30)
尝试创建一个唯一约束或主键,这将被重复项违反。您将收到一条错误消息,因为您有重复项。例外表将包含 重复行的 rowid。
alter table add constraint
unique --or primary key
(dupfield1,dupfield2) exceptions into exceptions_table;
通过 rowid 使用 exceptions_table 加入您的表并删除 dups
delete original_dups where rowid in (select ROW_ID from exceptions_table);
如果要删除的行数很大,则通过rowid创建一个新表(包含所有授权和索引)anti-joining exceptions_table并将原表重命名为original_dups表并将new_table_with_no_dups重命名为原表
create table new_table_with_no_dups AS (
select field1, field2 ........
from original_dups t1
where not exists ( select null from exceptions_table T2 where t1.rowid = t2.row_id )
)
【讨论】:
使用rowid-
delete from emp
where rowid not in
(select max(rowid) from emp group by empno);
使用自连接-
delete from emp e1
where rowid not in
(select max(rowid) from emp e2
where e1.empno = e2.empno );
【讨论】:
delete from emp where rowid in
(
select rid from
(
select rowid rid,
dense_rank() over(partition by empno order by rowid
) rn
from emp
)
where rn > 1
);
【讨论】:
1.解决方案
delete from emp
where rowid not in
(select max(rowid) from emp group by empno);
2。懒惰
delete from emp where rowid in
(
select rid from
(
select rowid rid,
row_number() over(partition by empno order by empno) rn
from emp
)
where rn > 1
);
3.解决方案
delete from emp e1
where rowid not in
(select max(rowid) from emp e2
where e1.empno = e2.empno );
4.解决方案
delete from emp where rowid in
(
select rid from
(
select rowid rid,
dense_rank() over(partition by empno order by rowid
) rn
from emp
)
where rn > 1
);
【讨论】:
5.解决方案
delete from emp where rowid in
(
select rid from
(
select rowid rid,rank() over (partition by emp_id order by rowid)rn from emp
)
where rn > 1
);
【讨论】:
DELETE from table_name where rowid not in (select min(rowid) FROM table_name group by column_name);
你也可以通过其他方式删除重复记录
DELETE from table_name a where rowid > (select min(rowid) FROM table_name b where a.column=b.column);
【讨论】:
This blog post 对于一般情况真的很有帮助:
如果行完全重复(所有列中的所有值都可以有副本),则没有可使用的列!但是要保留一个,您仍然需要为每个组中的每一行设置一个唯一标识符。 幸运的是,Oracle 已经有了一些您可以使用的东西。行号。 Oracle 中的所有行都有一个 rowid。这是一个物理定位器。也就是说,它说明了 Oracle 在磁盘上存储该行的位置。这对每一行都是独一无二的。因此,您可以使用此值来识别和删除副本。为此,请将不相关删除中的 min() 替换为 min(rowid):
delete films
where rowid not in (
select min(rowid)
from films
group by title, uk_release_date
)
【讨论】:
DELETE FROM tableName WHERE ROWID NOT IN (SELECT MIN (ROWID) FROM table GROUP BY columnname);
【讨论】:
delete from dept
where rowid in (
select rowid
from dept
minus
select max(rowid)
from dept
group by DEPTNO, DNAME, LOC
);
【讨论】:
为了获得最佳性能,这是我写的:
(见执行计划)
DELETE FROM your_table
WHERE rowid IN
(select t1.rowid from your_table t1
LEFT OUTER JOIN (
SELECT MIN(rowid) as rowid, column1,column2, column3
FROM your_table
GROUP BY column1, column2, column3
) co1 ON (t1.rowid = co1.rowid)
WHERE co1.rowid IS NULL
);
【讨论】:
检查以下脚本 -
1.
Create table test(id int,sal int);
2。
insert into test values(1,100);
insert into test values(1,100);
insert into test values(2,200);
insert into test values(2,200);
insert into test values(3,300);
insert into test values(3,300);
commit;
3.
select * from test;
您将在此处看到 6 条记录。
4.运行下面的查询 -
delete from
test
where rowid in
(select rowid from
(select
rowid,
row_number()
over
(partition by id order by sal) dup
from test)
where dup > 1)
select * from test; 您会看到重复记录已被删除。
希望这能解决您的疑问。
谢谢:)
【讨论】:
我没有看到任何使用公用表表达式和窗口函数的答案。 这是我发现最容易使用的。
DELETE FROM
YourTable
WHERE
ROWID IN
(WITH Duplicates
AS (SELECT
ROWID RID,
ROW_NUMBER()
OVER(
PARTITION BY First_Name, Last_Name, Birth_Date)
AS RN
SUM(1)
OVER(
PARTITION BY First_Name, Last_Name, Birth_Date
ORDER BY ROWID ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING)
AS CNT
FROM
YourTable
WHERE
Load_Date IS NULL)
SELECT
RID
FROM
duplicates
WHERE
RN > 1);
注意事项:
1) 我们只检查分区子句中的字段是否重复。
2) 如果您有理由选择一个重复项而不是其他重复项,您可以使用 order by 子句使该行具有 row_number() = 1
3)您可以通过将最后的 where 子句更改为“Where RN > N”(其中 N >= 1)来更改保留的数字重复(我在想 N = 0 会删除所有有重复的行,但它只会删除所有行)。
4) 在 CTE 查询中添加 Sum 分区字段,该字段将使用组中的行数标记每一行。因此,要选择包含重复项的行,包括第一项,请使用“WHERE cnt > 1”。
【讨论】:
解决方案:
delete from emp where rowid in
(
select rid from
(
select rowid rid,
row_number() over(partition by empno order by empno) rn
from emp
)
where rn > 1
);
【讨论】:
create or replace procedure delete_duplicate_enq as
cursor c1 is
select *
from enquiry;
begin
for z in c1 loop
delete enquiry
where enquiry.enquiryno = z.enquiryno
and rowid > any
(select rowid
from enquiry
where enquiry.enquiryno = z.enquiryno);
end loop;
end delete_duplicate_enq;
【讨论】:
这类似于最佳答案,但给了我一个更好的解释计划:
delete from your_table
where rowid in (
select max(rowid)
from your_table
group by column1, column2, column3
having count(*) > 1
);
【讨论】: