Answer:
Delete
Duplicate Records using SQL
By duplicate record means that every field in one record is
identical to every field in a different record, i.e. a duplicate
is where there is no way of telling two or more records apart. If
you just need to remove records which are similar (i.e. one or
more fields are identical but there are one or more fields which
are different) then instead refer to how to delete similar
records.
To check that you have duplicate records in your table do the
following:
select
count(*) from MyTable
and
select
distinct * from MyTable
unfortunately SQL does not permit a select count(distinct). You
have duplicate records if the number of records returned by the
second query is less than the number of records returned by the
first.
Unfortunately there is no way in SQL to delete one of these
duplicates without deleting all of them. They are identical after
all, so there is no SQL query that you could put together which
could distinguish between them.
What you can do is to copy all the distinct records into a new
table:
select
distinct *
from
MyTable
This query will create a new table (NewTable in my example)
containing all the records in the original table but without any
records being duplicated. It will therefore preserve a single
copy of those records which were duplicated.
Because this query creates (and populates) a new table, it will
fail if the table already exists.
Delete Similar Records
To delete similar records, i.e. where the records are not the
same but one field is the same and only one copy needs to be
preserved, try the following SQL:
delete
T1 from MyTable
T1, MyTable
T2
where
T1.dupField
= T2.dupField
and
T1.uniqueField
> T2.uniqueField
This will delete all records from the table MyTable which have
the same value for the field dupField, leaving that record which
has the lowest value in uniqueField.
The above SQL works on SQL-Server and MySQL but not in MS-Access.
For MS-Access try:
delete
from MyTable
where uniqueField
not in
(select
min(uniqueField)
from MyTable
T2 where T2.dupField=MyTable.dupField)
This syntax also works in SQLServer (but not in MySQL) but may
not be as efficient for large tables.
Using
Index to eliminate duplicate row
It's easy to introduce duplicate
rows of data into Oracle tables by running a data load twice
without the primary key or unique indexes created or
enabled.
Here's how you remove the duplicate rows before
the primary key or unique indexes can be created:
DELETE
FROM our_table
WHERE
rowid not in (SELECT MIN(rowid)
FROM
our_table
GROUP
BY column1, column2, column3... ;
Here column1, column2, column3
constitute the identifying key for each record. Be sure to
replace our_table with the table name from which you want
to remove the duplicate rows. The GROUP BY is used on the columns
that make the primary key for the table. This script deletes each
row in the group after the first row.
|
No comments:
Post a Comment