Delete millions of rows on a badly designed table

All we need is an easy explanation of the problem, so here it is.

I have identified Millions of duplicated rows on a table I have inherited by using this query:

SELECT COUNT(*) AS NumRecords, AccessID, LEFT(SQLTEXT, 5000)
FROM Table
WHERE AccessID=5012
GROUP BY AccessID, LEFT(SQLTEXT, 5000)
HAVING COUNT(*)>1;

Delete millions of rows on a badly designed table

The only index I can use on the table is the AccessRequestID field – the SQLText field is VARCHAR(MAX) and there are over 100 million records here and since there is a varchar(MAX) column the table is HUGE and takes FOREVER to do anything with. How can I turn that Select statement into a delete to remove the duplicated records? I was trying to figure out how the write a CTE using Partition Rownum, but I’m not confident in it. My Idea would be to have it in a loop that starts with AccessID 1 and then increments by one until the end of the table (There are only 5012 unique accessIDs) Since I would be filtering the where by the NC Index it will hopefully be faster.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

maybe you can try to insert the valid rows into a new table and then replace the old table with the new one.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply