INSERT-ing to a table if the the data does not already exits – how do indexes peform in this scenario?

All we need is an easy explanation of the problem, so here it is.

When you are in the position of inserting data into a table if the data does not already exist …what is the best way to go with indexes?

I’m specifically thinking of SQL Server here but I assume the situation would be the same on other platforms.

On the one hand you are inserting new data into the table that will need to be added to the index at some performance cost.
On the other hand you are reading the table to see if the data already exists, so the index will help performance.

My theory is that having an index in this situation would probably be better than to not have one as (from my limited experience) I have always found un-indexed tables to perform terribly and insertions to indexed tables to be ‘a bit slower’ (obviously the type of index will impact on this).

My simple testing also suggests that indexing the table (a clustered index on 2 of the 7 columns) does improve performance, but then the table I tested on already contained 30M rows to start with (40-50M more are added by the insert).

I was just wondering if anyone can offer any insight into this scenario to help my learning.
Thanks

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

A clustered index IS the table – so if the rows you are adding are already ordered there is not much work to do. A non-clustered index is an extra structure that you will have to maintain – so Inserts, Updates & Deletes will all have an overhead, the more indexes the higher the overhead. Generally, there are some selects for every table (otherwise what is the point of storing the data?), The balance between the improved SELECT performance versus the maintenance overhead can really only be determined for your particular workload – so measure twice & cut once!

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply