All we need is an easy explanation of the problem, so here it is.
I have just been involved in a project on which I will have to develop a data migration process and a web interface that uses an already existing SQL Server DB. This DB was developed by another person several years ago, it has around 100 GB of data and it is increasing every 10 minutes (it stores 10 min data from several units -> 144 records per day per device). Several tables have around 10 milion rows.
The point is that I think the main tables have been designed in a way that is not the most efficient or appropiate for the kind of queries that will be usually executed. Now I need to proof if what I say is better than what it is already implemented.
The DB is extensive in number of tables, but the structure can be simplified by the following diagram:
The Date_Id field is generated automatically by a function using the DateTime field. There are two indexes in both tables. The cluster index for each table contains the PK fields in the same order. The second index for the Unit table contains only the Unit_Id field, while the second index in UnitData contains the Unit_Id and DateTime fields in this order.
However, I think the design should be this one:
In this case only a clustered index for the PK fields will be needed. For this DB design the usual query would be something like:
SELECT ud.* FROM Unit u, UnitData ud WHERE u.Unit_Id = ud.Unit_Id and ud.DateTime >= 'dd-MM-yyyy' ORDER BY ud.Unit_Id, ud.DateTime
Now comes the thing that I really don’t understand: I’ve been told that the only reason for having a Date_Id column is to use it as partitioning column for this table. I’ve asked about the real necessity of having this table partitioned and the asnwer was “to run queries more efficiently when wanting daily or monthly data”. I didn’t know to much about partitioning before this, so I checked these links:
Considering that the ideal query would be filtering by device and datetime, the questions are:
- What do you think would be the most efficient and ideal query for the first DB design (with partitioning)?
- Do you really think that the most efficent query against the first DB design is better than the second one (the one I wrote above)?
- If the previous one was affirmative, do you really think the improvement is worth enough having two extras fields (Id and Date-Id) and an extra index?
Thank you very much!!
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Using partitioning is only going to help your query performance if the partitioning scheme is built to serve your specific queries.
You’re going to have to review your query patterns and see how they are accessing the table in order to identify the best approach. The reason for this is you can only partition on a single column (the partitioning key) and this is what will be used for partition elimination.
There are two factors that affect if partition elimination can occur and how well it will perform:
- Partition Key – Partitioning can only occur on a single column and your query must include that column. For example, if your table is partitioned on date and your query uses that date column, then partition elimination should occur. However, if you don’t include the partition key within the query predicate, the engine can not perform elimination.
- Granularity – If your partitions are to big, you won’t gain any benefit from elimination because it will still pull back more data than it needs to. However, make it to small and it becomes difficult to manage.
In many ways, partitioning is just like using any other index, with some added benefits. However, you don’t realize those benefits unless you’re dealing with incredibly large tables. Personally, I don’t even consider partitioning until my table is over 250 GB in size. Most of the time, well defined indexing will cover many of the use cases on tables smaller than that. Based on your description, you’re not seeing huge data growth, so it could be that a properly index table will perform just fine for your table.
I would strongly recommend that you review whether partitioning is actually necessary to solve your issues. One would usually partition a very large table for the purpose of:
- Distributing data between different types of disk so that more “active” data can be placed on faster, more expensive storage while less active data is placed on cheaper, slower storage. This is mostly a cost savings measure.
- Assisting in index maintenance for extremely large tables. Since you can rebuild partitions individually, this can assist in keeping indexes properly maintained with minimal impact.
- Leveraging partitioning for improved archival process. See sliding windows.
Table partitioning may improve performance if you are able to work within the limits of how the partition works. See the description at:
However, partitioning can also make your server run slower if your partitions are not set up “just so” and your queries cannot remain within a single partition. Gail Shaw has written an article on this:
A couple of quotes:
“Partitioning can enhance query performance, but there is no guarantee.” And,
“In summary, partitioning is mostly for improved maintenance, fast loads, fast deletes and the ability to spread a table across multiple filegroups; it is not primarily for query performance.”
Read this on the partitioning — SQL Server- Table partitions on SSD. As far as #2, you’re table will be fragmented if you design it this way. You should switch places of the columns. Make DateTime the first column, this way the new rows will be added at the bottom, instead of finding room for every Unit_Id everyday — lots of page splits. Then you can create a nonclusted index for your query.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂