Is it a good approach to design tables according to the queries I'll perform on it?

All we need is an easy explanation of the problem, so here it is.

Watching this video, and pretty new to dbms.
The speaker explains that in a row-oriented DB, rows are read in blocks.
So, my understanding is that if I have rows with fewer fields, more rows can fit into a single block and when I query the table it should take less IO operation, resulting in better performances.. Am I right?

Can I extract the rule that I shouldn’t design tables according to the entity they represent but, instead, to the frequency I’ll read or update that fields?

For example:
table employers:

  • ID
  • Name (frequently used)
  • Badge number (frequently used)
  • Birth date (rarely used)
  • Birthplace (rarely used)

    Should I split the table into 2?

  • tbl1: ID | Name | Badge number
  • tbl2: ID | Birth date | Birthplace

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

In most database management systems, data is is stored as pages, not blocks. Pages are normally 4 or 8 KB, depending on the database and how its been configured.

All else being equal, smaller row size will equate to better reuse of cached pages and less page reads on queries that require a large number of rows – so less I/O and faster read performance.

However

If you vertically partition the table (as you have in your example), there will be a slight increase to overall storage (equal to the primary key length and number of rows, plus the b-tree) and insert performance will be slightly slower as you’ll need to maintain a PK-FK relationship between the two tables.

Furthermore, if most of your queries are for single-record lookups, you’re still going to be reading a single page. There’s a greater chance that page will be cached, but reading 4 or 8 KB off a modern disk is really not an expensive operation.

Splitting the table would require 2 page reads (and navigating two B-trees) when you require BirthDate/BirthPlace. Again, not really a big deal on modern hardware.

The only time I would vertically partition a table would be in certain data warehouse situations, or if BirthDate/BirthPlace were nullable and infrequently populated.

Other Considerations

If badge number is relatively small in size (say, under 20-30 bytes), the best thing you can do to increase performance would be to drop the unneeded ID column and make your primary key BadgeNumber since:

  1. You shouldn’t have duplicates in that column
  2. Most likely you will primarily lookup on that column, so using BadgeNumber:
  • Saves you a column, making your table more compact
  • Removes the need for an index (and associated overhead) on BadgeNumber
  • Eliminates the need to join to your table to get the BadgeNumber when the table has a PK-FK relationship to another table.

There are other ways to reduce I/O and improve read performance. Most commercial DBMSs will support some form of data compression. This can fit more rows on a single page without any changes to the structure of the table, at the expense of some CPU overhead to compress/decompress the data as it is written/read. CPU is usually a cheaper operation than disk, so compression is usually a net benefit.

Method 2

What you are considering is "premature optimization". You can’t usually tell how much of an overhead those extra few bytes of "Birth date" and "Birth place" will bring, and whether it will have any material effect on performance. (I also hope that "Birth place" is actually a foreign key reference to a "Place" entity.)

So, you start with designing and properly normalizing your entities and their attributes and relationships. You validate your design by confirming that it can support all the queries you need to run from the semantic point of view.

If and when you do encounter serious performance problems caused by your table being too wide, with unequivocal hard evidence of that being the case, only then will you start considering possible remedies (of which vertical partitioning would be towards the end of the list).

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply