hierarchical data best practice in relational database

All we need is an easy explanation of the problem, so here it is.

we have 1 table that also references itself in parent_id

table high level structure: (let’s assume that data is above 1M rows)

 category_table
 id int
 name varchar
 parent_id int

the goal is to get the categories with their parent category names.

i have 2 approach in mind

A) do a select query and join category_table to itself

 select a.*, b.name from category_table a left join category_table b on a.id=b.parent_id

B) add another column for the parent name = parent_name

  category_table
  id int
  name varchar
  parent_id int
  parent_name varchar

then do a single select statement to fetch the data

 select * from category_table

my worries are,
A is what i think the right approach is but when fetching data it may be slow.
B is faster than A however i will introduce a redundant column and will be a challenge when i update the real parent name

would like to know your thoughts on which one is the best way to approach this and if you have any other suggestions

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

As with all denormalization decisions, it boils down to two things:

  1. How important is it to optimize the query performance? If the query with the JOIN provides satisfactory performance, then use that, because the database will enforce its own data integrity to prevent update anomalies. However, if the difference is performance is great enough that you definitely cannot satisfy your performance objective if you use the JOIN solution, then you must use the denormalization.

  2. Do you have a robust plan for accounting for the data anomalies? You correctly identified that you could have a challenge if category names change, because you need to remember to update them in multiple rows. How confident are you that you can write the code perfectly to do that? Alternatively, do you have a plan for detecting and correcting anomalies that occur if updates fail to update the category name in all necessary places? Do the updates and the correction code account for concurrent updates?

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply