All we need is an easy explanation of the problem, so here it is.
we have 1 table that also references itself in
table high level structure: (let’s assume that data is above 1M rows)
category_table id int name varchar parent_id int
the goal is to get the categories with their parent category names.
i have 2 approach in mind
A) do a select query and join
category_table to itself
select a.*, b.name from category_table a left join category_table b on a.id=b.parent_id
B) add another column for the parent name =
category_table id int name varchar parent_id int parent_name varchar
then do a single select statement to fetch the data
select * from category_table
my worries are,
A is what i think the right approach is but when fetching data it may be slow.
B is faster than A however i will introduce a redundant column and will be a challenge when i update the real parent name
would like to know your thoughts on which one is the best way to approach this and if you have any other suggestions
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
As with all denormalization decisions, it boils down to two things:
How important is it to optimize the query performance? If the query with the JOIN provides satisfactory performance, then use that, because the database will enforce its own data integrity to prevent update anomalies. However, if the difference is performance is great enough that you definitely cannot satisfy your performance objective if you use the JOIN solution, then you must use the denormalization.
Do you have a robust plan for accounting for the data anomalies? You correctly identified that you could have a challenge if category names change, because you need to remember to update them in multiple rows. How confident are you that you can write the code perfectly to do that? Alternatively, do you have a plan for detecting and correcting anomalies that occur if updates fail to update the category name in all necessary places? Do the updates and the correction code account for concurrent updates?
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂