The approach to take when creating/editing many interrelated records in a transactional way?

All we need is an easy explanation of the problem, so here it is.

So I have a fairly complex system I would think, that is starting to come about. It is too involved to write out all the tables, but here is a brief description.

Basically I am creating a badging system like StackOverflow for posts. You have these sorts of tables:

  • users
  • posts
  • events (saved to database so you know when each important event happened)
  • user_statistics (rollup of badge counts and such)
  • post_statistics (rollup of operation counts on the post, "it has been edited 20 times")
  • user_badges (the awarding of a badge to a user)
  • badge_types

Then let’s say you "update a post". Here is what happens:

  • post record is updated
  • event record is created next, which says "update action", which only gets created if it doesn’t exist, otherwise it reuses the same update event (so as not to prevent spamming the system). It is associated to post id and user id.
  • user statistics are updated to count the new event if it was created. there could be multiple statistics to update, as the stats may be scoped to certain categories (like all posts for a specific language)
  • if the statistics reach a threshold, then check if we need to create a badge or potentially multiple badges, then create the badges.
  • potentially create a notification record.
  • potentially a few other things, such as escalating privileges on the site now that they have more reputation, etc..
  • all of this needs to succeed, so nothing is left undone (all counts are correct and badges are awarded properly).

How do you appropriately accomplish this in PostgreSQL? In my specific case there appears to be about 10 tables which are queried, and at least 5 tables which are modified (records created or updated). All of this should in theory be atomic, in a single transaction, but it seems like a lot to pack into a transaction, especially if you have these "events" coming in multiple times a second.

The only initial way I can think of maybe potentially solving this, is using a queue and background jobs. Each step above would be done sequentially, outside of a transaction, with potential time gaps between steps. So there would be an intermediate state where things are inconsistent. But eventually (it seems in theory), the queue would run and retry until success, and get to the correct state. Is that how this should be done?

If not, is it okay to have this complex of a single transaction on every event? I can’t tell, I didn’t think the solution of implementing badges and these counters would turn out so complex, but there is a lot to consider and do on each event. Any pointer in the general right direction is all I am looking for, based on your expertise building scalable database systems.

Assume that this system must be this complex, because I am really asking about in theory how to handle complex transaction requirements. That is, if you know of an ideal way of modeling a badging system, that would be nice to know, but wouldn’t really address the main part of the question. Thank you for the help!

For now, for my purposes, everything can be considered to fit on a single machine, not distributed across multiple databases.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Agreed with Charlieface in the comments for the following reasons:

  1. "especially if you have these "events" coming in multiple times a second" – You can have those events happen 100 times per second and 0 blocking contention if the whole transaction only takes 10 milliseconds to run. Most queries when architected properly against tables that are indexed correctly shouldn’t take more than a few milliseconds to a few hundred milliseconds to execute.

  2. "is it okay to have this complex of a single transaction on every event?" – Yes. But you only seemed to mention solutions that involve all or nothing with Transactions, when there’s a third option – multiple Transactions. You should put only the objects that need to be immediately transactionally consistent in the same Transaction. So based on your described workflow, I’d say a Transaction could wrap the updating of the Post and the creation of the Event record. A second transaction can wrap just the updating of the different statistics tables and the dependent actions on those statistics such as creation of the Badge and creation of a Notification to the user. A third Transaction can handle ensuring all the dependent table updates for privilege changes are atomic.

  3. Also, to Charlieface’s point, rollups and statistics usually aren’t required to be 100% accurate 100% of the time. If you followed the Transaction pattern of my previous point, and in the rare cases the Transaction on the update of the statistics tables failed, you should still have a nightly (or whatever cadence makes sense – could be hourly, could be weekly, etc) job that re-calculates those statistics to fix any fallout. That way you get the best of both worlds of: most times the data is atomically accurate and in the rare cases it’s not, it will be eventually, but also improved workflow performance by breaking up a single Transaction into multiple smaller Transactions that will lock all of those database objects for even shorter timeframes when the process runs.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply