All we need is an easy explanation of the problem, so here it is.
I’m at a crossroads where I need to decide if I’m going to stick with
bigserial as my primary key, or change to
uuid (non-auto-generating—my API server will generate the ID using uuid v4 and insert it).
I’ve spent hours researching
uuid primary keys, and it seems no one can agree on what the disadvantages of
uuid would be (if any). My database isn’t that complicated: it’s a series of tables with pretty basic relationships, I’m typically only inserting one row at a time, I have a handful of
jsonb fields I’m using here and there. Write speed/frequency will only be significantly high on one table in particular.
The reason I started looking into UUIDs wasn’t because I think I’ll ever run out of
bigint keys (9 quintillion, if I recall), but more from a standpoint of obfuscation. Right now I’m having to hash the IDs on the front-end to avoid showing the user DB IDs in the URL (e.g.
/things/2732). Using Hashids, I can instead have URLs like
/things/To2jZP13dG. But I thought I could go a step further and just use UUIDs which don’t give any clues as to # of records. What I don’t like is the added effort of having to encode IDs before passing them to the back end and decoding them there, and then when querying batches of 50-100 items to return to the client, having to mass encode all those IDs before returning them to the client.
One argument for random UUIDs (uuid v4) was:
If your primary key is an incrementing ID, those are stored physically next to each other. That database page can have contention as many people are writing to it. Random IDs prevent contention by spreading the writes all over the DB
But then I found a contradictory statement here:
Regular random UUIDs are distributed uniformly over the whole range of possible values. This results in poor locality when inserting data into indexes – all index leaf pages are equally likely to be hit, forcing the whole index into memory. With small indexes that’s fine, but once the index size exceeds shared buffers (or RAM), the cache hit ratio quickly deteriorates.
I know the folks at Heroku are fans of using UUIDs as primary keys. For whatever it’s worth, I wasn’t planning on having Postgres auto-generate the IDs at all. Instead, my API server would generate a v4 UUID and pass that to the database (this would allow my API server & front end client to be more efficient and not have to always use
RETURNING id statements in my queries).
Does anyone have a canonical answer on the true cost of
INSERT statements when using
uuid as the primary key?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
It is easy enough to benchmark this, but the
INSERT performance of UUIDs will be worse, because they are bigger and slower to generate.
But it doesn’t sound like you are building a high performance application anyway (then you probably wouldn’t be using JSON), so it probably won’t make much difference.
Finally, you want to use UUIDs for security reasons (which I won’t discuss here). Security always puts a penalty on performance and usability, so consider it as the price you are paying for security.
What Laurenz said is true, but I’ve actually found a more measurable difference in performance when you try to use those
UUID fields in predicates (e.g. in
HAVING clauses). Again, it’ll depend on the amount of data between the tables your predicates are for, but a comparison between a 16 byte value and another 16 byte value is somewhat significantly different than one between a 4 byte and 4 byte value, for example, if the fields were
INT data types. But as Laurenz mentioned, that is a valid price to pay for security.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂