Canonical-or-best data type to store a Flake ID in PostgresSQL for two query patterns

All we need is an easy explanation of the problem, so here it is.

I want to generate and write 128-bit k-ordered Flake IDs to a column from clients connected to my PostgreSQL database. These IDs are essentially a large globally-unique numbers that grow with time, similar to a monotonically increasing ID but without any coordination.

The two most common query patterns would look this:

  • Sort by the Flake ID SELECT * ORDER BY flake_id
  • Group by another column, and then select the maximum Flake ID in each group. SELECT max(flake_id) GROUP BY some_other_column

There seem to be a few possible ways to do this:

  • bigint with two columns. Simple to order by, but unclear how to select maximum across two words after a group-by.
  • bytea with 16 bytes.
  • bit(n) where n=128
  • uuid happen to be 128-bit, and testing yields ORDER BY results, but there is no max function that can be applied to it
  • encoded as a string, and stored in text haven’t tested it but a max function seems nonsensical unless it uses a lexicographic order. It also seems a bit dirty to use a string to encode what is order-able on its own numerically.

There’s a bit of choice paralysis in the options and how they interact with indexes to perform the above query patterns most efficiently.

I am looking for insight into the ideal data-type considering the above query patterns, and how it would interact with the relevant indexes.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

If you want to go with two bigints, create a composite type for flake_id:

CREATE TYPE pair (a bigint, b bigint);

Your first query then can remain as it is, and your second query could be rewritten to

SELECT DISTINCT ON (some_other_column)
       flake_id
ORDER BY some_other_column, flake_id DESC;

But why don’t you use the obvious data type numeric?

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply