Indexing first and last item of a series only

All we need is an easy explanation of the problem, so here it is.

Does PostgreSQL support any kind of sparse index that would be useful for indexing only the first and last value in the table, based on a certain key?

Consider the following data:

CREATE TABLE samples (
   device_id int not null;
   ts timestamp not null;
   value real not null;
);

Now, let’s assume I have millions of devices and each one can have billions of samples. I want to access all of this data – if access is slow I don’t mind. However I am especially interested to access the first and the last value by ts timestamp: (device_id, value) very, fast. This to know what’s the range of samples for a specific device.

I can build a trigger/application logic that manages a separate table where the information is stored:

CREATE TABLE first_last_samples (
   device_id int not null;
   first_ts timestamp not null;
   first_value real not null;
   last_ts timestamp not null;
   last_value real not null;
);

However, managing the logic for caching first/last item yourself feels a bit clunky. I am asking would PostgreSQL manage any special index types, maybe by extensions, which would let me solve this problem on the table/view/index definition level?

Because there are billions of data points, putting all of them in the index does not seem to make sense if you are only interested in first and last.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

If your undisclosed access patterns and/or other restrictions don’t allow for a MATERIALIZED VIEW or a trigger solution keeping a table with min/max per device_id up to date, then the closest standard tool might be a BRIN index, which is much smaller than a corresponding B-tree index, typically by several orders of magnitude. But its efficiency also depends on undisclosed data distribution in your table. See:

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply