Postgresql multi-field index on jsonb data does full tablescan when field is not found

All we need is an easy explanation of the problem, so here it is.

I have a table with millions of records. I need to query for the last-added record by timestamp, for a given field. Pretty simple stuff, trivial to do with SQL:

CREATE TABLE records
(
    id integer,
    "timestamp" integer,
    type text
);
CREATE INDEX idx_type_time_sql
ON records (type ASC, timestamp DESC);

Queries are fast, even searching for a type that does not exist in the table.

select * from records where type = 'KNOWN' order by timestamp desc limit 1
select * from records where type = 'UNKNOWN' order by timestamp desc limit 1

I can also almost get it working with NOSQL (aka a jsonb field that contains all object properties):

CREATE TABLE records
(
    id integer,
    json jsonb NOT NULL
)
CREATE INDEX idx_timestamp
ON records (((json->'timestamp')::bigint));

This is fast (a few ms) to find a record when type is found. However, THIS FAILS to use the index if the type is not found in the table. It does a full tablescan that takes 12 seconds or so.

-- fast:
select * from records where json->>'type' = 'KNOWN'
order by (json->'timestamp')::bigint desc limit 1;
-- slow:
select * from records where json->>'type' = 'UNKNOWN'
order by (json->'timestamp')::bigint desc limit 1;

I have tried many different types of jsonb indexes and queries with no luck, eg:

CREATE INDEX IF NOT EXISTS idx_type_timestamp ON records ( (json -> 'type'), ((json -> 'timestamp')::bigint));

Is there any way to get postgresql jsonb indexing working as well as a good old fashioned SQL index, when querying for unknown values? Or is this just a shortcoming of jsonb?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Paying close attention to matching the index syntax with the query syntax was all that was needed to solve this (thanks @jjanes). Once you do, the successful approach is very similar to indexing standard SQL columns.

Index:

CREATE INDEX IF NOT EXISTS idx_json_pair ON records ((json->>'type'),((json->'timestamp')::bigint));

Now, both hit and miss queries are fully indexed and fast:

select * from records where json->>'type' = 'KNOWN' order by (json->'timestamp')::bigint desc limit 1
select * from records where json->>'type' = 'UNKNOWN' order by (json->'timestamp')::bigint desc limit 1

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply