How to use ORDER BY and LIMIT inside JSON_AGG

All we need is an easy explanation of the problem, so here it is.

I have a query which returns a desired output.

SELECT 
    shop,
    JSON_AGG(item_history.* ORDER BY created_date DESC) as data
FROM item_history
GROUP BY
    shop;

Results:

[
  {
    "shop": "shop1",
    "data": [
      {
        "id": 226,
        "price": "0",
        "shop": "shop1.com",
        "country": "UK",
        "item": "item1",
        "created_date": "2021-06-07T08:48:42.338201",
      },
      {
        "id": 224,
        "price": "0",
        "shop": "shop1.com",
        "country": "UK",
        "item": "item 1",
        "created_date": "2021-06-07T07:53:25.030621",
      },
      ...
  },
  {
    "shop": "shop2",
    "data": [
      {
        "id": 225,
        "price": "0",
        "shop": "shop2.com",
        "country": "DE",
        "item": "Item 2",
        "created_date": "2021-06-07T08:48:36.443849",
      },
      ...
]

Which is exactly the output I want but the issue is that it fetches all items under data array and it would be nice to limit that array. I tried adding LIMIT

SELECT 
    shop,
    JSON_AGG(item_history.* ORDER BY created_date DESC LIMIT 5) as data
FROM item_history
GROUP BY
    shop;

But I get syntax error at or near "limit". Is it possible to limit the output from JSON_AGG?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

The syntax of the ORDER BY inside JSON_AGG does not accept a LIMIT option. I am not aware of a way to limit the scope of any aggregate function in this way. As an alternative, you can try the following ROW_NUMBER + FILTER approach:

  • assign row numbers within shop partitions
  • use the row numbers in conditional aggregation.

In SQL it would look like this:

SELECT 
    shop,
    JSON_AGG(derived.* ORDER BY created_date DESC) FILTER (WHERE rn <= 5) AS data
FROM
    (
        SELECT
            *,
            ROW_NUMBER() OVER (PARTITION BY shop ORDER BY created_date DESC) AS rn
        FROM
            item_history
    ) AS derived
GROUP BY
    shop;

Or you could use a CTE instead of a derived table:

WITH
    cte AS
    (
        SELECT
            *,
            ROW_NUMBER() OVER (PARTITION BY shop ORDER BY created_date DESC) AS rn
        FROM
            item_history
    )
SELECT 
    shop,
    JSON_AGG(cte.* ORDER BY created_date DESC) FILTER (WHERE rn <= 5) AS data
FROM
    cte
GROUP BY
    shop;

The results would be identical. The difference, if any, would only be in efficiency, because CTEs in PostgreSQL may be materialised while derived tables are typically not. You should test for yourself to see which option works better for you.

Note: In both cases the aggregate function still has the same ORDER BY, but you can now change it however you like, or remove it altogether: the FILTER clause makes sure you only have the last five entries per shop anyway.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply