How to transpose values from one column into columns with a value from a different column?

All we need is an easy explanation of the problem, so here it is.

I have a database with the following structure:

Date role type duration
2022-04-16 Nurse Food preparation 45
2022-04-17 Nurse Cleaning 30
2022-04-17 Volunteer Cleaning 20
2022-04-17 Nurse Food preparation 60

Note: I don’t know the values in the "type" column in advance, since they are defined by the user. Also, There can be multiple rows with overlapping date, role, and type.

I am using a charting library that would like for the data to be grouped as follows:

role Food preparation Cleaning
Nurse 105 30
Volunteer Null 20

So far, I am able to group the data using the following query

select 
    role,
    type, 
    sum(duration) as total_minutes
from work
group by role, type;
role type total_mintes
Nurse Cleaning 45
Nurse Food preparation 20
Volunteer Cleaning 15
Volunteer Food preparation 43

How can I "pivot"/"transpose" the data so that each row represents a role with one column containing the sum of minutes for each type of work?

In effect, I would like to transpose the data similar to the Pandas DataFrame.pivot_table function, but using only SQL.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

First of all you will need to install the tablefunc extension using the create extension tablefunc; command, otherwise the pivot function crosstab will not work.

Even after reading this answer, it is recommended that you read the official documentation of PostgreSQL on crosstab here

As for how to do this:

select *
from crosstab(
    'select
    role,
    type,
    sum(duration) as total_minutes
from work
group by role, type
order by type',
    'select distinct type from work order by type'
) as ct(
    role text,
    "Cleaning" text,
    "Food preparation" text
);

Pay attention to the explicit order by clause in both queries, this is a must, otherwise it may map values incorrectly as SQL does not guarantee the order of data without it.

You will have to specify each possible output of the column type in the alias.


A more dynamic version of the above (although not perfect by any means):

create or replace function get_dynamic_transpose()
  returns text
  language plpgsql
as
$$
declare
    v_output_columns text;
begin
    select array_to_string(array_agg(distinct quote_ident(type) || ' ' || pg_typeof(type) || E' \n'),',','null')
    into v_output_columns
    from testing;

    return format(
'select *
from crosstab(
    ''select
    role,
    type,
    sum(duration) as total_minutes
from testing
group by role, type
order by type'',
    ''select distinct type from testing order by type''
) as ct(
    role text,
    %s
);', v_output_columns
    );
end;
$$;

This function would return the query you need to execute to get your desired result. It will dynamically build the list of possible columns you need for the output. This function can definitely be made to be more general purpose like it is done here but it’s not a small amount of work to do that as PostgreSQL cannot return a set it does not know its definition beforehand.

There is the other option of this function instead of returning a query string, it can instead return an array of json objects each representing a row, and you would split this json into normal rows and column on application side. If such a solution is acceptable then this works fine:

create or replace function get_dynamic_transpose_jsonb()
  returns jsonb
  language plpgsql
as
$$
declare
    v_output_columns text;
    v_query text;
    v_result jsonb;
begin
    select array_to_string(array_agg(distinct quote_ident(type) || ' ' || pg_typeof(type) || E' \n'),',','null')
    into v_output_columns
    from testing;

    v_query = format(
'select jsonb_agg(ct)
from crosstab(
    ''select
    role,
    type,
    sum(duration) as total_minutes
from testing
group by role, type
order by type'',
    ''select distinct type from testing order by type''
) as ct(
    role text,
    %s
);', v_output_columns
    );

    execute v_query into v_result;

    return v_result;
end;
$$;

The result of this function would be something similar to the following

[{"role": "Nurse", "Cleaning": "30", "Food preparation": null}, {"role": "Volunteer", "Cleaning": null, "Food preparation": "55"}]

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply