All we need is an easy explanation of the problem, so here it is.
Every once in a while we need to backfill a bunch of data based on some complex calculations. Because we have a lot of data, we need to make sure that we are not locking a table for a long time and do the updates in chunks. For that we have this type of script:
DO $$ DECLARE affected_count integer; start_time timestamp; end_time timestamp; elapsed_interval interval; BEGIN LOOP select now() into start_time; -- Add code here, make sure it limit it, below is an example -- Make sure that the query eventually updates 0 rows, -- otherwise this script will run forever update checklists set last_active_date = updated_date where id in ( select id from checklists where last_active_date is null limit 10000 ); -- End of code COMMIT; GET DIAGNOSTICS affected_count = ROW_COUNT; select now() into end_time; elapsed_interval := end_time - start_time; RAISE NOTICE 'Updated % row(s) in %.', affected_count, elapsed_interval; IF affected_count = 0 THEN EXIT; END IF; END LOOP; END; $$
COMMIT inside the script.
This approach works great! Usually.
Lately, we had a bit more complex logic that drives this kind of update. The later one includes the usage of crosstab. And looks something like this:
DO $$ DECLARE affected_count integer; start_time timestamp; end_time timestamp; elapsed_interval interval; BEGIN LOOP select now() into start_time; insert into permissions_stats( id, organization_id, reference_type, folder_id, template_id, all_members_groups_count, members_count, groups_count, guests_count, inherited_all_members_groups_count, inherited_members_count, inherited_groups_count, inherited_guests_count, template_shared) select uuid_generate_v4() as id, organization_id, 'Folder' as reference_type, folder_id, null as template_id, sum(all_members_groups_count) as all_members_groups_count, sum(members_count) as members_count, sum(groups_count) as groups_count, sum(guests_count) as guests_count, sum(inherited_all_members_groups_count) as inherited_all_members_groups_count, sum(inherited_members_count) as inherited_members_count, sum(inherited_groups_count) as inherited_groups_count, sum(inherited_guests_count) as inherited_guests_count, false as template_shared from ( select folder_id, organization_id, coalesce(all_members_groups_count, 0) as all_members_groups_count, coalesce(members_count, 0) as members_count, coalesce(groups_count, 0) as groups_count, coalesce(guests_count, 0) as guests_count, coalesce(inherited_all_members_groups_count, 0) as inherited_all_members_groups_count, coalesce(inherited_members_count, 0) as inherited_members_count, coalesce(inherited_groups_count, 0) as inherited_groups_count, coalesce(inherited_guests_count, 0) as inherited_guests_count from crosstab(' with recursive folders_path_to_root(start_id, id, parent_folder_id, organization_id) as ( select f.id as start_id, f.id, f.parent_folder_id, f.organization_id from folders f where deleted_by_id is null union all select pf.start_id, f.id, f.parent_folder_id, f.organization_id from folders_path_to_root pf inner join folders f on (f.id = pf.parent_folder_id and f.parent_folder_id is not null) ), folder_ids as ( select f.id as id from folders f where not exists (select * from permissions_stats where reference_type = ''Folder'' and folder_id = f.id) order by f.id asc limit 1000 ), counts_cte as ( select rp.start_id as folder_id, rp.organization_id as organization_id, case when u.user_type = ''Standard'' and om.guest then ''inherited_guests_count'' when u.user_type = ''Group'' and g.group_type = ''Standard'' then ''inherited_groups_count'' when u.user_type = ''Group'' and g.group_type = ''AllMembers'' then ''inherited_all_members_groups_count'' else ''inherited_members_count'' end as permit_type, count (u) as count from folder_permits fp inner join folders_path_to_root rp on (rp.id = fp.folder_id and rp.organization_id = fp.organization_id) inner join organization_memberships om on (fp.organization_membership_id = om.id) inner join users u on (u.id = om.user_id) left join groups g on (g.user_id = u.id and u.user_type = ''Group'') where rp.start_id in (select id from folder_ids) and rp.id not in (select id from folder_ids) group by permit_type, rp.start_id, rp.organization_id ) select folder_id as folder_id, organization_id as organization_id, permit_type, count from counts_cte union all select f.id as folder_id, f.organization_id as organization_id, case when u.user_type = ''Standard'' and om.guest then ''guests_count'' when u.user_type = ''Group'' and g.group_type = ''Standard'' then ''groups_count'' when u.user_type = ''Group'' and g.group_type = ''AllMembers'' then ''all_members_groups_count'' else ''members_count'' end as permit_type, count (u) as count from folders f left join folder_permits fp on (fp.folder_id = f.id) left join organization_memberships om on (fp.organization_membership_id = om.id) left join users u on (u.id = om.user_id and u.user_type in (''Standard'', ''Group'')) left join groups g on (g.user_id = u.id and u.user_type = ''Group'') where f.id in (select id from folder_ids) group by permit_type, f.id, f.organization_id ', ' values (''all_members_groups_count''), (''members_count''), (''guests_count''), (''groups_count''), (''inherited_all_members_groups_count''), (''inherited_members_count''), (''inherited_groups_count''), (''inherited_guests_count'') ' ) as grouped_counts ( "folder_id" uuid, "organization_id" uuid, "all_members_groups_count" int, "members_count" int, "guests_count" int, "groups_count" int, "inherited_all_members_groups_count" int, "inherited_members_count" int, "inherited_groups_count" int, "inherited_guests_count" int ) ) as pivot group by pivot.folder_id, pivot.organization_id on conflict (reference_type, folder_id) where reference_type = 'Folder' do update set all_members_groups_count = excluded.all_members_groups_count, members_count = excluded.members_count, groups_count = excluded.groups_count, guests_count = excluded.guests_count, inherited_all_members_groups_count = excluded.inherited_all_members_groups_count, inherited_members_count = excluded.inherited_members_count, inherited_groups_count = excluded.inherited_groups_count, inherited_guests_count = excluded.inherited_guests_count; -- End of code COMMIT; GET DIAGNOSTICS affected_count = ROW_COUNT; select now() into end_time; elapsed_interval := end_time - start_time; RAISE NOTICE 'Updated % row(s) in %.', affected_count, elapsed_interval; IF affected_count = 0 THEN EXIT; END IF; END LOOP; END; $$
The internal query works pefertly fine. But, when I execute the entire loop, I am getting the following error:
ERROR: cannot begin/end transactions in PL/pgSQL HINT: Use a BEGIN block with an EXCEPTION clause instead. CONTEXT: PL/pgSQL function inline_code_block line 168 at SQL statement
Now, digging a bit more into the docs and a bunch of other posts, I found that commit is not supported in plpgsql anonymous functions, as the function will execute with its own transaction.
It kind of makes sense. However, how come the first query works without any issues.
The main target of this process is to allow inserting/updating a lot of data in chunks with an option to stop in a middle and restart from the same point. Without the
commit it will always start from the very beginning every time.
What am I missing?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
I found that
commitis not supported in plpgsql anonymous functions
Well. Your finding is historically correct. (But the terminology is not.) A
DO command executes …
an anonymous code block
Note the term "block", not "function". That’s key here because a
FUNCTION still cannot execute transaction control statements like
COMMIT in Postgres 13 – and probably never will. But a
PROCEDURE can, which was added with Postgres 11. And starting with the same version, anonymous blocks executed with a
DO command can do that, too.
Did you run both tests with different versions of Postgres?
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂