Fill a column with continous index in Postgres

All we need is an easy explanation of the problem, so here it is.

Imagine the followign table:

tmp_migration.asset

╔══════════╦══════════════════════════╗
║ id       ║ ...many other columns... ║
╠══════════╬══════════════════════════╣
║       15 ║                      ... ║
║       16 ║                      ... ║
║       17 ║                      ... ║
║       18 ║                      ... ║
║    10020 ║                      ... ║
║    10021 ║                      ... ║
╚══════════╩══════════════════════════╝

You see, the index doesn’t start at 1, has gaps, etc.

Problem

I want to add a new column tempId with a continous index. The table has 80m rows. How can I do that? I googled alot of things and ended up nowhere.

Background

The table is part of a data migration project. tmp_migration is a temporary schema created as the source of the data migration. In the current step I’m trying to copy over from tmp_migration.asset to public.asset while doing data transformation. I’m using a combined INSERT INTO ... SELECT ... query for that.

The problem with that is, it takes several hours (80m rows) and I don’t receive any progress notification during the run. To solve that, I wanted to use "pagination". In the bash, which is calling psql with the insert/select script, I created a loop which sets borders passed to the script.

I started with using limit / offset by adding

LIMIT :limit
OFFSET :offset;

to the script, but this slows dramatically down after being at higher "pages". So, it is advised to use WHERE on your PK over limit/offset. However, for this I need a continous PK, which I have not. Thus, I thought of adding a temporary consistent index.

Maybe there are other solutions that I don’t see right now. Would be very happy about assistence.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

DEMO:

CREATE TABLE test (id INT PRIMARY KEY, other_field INT);
INSERT INTO test VALUES (3,333),(55,555),(777,777);
SELECT * FROM test;
id other_field
3 333
55 555
777 777
ALTER TABLE test ADD COLUMN continuous INT;
SELECT * FROM test;
id other_field continuous
3 333 null
55 555 null
777 777 null
UPDATE test
SET continuous = calculate_rownumber.rownumber
FROM ( SELECT id, ROW_NUMBER() OVER (ORDER BY id) rownumber
       FROM test ) calculate_rownumber
WHERE test.id = calculate_rownumber.id;
SELECT * FROM test;
id other_field continuous
3 333 1
55 555 2
777 777 3

db<>fiddle here

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply