Redshift – Insert data into a temp table with a condition being column doc_id should be unique and not null

All we need is an easy explanation of the problem, so here it is.

I’m trying to create a temp table and insert data into it in Redshift.
My goal is to create a single record for unique doc_id WHERE doc_id IS NOT NULL.

Here’s my code:

-- Creating temp table to load only rows with unique and not null doc_id
DROP TABLE IF EXISTS TMP_table CASCADE;

CREATE TEMP TABLE IF NOT EXISTS TMP_table
(
    uuid varchar,
    id integer,
    doc_id integer,
    revenue double,
    doc_date varchar,
);

-- insert into the temp table and add the distinct and not null filter on the doc_id
INSERT INTO TMP_table
(
    uuid,
    id,
    doc_id,
    revenue,
    doc_date
)
SELECT
    uuid,
    id,
    select DISTINCT (table_x.doc_id) from table_x where table_x.doc_id IS NOT NULL,
    revenue,
    doc_date
FROM schema.table_x;

Upon running the above code I get a syntax error near distinct. And I can’t seem to figure out what the error is.

Any guidance please?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

If there are duplicates doc_ids I just want to pick one and insert it (it should be not null too thus my where statement).

You can use a window/ranking function – eg. ROW_NUMBER() – to simplify the query.

Note: I haven’t checked if these are available in Redshift.

INSERT INTO TMP_table
(
    uuid,
    id,
    doc_id,
    revenue,
    doc_date
)
SELECT 
    uuid,
    id,
    doc_id,
    revenue,
    doc_date
FROM
  (
    SELECT
        uuid,
        id,
        doc_id,
        revenue,
        doc_date,
        ROW_NUMBER() OVER          -- assign row numbers
          ( PARTITION BY doc_id    -- per doc_id
                                   -- without any specific order
          ) AS rn
    FROM schema.table_x
    WHERE doc_id IS NOT NULL 
  ) AS x
WHERE rn = 1    -- pick the first one per doc_id if there are 2+ 
;

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply