All we need is an easy explanation of the problem, so here it is.
I’m trying to create a temp table and insert data into it in Redshift.
My goal is to create a single record for unique doc_id WHERE doc_id IS NOT NULL.
Here’s my code:
-- Creating temp table to load only rows with unique and not null doc_id DROP TABLE IF EXISTS TMP_table CASCADE; CREATE TEMP TABLE IF NOT EXISTS TMP_table ( uuid varchar, id integer, doc_id integer, revenue double, doc_date varchar, ); -- insert into the temp table and add the distinct and not null filter on the doc_id INSERT INTO TMP_table ( uuid, id, doc_id, revenue, doc_date ) SELECT uuid, id, select DISTINCT (table_x.doc_id) from table_x where table_x.doc_id IS NOT NULL, revenue, doc_date FROM schema.table_x;
Upon running the above code I get a syntax error near distinct. And I can’t seem to figure out what the error is.
Any guidance please?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
If there are duplicates doc_ids I just want to pick one and insert it (it should be not null too thus my where statement).
You can use a window/ranking function – eg.
ROW_NUMBER() – to simplify the query.
Note: I haven’t checked if these are available in Redshift.
INSERT INTO TMP_table ( uuid, id, doc_id, revenue, doc_date ) SELECT uuid, id, doc_id, revenue, doc_date FROM ( SELECT uuid, id, doc_id, revenue, doc_date, ROW_NUMBER() OVER -- assign row numbers ( PARTITION BY doc_id -- per doc_id -- without any specific order ) AS rn FROM schema.table_x WHERE doc_id IS NOT NULL ) AS x WHERE rn = 1 -- pick the first one per doc_id if there are 2+ ;
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂