Filtering Escape Characters in PostgreSQL

All we need is an easy explanation of the problem, so here it is.

I am trying to filter distinct serial number from a table and join two tables. Current my code looks like

SELECT DISTINCT ON (t1.serial_nbr) t1.*, q1.serl_nbr
FROM ( select
                 serial_nbr,
                 CASE
                 when substring(serial_nbr FROM 2 for 2) = '18' THEN
                 regexp_replace(serial_nbr,'(...)(\w{{5}})','\\1 \\2')
                 when substring(serial_nbr FROM 2 for 2) = '19' AND length(serial_nbr) = 9 THEN
                 regexp_replace(serial_nbr,'(...)(\w{{6}})','\\1 \\2')
                 ELSE serial_nbr END as serial
                 FROM controller_returns) as t1
LEFT JOIN dblink('{param_db}', 
            'SELECT DISTINCT ctrlr_serl_nbr,insert_ts
            FROM ctrlr_basic_setng') 
as q1(ctrlr_serl character varying,insert_ts timestamp without time zone)
on q1.ctrlr_serl_nbr = t1.serial
WHERE t1.serial_nbr~'^[ABCD]([AC0-9]{{1,9}})'
ORDER by t1.serial_nbr, substring(t1.serial_nbr from 2 for 2), substring(t1.serial_nbr from 1 for 1)

I am using postgreSQL and connecting through using psycopg2 in python.

This is evolving code but I been noticing that the serial # that I am getting aren’t truly distinct. I would sometimes get case like this ‘A1800038’ and ‘A1800038\n’. What I am trying to figure out is is how to filter out the serial with \n. I was trying to do it in the WHERE clause doing '^[ABCD]([AC0-9][^\\n]{{1,9}})' or '^[ABCD]([AC0-9][^[:space:]]{{1,9}})' but neither work.

Any help would be much appreciated.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

…how to filter out the serial with \n.

Add this in the subquery t1:

WHERE t1.serial_nbr !~ '\\n$'

You have to double \ to remove its special meaning. To identify actual newline characters (E'\n' – ASCII code 10), use a single \ instead.
$ signifies the end of the string.

That said, you could use cheaper LIKE expressions in your query:

SELECT DISTINCT ON (t1.serial_nbr) t1.*, q1.serl_nbr
FROM  (
   SELECT serial_nbr
        , CASE WHEN serial_nbr LIKE '_18%'      THEN regexp_replace(serial_nbr, '(...)(\w{{5}})','\\1 \\2')
               WHEN serial_nbr LIKE '_19______' THEN regexp_replace(serial_nbr, '(...)(\w{{6}})','\\1 \\2')
                                                ELSE serial_nbr END AS serial
   FROM   controller_returns
   WHERE  t1.serial_nbr !~ '\\n$'   -- !
   ) t1
LEFT   JOIN dblink('{param_db}', 'SELECT DISTINCT ctrlr_serl_nbr, insert_ts FROM ctrlr_basic_setng')
         AS q1(ctrlr_serl varchar, insert_ts timestamp)
         ON q1.ctrlr_serl_nbr = t1.serial
WHERE  t1.serial_nbr~'^[ABCD]([AC0-9]{{1,9}})'
ORDER  BY t1.serial_nbr, substring(t1.serial_nbr FROM 2 FOR 2), left(t1.serial_nbr, 1);

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply