Postgresql Search for a 6 digit number in text

All we need is an easy explanation of the problem, so here it is.

I have a table with one field Address – varchar(250), is it possible to search this field for a 6 digit (integer) Pincode. Is it also possible to list out records without any 12 digit Pincode.

Sl No |  Address 
1 | "16/127, Off M G Road, Goregaon (west) Mumbai, Maharashtra, 400062 02228744298"
2 | "113, 113,ragiptblr-2, Ragi Pet Bangalore, Karnataka, 560002 2229218"
3 | "2, 2, Panchal Estate, Bapunagar Ahmedabad, Gujarat, 380024 079227401432nd Flr, B Wing, Jay Chamber, Service Rd, Near Telephone Exchange, Vile Parle (east) Mumbai, Maharashtra, 400057 02226104168"
4 | "New#27, Gopalakrishna Street, T Nagar Chennai, Tamil Nadu, 600017 04428150211"
5 | "9360 61/8, Multani Dhanda, Pahar Ganj Delhi, Delhi, 110055 01123550815"
6 | "113, 113,ragiptblr-2, Ragi Pet Bangalore, Karnataka,  2229218"
7 | "9360 61/8, Multani Dhanda, Pahar Ganj Delhi, Delhi,  01123550815"

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

To resolve your issue, I did the following (all of the code below is available on the fiddle here). The code makes use of the SUBSTRING() function (manual) and the tilde (~) regular expression (regex) matching operator (manual). This is an excellent site for regexes (quick start here).

CREATE TABLE test 
(
  s1_no INTEGER GENERATED ALWAYS AS identity,
  address TEXT NOT NULL
);

and then populate using your data (and some test data of my own). BTW, all of the PINs that were supposed to be 12 digits long were actually 11 digits in your data, so I’ve assumed that 11 is the correct number of digits – change as required:

INSERT INTO test (address) 
VALUES 

--
-- my own test cases
--

('"sdfg ghhjbghjj 123456 12345678901"'),
('"dsdcfg vbggff 12345 123456789"'),      --  <<---- should NOT be returned
                                          --  no 11 digit number
--
-- your data (with concatenated records corrected)
--

('"16/127, Off M G Road, Goregaon (west) Mumbai, Maharashtra, 400062 02228744298"'),
('"113, 113,ragiptblr-2, Ragi Pet Bangalore, Karnataka, 560002 2229218"'),
('"2, 2, Panchal Estate, Bapunagar Ahmedabad, Gujarat, 380024 07922740143"'),
('"2nd Flr, B Wing, Jay Chamber, Service Rd, Near Telephone Exchange, Vile Parle (east) Mumbai, Maharashtra, 400057 02226104168"'),
('"New#27, Gopalakrishna Street, T Nagar Chennai, Tamil Nadu, 600017 04428150211"'),
('"9360 61/8, Multani Dhanda, Pahar Ganj Delhi, Delhi, 110055 01123550815"'),
('"113, 113,ragiptblr-2, Ragi Pet Bangalore, Karnataka,  2229218"'),

--
-- This last record from your own data, returns NULL for sstr (see below)
-- because there is no 6 digit no. in the address.
--

('"9360 61/8, Multani Dhanda, Pahar Ganj Delhi, Delhi,  01123550815"');

Then use the query:

SELECT
  s1_no,
  TRIM(SUBSTRING(address FROM ' \d{6} ')) AS sstr
FROM test
WHERE address ~ ' \d{11}"'
ORDER BY s1_no;

Result:

s1_no   sstr
1   123456
3   400062
5   380024
6   400057
7   600017
8   110055
10  NULL    
    7 rows 

Records 2, 4 & 9 aren’t returned because there is no 11 digit PIN (couldn’t find out what this was for) and record 10 returns NULL as there is no 6 digit PIN. The TRIM() function (manual) removes the spaces returned by the regex.

Explanation of regexes:

The SUBSTRING() function extracts a string which matches the pattern.

' \d{6} '

  • Match a space,

  • Followed by 6 digits exactly. (\d) is a "shorthand character class" (see here) that represents a single digit.

  • The curly braces act as a repetition quantifier in regexes. The single 6 between them means match 6 occurrences exactly. One could put something like {3,6} which would match between 3 and 6 occurrences for example.

  • The final space means that these 6 digits must also be followed by a space in order for a match to occur.

The tilde (~) operator can be seen as a synonym for the English word "contains".

  • ~ ' \d{11}"' – does the expression contain a space, followed by exactly 11 digits, followed by a double quote (")?

You can eliminate the NULL by using (see fiddle):

SELECT
  s1_no,
  TRIM(SUBSTRING(address FROM ' \d{6} ')) AS sstr
FROM test
WHERE SUBSTRING(address FROM ' \d{6} ') IS NOT NULL 
AND address ~ ' \d{11}"';

Finally, your PIN will be returned as TEXT and not as an INTEGER. This is OK, since one does not add, subtract, multiply or divide using PINs.

However, should you wish to hash them as INTEGERs for partitioning purposes or somesuch. You can use the very useful [PG_TYPEOF()][10] function (manual) which returns the the type of any expression as follows (also in fiddle):

SELECT
  s1_no,
  TRIM(SUBSTRING(address FROM ' \d{6} ')) AS sstr,
  PG_TYPEOF(TRIM(SUBSTRING(address FROM ' \d{6} ')))
FROM test
WHERE address ~ ' \d{11}"'
ORDER BY s1_no;

Result (snipped for brevity):

s1_no   sstr    pg_typeof
1       123456       text
3       400062       text
...
...
...

For the purposes of hashing, you could use the non-standard PostgreSQL CAST() operator :: (manual) as follows (see fiddle):

TRIM(SUBSTRING(address FROM ' \d{6} '))::INT

This could also help if you wished to partition your PINs by range as opposed to list.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply