How do I parse addresses in PostgreSQL?

All we need is an easy explanation of the problem, so here it is.

Let’s say for instance that I want to parse these addresses for the Chicken Ranch

Chicken Ranch
10511 Homestead Rd
Pahrump, NV 89061

Chicken Ranch
1600 Pennsylvania Avenue
NW Washington, D.C. 20500

In both of these cases, I’d like to get rid of Rd and Avenue. For instance in the first case, I’d like to get “Homestead”, and in the second “Pennsylvania”. Not every address has a designation like this though.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

This is a question address canonicalization and parsing. Essentially what you’re talking about is handled through a gazetteer (geographical rule set). There are two ways to do this right,

  1. address_standardizer from the PostGIS project and certainly better if you’re only using United States addresses.
  2. pgsql-postal may be a better method for international addresses.

I’ll show the address standardizer version for the address,

Chicken Ranch
10511 Homestead Rd
Pahrump, NV 89061

Using standardize_address from address_standardizer, returns a composite type of stdaddr. First we install it,

CREATE EXTENSION address_standardizer;
CREATE EXTENSION address_standardizer_data_us;

And, then we can use it like this.

SELECT * FROM standardize_address('us_lex',
   'us_gaz', 'us_rules', '10511 Homestead Rd, Pahrump, NV 89061');
 building | house_num | predir | qual | pretype |   name    | suftype | sufdir | ruralroute | extra |  city   | state  | country | postcode | box | unit 
----------+-----------+--------+------+---------+-----------+---------+--------+------------+-------+---------+--------+---------+----------+-----+------
          | 10511     |        |      |         | HOMESTEAD | ROAD    |        |            |       | PAHRUMP | NEVADA | USA     | 89061    |     | 
(1 row)

So you can see, ROAD is pulled out in suftype

Likewise,…

SELECT * FROM standardize_address('us_lex',
   'us_gaz', 'us_rules', '1600 Pennsylvania Avenue, NW Washington, D.C. 20500');
 building | house_num | predir | qual | pretype |     name     | suftype |  sufdir   | ruralroute | extra | city | state | country | postcode  | box |     unit     
----------+-----------+--------+------+---------+--------------+---------+-----------+------------+-------+------+-------+---------+-----------+-----+--------------
          | 1600      |        |      |         | PENNSYLVANIA | AVENUE  | NORTHWEST |            |       |      |       | USA     | D C 20500 |     | # WASHINGTON

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply