Searching against a large list of people at the same time in Postgres

All we need is an easy explanation of the problem, so here it is.

I have a list of people that varies between 2-5k entries that has this general format:

firstname1 lastname1, firstname2 lastname2, firstname3 lastname3 ...

I have a user_table which contains about 250-300k entries with a firstname and lastname column. What I’m trying to do is let users input that list of names into a field, and return all users in the database with matching names. My first instinct was to create a view called name_search that looks like this ->

create view name_search as (
  select
    id,
    to_tsvector(concat(firstname, ' ', lastname)) as search_field
  from user_table
)

Then I can search against it by turning all the names in the comma separated list into a string like this

const queryString = "(firstname1 <-> lastname1) | (firstname2 <-> lastname2) | (firstname3 <-> lastname3) ..."

then I can pass that string into this query

select
  *
from name_search 
where search_field @@ to_tsquery(queryString)

I’m leaving out some of the details like escaping the string etc etc. but my issue is that this is horribly slow. I was wondering if someone could help me create a solution to this that is faster.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

I don’t see a case for full text search here.
You can provide your input string as is to this query:

SELECT t.id, firstname, lastname
FROM   user_table t
JOIN  (
   SELECT split_part(fullname, ' ', 1) AS firstname
        , split_part(fullname, ' ', 2) AS lastname
   FROM   unnest(string_to_array('firstname1 lastname1, firstname2 lastname2, firstname3 lastname3', ', ')) fullname
   ) input USING (firstname, lastname);

An index on (firstname, lastname) should help performance while the input list isn’t too long. For retrieving more than a couple percent of all rows, a sequential scan is faster anyway.

Related:

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply