Postgres to_tsvector is case sensitive on mac os

All we need is an easy explanation of the problem, so here it is.

On PostgreSQL 10.4 on Ubuntu 10.4-2.pgdg16.04+1

select to_tsvector('simple', 'БОЛЬШИЕ БУКВЫ');

returns 'большие':1 'буквы':2

But on postgres 10.3 (installed by brew) on Mac Os High Sierra version 10.13.3 it returns 'БОЛЬШИЕ':1 'БУКВЫ':2 for not english letters

How can I fix it on Mac os?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

The ability to identify the case of characters depend on the LC_CTYPE of your database, which by default, depends on the environment in which the PostgreSQL instance has been created (with initdb).

For instance, on Ubuntu with PostgreSQL11:

tstc=# show lc_ctype;
 lc_ctype 
----------
 C

tstc=# select to_tsvector('simple', 'БОЛЬШИЕ БУКВЫ');
      to_tsvector      
-----------------------
 'БОЛЬШИЕ':1 'БУКВЫ':2
(1 row)

That’s the result you got on your db on MacOS.

But when I’m logged to a different database, with an UTF-8 locale this time:

postgres=# show lc_ctype;
  lc_ctype   
-------------
 fr_FR.UTF-8
(1 row)

postgres=# select to_tsvector('simple', 'БОЛЬШИЕ БУКВЫ');
      to_tsvector      
-----------------------
 'большие':1 'буквы':2
(1 row)

Now letters are put in lower case.

The fix is to create the database with the correct LC_CTYPE. It cannot be changed afterwards. By default, this setting comes from template1, but it can be overriden by choosing template0, if template1 does not suit you, for instance:

CREATE DATABASE newDB lc_ctype='C.UTF-8' template template0;

The locale specified with lc_ctype must also be supported by the system. Check with locale -a or some equivalent if that doesn’t work on MacOS.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply