MySQL What charset/collation for Case insensitivity and Accent sensitivity?

All we need is an easy explanation of the problem, so here it is.

I am looking for a charset/collation that would make it so when I do a

SELECT * FROM table_name WHERE username = "Warrior"

It only returns me the rows where username = "Warrior", "warrior" or "WARRIOR", and not "WÂRRÎOR" "Wârrîor" etc.

I found a partial solution, by changing the Charset to "utf8mb4" and the Collation to "utf8mb4_bin", now it seems accent sensitive, it differentiates "Wârrîor" from "Warrior", but it’s also case sensitive, so "Warrior" is different than "WARRIOR" which is not what I want.

I tried a different collations but I couldn’t get one to do what exactly what I want. Any ideas ?

Below is a screenshot of the different Collations available to me in the "utf8mb4" Charset :

MySQL What charset/collation for Case insensitivity and Accent sensitivity?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Note that _bin means case and accent sensitive, in your case you should’t use utf8mb4_bin.

You could use:

utf8mb4_0900_as_ci

as means accent
sensitive, and ci means case insensitive

Demo:

CREATE TABLE t (
s1 VARCHAR(15) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_ci
);

insert into t values
('WÂRRÎOR'),
('Warrior'),
('warrior'),
('WARRIOR'),
('Wârrîor');

SELECT * 
FROM t 
WHERE s1 = "Warrior";

Result:

s1
Warrior
warrior
WARRIOR

Method 2

  • _bin — accent sensitive and case sensitive
  • _as_ci (MySQL 8.0 only) — accent sensitive and case insensitive
  • _ci — accent insensitive and case insensitive

This lets you see what will compare equal and what won’t:

(Caveat: Those were taken from specific versions; the available collations do change, but the collations don’t change.)

Most, maybe not all, _ci and _ai_ci collations will treat "Wârrîor" = "Warrior"

All _ci or _ai_ci collations will treat "WARRIOR" = "Warrior"

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply