MySQL 8 trailing spaces being evaluated in equals comparison

All we need is an easy explanation of the problem, so here it is.

I am getting different behaviors after updating a 5.6 database to 8.

On 5.6, if I run the following statement I get zero rows, which is what I expect to get:

SELECT * FROM EntityCustomerContact where CustomerContactID <> TRIM(CustomerContactID);

On 8.0 I get hundreds of rows returned. CustomerContactID is a varchar(32);

My understanding is that all CHAR, VARCHAR, and TEXT values in MySQL are compared without regard to any trailing spaces.

The update process consisted of importing a mysqldump from 5.6 into 8.0. The only other thing that I changed other than moving to 8.0 was updating the collation on all tables and columns from utf8mb4_unicode_ci to utf8mb4_0900_ai_ci. This is the first upgrade I’ve done moving to the utf8mb4_0900_ai_ci collation.

I have performed a number of other upgrades from 5.6/7 to 8.0 and never encountered this issue – I ran my test query on some other 8.0 (specifically 8.0.29) databases that were upgraded in the same fashion excepting the collation change and received the expected results.

I have searched for possible mysqld settings to address this but have come up short.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

MySQL collations have a pad attribute, which has a value of PAD SPACE or NO PAD

Reference

For nonbinary strings (CHAR, VARCHAR, and TEXT values), the string
collation pad attribute determines treatment in comparisons of
trailing spaces at the end of strings. NO PAD collations treat
trailing spaces as significant in comparisons, like any other
character. PAD SPACE collations treat trailing spaces as insignificant
in comparisons; strings are compared without regard to trailing spaces https://dev.mysql.com/doc/refman/8.0/en/char.html

You have changed utf8mb4_unicode_ci to utf8mb4_0900_ai_ci and the following query return the PAD_ATTRIBUTE difference:

mysql> SELECT COLLATION_NAME, 
              PAD_ATTRIBUTE        
       FROM INFORMATION_SCHEMA.COLLATIONS        
       WHERE CHARACTER_SET_NAME = 'utf8mb4' 
       AND COLLATION_NAME in ('utf8mb4_unicode_ci','utf8mb4_0900_as_ci');

+--------------------+---------------+
| COLLATION_NAME     | PAD_ATTRIBUTE |
+--------------------+---------------+
| utf8mb4_0900_as_ci | NO PAD        |
| utf8mb4_unicode_ci | PAD SPACE     |
+--------------------+---------------+

Method 2

I’ve been wondering about this. Perhaps this explains it (from the changelog for 8.0.1):

—– 2017-04-10 8.0.1 Development Milestone — Character Set Support — —–

The pad attribute for Unicode 9.0.0 collations was changed from PAD
SPACE to NO PAD. Consequently, these collations now treat spaces at
the end of strings like any other character. The affected collations
have names that contain the string 0900.

Comparisons of VARCHAR columns that
have a 9.0.0 collation differ from other collations with respect to
trailing spaces. For example, ‘a’ and ‘a ‘ compare as different
strings, not the same string. Example:

mysql> SET NAMES ‘latin1’ COLLATE ‘latin1_swedish_ci’;

mysql> SELECT 'a' = 'a ';
+------------+
| 'a' = 'a ' |
+------------+
|          1 |
+------------+
mysql> SET NAMES 'utf8mb4' COLLATE 'utf8mb4_0900_ai_ci';
mysql> SELECT 'a' = 'a ';
+------------+
| 'a' = 'a ' |
+------------+
|          0 |
+------------+

The INFORMATION_SCHEMA COLLATIONS table now has a
PAD_ATTRIBUTE column that indicates the pad attribute for each
collation.

A problem with the latin1_de collation involving early weight string
truncation has been corrected. The only likely effect is for
WEIGHT_STRING()
function results.

I think the bottom line is to avoid CHAR; stick with the VARCHAR. I don’t think COLLATION has much, if anything, to do with the issue; rather "Padding".

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply