Why does Casting this string as a decimal fail?

All we need is an easy explanation of the problem, so here it is.

Why does casting this result of REGEXP_SUBSTR() to a DECIMAL fail?

SELECT
    REGEXP_SUBSTR('Cost (-$14.18)', '(?<=Cost [(]-[$])[0-9.]+') AS _extracted,
    CAST(REGEXP_SUBSTR('Cost (-$14.18)', '(?<=Cost [(]-[$])[0-9.]+') AS DECIMAL(8,2)) AS cost_1,
    CAST((SELECT _extracted) AS DECIMAL(8,2)) AS cost_2,
    CAST((SELECT _extracted) * 1 AS DECIMAL(8,2)) AS cost_3,
    CAST('14.18' AS DECIMAL(8,2)) AS cost_4;
+------------+--------+--------+--------+--------+
| _extracted | cost_1 | cost_2 | cost_3 | cost_4 |
+------------+--------+--------+--------+--------+
| 14.18      |  14.00 |  14.00 |  14.18 |  14.18 |
+------------+--------+--------+--------+--------+

Casting a plain string as in cost_4 seems to work. Multiplying the REGEXP_SUBSTR() result by 1 also appears to work. But simply casting the result as I’ve done with cost_1 and cost_2 fails to produce the correct fixed point version of _extracted.

Oddly, in my application using the backreference as I’ve done for cost_2 actually produces the correct result. Was unable to reproduce elsewhere but thought it worth mentioning.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

This has been a long-standing issue with MySQL with people reporting this very issue as a bug since 2011. I have found that the problem is almost completely dependent on the collation being used within the REGEXP_SUBSTR() function.

For instance, if you cast the result of REGEXP_SUBSTR() as a CHAR(100), your decimals remain intact:

mysql> SELECT CAST(CAST(REGEXP_SUBSTR('Cost (-$14.18)', '[0-9.]+') AS CHAR(100)) AS DECIMAL(8,2)) AS result;

result
----- 
14.18

The result returned by REGEXP_SUBSTR() used a UTF-16 character set before MySQL 8.0.17. Versions after this supposedly use the same character set as configured by the client (See bug #94203 reported by Rick James), but this does not appear accurate. My SQL client is configured to use UTF-8 everywhere. Running your initial query in my client produces the exact same results as you shared in the question.

However, if I CONVERT( ... USING 'UTF8'):

SELECT CAST(CONVERT(REGEXP_SUBSTR('Cost (-$14.18)', '[0-9.]+') USING 'UTF8') AS DECIMAL(8,2)) AS result;

result
----- 
14.18

Surprise, surprise. A correct number.

Generally in this situation I do the very same thing that you did for cost_3; I multiply the returned value by 1, then cast it to the desired type. You can save a step by casting as FLOAT, but this sometimes has precision implications.

It is not a great answer, but it is one that can be used across multiple versions of MySQL.

Method 2

Not CAST. Use

FORMAT(expression, 2)  -- for displaying with 2 decimal places

ROUND(expression, 2)   -- for further computation

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply