Why sometimes choose to use char instead of int to define a column in the case of pure numbers, no leading 0, not exceeding the value range of int?

All we need is an easy explanation of the problem, so here it is.

As I said, I found a strange phenomenon when referring to other people’s projects,

some people’s table columns (qualified for the title) use char instead of tinyint, such as

create table A(
  id int not null auto_increment,
  a_seq char(9) comment 'The first one is 1, 2, 3 (national, private, foreign)'
  a_type char(1) comment '0 is normal, 1 is disable'
  a_status char(1) comment '0 is visible, 1 is not'
)

For a_type and a_status, both char(1) and tinyint(1) are one byte, and the comparison speed of numeric characters and numbers may be as fast, so the difference between the two is not very big.

So for a_seq, why not use int for storage? int only takes up 4 bytes, but char(9) takes up 9 bytes. If you add a UNIQUE index to a_seq, doesn’t char(9) take up space and be slow?


To add, I also saw someone store the year (2020, 2021) in char(4) instead of shortint


Can anyone tell me the reason for this empirically, as I’m getting confused by this.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Why sometimes choose to use char instead of int

A half of access libraries/frameworks provides parameter values as strings unconditionally. In this case the server compares the column and literal as strings, without any type convertion.

the comparison speed of numeric characters and numbers may be as fast

I doubt. If one value to be compared is numeric and another one is string then both values to be compared are converted to floating point (double precision) values. See Type Conversion in Expression Evaluation.

If you add a UNIQUE index to a_seq, doesn’t char(9) take up space and be slow?

More disk space? of course. Slow? it depends.

Method 2

INT takes 4 bytes. CHAR(9) takes at least 9 bytes. If you are putting a number in CHAR(9), then you can get up to 1 billion. INT can handle a bigger number than that.

Significantly, putting numbers in CHAR (or VARCHAR) make them compare ‘incorrectly’ when using an inequality test:

 2  <  10    -- numeric comparison
"2" > "10"   -- because "2" > "1"

Which do you need?

There is a datatype called YEAR; wny not use that?

VARCHAR stores only the characters needed; CHAR pads to the length given. Don’t use CHAR unless you really need the padding.

There are similar arguments for using DATETIME instead of VARCHAR.

In general, use the appropriate datatype!

As for speed, … fetching the rows involved is far more costly than something as trivial as comparing one but with another. The time difference is probably so insignificant as to be essentially impossible to measure.

The default collation in the MySQL 8.0 is a complex utf8mb4 (UTF-8) collation. Even for a simple equality test, it must check each byte in a complete way. A simple example is "B" = "b" — that is case folding. Add accented letters or non-spacing accents and it becomes much more complex.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply