Converting large database with many tables from latin1 to utf8mb4

All we need is an easy explanation of the problem, so here it is.

So this is not a question about "how do I convert a table from latin1 to utf8?" I totally know that and get it. The question I’m trying to ask is, "How can I make this conversion have as little pain as possible during the transition?" I know I need to convert the columns on each table, and then at some point change the PHP MySQL connection from latin1 to UTF8, and I could easily do all of that if my database was 1 GB, not 1 TB.

Using MariaDB 10.3, there are about 600 tables in the database all under the InnoDB storage engine and I would say that probably 50 of those are north of 1gb with about 20 that are in the 10’s or 100’s of gb. And the problem with those ~20 tables are they are the core of the application itself, and 1 table in those 20 is where a LOT of the UTF8 problems occur (currently 66gb).

So taking care of ~90% of the tables will basically have no downtime but that last 10% will be a doozy. Any advice on what steps I should take and in what order? Here’s what I’m generally thinking…..

  1. Convert the 90% to utf8mb4
  2. Set the PHP MySQL connection charset from latin1 to utf8mb4
  3. Using a script I built that will convert each column of the remaining tables from latin1 to binary, then binary to utf8mb4. Set aside probably….. 3-4 hours of downtime??? Our app is a very busy app, and 3-4 hours of downtime is a lot.

Has anybody tried Percona’s pt-online-schema-change with success and do you think it’d be helpful in this case?

The only other thing I can think of is get a new slave up and running that is a fresh copy of master, make all the utf8mb4 changes on that slave, and then promote the slave to the master database. I guess I could also convert all the slaves before hand, just rotate them in or out of service while I do that. The only unknown is what would happen with a latin1 master, while the slaves are all utf8mb4. All the converted data will be fine, but I assume new data may be latin1 from the binlog and not charset agnostic?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

At my last job we used pt-online-schema-change for such changes, or any other ALTER TABLE changes, hundreds of times per week on tables much larger than yours. I worked on an internal service and dashboard to allow developers to run schema changes on their own. I know — that’s madness!

For such large tables, you have to be careful about restarts. If the database has a failover event or if the host where pt-online-schema-change runs has a restart, then you have to start over. We actually developed patches for pt-online-schema-change to save its state, so we could resume where it left off if the script was interrupted. Unfortunately those patches are not public and I’ve left that job.

At least run pt-online-schema-change in a screen or tmux session, so you don’t have to depend on an uninterrupted ssh session.

How much time does it take for a very large table? It varies, because pt-online-schema-change monitors a couple of performance indicators, and it slows itself down dynamically if it thinks the table copying workload is causing a performance drop. So if your database is normally serving high traffic levels, pt-online-schema-change would take more time than it would if the database were idle. Therefore it’s worthwhile to schedule your schema changes during off-hours if possible.

Tables that are large may take more than 24 hours to complete a schema change. I think the longest I saw was 4 weeks. That was probably a single table over 1TB, on a very busy database server. It was unfortunate, because I recall in that case, the developers thought they could drop an index. Once they dropped it, it turned out they really did need that index after all for certain queries. But it took 4 weeks to do the alter table to recreate the dropped index. Because of using pt-online-schema-change, the table could still be queried during that 4 weeks, but the performance of certain queries was bad without the needed index. That was painful.

I kept telling the developers that allowing tables to grow so large is asking for trouble, for reasons like that. But they didn’t listen.

Another caveat of pt-online-schema-change is that since it must create triggers at the start and do a rename at the end, it must have exclusive access to the table briefly at the start and the end. This means it waits for an exclusive metadata lock, if there are any transactions outstanding against the table. So if you have long-running queries, or even short queries that leave their transaction uncommitted, it will block the startup or the rename at the end. And while pt-online-schema-change is blocked waiting for that metadata lock, it blocks all other queries. This can cause a serious problem.

So we found a way to invoke pt-online-schema-change with a 2-second timeout on the metadata locking. If it can’t get its job done in 2 seconds, it stops waiting, and must try again. This prevents long logjams like I described. Sometimes it means pt-online-schema-change must retry many times to start or to finish. But that’s better than an outage.

Ideally, you would not have such long-running transactions, but that’s up to your application code. It might be difficult to know if you have such cases, or which code is responsible for them.

The final caveat I can think of right now is if you do joins on string columns anywhere, changing the character set and hence the collation means that if those joins were depending on indexes before, they can’t now. Such joins may be at a great disadvantage on performance until you can alter the joined table to be compatible. This has nothing to do with pt-online-schema-change, but would apply to any method you use to change character sets.

I hope you are upgrading to utf8mb4, not just utf8. The utf8mb4 is becoming the preferring character set, and utf8 (the 3-byte type) is becoming deprecated.

I’m not sure about the replication issue you mentioned. I suggest you test it, not with your production database, but in a test environment. My suspicion is that statement-based replication would work, but I’m not sure that row-based replication would.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from or, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply