I have a database on latin1 historically. All tables are on utf8 but for one table which is on latin1. I need to convert this to utf8. There are two approaches. As per MySQL doc, both are blockers of DML operation. The second approach can be inplace and faster but is a DML blocker. The first approach can work only with COPY and not INPLACE. These tests were done in Aurora MySQL 5.7.
- ALTER TABLE tbl_name CONVERT TO CHARACTER SET charset_name ;
- ALTER TABLE t1 CHARACTER SET = utf8;
I tried both methods on two different clones. While the convert is a long process (1.5 hours on a 35M rows table in RDS), it worked correctly when we updated an existing value to utf8. The one with the second approach, worked instantaneously (0.12 secs), understandably as it only updated the metadata, it didn’t seem to do the right thing. When I updated an existing value with utf8, it produced garbage. What’s missing? latin1 is a subset of utf8, so the second approach should have worked. With the table definition having utf8, the future values be correctly saved? Doesn’t seem to be so.
if first method is the only right approach, what’s the purpose of the second method? Is it to be used only on a newly created table?
if all characters are in latin1 (1 byte) which is subset of utf8 (3 bytes), what does the convert do? Even in utf8, it will still store it in 1 byte (being originally single byte) so what does convert physically do?