Not the answer you need?
Register and ask your own question!

Controlling the character set of the binlog

mtalucmtaluc EntrantLegacy User Role Beginner
We came across an issue with Tungsten replicator where it was incorrectly interpreting multi-byte characters. After some deeper investigation, it looks like it's not an issue with Tungsten, but with the way data is being written to the binlogs by MySQL. The following illustrates the problem:

I created a table in mysql (Percona 5.6). The default character set of the server is latin1. It has row level logging.

create table mt_test(`machine_name` varchar(64) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL) default charset=utf8;

I did a setnames and ran an insert of a right-hand upper quote (unicode hex=E28099):

mysql> set names utf8;
mysql> insert into mt_test select machine_name from play_wide where id=13130;

Here we confirm the value of the quote:

mysql> select machine_name, hex(substring(machine_name,8,1)) from mt_test;
+
+
+
| machine_name | hex(substring(machine_name,8,1)) |
+
+
| Treessa’s MacBook Pro |E28099 |
+
+
+

For reference:

mysql> show variables like '%char%';
+
+
+
| Variable_name | Value |
+
+
+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+
+
+

But in the binlog I see the character misinterpreted as though it were latin1 where the unicode bytes get split into 3 separate characters:

/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
BEGIN
/*!*/;
# at 1010181
#170317 19:13:39 server id 18913 end_log_pos 1010239 CRC32 0x061bfc35 Table_map: `usage_data`.`mt_test` mapped to number 96
# at 1010239
#170317 19:13:39 server id 18913 end_log_pos 1010299 CRC32 0xe903ce6f Write_rows: table id 96 flags: STMT_END_F
### INSERT INTO `usage_data`.`mt_test`
### SET
### @1='Treessaâ<80><99>s MacBook Pro'

I've tried various permutations of configuration settings and table settings, but can't seem to make the log write the correct value. Note that the OS locale is UTF8. Any ideas of how this can be controlled?
Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.