Controlling the character set of the binlog

We came across an issue with Tungsten replicator where it was incorrectly interpreting multi-byte characters. After some deeper investigation, it looks like it’s not an issue with Tungsten, but with the way data is being written to the binlogs by MySQL. The following illustrates the problem:

I created a table in mysql (Percona 5.6). The default character set of the server is latin1. It has row level logging.

create table mt_test(machine_name varchar(64) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL) default charset=utf8;

I did a setnames and ran an insert of a right-hand upper quote (unicode hex=E28099):

mysql> set names utf8;
mysql> insert into mt_test select machine_name from play_wide where id=13130;

Here we confirm the value of the quote:

mysql> select machine_name, hex(substring(machine_name,8,1)) from mt_test;
| machine_name | hex(substring(machine_name,8,1)) |
| Treessa’s MacBook Pro |E28099 |
±---------------------------------+ ----------------------------------+

For reference:

mysql> show variables like ‘%char%’;
| Variable_name | Value |
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |

But in the binlog I see the character misinterpreted as though it were latin1 where the unicode bytes get split into 3 separate characters:

/!\C utf8 //!/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/!/;

at 1010181

#170317 19:13:39 server id 18913 end_log_pos 1010239 CRC32 0x061bfc35 Table_map: usage_data.mt_test mapped to number 96

at 1010239

#170317 19:13:39 server id 18913 end_log_pos 1010299 CRC32 0xe903ce6f Write_rows: table id 96 flags: STMT_END_F

INSERT INTO usage_data.mt_test


@1=‘Treessaâ<80><99>s MacBook Pro’

I’ve tried various permutations of configuration settings and table settings, but can’t seem to make the log write the correct value. Note that the OS locale is UTF8. Any ideas of how this can be controlled?