On MySQL aurora pt-online-schema-change with Cannot connect to h=xx.xx.xx.xxx,p=...,u=myuser

I have been trying to use pt-online-schema-change tool to do character set/collation conversion of a large table from latin1 to utf8. My connection is local laptop to aurora via vpn. In all our environments it worked well. However, in prod, it randomly started to report, "Cannot connect to h=xx.xx.xx.xxx, p=…, u=myuser (data masked but that’s all what it reports). It does not abort but continues. Google searches show this could be related to inability to connect to replica slaves, the failed connection messages as such do not indicate this. But when I look for the host it reported in ec2 dashboard, it happens to be a bastion host used for some third party app (fivetran).
The question is, why is the bastion host coming in this mix? One possibility is that the pt-online-schema-change mistakes the bastion host to be a slave and attempts to connect to find the replication lag. That doesn’t make sense as it’s really not a slave. Any thoughts or any one else has encountered this or similar issue?
Thank you!

1 Like

Hi, indeed what you state is possible. For Aurora we recommend using the dsn method to monitor replica lag. Insert any replicas you want monitored in a table like the following:

use percona_schema

CREATE TABLE `percona_schema`.`dsns` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parent_id` int(11) DEFAULT NULL,
`dsn` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
);

INSERT INTO percona_schema.dsns(dsn) values ('h=your_replica1....rds.amazonaws.com');
INSERT INTO percona_schema.dsns(dsn) values ('h=your_replica2....rds.amazonaws.com');

then you can run the tool:

pt-online-schema-change \
--alter '...' \
h=auroraprimary.....rds.amazonaws.com,D=yourdb,t=your_table \
--recursion-method=dsn=D=percona_schema,t=dsns \
--no-check-replication-filters \
--user=percona \
--password=$pw \
--execute

hope that helps!

1 Like