Very undesiderable side effects in case of key vault plugin startup failure

Hello,

I am using the latest Percona:

Ver 14.14 Distrib 5.7.21-21, for debian-linux-gnu (x86_64) using 7.0

When there’s any issue with the key vault plugin (for example the key vault server is unreachable / down), the encrypted tables cause issues that I believe should not happen.

Performing access / SQL queries / deletion over them causes “MySQL server has gone away” errors. Server should “degrade gracefully” without crashing processes. If should check for some “magic number” in the files and, in case, just print a “cannot open encrypted table” message, not cause an hard crash.

Furthermore, I had to reinstall two servers from scratch because after the “gone away” errors, some times the server actually deleted / opened the tables but heavily corrupted “ibdata1”. From that moment on, I could not even drop the whole databases to restore them from backup (Server gone away errors galore). I tried recovering / replacing the now missing files as well but even by employing various tricks (to realign the tablespaces ids) it did not work.

Here’s relevant portion of the config file:

# Database encryption at rest section
early-plugin-load="keyring_vault=keyring_vault.so"
keyring_vault_config="/var/lib/mysql-keyring/keyring_vault.conf"
keyring-vault-timeout=30
innodb_encrypt_tables=ON
innodb_temp_tablespace_encrypt=ON
binlog_checksum=CRC32
master_verify_checksum=ON
encrypt_binlog=ON

Hi dfumagalli thanks for this report and I’m sorry to hear that you are having these issues.
Do you have anything in logs that you can share with us please? I am trying to track down if this is a previously reported issue or if we need to raise a new bug report in our Jira system.
Any additional details you might have would be appreciated.

Note: user has a second post [url]https://www.percona.com/forums/questions-discussions/mysql-and-percona-server/percona-server-5-7/51580-activating-encryption-on-one-table-instantly-encrypts-all-tables-in-all-databases[/url]

Hi dfumagalli.

Could you please see if there are any errors in the log file? You have binlog_encryption turned on, however, keyring_vault is not functional (you should also see errors in log flie). I suspect you have option binlog_error_action set to ABORT_SERVER. This means that when encryption of the binlog is not possible the server will abort. Please check in the logs if this is the case.

Thanks,
Robert

Thank you Robert and Lorraine for your kind interest!

Sadly - as I stated in the original post - I had to reinstall the servers and so the logs went lost.

As for the keyring_vault being not functional, that’s exactly what made me learn about this issue.

Expected behavior: server prints a warning or error about it being unable to open the tables (happens). Then a “graceful behavior”, where if by chance some software uses those tables, it just gets returned an error. Maybe a nice: “cannot open encrypted tables without key” error.

Observed behavior: software or console use the table (or delete them, or drops the database) => the reported error is: “table locked” or “table in use”. If you do it from the console you can see a message about backend “running away” and - insisting a bit - ibdata1 corruption.

I have the configuration file though, I did not see an ABORT_SERVER statement.

Server version:

Server version: 5.7.21-21 Percona Server (GPL), Release '21', Revision '2a37e4e'

and

mysql Ver 14.14 Distrib 5.7.21-21, for debian-linux-gnu (x86_64) using 7.0

Configuration file:


#
# The Percona Server 5.7 configuration file.
#
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
#
# For explanations see
# http://dev.mysql.com/doc/mysql/en/server-system-variables.html

[mysqld]
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
lc-messages-dir = /usr/share/mysql
explicit_defaults_for_timestamp

log-error = /var/log/mysql/error.log

# SSL certificates
ssl-ca=/var/lib/mysql/ca.pem
ssl-cert=/var/lib/mysql/server-cert.pem
ssl-key=/var/lib/mysql/server-key.pem
# wsrep_provider_options="socket.ssl=yes;socket.ssl_key=/etc/mysql/certs/server-key.pem;socket.ssl_cert=/etc/mysql/certs/server-cert.pem;socket.ssl_ca=/etc/mysql/certs/ca.pem"
bind-address=0.0.0.0
innodb_file_per_table=ON

# Recommended in standard MySQL setup
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_ALL_TABLES

# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

# Database encryption at rest section
early-plugin-load="keyring_vault=keyring_vault.so"
keyring_vault_config="/var/lib/mysql-keyring/keyring_vault.conf"
keyring-vault-timeout=30
innodb_encrypt_tables=ON
innodb_temp_tablespace_encrypt=ON
binlog_checksum=CRC32
master_verify_checksum=ON
encrypt_binlog=ON

[client]
ssl-ca=/var/lib/mysql/ca.pem
ssl-cert=/var/lib/mysql/server-cert.pem
ssl-key=/var/lib/mysql/server-key.pem

ABORT_SERVER is the default value, so if it is not in the configuration file - it is set. Also there should be plenty of errors in the log file - related to binary log.

Kind regards,
Robert

That’s fine. But why, by the very setting defaults, is a server meant to crash, get corrupted, break ibdata1 and require a reinstall in case it cannot connect to the key vault server? I mean, it’s not like connection errors are so rare. All it takes is for the key vault server to be rebooting / updating vault daemon / taking a snapshot backup or just turned off. Pretty common stuff.

If I ever told my customers their years of data risk getting lost because another (possibly a third party) server is off… they’d do very ugly things to me.

Hi dfumagalii,

Both encrypt-binlog and log-bin are static variables - i.e. they can be only set at the server startup. When binlog is opened - it goes into keyring (which goes into Vault) fetches the key and encrypt the binlog. From now one binlog key is not re-fetched from the Vault, but cached in keyring. So the scenario you described should not be valid - If the binlog cannot be encrypted the Percona Server will abort during startup.

Having made that clear that scenario described by you is not possible, I want to make also clear that re-installing keyring when binlog encryption is ON is possibly dangerous operation (given that newly installed keyring is not operational). Then the server can abort since it will try to fetch binlog key from keyring - since keyring is fresh it will try to fetch it from Vault. Since Vault is not accessible PS will not be able to encrypt binlog and abort. We let user decide if he/she wants to re-install keyring, but we are planning on blocking this when binlog encryption is ON.

We believe it is better to stop the server from starting than allow binlog to be left silently un-encrypted. Of course this is the default behavior and can be altered by changing binlog_error_action.

Kind regards,
Robert