I’ve hit a bit of a wall in benchmarking, and it’s not making much sense to me, so I’m hoping that someone here can help me.
My company is currently looking at the possibility of moving it’s primary databases into a full-disk encryption environment (AES-256 LUKS, likely); and I’m trying to benchmark FDE vs non-FDE to see what kind of performance hit we’ll be looking at. I was expecting some manner of hit, but what I’ve actually found is a hit so large as to be absolutely absurd, and I don’t know how to explain it away.
I’m using Percona Server 5.5.27-rel28.0-291.squeeze on Debian Linux 2.6.32-5-amd64, on two identical hosts (somewhat beefy Dell servers: dual Xeon E5620, 24GB RAM, SAS RAID-1 system block device, and SAS RAID-10 data block device). The systems are configured entirely the same, with the sole difference being the use of FDE on the block devices:
On the encrypted server:
/dev/sda holds the primary root partition (RAID-1), the MySQL binary log files (in /var/lib/mysqllogs), etc. This is all within an EXT3 filesystem on top of LVM which is in turn within an FDE block device.
/dev/sdc is the primary data block device (mounted at /var/lib/mysql). It is also a formatted EXT3 filesystem on top of LVM which is in turn within an FDE block device.
On the non-encrypted server:
/dev/sda holds the primary root partition (RAID-1), the MySQQL binary log files (in /var/lib/mysqllogs), etc. This is all within an EXT3 filesystem on top of LVM (no FDE).
/dev/sdc is the primary data block device (mounted at /var/lib/mysql). It is also formatted with an EXT3 filesystem on top of LVM (no FDE).
As you can see, the only difference, configuration wise, is that the encrypted serve has an FDE block device between LVM and the RAID device (hardware RAID device on these machines is a virtual drive provided by PERC6/i), whereas the unencrypted server has LVM sitting directly on the RAID device.
MySQL’s configuration is the same across the hosts, the filesystem creation and mount options, etc. Everything is the same:
RAID Strip Size: default as per the PERC6/i, 64kB
Filesystem creation options: -t ext3 -b 4096 -E stride=16 -E stripe-width=32 -j -O large_fileFilesystem mount options: defaults,relatime,data=journal,barrier=0
To benchmark the systems, I loaded a copy of production-like data, and then used mysqlslap to hit each server, with fairly low concurrency (30) and 5 iterations (though in further benchmarking I’m lowering to 3 iterations to save time).
Note that due to EXT3 being mounted as data=journal, MySQL’s O_DIRECT flush method isn’t available and both hosts are using flush method O_DSYNC.
Now, what I am seeing is that:
doing the initial data load on each system (zcat .sql.gz | time mysql --host=foo) took 1h45m on the unencrypted server and 2h35m on the encrypted server.
the average for the queries that I’m throwing at the unencrypted server is 2494 seconds per iteration (so a total of 24945 = 12,470 seconds for the run of 5 iterations, or ~3.5 hours).
I aborted the benchmark run on the encrypted server before it completed, but after it had been running for 7h29m48s (!!!)
On the encrypted server, in SHOW PROCESSLIST, I can see a lot (nearly all) of write queries that are in state “query end”.
The above was a run with innodb_flush_log_at_trx_commit = 1.
After some research, I am rerunning the benchmark with innodb_flush_log_at_trx_commit = 2 in the hopes that it will help; but:
[*]While the unencrypted server shows the load of being benchmarked (i.e. 50-60% CPU use, high amounts of data read/written from disk with CPU I/O wait of ~2%); the encrypted server shows almost no load (~5-10% userspace CPU usage, ~5% CPU I/O wait, and very slow reading/writing from the disk). This was the case with innodb_flush_log_at_trx_commit = 1 as well (and I think that innodb_flush_log_at_trx_commit = 2 has helped increase/even out the load on the encrypted database server), but it still isn’t anywhere near equivalent; and just looking at the load numbers I can tell that the encrypted database server won’t correctly finish this benchmark in anything close to an adequate time.Just for reference, the encrypted database server still shows many (… all?) writes getting stalled in state “query end” for a significant amount of time.
There is clearly a massive bottleneck between MySQL and the disk on the encrypted system, but I don’t know where it is (and, more importantly, if it can be worked around). Any help that can be given would be greatly appreciated!