This is rather a technical question to satisfy my curiosity.
I know that both SQL Server & Oracle acquire latches on in-memory data structures (such as pages) to serialize access/writes to the physical representation of the page to prevent lost-updates and corruptions. These latches are causing some sorts of contention in SQL Server at least in high concurrency workload. For instance, the last page in a clustered index is constantly being latched on by one thread causing other threads to be blocked in a waiting state until the exclusive latch is released. This waiting causes performance issues. (More on Microsoft whitepaper here Whitepaper: Diagnose & resolve latch contention - SQL Server | Microsoft Docs )
My question is how do MySQL and Postgres protect their in-memory data structures, I couldn’t find anything online about latches in Postgres or MySQL. Do they use single-threaded access to update memory (sure sounds simpler).
I’ll speak to MySQL here. Firstly, I know absolutely zero SQLServer/Oracle, so I’m going 100% off what you typed. For background, you can go read up on InnoDB’s internals like the REDO log and doublewrite buffer. Also, I’m going to speak for the InnoDB engine within MySQL (which can use multiple engines, each can implement locking/flushing/etc in their own way). Page writes to disk are done asynchronously, outside the context of a transaction. When you COMMIT a change, only that data which is changing (not the entire 16K page) is written to the REDO log and synchronously flushed to disk as part of the transaction. This is what gives MySQL/InnoDB ACID compliance and functions as the crash-recovery mechanism. The page you modified (row data) is in memory (innodb buffer pool) and is now dirty and needs to be flushed to disk. This happens asynchronously, and periodically within InnoDB using an LRU flush list. Nothing prevents another transaction from updating that same row again, or another row in this page; there is no lock on the page preventing this. If another transaction modifies this page, a copy is created for this transaction. Once committed, this page too will be flushed to disk.
If another transaction modifies this page, a copy is created for this transaction
This did the trick for me, I appreciate it. so MySQL (innoDB) copies the page in memory before modifying it creating a new dirty page that gets added to the LRU.
Btw I added a link to the Microsoft whitepaper on latches, still interested to know why they went the route of latching to the page in memory instead of creating a copy of it and modifying it. If I would make a guess, it might be to minimize the number of io writes to the disk of the same page when it is dirtied multiple times. E.g. transactions 1, 2, and 3 update the same row, which results in 3 dirty pages of the same page added to the flush list. This translates to 3 asynchronous io writes. Versus writing all those changes to a single page in memory and suffering the latching contention but writing the page to disk once. Appreciate any correction here, thanks again!
It’s Matthew. I think that because disk IO in innodb is async, writing 3 times gets spread out and the cost gets amortized over time. I’ll have to verify on the make a copy of the page thing. There are only a small number of cases where innodb would aggressively flush disk io, and that’s when the % of dirty pages in the buffer pool exceeds 90%. Flushing a dirty page to disk is not necessary for ACID compliance.
Reading another article on MySQL, it looks like updating rows in the same page might updates the same dirty page, that is why having UUID as primary key in MySQL degrades write performance based on this article UUIDs are Popular, but Bad for Performance — Let’s Discuss - Percona Database Performance Blog. here is a snippet of the article that indicates that dirty pages can receive multiple writes
It looks like MySQL use buffer pool mutex to make sure only a single thread has access to a resource at a time. That looks equivalent to SQL Server and Oracle latches. You can even have instrumentation to measure wait time on those in-memory structures such as dirty page writes in the buffer pool.
This make sense to me now, allowing multiple threads to write to the same dirty page in memory will cause lost updates and using mutex protects against that, it is a double edge sword though as it can cause waits if that dirty page gets a lot of writes in memory. An example is a clustered primary key serial integer in a table with few columns where a lot of rows can fit in a single page.
Thanks! I am interested to see how postgres handle this now