MariaDB apparent table locking

We recently upgraded to MariaDB 10.3. All of our tables use InnoDB.

Occasionally we have a process that may take a while inserting records into a table with about 200 million records. Here’s the export for it:

CREATE TABLE `assigned_dates` (
  `id` int(10) UNSIGNED NOT NULL,
  `uid` int(11) UNSIGNED NOT NULL,
  `cid` int(11) UNSIGNED NOT NULL,
  `assign_date` int(11) UNSIGNED NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

ALTER TABLE `assigned_dates`
  ADD PRIMARY KEY (`id`),
  ADD UNIQUE KEY `cid_uid` (`cid`,`uid`),
  ADD KEY `cid` (`cid`),
  ADD KEY `uid` (`uid`);

ALTER TABLE `assigned_dates`
  MODIFY `id` int(10) UNSIGNED NOT NULL AUTO_INCREMENT;

ALTER TABLE `assigned_dates`
  ADD CONSTRAINT `assigned_dates_ibfk_1` FOREIGN KEY (`uid`) REFERENCES `users` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
  ADD CONSTRAINT `assigned_dates_ibfk_2` FOREIGN KEY (`cid`) REFERENCES `content` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;

We have a process that inserts records into this table using INSERT IGNORE, which may run batches trying to insert a few thousand records at a time, ignoring them if they already exist. While this process runs, I see other database processes that are paused, apparently waiting for this process to finish first. If I use SHOW PROCESSLIST, the status of all of the other queries that are waiting is “Updating.” This is the export for one table where the insert queries are typically waiting for the above process to finish:

CREATE TABLE `oauth_access_tokens` (
  `access_token` varchar(40) COLLATE utf8_unicode_ci NOT NULL,
  `client_id` varchar(80) COLLATE utf8_unicode_ci NOT NULL,
  `user_id` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `expires` timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(),
  `scope` varchar(2000) COLLATE utf8_unicode_ci DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

ALTER TABLE `oauth_access_tokens`
  ADD PRIMARY KEY (`access_token`),
  ADD KEY `user_id` (`user_id`);

ALTER TABLE `oauth_access_tokens`
  ADD CONSTRAINT `oauth_access_tokens_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `users` (`username`) ON DELETE CASCADE ON UPDATE CASCADE;

These tables really have nothing in common other than they both have a foreign key to the users table (but not even the same column).

How can I figure out why hundreds of other database processes are waiting for the INSERT IGNORE into assigned_dates to finish before they finish? This wasn’t a problem in MySQL 5.6, which we upgraded from, obviously a lot of changes have been made since then though.

Some more notes:

I’m only assuming that the INSERT IGNORE queries are causing the pause, because those are the queries that I see when I use SHOW PROCESSLIST. The process which ends with the INSERT IGNORE does run several other SELECT queries, but none of them are selecting, inserting, or updating the oauth_access_tokens table.

If I keep refreshing the process list, I see the INSERT IGNORE queries changing, which means that the process isn’t hung up on one user and one set of inserts, but continues to finish each user, get the next user to update, run all of the select queries to build the list of inserts, and then runs the INSERT IGNORE at the end of the process, all while the other processes trying to insert into other tables are paused in the “Updating” status.

This update process might run on a few thousand users at once. The process runs a lot of select queries before building and running the final INSERT UPDATE that I always actually see running. I usually don’t catch any of the SELECT queries in the process list, just the final insert.

I have a comment in the code saying that fetching the current list of CIDs for a UID, and only inserting the records that don’t currently exist, is slower overall than using INSERT IGNORE.

1 Like

Some more:

These queries are sent by PHP using the mysqli extension, and not using prepared statements.

The final INSERT IGNORE query itself is a single query with a list of values, e.g. INSERT IGNORE INTO assigned_dates (uid,cid,assign_date) VALUES (…), (…), (…)… The list of values could have more than 1,000 records per user. Each individual INSERT query is for a single user, users are processed one at a time, but the other waiting queries do not complete after one user finishes and the next starts. It might take 20 minutes or more for the entire batch of users to finish processing, depending on how many there are. All of the pending queries will finish after the entire list of users is finished processing.

1 Like

I’m only assuming that the other queries do not complete after a single user completes, because if I use SHOW PROCESSLIST then the age of the oldest waiting queries is pretty close to the age of the entire user update process.

I might be contradicting myself when I say I assume the INSERT IGNORE queries are causing the pause, but that the other queries don’t complete after a single user finishes. If they were waiting for only that one query, then they would finish between the time one user’s INSERT IGNORE finishes and the next one begins (it might take a full second or two between INSERT queries). It seems more like the entire user update process, which is a cron job that runs a lot of different queries, is pausing every other MariaDB process.

Obviously, I don’t know if any other queries do complete while this is going on, I wouldn’t see them in the process list unless I clicked at exactly the right time.

This is running on a CentOS 7.9 server with Apache 2.4.46, running WHM/cPanel 92.0.7. When we upgraded from MySQL 5.6 to MariaDB 10.3, the only database config change was to set the SQL mode.

Here’s our config file:

[mysqld]
log-error=/var/lib/mysql/server7.err
performance-schema=0
slow-query-log=1
long-query-time=1
slow-query-log-file="/var/log/mysql-slow-queries.log"
max_connections=2048

max_allowed_packet=1G

character-set-server=utf8
collation-server=utf8_general_ci
character-set-server=utf8
collation-server=utf8_general_ci
init-connect="SET NAMES utf8"
query_cache_type=1
query_cache_limit=32M
query_cache_size=512M

key_buffer_size=512M
sort_buffer_size=48M
read_buffer_size=48M
join_buffer_size=48M
read_rnd_buffer_size=48M
table_open_cache=8192
key-buffer-size=512M
tmp_table_size=512MB
myisam_sort_buffer_size=128M
bulk_insert_buffer_size=64M
myisam_max_sort_file_size=20G
myisam_repair_threads=2

tmpdir="/mysqltemp"

connect_timeout=10
wait_timeout=28800
interactive_timeout=1200

# Buffer pool size
innodb_buffer_pool_size=32G
innodb_flush_method=O_DIRECT
open_files_limit=40000
default-storage-engine=InnoDB
innodb_file_per_table=1

net_read_timeout = 120
net_write_timeout = 900

sql-mode="ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
1 Like

Hi!

Can you send us the output of SHOW PROCESSLIST and SHOW ENGINE INNODB STATUS with sensitive data masked? I want to see how blocked and blocking threads are related.

1 Like

I’m working on it, I think traffic on the server is too low right now maybe, I can’t provoke it to happen.

I did notice I was wrong about one thing, though. The time column in the processlist I thought was going to show the connection time for the entire PHP process that was doing the work, but when I run this on my own now I see the time column is almost always 0 or 1, so that’s the time for individual queries, not the entire connection. Before, when I was seeing the locking, the time on the top query was over 1100, so that was a single insert query that was not finishing, before I said that I saw the values in that query changing and that couldn’t have been the case or else the time would have reset.

The queries do actually finish though, but right now I can’t force the problem to happen. I’ll work with our client to see about doing updates when they know there’s going to be more traffic on the server and see if we can get it to happen again.

1 Like

innodb_status.txt (479.3 KB)
db_process.csv.txt (90.6 KB)

The process list should be opened as a CSV file.

There are a lot of queries to update launhcount in content, those are all part of a trigger.

1 Like

It’s happening again now, and I see another query trying to insert into content_session doing a rollback, like the spreadsheet that I posted also shows.

There are several hundred queries in the process list, and the progress column for all of them is 0.000.

1 Like

It looks like this might be a clue:

Process ID=35137, Main thread ID=139797217666816, state: enforcing dict cache limit

1 Like

I do not see in the data provided that queries on tables assigned_dates and oauth_access_tokens affect each other. But I see that queries on content_session are waiting for a lock.

I also noticed that both tables refer to table users as a parent. This can add to the issue.

You wrote you upgraded to MariaDB 10.3 Which version did you use before? Have you seen the same issue for the same workload? Asking, because issue by itself does not look MariaDB-specific.

1 Like

We upgraded from MySQL 5.6. We didn’t see this issue previously.

There are more tables at play, I think I was too specific with my initial description. This database has over 200 tables in it, but when I see the issue it’s because it’s trying to do a batch insert into assigned_dates, which has about 200 million records on it (and several indexes - the indexes alone are 12.5GB), and while it’s inserting those records I see hundreds of other processes fill up and wait for the other one to finish.

Those processes are from several other tables, but I think that all of them have a foreign key pointing to the users table (I saw a few queries into course_tracking_data, which does not have a key to users). The users table itself is sometimes one of the tables that’s waiting. I think that I frequently see oauth_access_tokens and users in the list because both of those are updated on login, and the content_session table is the one that people update the most frequently after they log in.

The process list I posted has a lot of queries waiting on the content table also. The content table has no link to the users table, but the content_session table, which does have a link to the users table, has a trigger which updates the content table.

The assignment_cache table is similar to assigned_dates, the same process inserts records into both of those tables. But assignment_cache only has about 2 million records, so I don’t see it hang around for very long.

I tried to double the RAM allocated to InnoDB buffer pool from 32GB to 64GB, and I did see the issue occur once after that, although it didn’t seem to last very long. I’m not sure if that had any effect.

I guess I’m confused why this happens at all. I understand that assigned_dates has a foreign key to users, is the reason why we’re seeing this because while it’s trying to insert a batch of records into assigned_dates, it needs to validate those user IDs with the users table, so it locks the users table until the insert finishes? But why would all of the other queries on other tables be locked also? We aren’t changing the users table, so why would an insert query on content_session, for example, need to wait for a query on assigned_dates to finish, even if users was locked for writing? Content_session has a key into users, but if users is locked for writing I don’t understand why that would lock content_session.

The insert query which inserts into assigned_dates doesn’t have a limit on its size, if there are 1,200 records to insert, it builds a query to insert all 1,200 records and then executes the entire thing. If it’s going to be difficult to prevent the lock at all, then maybe I can set a limit on that to insert in batches of, say, 50, so that we can be sure that it’s not going to take very long for that query to finish, which will let everything else finish, even if it’s about to start another insert query with the next 50 records.

1 Like

Would this be an ideal place for a prepared statement? I built one large insert query because I thought that was going to be faster than issuing many queries to insert single records (and I’m sure it is, in general). If I prepared the query once, and then just ran it once for each row to insert, might that end up being faster?

1 Like