mysql tables crashing

If you want the SQL commands to fix your indexes, ask me.

[B]jcn50 wrote on Fri, 14 September 2007 10:28[/B]

UNIQUE(device_id, timestamp,data_type,object) means:

  • device_id contains no duplicate ;
  • timestamp contains no duplicate ;
  • data_type contains no duplicate ;
  • object contains no duplicate.

UNIQUE(device_id, timestamp,data_type,object) doesn’t mean: “there’s only ONE UNIQUE row having a defined: device_id AND timestamp AND data_type AND object”.

HMMM!!! Very very interesting…

Let me find some of my DELETE commands to see exactly how I’m doing it. But what you said is not how I intended my scripts
to work, that would explain a lot…

I always thought that including 4 unique columns would mean that the combination needs to be unique which is what my intentions were. so to make it simpler, let’s play with 2 columns…

So I could insert a row with:
device_id = 10
timestamp = 12345677

But if I wanted to insert again:

device_id = 10
timestamp = 12345677

It shouldn’t let me do that…atleast that is what I thought
I was doing…

I will confirm my DELETES, but usually the deletes are based
on timestamp, not even the device_id…

So if I have 10 records with device_id 10,12,13,14,15 etc…

My cleanup script would just delete any record that has a timestamp older than say 24 hours…

So I didn’t think that I needed to be specific in that sense when deleting???

Thoughts?

OK, if you didn’t intend to do what I wrote… it simply means that you have NO PRIMARY INDEX! :smiley:

I’m not saying that what you want you to do is wrong, I’m just saying that the columns which are set to UNIQUE (or “PRIMARY INDEX”) are wrong/misused.

SQL commands are:
ALTER TABLE data_day DROP PRIMARY KEY
(should delete all primary keys)

ALTER TABLE data_day DROP INDEX device_id
ALTER TABLE data_day DROP INDEX timestamp
ALTER TABLE data_day DROP INDEX data_type
ALTER TABLE data_day DROP INDEX object
(should remove the unique index)

ALTER TABLE data_day ADD INDEX device_id
ALTER TABLE data_day ADD INDEX timestamp
ALTER TABLE data_day ADD INDEX data_type
ALTER TABLE data_day ADD INDEX object
(should add a simple index)

And you’re done!

Copy/paste your fields’ type again, to show 5 beautiful “MUL” and no more “PRI”.

To answer your question, yes the data_day table should not have any column that is a primary key. The device_id is a primary key in another table and we use that in the data_day table for other purposes. Basically we dump a lot of data into the data_day table and I guess, the reason why we used Unique( 4 columns ) is that I didn’t want any duplicate entry if any of the INSERTS all matched the 4 columns exactly…

Does that make any sense? So…

col1, col2, col3, col4 could have values 1,2,3,4 respectively
and if I wanted to INSERT those again, it should reject it…

But 1,2,2,3 or 1,1,1,1 any other combo would work…

I think we’re pretty much on the same page, just want to confirm…

You have suggested other things to try earlier in this thread
so I want to try those first…you’ve given me plenty of troubleshooting ideas!!!

[B]allworknoplay wrote on Fri, 14 September 2007 19:18[/B]
To answer your question, yes the data_day table should not have any column that is a primary key. The device_id is a primary key in another table and we use that in the data_day table for other purposes. Basically we dump a lot of data into the data_day table and I guess, the reason why we used Unique( 4 columns ) is that I didn't want any duplicate entry if any of the INSERTS all matched the 4 columns exactly..

Does that make any sense? So…

Ok I got it... but it doesn't work like that. (see below)
[B]allworknoplay wrote on Fri, 14 September 2007 19:18[/B]
col1, col2, col3, col4 could have values 1,2,3,4 respectively and if I wanted to INSERT those again, it should reject it..

But 1,2,2,3 or 1,1,1,1 any other combo would work…

No it will not work: - if you already have 1,2,3,4 AND - if you try to insert 1,1,1,1 => mySQL won't proceed the INSERT (because the first value is not distinct in the column), and mySQL will simply dump 1,1,1,1.
[B]allworknoplay wrote on Fri, 14 September 2007 19:18[/B]

You have suggested other things to try earlier in this thread
so I want to try those first…you’ve given me plenty of troubleshooting ideas!!!

Debugging is part of programming )... good luck!

Ok here is one of my codes…

INSERT INTO data_day (device_id,timestamp,data_type,object,actual,gdata) VALUES (’".$device_id."’,’".$timestamp."’,‘disk’,’".$VLetter."’,‘1’,’ “.$PercentUsedDisk.”’)

The first 4 obviously are the ones in question, and the last 2 (actual,gdata) could be of any value, that doesn’t and shouldn’t have any affect on the INSERTS…

If you don’t really need to have 4 seperated columns, you could merge columns device_id,timestamp,data_type,object into a single UNIQUE column (let’s call it “merged”).
You will separate the values by a separator (let’s pick up the semicolon).
So your “merged” column’s data could look like this:
1175214041;1189629601;disk;C

I have no other suggestion for now (at the mySQL level) I could think of (yet).

This particular script doesn’t have any DELETES.

I have another script that loops through this table 2.5 million rows and just deletes anything with a timestamp older than a certain period like 24 hours…

So I don’t think I have any WHERE clauses in the DELETE statements because I assume I didn’t have to be specific, all I care is that if any record with a timestamp older than say 24 hours, then DELETE…

[B]jcn50 wrote on Fri, 14 September 2007 15:48[/B]
If you don't really need to have 4 seperated columns, you could merge columns device_id,timestamp,data_type,object into a single UNIQUE column (let's call it "merged"). You will separate the values by a separator (let's pick up the semicolon). So your "merged" column's data could look like this: 1175214041;1189629601;disk;C

I have no other suggestion for now (at the mySQL level) I could think of (yet).

Well I can’t merge them because I have other programs that call this table that look for a specific column and do something else with it, like the data_type and object.

That’s what makes this whole thing difficult, because everything is production and one change to one column or table could have affects on other scripts that I completely forgot uses it…

I think my first two steps at this point are to drop the index on the ID column and start using DELETE QUICK everywhere I can think of!!!

[B]allworknoplay wrote on Fri, 14 September 2007 19:45[/B]
Ok here is one of my codes...

INSERT INTO data_day (device_id,timestamp,data_type,object,actual,gdata) VALUES (’".$device_id."’,’".$timestamp."’,‘disk’,’".$VLetter."’,‘1’,’ “.$PercentUsedDisk.”’)

The first 4 obviously are the ones in question, and the last 2 (actual,gdata) could be of any value, that doesn’t and shouldn’t have any affect on the INSERTS…

I bet you won’t have a lot of completed INSERTs, because data_type is not UNIQUE on this line of the script!

How many results do you have for this query?
SELECT * FROM data_day WHERE data_type = “disk”;

[B]allworknoplay wrote on Fri, 14 September 2007 19:49[/B]
I have another script that loops through this table 2.5 million rows and just deletes anything with a timestamp older than a certain period like 24 hours...

So I don’t think I have any WHERE clauses in the DELETE statements because I assume I didn’t have to be specific, all I care is that if any record with a timestamp older than say 24 hours, then DELETE…

2.5 million rows?! Wow, DELETE QUICK is a must D… Because after each DELETE, your indexes are rebuilt! That’s a waste of resources.
I hope your mySQL server automatically locks the table “data_day”, because if you have an INSERT when the INDEX is being rebuilt, you are likely to have a bunch of crashes!

Here is one of my DELETE scripts, short and small, I guess all I have to do is add QUICK right?

$timeframe_day = strtotime("-24 hours");
$query_day = “DELETE FROM data_day WHERE timestamp <=’”.$timeframe_day."’ ";
$result_day = @mysql_query($query_day);
unset($query_day,$result_day);

So…

DELETE FROM data_day WHERE timestamp

Becomes…

DELETE QUICK FROM data_day WHERE timestamp

Also, do you think the UNSET variable is necessary?
I use it in my other program that has over 2000 lines of code. I just want to release any memory PHP is using and avoid and possible overlapping of variables…

[B]allworknoplay wrote on Fri, 14 September 2007 20:25[/B]
Also, do you think the UNSET variable is necessary? I use it in my other program that has over 2000 lines of code. I just want to release any memory PHP is using and avoid and possible overlapping of variables..
If your script does something else that this DELETE QUICK thereafter, yes it could help. Otherwise, if it's only a cron job that run 1 time each day (a "short script" as you call it) the variables will be unset anyway at the end of the script's execution. If you use $query_day and $result_day again later, it's a waste of time because you destroy the memory allocated by it, and you re-create a memory space again after. I guess UNSET is usefull if your script is running for, let's say, a week...

Not to derail the topic, but how well do you know PHP?

[B]allworknoplay wrote on Fri, 14 September 2007 20:38[/B]
Not to derail the topic, but how well do you know PHP?

Well, I use it very much, also doing PHP executable for Windows. But I didn’t upgrade myself to PHP5 yet (… Also, I never used the object-oriented programming (for me the code should be sequential).

OK, when I think about it, the crash may come from the “thread_concurrency” variable set to >1.

  • Imagine that you have one of your script is running a lot of DELETE queries;
  • At the same time, another script is running a lot of INSERT queries.

Crash can occurs when an INSERT query occurs while rebuilding the INDEXes: the server will miss some inserted keys, ending with more rows than the number of keys.
mysql> check table data_day;
±-------------------±------±---------±------------------ ------------+
| Table | Op | Msg_type | Msg_text |
±-------------------±------±---------±------------------ ------------+
| collector.data_day | check | warning | Table is marked as crashed |
| collector.data_day | check | error | Found 2434089 keys of 2434281 |
| collector.data_day | check | error | Corrupt |
±-------------------±------±---------±------------------ ------------+

This is the only logical case I could think of! :o
As we said earlier, the advantage of using DELETE QUICK is that the INDEX is not rebuilt. Also, when you run OPTIMIZE TABLE, the entire table will be locked, as per mySQL documentation:
http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html

So my #1 suggestion will be to set thread_concurrency to 1, even if you have 2 CPUs…
Also, previous mySQL version (< v5) didn’t have this thread_concurrency variable!..

I have NO IDEA what this means…

DELETE QUICK is not useful when deleted values lead to underfilled index blocks spanning a range of index values for which new inserts occur again. In this case, use of QUICK can lead to wasted space in the index that remains unreclaimed. Here is an example of such a scenario:

[B]jcn50 wrote on Fri, 14 September 2007 17:04[/B]
OK, when I think about it, the crash may come from the "thread_concurrency" variable set to >1.
  • Imagine that you have one of your script is running a lot of DELETE queries;
  • At the same time, another script is running a lot of INSERT queries.

Crash can occurs when an INSERT query occurs while rebuilding the INDEXes: the server will miss some inserted keys, ending with more rows that the number of keys.
mysql> check table data_day;
±-------------------±------±---------±------------------ ------------+
| Table | Op | Msg_type | Msg_text |
±-------------------±------±---------±------------------ ------------+
| collector.data_day | check | warning | Table is marked as crashed |
| collector.data_day | check | error | Found 2434089 keys of 2434281 |
| collector.data_day | check | error | Corrupt |
±-------------------±------±---------±------------------ ------------+

This is the only logical case I could think of! :o
As we said earlier, the advantage of using DELETE QUICK is that the INDEX is not rebuilt. Also, when you run OPTIMIZE TABLE, the entire table will be locked, as per mySQL documentation:
http://dev.mysql.com/doc/refman/5.0/en/optimize-table.html

So my #1 suggestion will be to set thread_concurrency to 1, even if you have 2 CPUs…
Also, previous mySQL version (< v5) didn’t have this thread_concurrency variable!..

That sounds like the best answer/solutions so far.

This is what I will do, change the thread_concurrency to 1.
And use DELETE QUICK in all my programs, starting with all
my garbage cleanup scripts…

Then we can see how stable this son of a gun can be!!!

I will do these first, and if the crash still happens, I will drop by INDEX on the ID column…

My SQL version is: 5.0.21

[B]allworknoplay wrote on Fri, 14 September 2007 21:06[/B]
I have NO IDEA what this means...

DELETE QUICK is not useful when deleted values lead to underfilled index blocks spanning a range of index values for which new inserts occur again. In this case, use of QUICK can lead to wasted space in the index that remains unreclaimed. Here is an example of such a scenario:

I guess it have some relation with “contiguous space use”.
It’s the same as “disk optimization”.

INDEX:
0: data
1: data
2: data

DELETE QUICK 1;

[B]NEW INDEX:
0: data

2: data[/B]

=> result: no more “1: data” in the INDEX, INDEX not optimized, INDEX having a “hole” at line #2.

Ok, don’t forget to change the thread_concurrency to 1 too.