Is It max performance on one server or did I missed something ?

Hello experts,

I developped sometimes ago an indexer + search engine with mysql.
Now the database weight is 15Go and the query on it became sometimes very long.
For today on 75 request 24 are longer than 2 seconds :

SELECT * FROM vtpLog where timeResponse > 2 order by date desc; Time ResultCountvétérinaire 126.35 54482perte d’identité 8.76 948%de femmes vétérinaires 9.04 73pancreatite rat 3.23 23dépilation chinchilla 3.75 1trichophyton 3.2 319tumeur de la prostate 8.63 96pathologies du dindon 2.26 15Hypothyroidie canine 3.62 274gestation de la chienne 18.49 363vermifugation chienne 4.53 26vermifugation chienne gestante 6.33 9hypothyroidie 5.07 1032acupuncture 18.33 11association antibiotiques 26.63 1055pyothorax chat 5.68 45

When mysql takes long time to respond the servers state seems ok :
cpu : 20 - 40 %
mem : 1.7Go used, 2Go Page cache, 300mo free
hdd : reading only

I documented myself, tuned-up mysql, re-check indexes for month but no real progress confused: so I’d like an mysql expert advice to tell me what I’ve done stupid )

I don’t plan (for now) to recode all to use “full text”.

So here are the informations :

Server :
This server is dedicated for this search engine; there are only apache + mysql with maximum memory allocated on mysql.

[B]Quote:[/B]
CPU : Core(TM)2 E8400 @3.00GHz GenuineIntel MEM : 4 Go DDR2 HDD : 2x 750 Go - SATA2 (RAID 1)

Script :

[B]Quote:[/B]
/*================================================= ========= ====*/ create table vtpCategory ( Id int not null auto_increment, Parent_Id int, Value varchar(254), primary key (Id) ) type = MyISAM;

/================================================= ========= ====/
create table vtpDomain
(
Id int not null auto_increment,
vtpCategory_Id int not null,
IsOff bool default 0 ,
LastFarmSuccess bool default 0 ,
Rank int default 50,
StartUri varchar(254) not null ,
Title varchar(254) ,
Language varchar(254) ,
LastVisitDate datetime ,
primary key (Id),
key AK_StartUriUnique (StartUri)
)
type = MyISAM;

/================================================= ========= ====/
create table vtpHtmlSource
(
Id int not null auto_increment,
Uri_Id int,
Value mediumtext,
primary key (Id)
)
type = MyISAM;

/================================================= ========= ====/
create table vtpUri
(
Id int not null auto_increment,
Domain_Id int not null,
Value varchar(2040) not null,
ValueHash int not null,
IsDead bool default 0 ,
Content mediumtext ,
ContentHash int,
Title varchar(254) ,
Description varchar(254) ,
UpdateDate datetime ,
Rank int default 0 ,
primary key (Id),
key AK_ValueHash (ValueHash)
)
type = MyISAM;

/================================================= ========= ====/
create table vtpUriOutsideDomain
(
Id int not null auto_increment,
Domain_Id int,
Value varchar(254),
ValueHash int,
FromUriValueHash int ,
FromUriValue varchar(254) ,
primary key (Id)
)
type = MyISAM;

/================================================= ========= ====/
create table vtpWord
(
Id int not null auto_increment,
Value varbinary (254) not null ,
primary key (Id),
key AK_Value (Value)
)
type = MyISAM;

/================================================= ========= ====/
create table vtpWordScore
(
Uri_Id int not null,
Word_Id int not null,
Score int ,
primary key (Uri_Id, Word_Id)
)
type = MyISAM;

alter table vtpCategory add constraint FK_CategoryCategory foreign key (Parent_Id)
references vtpCategory (Id) on delete restrict on update restrict;

alter table vtpDomain add constraint FK_DomainCategory foreign key (vtpCategory_Id)
references vtpCategory (Id) on delete restrict on update restrict;

alter table vtpHtmlSource add constraint FK_HtmlSourceUri foreign key (Uri_Id)
references vtpUri (Id) on delete restrict on update restrict;

alter table vtpUri add constraint FK_UriDomain foreign key (Domain_Id)
references vtpDomain (Id) on delete restrict on update restrict;

alter table vtpUriOutsideDomain add constraint FK_OutsideDomainDomain foreign key (Domain_Id)
references vtpDomain (Id) on delete restrict on update restrict;

alter table vtpWordScore add constraint FK_Uri foreign key (Uri_Id)
references vtpUri (Id) on delete restrict on update restrict;

alter table vtpWordScore add constraint FK_Word foreign key (Word_Id)
references vtpWord (Id) on delete restrict on update restrict;

Query in two times :
1- get WordIds

[B]Quote:[/B]

SELECT Id FROM vtpWord WHERE (CAST(vtpWord.Value AS CHAR CHARACTER SET utf8) = ‘vétérinaire’);

Returns 41 here; This one is quite fast (0.1610 sec without query cache)

2- sort by scores

[B]Quote:[/B]
SET SESSION SQL_BIG_SELECTS = 1; SELECT SQL_CALC_FOUND_ROWS vtpUri.Id as UriId, SUM( ws0.Score) * (vtpDomain.Rank + vtpUri.Rank) as uriScore, vtpUri.Domain_Id FROM vtpUri INNER JOIN vtpWordScore AS ws0 ON (ws0.Uri_Id = vtpUri.Id) INNER JOIN vtpDomain ON (vtpDomain.Id = vtpUri.Domain_Id) where ws0.Word_Id = 41 AND vtpDomain.vtpCategory_Id != 2 AND vtpDomain.IsOff != 1 GROUP BY ws0.Uri_Id ORDER BY uriScore desc LIMIT 0,400

Mysql States :

[B]Quote:[/B]
MySQL work since 35 Days 5 hours 16 minutes 15 seconds. Slow_queries 4 304 Handler_read_rnd 1 670 k Handler_read_rnd_next 19 G Created_tmp_disk_tables 1 765 Opened_tables 157 Table_locks_waited 829 k

my.cnf :

[B]Quote:[/B]
skip-locking max_connections = 40 key_buffer_size = 2500M # cf mysqltuner.pl max_allowed_packet = 500M table_cache = 200 # table_open_cache empeche de demarrer open_files_limit = 600 # il y avais 4200 # should typically be set to at least 2x-3x of table_cache sort_buffer_size = 4M read_buffer_size = 4M read_rnd_buffer_size = 4M myisam_sort_buffer_size = 32M thread_cache_size = 8 query_cache_size = 16M # == query_cache_limit query_cache_limit = 16M join_buffer_size = 3M low_priority_updates = 1

tmp_table_size = 3400M # doit == max_heap_table_size
max_heap_table_size = 3400M

#net_buffer_length = 8K
#skip-external-locking
#log-slow-queries = /var/log/mysql/slow_queries.log

Try number of CPU’s*2 for thread_concurrency

thread_concurrency = 8

I hope you can show me the light :wink:

SELECT Id FROM vtpWord WHERE (CAST(vtpWord.Value AS CHAR CHARACTER SET utf8) = ‘vétérinaire’);

Rewrite this query to the form WHERE vtpWord.Value = blahblah(‘veterinaire’) and it will be fast.

Explain output for query 2?

If you install a real search engine, you will be much better off.

hello gmouse, i ordered answers by difficulty :

1°/ 1st query : ok cast moved, thanks ^^

3°/

Quote:

If you install a real search engine, you will be much better off.

Can you give me advice on which one to choose, and possibly a list of names?

2°/

Quote:

Explain output for query 2?

The query 2: returns :

UriId uriScore Domain_Id175247 3600 46178075 3600 4690506 3600 46

It returns an ordered list of the 400 1st UriId according uriScore.
vtpWordScore is a sum representing the number of occurrences of a word for an uri, it is calculated and set by the BDD indexer.
uriScore is the sum of vtpWordScore each search word multiplied by Rank.
The Rank score serve to up/degrade certain uriScore according to some criteria (if it is an index or a technical paper … etc.)

)

Explain has a different meaning in MySQL-world ) [URL]http://dev.mysql.com/doc/refman/5.1/en/explain.html[/URL]

Try Sphinx or DBSight as search engine.

Oups, shame on me, I forgot this one :o

Do you ever query vtpWordScore on uri_id? If so, you could probably gain a lot by switching the order of the PK fields (and removing the 2nd index). Otherwise, check if adding uri_id to the second index as a postfix makes a big difference.
And consider creating covering indices for the joined tables.

I don’t very well understood what you mean by “Do you ever query it vtpWordScore uri_id?”, I simply use query 2 to get all results id that’s all.
Anyway I have duplicated database and changed indexes according to your Recommendation:

Explain have less rows on the first line but unfortunately there is no significant time gain, query 2 still take 30 to 60 seconds to execute.

and covering indices for the joined tables?

I don’t think it will help because of this “degraded query 2” takes the same time to execute and it works only on vtpWordScore :

So the problem comes from vtpWordScore… or maybe 86 187 809 rows is too much on this system … but in this hypothesis why the system is not workloaded ?

here the system loads (I launched the 53sec query at 19h43) :

I’m suspicious about your disk IO.

Can you check the I/O wait when you are running this query by running “vmstat 2 10” and/or “top -b -n 10” and attach the text output from these to this thread?

Because your chart of Bytes/s doesn’t really tell us so much if:

  1. There are a lot of random very small reads = bytes/sec is very small but IO wait on disk is very high.
  2. The chart shows an average value and only samples ever so often then the average value of a few samples can give a very distorted impression.

Now for some of your server variables.
As I understood it you have 4GB of RAM, right?
And on the same server you are running both MySQL and Apache.

Then some of these values sound very high:

key_buffer_size = 2500M # cf mysqltuner.pl### This sounds very high, usually set to about 25% of available RAM since MyISAM tables relies on the file cache, unless you know that basically all your queries can fetch all data from the indexes and doesn’t have to read from the table.###max_allowed_packet = 500M ### Also sounds insanely high, any reason? ###sort_buffer_size = 4M### Sounds pretty low, I recommend about 16MB, but this shouldn’t be such a big deal breaker ###tmp_table_size = 3400M # doit == max_heap_table_sizemax_heap_table_size = 3400M### These two are also insanely high, with only 4GB in the machine and even though you are alone on it the key_buffer_size setting combined with this would probably put you over the top.###

hi sterin71 ),

Here are results of asked commands launched when server idle and during query, so you can compare results if you want :

sterin71: For some of your server variables :
-Yes 4GB Ram on server.
-Yes Apache + Mysql + named (forgot this one, it’s a pre-installed machine with some services but I dont uses them)

sterin71: Then some of these values sound very high:
Yes I know but for now the goal is just performance not stability, so I tried very big values to see if … ^^ to keep little stability, I lowed max_connections to 40.

key_buffer_size 2.5G because I tried to fit all indexes in RAM.
max_allowed_packet = 500M by convenience, just to raise max upload size scripts in phpMyAdmin, and “max_…” don’t impact on performances, I’m Wrong ?
sort_buffer_size = 4M I just remember It’s a memory eater so I put it to 4M to not to be too far over memory limit for 40 connexions.
tmp_table_size = 3400M + max_heap_table_size = 3400M Same as max_allowed_packet there are “max_…” values so it can crash but will not drop performances.

Anyway I’m ok to try all values you’ll give me to test )

I changed some “values” on Id and FK, examples :
Id INT(11) => Id MEDIUMINT(6)
Reduced sizes and some perf gain.

@Stein71 : I changed these :
key_buffer_size = 1000M
tmp_table_size = 1000M
max_heap_table_size = 1000M
sort_buffer_size = 16M then commented it

but no noticeable changes.
And something quite strange Virt memory seems in use even if there ram free.

top -b | grep mysql13696 mysql 40 0 1058m 198m 4556 S 0 5.0 0:02.39 mysqld13696 mysql 40 0 1066m 198m 4556 S 8 5.0 0:02.62 mysqld13696 mysql 40 0 1067m 199m 4556 S 3 5.0 0:02.72 mysqld

Is there a way to keep Virt to 0 ?

Another strange thing is there it seems that only one cpu is used.

Why mysql wouldn’t use both processors ?
I have this in my.cnf

Try number of CPU’s*2 for thread_concurrencythread_concurrency = 8

I’m trying to uping mysql’s processes priority.

Ps : excuse my poor english, I can read it quite easely but it’s hard for me to make phrases… I’m probably not well configured too :roll:

OK, as I suspected you are getting a huge IO-wait during the execution of the query. The last column is the amount of time the cpu is spending waiting for the disk and at 48-49% it is basically not doing anything else except waiting:

procs -----------memory---------- —swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 1 54696 731996 40928 2396816 0 0 10510 64 991 2195 0 2 49 49 0 1 54696 723076 40932 2405640 0 0 4436 0 660 1205 0 1 50 49 0 1 54696 715132 40932 2413724 0 0 3998 0 621 1061 0 2 50 48

The reason why you don’t have 99% is that you have a dual core server (I’m assuming) and in MySQL one query can only utilize one core.

The problem here is that you have a very large index and table size since it looks like vtpWordScore is 3.5GB and you are joining with a table that is 2GB of data, etc.

So the steps for the server is to:

  1. Find the correct page in the index with Word_Id = 41
  2. For each of the records found in 1 fetch the corresponding data from the next table that is also very big.
  3. Then join this with the records from the two tables, etc.

So what can we do about it?
1.
Have you run “OPTIMIZE TABLE vtpWordScore;”?
I don’t know how long time it will take on your machine (but it might be a long time), but it would probably be worth it due to the size of your table compared to the RAM.
By running the optimize table you are resorting the entire index and finding the right records should be much faster.

  1. Also try rewriting your query this way (not tested):

SELECT SQL_CALC_FOUND_ROWSvtpUri.Id as UriId,SUM( ws0.Score) * (vtpDomain.Rank + vtpUri.Rank) as uriScore,vtpUri.Domain_IdFROM vtpUriINNER JOIN ( SELECT Uri_Id ,score FROM vtpWordScore WHERE ws0.Word_Id = 41 ) AS ws0 ON ws0.Uri_Id = vtpUri.Id INNER JOIN vtpDomain ON vtpDomain.Id = vtpUri.Domain_Id AND vtpDomain.vtpCategory_Id != 2 AND vtpDomain.IsOff != 1WHEREGROUP BY ws0.Uri_IdORDER BY uriScore descLIMIT 0,400

My theory is that since you don’t seem to select too many rows from the vtpWordScore table an execution that involves first getting all records from one table creating a derived table and then performing a search on the next table would be better.
Please test it and post the result and the EXPLAIN output here.

Long term solution buy more RAM!!!
If you are on a 32bit system then it’s unfortunate since you can’t utilize any more than 4GB, but with such big database and I’m guessing performance constraints you should at least consider for the future moving it to 64Bit and more RAM.

Stein71 : Have you run “OPTIMIZE TABLE vtpWordScore;”?
No I thought it was useless because I created a new database “vs2” reimported struct+datas from script; anyway I launched the command but times are same as before.

2. Also try rewriting your query this way
Changed few things to make it run :

  • removed 2nd “where”
  • replaced ws0.Word_Id =41 by vtpWordScore.Word_Id =41
    Sadly it still takes time to execute (20 sec) when query is not in query_cache.

I found some things witch may help :
1- The use of SQL_NO_CACHE seems so save some seconds (hard to be sure because of query_cache keeps me from doing same test two times with same execution time)

2- SHOW PROFILE source FOR QUERY 1; gives us :

| Copying to tmp table | 21.697665 | exec | sql_select.cc | 1999 |

All the time is there so :

  • why does mysql do that ?
  • Is there a way to keep it from doing this copy or if not :
  • Is there a way to force him to copy to a “memory” tmp table ?

Long term solution buy more RAM!!!

no, the real solution is switching to a real search engine.

Here I am :

I revisited the query to avoid the “copy to tmp table”.
I throwed away the sum() and so the group by and no more “copy to tmp table” this time with one searched word and performance gain seems great.

SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS vtpUri.Id as UriId, (ws0.Score) as uriScoreFROM vtpUriINNER JOIN vtpWordScore AS ws0 ON (ws0.Uri_Id = vtpUri.Id)WHERE ws0.Word_Id = 136ORDER BY uriScore desc LIMIT 0,400

Explain

id select_type table type possible_keys key key_len ref rows Extra1 SIMPLE ws0 ref PRIMARY PRIMARY 3 const 24746 Using where; Using filesort1 SIMPLE vtpUri eq_ref PRIMARY PRIMARY 3 vetSpider6.ws0.Uri_Id 1 Using index

Thought I reached the goal BUT …

When I search more than one word explain says “Using temporary” again.

SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS vtpUri.Id as UriId, (ws0.Score + ws1.Score) as uriScoreFROM vtpUriINNER JOIN vtpWordScore AS ws0 ON (ws0.Uri_Id = vtpUri.Id)INNER JOIN vtpWordScore AS ws1 ON (ws1.Uri_Id = vtpUri.Id) WHERE ws0.Word_Id = 136 AND ws1.Word_Id = 5110ORDER BY uriScore desc LIMIT 0,400

Explain

id select_type table type possible_keys key key_len ref rows Extra1 SIMPLE ws1 ref PRIMARY PRIMARY 3 const 627 Using temporary; Using filesort1 SIMPLE ws0 eq_ref PRIMARY PRIMARY 6 const,vetSpider6.ws1.Uri_Id 1 1 SIMPLE vtpUri eq_ref PRIMARY PRIMARY 3 vetSpider6.ws1.Uri_Id 1 Using index

I don’t understand why MySql uses temporary on such simple jointure.
Do you know ?

In reading back through this thread, it is clear that you are getting a variety of different advice, which is all trial-and-error kind of stuff. This in my experience can waste months. I would encourage you to take a more scientific approach, such as Method R or Goal-Driven Performance Optimization (see [URL]http://www.percona.com/about-us/white-papers[/URL])

Theses advices have given already some performance gains,
but they mostly gave me clues on where to look and a much better understanding of the possible causes of the problem.
I am very grateful for that )

I’m not even sure on this material without full-text like search engine it is possible to do better.

Anyway, thank you for your link I will read it carefully.

Yes optimize table was not needed, I didn’t think of that you probably had recreated the entire table.

If we break down the query a bit.
How long does it take to run only this query:

SELECT Uri_Id ,score FROM vtpWordScore ws0 WHERE ws0.Word_Id = 41

If that query is slow, try it with this index:

ALTER TABLE vtpWordScore ADD INDEX ws_ix_wordid_uriid_score(word_id, uri_id, score);

The advantage with this index is that your DBMS don’t have to read any data from the table since all data is available in the index and hereby saving search times.
The drawback with it is that the amount of data for that table on your harddrive will almost double in size and inserts/updates will be slower.
But since your problem is disk seeks I think it might be worth it.

[B]Quote:[/B]

I don’t understand why MySql uses temporary on such simple jointure.
Do you know ?

I would say that the reason why MySQL uses a temporary table is because you are sorting on a calculated value retrieved from two tables. MySQL doesn't know this value until the join has been performed so it needs to store the intermediate result of the join in a temporary table before it can sort it.

Sterin71 : do know that you now allow more than one …
Ooops , I missed that ) so I will add contraint or replace index + primary at the end.

Sterin71 : No don’t move them to another table…
Oh yes you right, I should see it.
I added index but,there still “f#%?! copying to temp table ^^” it seems because of “… * (vtpDomain.Rank + vtpUri.Rank)”, without those EXPLAIN no more says Using Temporary.

SELECT SQL_NO_CACHE HIGH_PRIORITY SQL_NO_CACHE SQL_CALC_FOUND_ROWS vtpUri.Id as UriId, (ws0.Score) * (vtpDomain.Rank + vtpUri.Rank) as uriScore, vtpUri.Domain_Id FROM vtpUri INNER JOIN vtpWordScore AS ws0 ON (ws0.Uri_Id = vtpUri.Id) INNER JOIN vtpDomain ON (vtpDomain.Id = vtpUri.Domain_Id) where ws0.Word_Id = 4598 AND vtpDomain.vtpCategory_Id != 2 AND vtpDomain.IsOff != 1 ORDER BY uriScore desc LIMIT 0,400

So I will try to place the result of this operation vtpWordScore.Score * (vtpDomain.VetupRank + vtpUri.Rank) directly in vtpWordScore.Score to remove them from the request.
The application will be less configurable but, I think and hope everything will be fine in term of performance.