Hey guys, I have tried searching for difference between varchar(255) and tinytext in terms of performance but I haven’t found a straight answer to this. Is it because there is no significant difference in performance? I know the difference in terms of spaces being trimmed and case sensitivity.
I have a table with about 200+ columns. Most of the small text types are tinytext rather than varchar. Is there a good reason to be using one type over the other in terms for performance? Would the order by be faster with varchar than tinytext when the table has a big row length? I read somewhere that varchar is stored right along with the row data on disk while tinytext is stored else where on disk and such(I’m not sure about that).
Thanks for the response. I have been messing around this a bit more. Have a column called “region” and it was tinytext. Also created an index for it region(100). But I couldn’t get mysql to use the index for a simple order by query (even with force index):-
select * from table order by region
The explain for it always used filesort. Am i missing something here? Is there a way you can force mysql to use the index that I created for tinytext type?
So then I duplicated that column and made it a varchar(100) instead and created another index region_varchar(100) on that new column. and issued the query again:-
select * from table order by region_varchar
Again it did not use the index and resorted to filesort. But it did work when i specified force index. Then I modified the index to use only first 10 characters region_varchar(10). Now again I couldn’t get mysql to use the index even with force index.
The only other way I could force it to use the index was to put in a where clause on that column.
So I’m missing something in the way mysql uses indexes and am not sure what that is…I’m kind of new to this stuff so if anyone could point out what I’m missing, its much appreciated.
Sorry the name “region” is bit misleading but thats the way it was when I got it. It has a bunch of codes and stuff embedded in it but it has a high cardinality (show index shows something 140K cardinality for first 10 char index out of a total of 150K rows in the table).
This doesn’t really make sense to me then, if there is high cardinality then it would be very efficient to resort to an index lookup over that of a full table scan.
Posting the table schema, a sample result and a sample query would help give a better responce.
First of all thanks for looking into this.
Alright, to keep things simple here’s the sample table I’m using to test :-
CREATE TABLE foo ( id int(11) unsigned NOT NULL auto_increment, name tinytext NOT NULL, name_varchar varchar(100) NOT NULL default ‘’, PRIMARY KEY (id), UNIQUE KEY name_index (name(100)), KEY name_varchar_index (name_varchar(7))) ENGINE=MyISAM DEFAULT CHARSET=latin1 ROW_FORMAT=FIXED;
name and name_varchar contains the same data which are 7 digit numbers(in the form of strings).
mysql> select * from foo limit 5;±—±--------±-------------+| id | name | name_varchar |±—±--------±-------------+| 1 | 1550620 | 1550620 || 2 | 1554011 | 1554011 || 3 | 1554864 | 1554864 || 4 | 1571814 | 1571814 || 5 | 1582609 | 1582609 |±—±--------±-------------+5 rows in set (0.00 sec)mysql> select count() from foo;±---------+| count() |±---------+| 131951 |±---------+1 row in set (0.00 sec)mysql> show index from foo;±------±-----------±-------------------±-------------±-------------±----------±------------±---------±-------±-----±-----------±--------+| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |±------±-----------±-------------------±-------------±-------------±----------±------------±---------±-------±-----±-----------±--------+| foo | 0 | PRIMARY | 1 | id | A | 131951 | NULL | NULL | | BTREE | || foo | 0 | name_index | 1 | name | A | 131951 | 100 | NULL | | BTREE | || foo | 1 | name_varchar_index | 1 | name_varchar | A | 131951 | 7 | NULL | | BTREE | |±------±-----------±-------------------±-------------±-------------±----------±------------±---------±-------±-----±-----------±--------+3 rows in set (0.00 sec)mysql> explain select * from foo order by name;±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 131951 | Using filesort |±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+1 row in set (0.00 sec)mysql> explain select * from foo order by name_varchar;±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 131951 | Using filesort |±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+1 row in set (0.00 sec)mysql> explain select * from foo force index(name_varchar_index) order by name_varchar;±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 131951 | Using filesort |±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+1 row in set (0.00 sec)
I have no idea why its using filesort.
Also like I mentioned earlier the only way i got it to use the index is by using a where clause
mysql> explain select * from foo force index(name_varchar_index) where name_varchar=‘blah’ order by name_varchar;±—±------------±------±-----±-------------------±-------------------±--------±------±-----±------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------±-----±-------------------±-------------------±--------±------±-----±------------+| 1 | SIMPLE | foo | ref | name_varchar_index | name_varchar_index | 7 | const | 1 | Using where |±—±------------±------±-----±-------------------±-------------------±--------±------±-----±------------+1 row in set (0.00 sec)
yea I know there are differences between the numeric/text types but I just used numbers just for that test table. I will have various types varchar/int/text etc in the actual table but for this test case I’m just using this so I can figure out why mysql isn’t using the indexes for order by.
Also I just did this:-
mysql> explain select * from foo order by id ;±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 131951 | Using filesort |±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+1 row in set (0.00 sec)
Now id is an auto-increment primary key so should be indexed and its still using filesort? This is suggesting to me that mysql always uses filesort for order by. Can you maybe verify this on some table on your machine? If it is the case that mysql always uses filesort for order by then thats probably what I missed. But that would be quite odd…
Update: Ignore the above post about the id column. I created another column of type int and created an index on that and it worked ok.
mysql> explain select * from foo force index(name_int_index) order by name_int;±—±------------±------±------±--------------±---------------±--------±-----±-------±------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------±------±--------------±---------------±--------±-----±-------±------+| 1 | SIMPLE | foo | index | NULL | name_int_index | 5 | NULL | 131951 | |±—±------------±------±------±--------------±---------------±--------±-----±-------±------+1 row in set (0.00 sec)
So…it would seem the indexes aren’t being used for simple order by when dealing with columns of type VARCHAR and TEXT. I still dont understand why it doesn’t use it…since text type cols are quite common.
Choosing all rows since you don’t have a WHERE clause.
Compare these:
mysql> explain select * from foo order by id;±—±------------±------±-----±--------------±-----±--------±-----±-----±---------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------±-----±--------------±-----±--------±-----±-----±---------------+| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 2000 | Using filesort |±—±------------±------±-----±--------------±-----±--------±-----±-----±---------------+1 row in set (0.00 sec)mysql> explain select id from foo order by id;±—±------------±------±------±--------------±--------±--------±-----±-----±------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------±------±--------------±--------±--------±-----±-----±------------+| 1 | SIMPLE | foo | index | NULL | PRIMARY | 4 | NULL | 2000 | Using index |±—±------------±------±------±--------------±--------±--------±-----±-----±------------+1 row in set (0.00 sec)
Notice that the second query with only id column uses index while the first don’t.
And these:
mysql> explain select * from foo order by id;±—±------------±------±-----±--------------±-----±--------±-----±-----±---------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------±-----±--------------±-----±--------±-----±-----±---------------+| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 2000 | Using filesort |±—±------------±------±-----±--------------±-----±--------±-----±-----±---------------+1 row in set (0.00 sec)mysql> explain select * from foo where id < 100 order by id;±—±------------±------±------±--------------±--------±--------±-----±-----±------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------±------±--------------±--------±--------±-----±-----±------------+| 1 | SIMPLE | foo | range | PRIMARY | PRIMARY | 4 | NULL | 98 | Using where |±—±------------±------±------±--------------±--------±--------±-----±-----±------------+1 row in set (0.00 sec)
Notice how when we limit the amount of rows returned that it starts to use a index even though I still have SELECT *.
The reason for this is that using an index is actually expensive in it’s own way.
If you are scanning an index and all columns that you want to return is part of the index it means that it can return the data right away and you have no extra cost (performance wise).
But if you are selecting columns that are not part of the index that is used. It means that the DBMS has to read the index and then jump to the correct row in the table to return the data from the missing columns.
So each row read = one jump to the table.
And the big problem with this is that random reads against disk devices is very expensive compared to sequential reads (the whole table in one go).
So somewhere there is a trade off where there is more expensive to use an index than it is to read in the entire table, sort it in RAM memory and return it.
It’s hard to really say where this limit is depending on your DB size and hardware. But as a rule of thumb I usually say that when you return more than somewhere around 5-10% of the rows you should start to think about it.
So that is why you get strange results with your explain.
BTW filesort doesn’t necessarily mean that it is written to disk. That depends on the size of your sort_buffer and the size of the result set returned by the query.
Only when the sort_buffer is full it will start to write temporary files on disk.
That is actually the first thing I tried (specifying the reqd cols only) after I started seeing this issue but:-
mysql> explain select name_varchar FROM foo force index (name_varchar_index) order by name_varchar;±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+| 1 | SIMPLE | foo | ALL | NULL | NULL | NULL | NULL | 131951 | Using filesort |±—±------------±------±-----±--------------±-----±--------±-----±-------±---------------+1 row in set (0.00 sec)
It doesn’t seem to matter if I specify the columns or use *, I get the same result for explain and its using filesort. However it does seem to work for INT columns which is what “id” is. As before mysql seems to use the index for int cols and not varchar and text.
[B]Quote:[/B]
BTW filesort doesn’t necessarily mean that it is written to disk. That depends on the size of your sort_buffer and the size of the result set returned by the query.
Only when the sort_buffer is full it will start to write temporary files on disk.
[B]Quote:[/B]
From the ini:-
This buffer is allocated when MySQL needs to rebuild the index in
REPAIR, OPTIMZE, ALTER table statements as well as in LOAD DATA INFILE
into an empty table. It is allocated per thread so be careful with
large settings.
sort_buffer_size=64M
From the mysql manual:-
If you want to increase ORDER BY speed, check whether you can get MySQL to use indexes rather than an extra sorting phase. If this is not possible, you can try the following strategies:
Increase the size of the sort_buffer_size variable.
Increase the size of the read_rnd_buffer_size variable.
Change tmpdir to point to a dedicated filesystem with large amounts of empty space. This option accepts several paths that are used in round-robin fashion. Paths should be separated by colon characters (‘:’) on Unix and semicolon characters (‘;’) on Windows, NetWare, and OS/2. You can use this feature to spread the load across several directories. Note: The paths should be for directories in filesystems that are located on different physical disks, not different partitions on the same disk.
From what I understood, that variable is only used if mysql can’t use the index that I created and thats whats puzzling me as to why it isn’t using the index for varchar and text cols. And if my table is really big, it cant quite fit all the indexes in memory and it will most likely start writing to disk and in any case the index rebuilding will take time until its cached and hence the first query will always be slow.
If you have more suggestions on why this is happening please post away. I’m all ears at this point to understand how this thing works.
The reason why it doesn’t use the index for your text/varchar columns is because you have defined the index to contain only the first 100 (for tinytext) and first 7 characters (for varchar).
This means that mysql will have to read each row of data from the table anyway to get the complete string.
And since reading randomly is a very slow operation it chooses to go with reading all data and sort it instead.
ok I gathered that it does use index if I use the whole length of column for creating the index (post #3 of this thread). So what you are saying then is that there is no way to force mysql to use the index(created using leading x chars) when doing a simple order by.
ok I gathered that it does use index if I use the whole length of column for creating the index (post #3 of this thread).
yes
[B]myrddin wrote on Tue, 12 June 2007 17:01[/B]
So what you are saying then is that there is no way to force mysql to use the index(created using leading x chars) when doing a simple order by.
The thing is that the index does not contain enough data to perform a proper sort on since it only contains truncated strings.
Consider this:
Your column is VARCHAR(10) index is (4).
Data i table:
[B]Quote:[/B]
1234A
1234Z
How will it be able to use the index to sort that data properly if the index only contains the first 4 characters?
I’m curious why you want it to use the index for a “select * from foo order by”?
Because since you are selecting all rows it will impose a lot of random reads of the table and they are very expensive.
So I don’t think that you will gain anything.
But back to your original question you should avoid text/blob columns unless you need them.
Other cases use VARCHAR for strings.
The big reason here is what post #2 said about temporary tables since text/blob is intended to be large objects.
What you said about TEXT/BLOB being handled differently internally in the database is true. Blob objects are allocated in a different way which means that you will get more seeks.
So the recommendation is that you avoid BLOB/TEXT columns unless you really need them.
How will it be able to use the index to sort that data properly if the index only contains the first 4 characters?
I figured if you issue a force index it will use the index to sort granted in cases like you mentioned those parts of the sort will be undefined(i.e. it will do a sort based on those 4 chars only, after 4 chars is undefined). But apparently thats not how it works.
[B]Quote:[/B]
I’m curious why you want it to use the index for a “select * from foo order by”?
Because since you are selecting all rows it will impose a lot of random reads of the table and they are very expensive.
So I don’t think that you will gain anything.
My actual table has about 200+ cols and has about 200K rows right now and will be increasing. I will be selecting approximately 15 cols(of both numeric and text types) from that and will be doing an order by and limit(x,100). Are you saying that using the index in such a case will be slower than just sticking with filesort?
[B]Quote:[/B]
So the recommendation is that you avoid BLOB/TEXT columns unless you really need them.
I know I got sidetracked there and thanks for getting back on topic. I will be following that recommendation.
I appreciate all the help/suggestions everyone. Thanks!