CHAR vs. INT indexes

Here’s a little bit of an academic question:

Let’s assume I have an indexed 32-bit integer column. I want to change this column to be a CHAR(7) containing an alpha numeric encoding of the original integer value. Other than the obvious nearly 100% increase in index size, are there any other, less obvious draw backs to using a CHAR column over an INT? I’m only interested in using the index for lookups, so I will never perform a range scan on the index.

Please remember this is an academic question. I’m not looking for responses like “why would you do that, its stupid, just use an INT and do your conversion in application code”. I’m looking for something more like “use CHAR(8) to have proper byte alignment” or “when using a GROUP BY on a CHAR column, the MySQL 5.0 query optimizer will ignore your index every other Friday due to some fictitious bug”.

If you’re using it as a lookup table, and really just pecking single rows from it, it may be that its not going to make an enormous difference to performance.

But since its an academic question…

Personally I would always avoid it where possible, but Im fairly experienced and so just try to do everything ‘the right way’ (as in, keep rows as compact as possible).

You say you wont do range scans, what about joins?

Does the table have many other fields which might be queried (or rather, does it have other indices?

I believe the PK of an innodb table results in a clustered index. Non clustered indices, will store the clustered index key as a lookup into the non clustered index.
This means that whilst on the face of it, your row key is only 3 bytes more than it would be, it will also increase the size of all indices too (not just the PK one).
This is bad in general, because bigger indices mean less cache efficiency and more disk io (slow).

I know this is exactly how it works in sql server. I think innodb too, but Im not 100% sure maybe someone else will confirm.

Other than that, I wouldnt expect any enormous problems. A char(7) is fixed width, and so mysql can still predict the row size (assuming all other fields are fixed width too). Id be much more concerned if it was varchar or char(50) etc

You’re absolutely right about InnoDB primary keys being clustered, so the increased size will effect all indexes, though in my case there is at most one other index on any of my tables. They’re also always implemented as B-Trees, so range scanning is no problem.

BETWEEN may fail, eg

mysql> SELECT 55 BETWEEN 1 AND 9;±-------------------+| 55 BETWEEN 1 AND 9 |±-------------------+| 0 |±-------------------+1 row in set (0.00 sec)mysql> SELECT ‘55’ BETWEEN ‘1’ AND ‘9’;±-------------------------+| ‘55’ BETWEEN ‘1’ AND ‘9’ |±-------------------------+| 1 |±-------------------------+1 row in set (0.00 sec)