Large varchar unique keys?

balmark · September 2, 2010, 4:24am

This is sort of on the edge of performance, as in, the way I’d implemented it involves a trigger on a table which might not be necessary.

I’ve a requirement to have a table with potentially very long strings, at the moment I’ve the column set to 1024 length which might still be too small.

The problem is that this long string has to be unique.

The MySQL key size is limited to 1000bytes or so, and with utf8, that means you’re limited to 333 for the key… so if I have a String in the database length 500 eg.
aaaaaaa…aaaa (length 500)

and I add

aaaaaaa…aaab (length 500)

this will fail with a duplicate key since it’ll only ever check the first 333 characters for the unique key

I fixed this by just having a normal index on this column for searches, but making a trigger to do a md5(upper(‘newvalue’)) on newly added strings and setting this value into a column called ‘myhash’ and putting a unique key on ‘myhash’

In the real world, I never should come across duplicates (the table should really only get to about 1million rows max) … but on the offchance, should I make another column ‘myhash2’ and use another encryption method and put a unique constraint on myhash+myhash2?

Changes of getting a duplicate MD5 hash is minute, but its the ‘birthday paradox’, the more you have the better chance of a duplicate

balmark · September 2, 2010, 4:42am

Looking around again, this wiki is better than the last with some values of probability ) Birthday_attack

It shows I’d need
26,000,000,000,000 hashs to have 10^-12 chance of two hashes being the same

unless I’m reading it wrong

I can live with that )

carpii · September 13, 2010, 10:14pm

I guess it really depends on what the MD5 hash is being used for, and how critical it is.

Whats the worst that could happen if there was an MD5 collision? Its extremely unlikely for arbitrary data, but if you rely on it being 100% unique, sods law dictates that it might just bite you one day

If its absolutely critical, then sure, I guess you could also apply a 2nd hash, but to save on performance only check that in the unlikely event the first hash collides.
Personally I use MD5 for hashes but Ive never found a reason to use them for keys

On the other hand, you can over-engineer stuff too, so if its not critical (ie security, financial) then in most cases it would suffice to just rely on it being unique, but at least have some monitoring in place for when there is a collision, so it will notify you that an insert failed.

making a trigger to do a md5(upper(‘newvalue’))

I’m assuming you are using a case-insensitive collation. Otherwise you probably do not want to UPPER the plaintext

Topic		Replies	Views
Long Primary Keys in MySQL Other MySQL® Questions	2	709	February 20, 2008
Percona MySQL 5.6 databases ignoring key length at constrait creation Other MySQL® Questions	0	786	April 13, 2015
Large database (billions of rows) with minimal writes, best setup for fast searches? Other MySQL® Questions	2	575	April 30, 2008
Pt-online-schema-change failed when creating new tablle, specified key was too long; max key length is 767 bytes issue MySQL & MariaDB percona	1	539	June 27, 2022
Percona mysql 8.0.21 unique key column but still insert duplicate data Percona Server for MySQL 8.0 mysql , bugs	4	741	October 17, 2023

Large varchar unique keys?

Related topics