This is sort of on the edge of performance, as in, the way I’d implemented it involves a trigger on a table which might not be necessary.
I’ve a requirement to have a table with potentially very long strings, at the moment I’ve the column set to 1024 length which might still be too small.
The problem is that this long string has to be unique.
The MySQL key size is limited to 1000bytes or so, and with utf8, that means you’re limited to 333 for the key… so if I have a String in the database length 500 eg.
aaaaaaa…aaaa (length 500)
and I add
aaaaaaa…aaab (length 500)
this will fail with a duplicate key since it’ll only ever check the first 333 characters for the unique key
I fixed this by just having a normal index on this column for searches, but making a trigger to do a md5(upper(‘newvalue’)) on newly added strings and setting this value into a column called ‘myhash’ and putting a unique key on ‘myhash’
In the real world, I never should come across duplicates (the table should really only get to about 1million rows max) … but on the offchance, should I make another column ‘myhash2’ and use another encryption method and put a unique constraint on myhash+myhash2?
Changes of getting a duplicate MD5 hash is minute, but its the ‘birthday paradox’, the more you have the better chance of a duplicate