I’m designing a schema for a full featured blog aggregator. I’ll have table for articles, feed, sites, hubbubs, users’ profile and other data.
Even if many of those tables will have an URL as the best natural key, I’m not sure if it could be a good primary key as well.
URL could be very long (indeed there’s any limit, I’m going to put an arbitrary one at 255 chars) and I’m not sure what will happen with joins.
So why not choose an auto-incremented ID as the key? I would prefer to not have a key completely unrelated to the real data.
Anyway since I will have million of records in not so much time from the start of the service, I’d like to avoid to modify the database schema to change the primary key on several tables, which implies changing relations, constraints, tables for n:m relations and moreover the application itself.
Some technical details: MySQL 5.1 with InnoDB storage engine. I’ll have also a master-master replication.
The very major part of write will be done via batch script during day. I’ll have a sphinx search on the database too.
So I need some opinion: will I have performance problem using an URL VARCHAR(255) as primary key at some point? It is always better to choose an integer ID? If i choose to use an ID I have to properly configure the replication options on it, which I would not with URLs (they cannot conflict).
Thank you for any help/opinion.