Hi!
I’m glad I found this web-site. The fact that Peter decided to start a forum makes me twice as glad ) The question that I have is obviously related to MySQL performance as the web-site name implies.
I need to estimate which one of the following approaches provides better performance.
I’m implementing a simple mapping table, which operates in terms of numerical object IDs (OID) and up to “N” numerical aliases (Alias1, …, AliasN) of different alias types (AliasType1, …, AliasTypeN). One OID can have zero or one Alias of the given AliasType associated with it. So, a pair of OID and AliasType uniquely identifies an Alias (but it may not exist at the time of the query). Also, a pair of an Alias and AliasType uniquely identifies an OID.
Since it is not very likely that any given OID has Aliases of all AliasTypes defined, I scratched out the most obvious implementation of a MySQL table that would have the following columns: (OID, Alias1, Alias2, …, AliasN) and every column would be a unique key. In this case my table could get rather sparse and with tens of millions records the efficiency would be rather pathetic.
Another approach suggests creating a single three-column table: (OID, AliasType, Alias), which can have two indicies (OID, AliasType) and (Alias, AliasType). In this case I shrink the width of the table but multiply the row count.
And the last approach I thought of is the derivative of the above. I can split this long 3-column table into N 2-column tables, where every table would contain mappings of an OID to an Alias of a particular AliasType.
Please correct me if I’m wrong in my expectations.
-
I think that a lookup for a particular Alias or an OID, given the AliasType and either OID or Alias, should be fast in the last design approach. This conclusion is based on the fact that MySQL server would have to make a lookup on a relatively small data set with the luxury of having a unique index built for both types of queries.
-
If I need to get all the aliases of all types that correspond to a given OID, the earlier design (with a 3-column table) should provide significantly better performance. I think this is true because I would have to fire N queries to every 2-column table versus one query to a 3-column table with a resulting dataset of multiple rows.
Am I correct in the statements above?
May be there is any better solution for MySQL than what I’m thinking about?
Thanks a lot!
/Sergey