Self-join and sorting

Hi all.

I’m wondering if anyone has a solution for a problem I’m facing. I’m giving a simplified example for clarity (I hope), but I think it illustrates the problem.

Let’s say I have a table called “Things” as follows:

Things------ThingID BIGINTTimestamp BIGINT Attribute VARCHAR(20)

And there are two indexes on this table as follows:

Indexes-----AttributeTimestampIndex (Attribute, Timestamp)ThingIDIndex (ThingID)

My table is denormalised, so each ThingID can have multiple Attributes for each timestamp, all stored in different rows.

However, importantly, each ThingID cannot have multiple timestamps.

For example:

Thing ID | Timestamp | Attribute1 | 1 | 'foo’1 | 1 | 'bar’2 | 1 | 'foo’2 | 1 | 'baz’3 | 2 | ‘foo’

Let’s say I want to find the timestamp of all Things that have Attributes of ‘bar’ and ‘foo’, I execute the following query:

SELECT t1.Timestamp FROM Things t1, Things t2 WHERE t1.AttributeA = ‘foo’ AND t2.ThingID = t1.ThingID AND t2.AttributeA = ‘bar’ ORDER BY Timestamp

The query optimizer (correctly) figures out that ‘bar’ is a more selective attribute, and evaluates t1 first, then joins to t2 on the ThingIDIndex index.

The problem is, even though I have Timestamp in my index (which it should hopefully be able to use for sorting), MySQL ends up doing a Filesort, because I am selecting t1.Timestamp rather than t2.Timestamp (even though they are always the same, since there can’t be multiple Timestamps for a ThingID).

The problem is I don’t know in advance whether t1 or t2 will have the more selective clause applied to it, so I don’t know whether to select from t1 or t2 in advance.

If I remove the t1 alias from Timestamp when I select it, then MySQL complains about an ambiguous column name (of course).

I can add the following clause to my SQL to enforce the fact that t1 and t2 have the same Timestamp.

t2.Timestamp = t1.Timestamp

But the effect is the same: MySQL still complains about an ambiguous column, and I still have to choose one of t1 or t2 to select from, thus I still have a 50% chance of MySQL having to do an unnecessary Filesort.

I’m not very keen on having to do a Filesort for a couple of reasons: 1. it’s unnecessary work, and, more importantly 2. I can’t start streaming the results to the client early, since I have to wait for the entire query to complete before I can get any results.

Anybody got any ideas as to how to solve this?

Honestly it sounds like a flawed design to me.