Nearest value joins

ImRe · November 1, 2006, 1:51am

Hi,

I have two tables, both containing measurement values with (unix_)timestamps. What is the best way to left join one of these with the other on the nearest timestamp value?

Currently I am doing something like this:

CREATE TABLE Session
(Session_id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
Session_Data INT UNSIGNED NOT NULL);

CREATE TABLE Meas_Data
(Time INT UNSIGNED NOT NULL,
Id DEC(15) UNSIGNED ZEROFILL NOT NULL,
Data DEC(15) UNSIGNED ZEROFILL NOT NULL,
Session_id BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (Time, Id),
FOREIGN KEY (Session_id) REFERENCES Session (Session_id)
ON DELETE NO ACTION ON UPDATE CASCADE);

CREATE TABLE GPS
(Session_id BIGINT UNSIGNED NOT NULL,
UTC_s INT UNSIGNED NOT NULL COMMENT ‘Timestamp’,
UTC_us DEC(3) UNSIGNED NOT NULL COMMENT ‘microseconds’,
Longitude DOUBLE NOT NULL,
Latitude DOUBLE NOT NULL,
HDOP DEC(2,1) UNSIGNED,
Altitude FLOAT,
Fix ENUM(‘Invalid’, ‘2D’, ‘3D’),
COG FLOAT,
Speed FLOAT,
Sat DEC(2) UNSIGNED,
PRIMARY KEY (Session_id, UTC_s),
FOREIGN KEY (Session_id) REFERENCES Transmission_Log (Session_id)
ON DELETE CASCADE ON UPDATE NO ACTION);

CREATE FUNCTION get_gps_distance(mt_ INT UNSIGNED, si_ BIGINT UNSIGNED)
RETURNS INT UNSIGNED
COMMENT ‘Returns the time difference to the nearest GPS fix’
BEGIN
DECLARE rv_ INT UNSIGNED;
SELECT MIN(ABS(mt_ - UTC_s)) INTO rv_ FROM GPS
WHERE TL_id = si_;
RETURN rv_;
END;

SELECT *
FROM Meas_Data NATURAL JOIN Session
LEFT JOIN GPS
ON Session.TL_id = GPS.TL_id
AND ABS(Backoffice_View.Capture_Time - GPS.UTC_s)
= get_gps_distance(Backoffice_View.Capture_Time,
Backoffice_View.TL_id);

When I added the left join, my tests on realistic data experienced a factor of ten slowdown. Is there a better way to do such joins?

Thx

ImRe

Peter · November 1, 2006, 5:45am

ImRe,

What is EXPLAIN for this query ?

In general it is not surprising at all as your nearest distance check may be pretty expensive.

I would also build query as simple query (without stored function) first as otherwise you hide complexity.

ImRe · November 1, 2006, 7:20am

Peter,

I rewrote the query with subquery instead of procedure. Here is the explain output

±—±-------------------±------------±------±--------------±--------±--------±----------------------±-----±-------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |±—±-------------------±------------±------±--------------±--------±--------±----------------------±-----±-------------------------+| 1 | PRIMARY | meas_data | range | PRIMARY | PRIMARY | 4 | NULL | 308 | Using where || 1 | PRIMARY | GPS | ref | PRIMARY | PRIMARY | 8 | gi2.meas_data.TL_id | 498 | || 2 | DEPENDENT SUBQUERY | GPS | ref | PRIMARY | PRIMARY | 8 | func | 498 | Using where; Using index |±—±-------------------±------------±------±--------------±--------±--------±----------------------±-----±-------------------------+

It looks pretty innocent to me …

Peter · November 1, 2006, 7:39am

Well. Number of rows is pretty large.

300500500 is 75.000.000 which is quite a number. Real number may be much smaller due to using where but it still can be significant.

You can profile your query and see how many rows each part really need to analyze.

Topic		Replies	Views
LEFT JOIN Optimization Other MySQL® Questions	2	610	December 19, 2011
Any suggestions to a query with multiple joins? Other MySQL® Questions	10	649	July 27, 2007
Mysql Sort by desc leaving NULL on last Other MySQL® Questions	0	439	February 6, 2017
Join and speed issues Other MySQL® Questions	6	657	August 27, 2009
Display query result from two tables... Other MySQL® Questions	1	349	March 22, 2007

Nearest value joins

Related topics