I have a table with datestamped info (DATETIME type) and some tinyint and smallint columns. I want to calculate averages and sums over any date interval the user in my application wants.
I have tried many ways but none seem to work properly. Has anybody done this before and has a definitive solution?
The things I have tried:
- Convert to unix time:
SELECT … FROM … GROUP BY FLOOR(UNIX_TIMESTAMP(TimeStamp)/86400)
Where 86400 is the number of seconds in the grouping period (e.g. if you want to group by hour, then it is 60 * 60, for grouping by 2 hours it is 2 * 60 * 60 ).
There are problems with this approach although I don’t remember the exact details I’m afraid.
- Group by year, month, day, hour separately:
SELECT ZoneId, GROUP_CONCAT(DATE_FORMAT(timestamp, ‘%Y%m%d-%H:%i:%s’)),STR_TO_DATE( CONCAT( DATE_FORMAT(timeStamp, ‘%Y’),DATE_FORMAT(timeStamp, ‘%m’), DATE_FORMAT(timeStamp, ‘%d’), (FLOOR(DATE_FORMAT(timeStamp, ‘%H’) / 2) ) * 2) ,‘%Y%m%d%H’) as CalculatedTimeStamp, sum(Length*(nVehicles1+nVehicles2+nVehicles3+nVehicles4+nVehicles5))/sum(nVehicles1+nVehicles2+nVehicles3+nVehicles4+nVehicles5), sum(nVehicles1), sum(nVehicles2), sum(nVehicles3), sum(nVehicles4), sum(nVehicles5) FROM tzonedata WHERE TimeStamp >= “2008-05-09 11:28:09” AND TimeStamp <= “2008-06-08 11:28:00” AND ( zoneid=1 OR zoneid=2 OR zoneid=3 OR zoneid=4 OR zoneid=5 OR zoneid=6 OR zoneid=7 OR zoneid=8) GROUP BY DATE_FORMAT(timeStamp, ‘%Y’),DATE_FORMAT(timeStamp, ‘%m’), DATE_FORMAT(timeStamp, ‘%d’), FLOOR(DATE_FORMAT(timeStamp, ‘%H’) / 2), zoneid ORDER BY CalculatedTimeStamp LIMIT 0, 5000
This example groups per 2 hours. Notice the ‘CalculatedTimeStamp’, this calculation makes it so that each returned row starts at the start of an hour. The returned resultset will contains times like this:
0:00
2:00
4:00
6:00
…
BUT:
→ It does not work when you have summer and wintertime! If you group by hour:
SELECT ZoneId, GROUP_CONCAT(DATE_FORMAT(timeStamp, ‘%H’)), STR_TO_DATE( CONCAT( DATE_FORMAT(timeStamp, ‘%Y’),DATE_FORMAT(timeStamp, ‘%m’), DATE_FORMAT(timeStamp, ‘%d’), DATE_FORMAT(timeStamp, ‘%H’)) ,‘%Y%m%d%H’) as CalculatedTimeStamp, sum(nVehicles1), sum(nVehicles2), sum(nVehicles3), sum(nVehicles4), sum(nVehicles5) FROM tzonedata WHERE TimeStamp >= “2008-03-30 0:01:00” AND TimeStamp <= “2008-03-30 14:01:00” AND ( zoneid=1) GROUP BY DATE_FORMAT(timeStamp, ‘%Y’),DATE_FORMAT(timeStamp, ‘%m’), DATE_FORMAT(timeStamp, ‘%d’), DATE_FORMAT(timeStamp, ‘%H’), zoneid ORDER BY CalculatedTimeStamp LIMIT 0, 5000
Then you get the following result:
The group by happens correctly, the GROUP_CONCAT column shows:
00,00,00,00
01,01,01,01
02,02,02,02
03,03,03,03
However, the calculated timestamp contains 3:00 2 times!
30-mrt-2008 00:00:00
30-mrt-2008 01:00:00
30-mrt-2008 03:00:00
30-mrt-2008 03:00:00
This is due to the fact that at that time we switch from winter to summertime.
Another problem with this approarch is that if the user selects to group by 7 days, it always gets truncated at the month. E.g.:
1 jan → 7 jan
8 jan → 16 jan
17 jan → 23 jan
23 jan → 30 jan
31 jan → 1 feb !!!
That is ofcourse because I group by year first, then by month and then by day.
Any help would be greatly appriciated, I already spend 3 days trying to figure out the correct way to do this.
regards,
Wim