 # Group by any date interval

I have a table with datestamped info (DATETIME type) and some tinyint and smallint columns. I want to calculate averages and sums over any date interval the user in my application wants.

I have tried many ways but none seem to work properly. Has anybody done this before and has a definitive solution?

The things I have tried:

1. Convert to unix time:

SELECT … FROM … GROUP BY FLOOR(UNIX_TIMESTAMP(TimeStamp)/86400)

Where 86400 is the number of seconds in the grouping period (e.g. if you want to group by hour, then it is 60 * 60, for grouping by 2 hours it is 2 * 60 * 60 ).

There are problems with this approach although I don’t remember the exact details I’m afraid.

1. Group by year, month, day, hour separately:

SELECT ZoneId, GROUP_CONCAT(DATE_FORMAT(timestamp, ‘%Y%m%d-%H:%i:%s’)),STR_TO_DATE( CONCAT( DATE_FORMAT(timeStamp, ‘%Y’),DATE_FORMAT(timeStamp, ‘%m’), DATE_FORMAT(timeStamp, ‘%d’), (FLOOR(DATE_FORMAT(timeStamp, ‘%H’) / 2) ) * 2) ,’%Y%m%d%H’) as CalculatedTimeStamp, sum(Length*(nVehicles1+nVehicles2+nVehicles3+nVehicles4+nVehicles5))/sum(nVehicles1+nVehicles2+nVehicles3+nVehicles4+nVehicles5), sum(nVehicles1), sum(nVehicles2), sum(nVehicles3), sum(nVehicles4), sum(nVehicles5) FROM tzonedata WHERE TimeStamp >= “2008-05-09 11:28:09” AND TimeStamp <= “2008-06-08 11:28:00” AND ( zoneid=1 OR zoneid=2 OR zoneid=3 OR zoneid=4 OR zoneid=5 OR zoneid=6 OR zoneid=7 OR zoneid=8) GROUP BY DATE_FORMAT(timeStamp, ‘%Y’),DATE_FORMAT(timeStamp, ‘%m’), DATE_FORMAT(timeStamp, ‘%d’), FLOOR(DATE_FORMAT(timeStamp, ‘%H’) / 2), zoneid ORDER BY CalculatedTimeStamp LIMIT 0, 5000

This example groups per 2 hours. Notice the ‘CalculatedTimeStamp’, this calculation makes it so that each returned row starts at the start of an hour. The returned resultset will contains times like this:

0:00
2:00
4:00
6:00

BUT:
-> It does not work when you have summer and wintertime! If you group by hour:

SELECT ZoneId, GROUP_CONCAT(DATE_FORMAT(timeStamp, ‘%H’)), STR_TO_DATE( CONCAT( DATE_FORMAT(timeStamp, ‘%Y’),DATE_FORMAT(timeStamp, ‘%m’), DATE_FORMAT(timeStamp, ‘%d’), DATE_FORMAT(timeStamp, ‘%H’)) ,’%Y%m%d%H’) as CalculatedTimeStamp, sum(nVehicles1), sum(nVehicles2), sum(nVehicles3), sum(nVehicles4), sum(nVehicles5) FROM tzonedata WHERE TimeStamp >= “2008-03-30 0:01:00” AND TimeStamp <= “2008-03-30 14:01:00” AND ( zoneid=1) GROUP BY DATE_FORMAT(timeStamp, ‘%Y’),DATE_FORMAT(timeStamp, ‘%m’), DATE_FORMAT(timeStamp, ‘%d’), DATE_FORMAT(timeStamp, ‘%H’), zoneid ORDER BY CalculatedTimeStamp LIMIT 0, 5000

Then you get the following result:
The group by happens correctly, the GROUP_CONCAT column shows:
00,00,00,00
01,01,01,01
02,02,02,02
03,03,03,03

However, the calculated timestamp contains 3:00 2 times!
30-mrt-2008 00:00:00
30-mrt-2008 01:00:00
30-mrt-2008 03:00:00
30-mrt-2008 03:00:00

This is due to the fact that at that time we switch from winter to summertime.

Another problem with this approarch is that if the user selects to group by 7 days, it always gets truncated at the month. E.g.:
1 jan -> 7 jan
8 jan -> 16 jan
17 jan -> 23 jan
23 jan -> 30 jan
31 jan -> 1 feb !!!

That is ofcourse because I group by year first, then by month and then by day.

Any help would be greatly appriciated, I already spend 3 days trying to figure out the correct way to do this.

regards,

Wim