Unexpected behaviour of GROUP_CONCAT

Hi,
I hope I am posting this question in the appropriate section and was wondering if you could help me.

I came accross the following HAVING clause recently:


“HAVING GROUP_CONCAT(DISTINCT(y)) NOT IN (a,b,c) GROUP BY pk”
-

which was not supposed (in my humble opinion) to filter out any returned results except where group_concat would only return a OR b OR c as a single value or in the worst case scenario it should throw an error.

The table being queried is something like this:

CREATE TABLE x (pk INT UNSIGNED NOT NULL,y INT UNSIGNED NOT NULL,KEY (pk));INSERT INTO xVALUES (1,106),(1,107),(1,119), (2,103),(2,107),(2,102),(2,119), (3,101),(3,102),(3,119), (4,119),(4,108),(4,110), (5,106),(5,110), (6,119);SELECT * FROM x;±—±----+| pk | y |±—±----+| 1 | 106 || 1 | 107 || 1 | 119 || 2 | 103 || 2 | 107 || 2 | 102 || 2 | 119 || 3 | 101 || 3 | 102 || 3 | 119 || 4 | 119 || 4 | 108 || 4 | 110 || 5 | 106 || 5 | 110 || 6 | 119 |±—±----+

I was surpprised to see that it would indeed work the following way:
-*- The last concatenated element will never be matched:

SELECT pk,CAST(GROUP_CONCAT(y) as char) FROM x GROUP BY pk HAVING GROUP_CONCAT(y) NOT IN (119);

always returns rows with 119** (which is quite logical I would say since [IN] does not work as [LIKE “%%”])

±—±------------------------------+| pk | CAST(GROUP_CONCAT(y) as char) |±—±------------------------------+| 1 | 106,107,119 || 2 | 103,107,102,119 || 3 | 101,102,119 || 5 | 106,110 |±—±------------------------------+

**except where pk 4 (119 is the first concatinated element) and 6 (is the only concatinated element)

-*- The first concatenated element is always matched(hence filtered out by ‘NOT IN’) and this is strange to me as IN is working the same as LIKE “106%” on the concatinated string:

SELECT pk,CAST(GROUP_CONCAT(y) as char) FROM x GROUP BY pk HAVING GROUP_CONCAT(y) NOT IN (106);±—±------------------------------+| pk | CAST(GROUP_CONCAT(y) as char) |±—±------------------------------+| 2 | 103,107,102,119 || 3 | 101,102,119 || 4 | 119,108,110 || 6 | 119 |±—±------------------------------+

-*- The full list of concatenated elements always matches in any order:

mysql> SELECT pk,CAST(GROUP_CONCAT(y) as char) FROM x GROUP BY pk HAVING GROUP_CONCAT(y) NOT IN (119,108,110);±—±------------------------------+| pk | CAST(GROUP_CONCAT(y) as char) |±—±------------------------------+| 1 | 106,107,119 || 2 | 103,107,102,119 || 3 | 101,102,119 || 5 | 106,110 |±—±------------------------------+4 rows in set (0.00 sec)mysql> SELECT pk,CAST(GROUP_CONCAT(y) as char) FROM x GROUP BY pk HAVING GROUP_CONCAT(y) NOT IN (119,110,108);±—±------------------------------+| pk | CAST(GROUP_CONCAT(y) as char) |±—±------------------------------+| 1 | 106,107,119 || 2 | 103,107,102,119 || 3 | 101,102,119 || 5 | 106,110 |±—±------------------------------+4 rows in set (0.00 sec)mysql> SELECT pk,CAST(GROUP_CONCAT(y) as char) FROM x GROUP BY pk HAVING GROUP_CONCAT(y) NOT IN (110,119,108);±—±------------------------------+| pk | CAST(GROUP_CONCAT(y) as char) |±—±------------------------------+| 1 | 106,107,119 || 2 | 103,107,102,119 || 3 | 101,102,119 || 5 | 106,110 |±—±------------------------------+4 rows in set (0.00 sec)mysql> SELECT pk,CAST(GROUP_CONCAT(y) as char) FROM x GROUP BY pk HAVING GROUP_CONCAT(y) NOT IN (108,119,110);±—±------------------------------+| pk | CAST(GROUP_CONCAT(y) as char) |±—±------------------------------+| 1 | 106,107,119 || 2 | 103,107,102,119 || 3 | 101,102,119 || 5 | 106,110 |±—±------------------------------+4 rows in set (0.00 sec)

Is this related to the specifics of GROUP_CONCAT or rather IN?
Is there something I am missing out or not understanding? (

I would not expect that GROUP_CONCAT(anything) would work well with IN.

The IN clause works in the opposite direction to determine if a value X is IN a list (X,Y,Z).

I believe this query solves your problem:

SELECT pk, GROUP_CONCAT(y) as foo
FROM x
GROUP BY pk
HAVING GROUP_CONCAT(y) not like ‘%107%’;

Problem solved!!! )

Basically the reason fot this unexpected results returned by the clause working in some cases is the Implicit Casting (lol) when every element which is in the IN() is casted as a string and then compared with another string (the concatinated one). In this case concatenated string is truncated to the length of the string being compared to. This sort of comparison will return true as long as the first character in the concatenated string is present in the IN()

if you do SELECT “106,110,111” IN ( 111, 106, 110 ) is the same as doing
SELECT “106,110,111” = 111 OR
“106,110,111” = 106 OR
“106,110,111” = 110

the “106,110,111” = 106 returns true and hence the pattern has been matched.

Bosiwel thanks for your reply - I have re-written the clause as it does not make sense at all, but posted this comment since was puzzled why the heck it still worked in some cases…now i know )