Project / Component
Product: Percona Server for MySQL
Version: 8.4.8-8
Component: SQL / Optimizer / GROUP_CONCAT / Derived table materialization
Severity: Major or Critical
Type: Bug
Description
We observe a wrong query result in Percona Server for MySQL 8.4.8-8 when group_concat_max_len is set to specific values.
The issue is reproducible with:
group_concat_max_len = 268435456 -- 2^28
group_concat_max_len = 536870912 -- 2^29
group_concat_max_len = 805306368 -- 3 * 2^28
For nearby values such as 268435455, 268435457, and other tested values, the query returns the expected result.
The query uses SUBSTRING_INDEX(GROUP_CONCAT(...), ',', 1) inside a derived table which is then joined by an outer query. The direct GROUP_CONCAT() result is correct, but the SUBSTRING_INDEX(GROUP_CONCAT(...)) expression returns an empty string.
This appears to be a wrong-result bug related to the internal result metadata or materialization of GROUP_CONCAT() when group_concat_max_len has specific boundary-like values.
Environment
Server version:
8.4.8-8
Version comment:
Percona Server (GPL), Release '8', Revision '1c288264'
Architecture:
x86_64
OS:
Distributor ID: Debian
Description: Debian GNU/Linux 13 (trixie)
Release: 13
Codename: trixie
Storage engine:
InnoDB
binlog_format:
ROW
transaction_isolation:
REPEATABLE-READ
character_set_connection:
utf8mb4
collation_connection:
utf8mb4_general_ci
Additional observed environment details:
The issue was reproduced on a live Percona Server 8.4.8-8 system.
A Docker-based Percona Server 8.4.8-8 comparison instance returns correct results unless the affected group_concat_max_len values are used.
The issue is not explained by:
- CPU architecture: amd64 Docker also returns correct results with non-affected values.
- optimizer_switch: identical between affected and non-affected systems.
- internal_tmp_mem_storage_engine: MEMORY vs TempTable was tested and excluded as the root cause.
- character set / collation differences: not sufficient to explain the behavior.
Steps to Reproduce
Run the following SQL on Percona Server for MySQL 8.4.8-8.
Case 1: affected value 2^28
SET SESSION group_concat_max_len = 268435456;
SELECT
b.id,
f1.calculated_arrival AS calc_arrival,
f1.concated_arrival AS con_arr,
CONCAT('[', f1.calculated_arrival, ']') AS zfh,
LENGTH(f1.calculated_arrival) AS len_calc,
HEX(f1.calculated_arrival) AS hex_calc,
LENGTH(f1.concated_arrival) AS len_con,
HEX(f1.concated_arrival) AS hex_con
FROM (SELECT 123 AS id) b
LEFT JOIN (
SELECT
fn1.id AS id,
SUBSTRING_INDEX(
GROUP_CONCAT(fn1.arrivalairport ORDER BY fn1.segment ASC),
',',
1
) AS calculated_arrival,
GROUP_CONCAT(fn1.arrivalairport ORDER BY fn1.segment ASC) AS concated_arrival
FROM (
SELECT 'ABC' AS arrivalairport, 1 AS segment, 123 AS id
) AS fn1
GROUP BY fn1.id
) f1 ON b.id = f1.id;
Case 2: affected value 2^29
SET SESSION group_concat_max_len = 536870912;
SELECT
b.id,
f1.calculated_arrival AS calc_arrival,
f1.concated_arrival AS con_arr,
CONCAT('[', f1.calculated_arrival, ']') AS zfh,
LENGTH(f1.calculated_arrival) AS len_calc,
HEX(f1.calculated_arrival) AS hex_calc,
LENGTH(f1.concated_arrival) AS len_con,
HEX(f1.concated_arrival) AS hex_con
FROM (SELECT 123 AS id) b
LEFT JOIN (
SELECT
fn1.id AS id,
SUBSTRING_INDEX(
GROUP_CONCAT(fn1.arrivalairport ORDER BY fn1.segment ASC),
',',
1
) AS calculated_arrival,
GROUP_CONCAT(fn1.arrivalairport ORDER BY fn1.segment ASC) AS concated_arrival
FROM (
SELECT 'ABC' AS arrivalairport, 1 AS segment, 123 AS id
) AS fn1
GROUP BY fn1.id
) f1 ON b.id = f1.id;
Case 3: affected value 3 * 2^28
SET SESSION group_concat_max_len = 805306368;
SELECT
b.id,
f1.calculated_arrival AS calc_arrival,
f1.concated_arrival AS con_arr,
CONCAT('[', f1.calculated_arrival, ']') AS zfh,
LENGTH(f1.calculated_arrival) AS len_calc,
HEX(f1.calculated_arrival) AS hex_calc,
LENGTH(f1.concated_arrival) AS len_con,
HEX(f1.concated_arrival) AS hex_con
FROM (SELECT 123 AS id) b
LEFT JOIN (
SELECT
fn1.id AS id,
SUBSTRING_INDEX(
GROUP_CONCAT(fn1.arrivalairport ORDER BY fn1.segment ASC),
',',
1
) AS calculated_arrival,
GROUP_CONCAT(fn1.arrivalairport ORDER BY fn1.segment ASC) AS concated_arrival
FROM (
SELECT 'ABC' AS arrivalairport, 1 AS segment, 123 AS id
) AS fn1
GROUP BY fn1.id
) f1 ON b.id = f1.id;
Expected Result
For all group_concat_max_len values, the result should be:
id: 123
calc_arrival: ABC
con_arr: ABC
zfh: [ABC]
len_calc: 3
hex_calc: 414243
len_con: 3
hex_con: 414243
Reasoning:
GROUP_CONCAT('ABC') = 'ABC'
SUBSTRING_INDEX('ABC', ',', 1) = 'ABC'
Therefore, SUBSTRING_INDEX(GROUP_CONCAT(...), ',', 1) should return ABC.
Actual Result
For the affected values:
group_concat_max_len = 268435456
group_concat_max_len = 536870912
group_concat_max_len = 805306368
the observed result is:
id: 123
calc_arrival: ''
con_arr: ABC
zfh: []
len_calc: 0
hex_calc: ''
len_con: 3
hex_con: 414243
The direct GROUP_CONCAT() output is correct:
con_arr = ABC
but the nested expression returns an empty string:
SUBSTRING_INDEX(GROUP_CONCAT(...), ',', 1) = ''
This is inconsistent and appears to be a wrong result.
Control Tests
The same query returns correct results for nearby values:
SET SESSION group_concat_max_len = 268435455;
-- result is correct
SET SESSION group_concat_max_len = 268435457;
-- result is correct
SET SESSION group_concat_max_len = 536870911;
-- result is correct
SET SESSION group_concat_max_len = 536870913;
-- result is correct
Known affected values:
268435456 = 2^28
536870912 = 2^29
805306368 = 3 * 2^28
Optimizer Trace Notes
Optimizer traces from affected and non-affected environments show the same logical transformation. The derived tables are materialized.
Relevant trace observations from comparison:
The derived table f1 is materialized in both cases.
The expanded query is identical.
The wrong result occurs despite:
- identical optimizer_switch
- same Percona Server version
- same x86_64 architecture in amd64 Docker comparison
Example trace excerpt:
"derived": {
"table": " `f1`",
"select#": 3,
"materialized": true
}
The issue appears to happen during or after evaluation/materialization of the derived table containing:
SUBSTRING_INDEX(GROUP_CONCAT(...), ',', 1) AS calculated_arrival,
GROUP_CONCAT(...) AS concated_arrival
Workaround
Avoid affected group_concat_max_len values. For example, instead of:
SET GLOBAL group_concat_max_len = 268435456;
use:
SET GLOBAL group_concat_max_len = 268435455;
This returns the expected result even when the original query fails.
Additional Diagnostic SQL
@@version: 8.4.8-8
@@version_comment: Percona Server (GPL), Release '8', Revision '1c288264'
@@version_compile_machine: x86_64
@@version_compile_os: Linux
@@sql_mode: STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
@@optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on,use_invisible_indexes=off,skip_scan=on,hash_join=on,subquery_to_derived=off,prefer_ordering_index=on,hypergraph_optimizer=off,derived_condition_pushdown=on,hash_set_operations=on,favor_range_scan=off
@@group_concat_max_len: 268435456
@@character_set_server: utf8mb4
@@collation_server: utf8mb4_general_ci
@@character_set_database: utf8mb4
@@collation_database: utf8mb4_general_ci
@@character_set_connection: utf8mb4
@@collation_connection: utf8mb4_general_ci
@@character_set_results: utf8mb4
@@internal_tmp_mem_storage_engine: MEMORY
@@tmp_table_size: 134217728
@@max_heap_table_size: 134217728
@@sort_buffer_size: 4194304
@@join_buffer_size: 262144
@@read_buffer_size: 2097152
@@read_rnd_buffer_size: 262144
@@binlog_format: ROW
@@transaction_isolation: REPEATABLE-READ
@@default_storage_engine: InnoDB
@@lower_case_table_names: 0
Optimizer trace:
TRACE: {
"steps": [
{
"join_preparation": {
"select#": 1,
"steps": [
{
"join_preparation": {
"select#": 2,
"steps": [
{
"expanded_query": "/* select#2 */ select 123 AS `id`"
}
]
}
},
{
"derived": {
"table": " `b`",
"select#": 2,
"materialized": true
}
},
{
"join_preparation": {
"select#": 3,
"steps": [
{
"join_preparation": {
"select#": 4,
"steps": [
{
"expanded_query": "/* select#4 */ select 'ABC' AS `arrivalairport`,1 AS `segment`,123 AS `id`"
}
]
}
},
{
"derived": {
"table": " `fn1`",
"select#": 4,
"materialized": true
}
},
{
"expanded_query": "/* select#3 */ select `fn1`.`id` AS `id`,substring_index(group_concat(`fn1`.`arrivalairport` order by `fn1`.`segment` ASC separator ','),',',1) AS `calculated_arrival`,group_concat(`fn1`.`arrivalairport` order by `fn1`.`segment` ASC separator ',') AS `concated_arrival` from (/* select#4 */ select 'ABC' AS `arrivalairport`,1 AS `segment`,123 AS `id`) `fn1` group by `fn1`.`id`"
}
]
}
},
{
"derived": {
"table": " `f1`",
"select#": 3,
"materialized": true
}
},
{
"expanded_query": "/* select#1 */ select `b`.`id` AS `id`,`f1`.`calculated_arrival` AS `calc_arrival`,`f1`.`concated_arrival` AS `con_arr`,concat('[',`f1`.`calculated_arrival`,']') AS `zfh`,length(`f1`.`calculated_arrival`) AS `len_calc`,hex(`f1`.`calculated_arrival`) AS `hex_calc`,length(`f1`.`concated_arrival`) AS `len_con`,hex(`f1`.`concated_arrival`) AS `hex_con` from ((/* select#2 */ select 123 AS `id`) `b` left join (/* select#3 */ select `fn1`.`id` AS `id`,substring_index(group_concat(`fn1`.`arrivalairport` order by `fn1`.`segment` ASC separator ','),',',1) AS `calculated_arrival`,group_concat(`fn1`.`arrivalairport` order by `fn1`.`segment` ASC separator ',') AS `concated_arrival` from (/* select#4 */ select 'ABC' AS `arrivalairport`,1 AS `segment`,123 AS `id`) `fn1` group by `fn1`.`id`) `f1` on((`b`.`id` = `f1`.`id`)))"
},
{
"transformations_to_nested_joins": {
"transformations": [
"parenthesis_removal"
],
"expanded_query": "/* select#1 */ select `b`.`id` AS `id`,`f1`.`calculated_arrival` AS `calc_arrival`,`f1`.`concated_arrival` AS `con_arr`,concat('[',`f1`.`calculated_arrival`,']') AS `zfh`,length(`f1`.`calculated_arrival`) AS `len_calc`,hex(`f1`.`calculated_arrival`) AS `hex_calc`,length(`f1`.`concated_arrival`) AS `len_con`,hex(`f1`.`concated_arrival`) AS `hex_con` from (/* select#2 */ select 123 AS `id`) `b` left join (/* select#3 */ select `fn1`.`id` AS `id`,substring_index(group_concat(`fn1`.`arrivalairport` order by `fn1`.`segment` ASC separator ','),',',1) AS `calculated_arrival`,group_concat(`fn1`.`arrivalairport` order by `fn1`.`segment` ASC separator ',') AS `concated_arrival` from (/* select#4 */ select 'ABC' AS `arrivalairport`,1 AS `segment`,123 AS `id`) `fn1` group by `fn1`.`id`) `f1` on((`b`.`id` = `f1`.`id`))"
}
}
]
}
},
{
"join_optimization": {
"select#": 1,
"steps": [
{
"join_optimization": {
"select#": 2,
"steps": [
]
}
},
{
"creating_tmp_table": {
"tmp_table_info": {
"table": "b",
"columns": 1,
"row_length": 5,
"key_length": 0,
"unique_constraint": false,
"makes_grouped_rows": false,
"cannot_insert_duplicates": false,
"location": "memory (heap)",
"row_limit_estimate": 26843545
}
}
},
{
"join_execution": {
"select#": 2,
"steps": [
]
}
},
{
"join_optimization": {
"select#": 3,
"steps": [
{
"join_optimization": {
"select#": 4,
"steps": [
]
}
},
{
"creating_tmp_table": {
"tmp_table_info": {
"table": "fn1",
"columns": 3,
"row_length": 22,
"key_length": 0,
"unique_constraint": false,
"makes_grouped_rows": false,
"cannot_insert_duplicates": false,
"location": "memory (heap)",
"row_limit_estimate": 6100805
}
}
},
{
"join_execution": {
"select#": 4,
"steps": [
]
}
},
{
"substitute_generated_columns": {
}
},
{
"table_dependencies": [
{
"table": " `fn1`",
"row_may_be_null": false,
"map_bit": 0,
"depends_on_map_bits": [
]
}
]
},
{
"rows_estimation": [
{
"table": " `fn1`",
"rows": 1,
"cost": 1,
"table_type": "system",
"empty": false
}
]
},
{
"attaching_conditions_to_tables": {
"original_condition": null,
"attached_conditions_computation": [
],
"attached_conditions_summary": [
]
}
},
{
"optimizing_distinct_group_by_order_by": {
}
},
{
"refine_plan": [
]
}
]
}
},
{
"creating_tmp_table": {
"tmp_table_info": {
"table": "f1",
"columns": 3,
"row_length": 18,
"key_length": 0,
"unique_constraint": false,
"makes_grouped_rows": false,
"cannot_insert_duplicates": false,
"location": "disk (InnoDB)",
"record_format": "packed"
}
}
},
{
"join_execution": {
"select#": 3,
"steps": [
]
}
},
{
"condition_processing": {
"condition": "WHERE",
"original_condition": null,
"steps": [
{
"transformation": "equality_propagation",
"resulting_condition": null
}
]
}
},
{
"table_dependencies": [
{
"table": " `b`",
"row_may_be_null": false,
"map_bit": 0,
"depends_on_map_bits": [
]
},
{
"table": " `f1`",
"row_may_be_null": true,
"map_bit": 1,
"depends_on_map_bits": [
0
]
}
]
},
{
"ref_optimizer_key_uses": [
]
},
{
"rows_estimation": [
{
"table": " `b`",
"rows": 1,
"cost": 1,
"table_type": "system",
"empty": false
},
{
"table": " `f1`",
"rows": 1,
"cost": 1,
"table_type": "system",
"empty": false
}
]
},
{
"condition_on_constant_tables": "true",
"condition_value": true
},
{
"attaching_conditions_to_tables": {
"original_condition": "true",
"attached_conditions_computation": [
],
"attached_conditions_summary": [
]
}
},
{
"refine_plan": [
]
}
]
}
},
{
"join_execution": {
"select#": 1,
"steps": [
]
}
}
]
}
Suggested Impact Statement
This is a wrong-result issue. The query completes successfully and returns no error or warning, but one projected expression returns an incorrect empty string while another expression based on the same GROUP_CONCAT() value returns the expected value.
This can lead to silent data corruption at the application level when group_concat_max_len is configured to affected values.
THX and Greetings
Thomas