Wrong result with SUBSTRING_INDEX(GROUP_CONCAT(...)) when group_concat_max_len is 2^28, 2^29 or 3*2^28 in Percona Server 8.4.8-8

Project / Component

Product: Percona Server for MySQL
Version: 8.4.8-8
Component: SQL / Optimizer / GROUP_CONCAT / Derived table materialization
Severity: Major or Critical
Type: Bug

Description

We observe a wrong query result in Percona Server for MySQL 8.4.8-8 when group_concat_max_len is set to specific values.

The issue is reproducible with:

group_concat_max_len = 268435456  -- 2^28
group_concat_max_len = 536870912  -- 2^29
group_concat_max_len = 805306368  -- 3 * 2^28

For nearby values such as 268435455, 268435457, and other tested values, the query returns the expected result.

The query uses SUBSTRING_INDEX(GROUP_CONCAT(...), ',', 1) inside a derived table which is then joined by an outer query. The direct GROUP_CONCAT() result is correct, but the SUBSTRING_INDEX(GROUP_CONCAT(...)) expression returns an empty string.

This appears to be a wrong-result bug related to the internal result metadata or materialization of GROUP_CONCAT() when group_concat_max_len has specific boundary-like values.

Environment

Server version:
8.4.8-8

Version comment:
Percona Server (GPL), Release '8', Revision '1c288264'

Architecture:
x86_64

OS:
Distributor ID:	Debian
Description:	Debian GNU/Linux 13 (trixie)
Release:	13
Codename:	trixie

Storage engine:
InnoDB

binlog_format:
ROW

transaction_isolation:
REPEATABLE-READ

character_set_connection:
utf8mb4

collation_connection:
utf8mb4_general_ci

Additional observed environment details:

The issue was reproduced on a live Percona Server 8.4.8-8 system.

A Docker-based Percona Server 8.4.8-8 comparison instance returns correct results unless the affected group_concat_max_len values are used.

The issue is not explained by:
- CPU architecture: amd64 Docker also returns correct results with non-affected values.
- optimizer_switch: identical between affected and non-affected systems.
- internal_tmp_mem_storage_engine: MEMORY vs TempTable was tested and excluded as the root cause.
- character set / collation differences: not sufficient to explain the behavior.

Steps to Reproduce

Run the following SQL on Percona Server for MySQL 8.4.8-8.

Case 1: affected value 2^28

SET SESSION group_concat_max_len = 268435456;

SELECT
    b.id,
    f1.calculated_arrival AS calc_arrival,
    f1.concated_arrival AS con_arr,
    CONCAT('[', f1.calculated_arrival, ']') AS zfh,
    LENGTH(f1.calculated_arrival) AS len_calc,
    HEX(f1.calculated_arrival) AS hex_calc,
    LENGTH(f1.concated_arrival) AS len_con,
    HEX(f1.concated_arrival) AS hex_con
FROM (SELECT 123 AS id) b
LEFT JOIN (
    SELECT
        fn1.id AS id,
        SUBSTRING_INDEX(
            GROUP_CONCAT(fn1.arrivalairport ORDER BY fn1.segment ASC),
            ',',
            1
        ) AS calculated_arrival,
        GROUP_CONCAT(fn1.arrivalairport ORDER BY fn1.segment ASC) AS concated_arrival
    FROM (
        SELECT 'ABC' AS arrivalairport, 1 AS segment, 123 AS id
    ) AS fn1
    GROUP BY fn1.id
) f1 ON b.id = f1.id;

Case 2: affected value 2^29

SET SESSION group_concat_max_len = 536870912;

SELECT
    b.id,
    f1.calculated_arrival AS calc_arrival,
    f1.concated_arrival AS con_arr,
    CONCAT('[', f1.calculated_arrival, ']') AS zfh,
    LENGTH(f1.calculated_arrival) AS len_calc,
    HEX(f1.calculated_arrival) AS hex_calc,
    LENGTH(f1.concated_arrival) AS len_con,
    HEX(f1.concated_arrival) AS hex_con
FROM (SELECT 123 AS id) b
LEFT JOIN (
    SELECT
        fn1.id AS id,
        SUBSTRING_INDEX(
            GROUP_CONCAT(fn1.arrivalairport ORDER BY fn1.segment ASC),
            ',',
            1
        ) AS calculated_arrival,
        GROUP_CONCAT(fn1.arrivalairport ORDER BY fn1.segment ASC) AS concated_arrival
    FROM (
        SELECT 'ABC' AS arrivalairport, 1 AS segment, 123 AS id
    ) AS fn1
    GROUP BY fn1.id
) f1 ON b.id = f1.id;

Case 3: affected value 3 * 2^28

SET SESSION group_concat_max_len = 805306368;

SELECT
    b.id,
    f1.calculated_arrival AS calc_arrival,
    f1.concated_arrival AS con_arr,
    CONCAT('[', f1.calculated_arrival, ']') AS zfh,
    LENGTH(f1.calculated_arrival) AS len_calc,
    HEX(f1.calculated_arrival) AS hex_calc,
    LENGTH(f1.concated_arrival) AS len_con,
    HEX(f1.concated_arrival) AS hex_con
FROM (SELECT 123 AS id) b
LEFT JOIN (
    SELECT
        fn1.id AS id,
        SUBSTRING_INDEX(
            GROUP_CONCAT(fn1.arrivalairport ORDER BY fn1.segment ASC),
            ',',
            1
        ) AS calculated_arrival,
        GROUP_CONCAT(fn1.arrivalairport ORDER BY fn1.segment ASC) AS concated_arrival
    FROM (
        SELECT 'ABC' AS arrivalairport, 1 AS segment, 123 AS id
    ) AS fn1
    GROUP BY fn1.id
) f1 ON b.id = f1.id;

Expected Result

For all group_concat_max_len values, the result should be:

id:           123
calc_arrival: ABC
con_arr:      ABC
zfh:          [ABC]
len_calc:     3
hex_calc:     414243
len_con:      3
hex_con:      414243

Reasoning:

GROUP_CONCAT('ABC') = 'ABC'
SUBSTRING_INDEX('ABC', ',', 1) = 'ABC'

Therefore, SUBSTRING_INDEX(GROUP_CONCAT(...), ',', 1) should return ABC.

Actual Result

For the affected values:

group_concat_max_len = 268435456
group_concat_max_len = 536870912
group_concat_max_len = 805306368

the observed result is:

id:           123
calc_arrival: ''
con_arr:      ABC
zfh:          []
len_calc:     0
hex_calc:     ''
len_con:      3
hex_con:      414243

The direct GROUP_CONCAT() output is correct:

con_arr = ABC

but the nested expression returns an empty string:

SUBSTRING_INDEX(GROUP_CONCAT(...), ',', 1) = ''

This is inconsistent and appears to be a wrong result.

Control Tests

The same query returns correct results for nearby values:

SET SESSION group_concat_max_len = 268435455;
-- result is correct

SET SESSION group_concat_max_len = 268435457;
-- result is correct

SET SESSION group_concat_max_len = 536870911;
-- result is correct

SET SESSION group_concat_max_len = 536870913;
-- result is correct

Known affected values:

268435456 = 2^28
536870912 = 2^29
805306368 = 3 * 2^28

Optimizer Trace Notes

Optimizer traces from affected and non-affected environments show the same logical transformation. The derived tables are materialized.

Relevant trace observations from comparison:

The derived table f1 is materialized in both cases.

The expanded query is identical.

The wrong result occurs despite:
- identical optimizer_switch
- same Percona Server version
- same x86_64 architecture in amd64 Docker comparison

Example trace excerpt:

"derived": {
  "table": " `f1`",
  "select#": 3,
  "materialized": true
}

The issue appears to happen during or after evaluation/materialization of the derived table containing:

SUBSTRING_INDEX(GROUP_CONCAT(...), ',', 1) AS calculated_arrival,
GROUP_CONCAT(...) AS concated_arrival

Workaround

Avoid affected group_concat_max_len values. For example, instead of:

SET GLOBAL group_concat_max_len = 268435456;

use:

SET GLOBAL group_concat_max_len = 268435455;

This returns the expected result even when the original query fails.

Additional Diagnostic SQL

                        @@version: 8.4.8-8
                @@version_comment: Percona Server (GPL), Release '8', Revision '1c288264'
        @@version_compile_machine: x86_64
             @@version_compile_os: Linux
                       @@sql_mode: STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
               @@optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on,use_invisible_indexes=off,skip_scan=on,hash_join=on,subquery_to_derived=off,prefer_ordering_index=on,hypergraph_optimizer=off,derived_condition_pushdown=on,hash_set_operations=on,favor_range_scan=off
           @@group_concat_max_len: 268435456
           @@character_set_server: utf8mb4
               @@collation_server: utf8mb4_general_ci
         @@character_set_database: utf8mb4
             @@collation_database: utf8mb4_general_ci
       @@character_set_connection: utf8mb4
           @@collation_connection: utf8mb4_general_ci
          @@character_set_results: utf8mb4
@@internal_tmp_mem_storage_engine: MEMORY
                 @@tmp_table_size: 134217728
            @@max_heap_table_size: 134217728
               @@sort_buffer_size: 4194304
               @@join_buffer_size: 262144
               @@read_buffer_size: 2097152
           @@read_rnd_buffer_size: 262144
                  @@binlog_format: ROW
          @@transaction_isolation: REPEATABLE-READ
         @@default_storage_engine: InnoDB
         @@lower_case_table_names: 0

Optimizer trace:

TRACE: {
  "steps": [
    {
      "join_preparation": {
        "select#": 1,
        "steps": [
          {
            "join_preparation": {
              "select#": 2,
              "steps": [
                {
                  "expanded_query": "/* select#2 */ select 123 AS `id`"
                }
              ]
            }
          },
          {
            "derived": {
              "table": " `b`",
              "select#": 2,
              "materialized": true
            }
          },
          {
            "join_preparation": {
              "select#": 3,
              "steps": [
                {
                  "join_preparation": {
                    "select#": 4,
                    "steps": [
                      {
                        "expanded_query": "/* select#4 */ select 'ABC' AS `arrivalairport`,1 AS `segment`,123 AS `id`"
                      }
                    ]
                  }
                },
                {
                  "derived": {
                    "table": " `fn1`",
                    "select#": 4,
                    "materialized": true
                  }
                },
                {
                  "expanded_query": "/* select#3 */ select `fn1`.`id` AS `id`,substring_index(group_concat(`fn1`.`arrivalairport` order by `fn1`.`segment` ASC separator ','),',',1) AS `calculated_arrival`,group_concat(`fn1`.`arrivalairport` order by `fn1`.`segment` ASC separator ',') AS `concated_arrival` from (/* select#4 */ select 'ABC' AS `arrivalairport`,1 AS `segment`,123 AS `id`) `fn1` group by `fn1`.`id`"
                }
              ]
            }
          },
          {
            "derived": {
              "table": " `f1`",
              "select#": 3,
              "materialized": true
            }
          },
          {
            "expanded_query": "/* select#1 */ select `b`.`id` AS `id`,`f1`.`calculated_arrival` AS `calc_arrival`,`f1`.`concated_arrival` AS `con_arr`,concat('[',`f1`.`calculated_arrival`,']') AS `zfh`,length(`f1`.`calculated_arrival`) AS `len_calc`,hex(`f1`.`calculated_arrival`) AS `hex_calc`,length(`f1`.`concated_arrival`) AS `len_con`,hex(`f1`.`concated_arrival`) AS `hex_con` from ((/* select#2 */ select 123 AS `id`) `b` left join (/* select#3 */ select `fn1`.`id` AS `id`,substring_index(group_concat(`fn1`.`arrivalairport` order by `fn1`.`segment` ASC separator ','),',',1) AS `calculated_arrival`,group_concat(`fn1`.`arrivalairport` order by `fn1`.`segment` ASC separator ',') AS `concated_arrival` from (/* select#4 */ select 'ABC' AS `arrivalairport`,1 AS `segment`,123 AS `id`) `fn1` group by `fn1`.`id`) `f1` on((`b`.`id` = `f1`.`id`)))"
          },
          {
            "transformations_to_nested_joins": {
              "transformations": [
                "parenthesis_removal"
              ],
              "expanded_query": "/* select#1 */ select `b`.`id` AS `id`,`f1`.`calculated_arrival` AS `calc_arrival`,`f1`.`concated_arrival` AS `con_arr`,concat('[',`f1`.`calculated_arrival`,']') AS `zfh`,length(`f1`.`calculated_arrival`) AS `len_calc`,hex(`f1`.`calculated_arrival`) AS `hex_calc`,length(`f1`.`concated_arrival`) AS `len_con`,hex(`f1`.`concated_arrival`) AS `hex_con` from (/* select#2 */ select 123 AS `id`) `b` left join (/* select#3 */ select `fn1`.`id` AS `id`,substring_index(group_concat(`fn1`.`arrivalairport` order by `fn1`.`segment` ASC separator ','),',',1) AS `calculated_arrival`,group_concat(`fn1`.`arrivalairport` order by `fn1`.`segment` ASC separator ',') AS `concated_arrival` from (/* select#4 */ select 'ABC' AS `arrivalairport`,1 AS `segment`,123 AS `id`) `fn1` group by `fn1`.`id`) `f1` on((`b`.`id` = `f1`.`id`))"
            }
          }
        ]
      }
    },
    {
      "join_optimization": {
        "select#": 1,
        "steps": [
          {
            "join_optimization": {
              "select#": 2,
              "steps": [
              ]
            }
          },
          {
            "creating_tmp_table": {
              "tmp_table_info": {
                "table": "b",
                "columns": 1,
                "row_length": 5,
                "key_length": 0,
                "unique_constraint": false,
                "makes_grouped_rows": false,
                "cannot_insert_duplicates": false,
                "location": "memory (heap)",
                "row_limit_estimate": 26843545
              }
            }
          },
          {
            "join_execution": {
              "select#": 2,
              "steps": [
              ]
            }
          },
          {
            "join_optimization": {
              "select#": 3,
              "steps": [
                {
                  "join_optimization": {
                    "select#": 4,
                    "steps": [
                    ]
                  }
                },
                {
                  "creating_tmp_table": {
                    "tmp_table_info": {
                      "table": "fn1",
                      "columns": 3,
                      "row_length": 22,
                      "key_length": 0,
                      "unique_constraint": false,
                      "makes_grouped_rows": false,
                      "cannot_insert_duplicates": false,
                      "location": "memory (heap)",
                      "row_limit_estimate": 6100805
                    }
                  }
                },
                {
                  "join_execution": {
                    "select#": 4,
                    "steps": [
                    ]
                  }
                },
                {
                  "substitute_generated_columns": {
                  }
                },
                {
                  "table_dependencies": [
                    {
                      "table": " `fn1`",
                      "row_may_be_null": false,
                      "map_bit": 0,
                      "depends_on_map_bits": [
                      ]
                    }
                  ]
                },
                {
                  "rows_estimation": [
                    {
                      "table": " `fn1`",
                      "rows": 1,
                      "cost": 1,
                      "table_type": "system",
                      "empty": false
                    }
                  ]
                },
                {
                  "attaching_conditions_to_tables": {
                    "original_condition": null,
                    "attached_conditions_computation": [
                    ],
                    "attached_conditions_summary": [
                    ]
                  }
                },
                {
                  "optimizing_distinct_group_by_order_by": {
                  }
                },
                {
                  "refine_plan": [
                  ]
                }
              ]
            }
          },
          {
            "creating_tmp_table": {
              "tmp_table_info": {
                "table": "f1",
                "columns": 3,
                "row_length": 18,
                "key_length": 0,
                "unique_constraint": false,
                "makes_grouped_rows": false,
                "cannot_insert_duplicates": false,
                "location": "disk (InnoDB)",
                "record_format": "packed"
              }
            }
          },
          {
            "join_execution": {
              "select#": 3,
              "steps": [
              ]
            }
          },
          {
            "condition_processing": {
              "condition": "WHERE",
              "original_condition": null,
              "steps": [
                {
                  "transformation": "equality_propagation",
                  "resulting_condition": null
                }
              ]
            }
          },
          {
            "table_dependencies": [
              {
                "table": " `b`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ]
              },
              {
                "table": " `f1`",
                "row_may_be_null": true,
                "map_bit": 1,
                "depends_on_map_bits": [
                  0
                ]
              }
            ]
          },
          {
            "ref_optimizer_key_uses": [
            ]
          },
          {
            "rows_estimation": [
              {
                "table": " `b`",
                "rows": 1,
                "cost": 1,
                "table_type": "system",
                "empty": false
              },
              {
                "table": " `f1`",
                "rows": 1,
                "cost": 1,
                "table_type": "system",
                "empty": false
              }
            ]
          },
          {
            "condition_on_constant_tables": "true",
            "condition_value": true
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": "true",
              "attached_conditions_computation": [
              ],
              "attached_conditions_summary": [
              ]
            }
          },
          {
            "refine_plan": [
            ]
          }
        ]
      }
    },
    {
      "join_execution": {
        "select#": 1,
        "steps": [
        ]
      }
    }
  ]
}

Suggested Impact Statement

This is a wrong-result issue. The query completes successfully and returns no error or warning, but one projected expression returns an incorrect empty string while another expression based on the same GROUP_CONCAT() value returns the expected value.

This can lead to silent data corruption at the application level when group_concat_max_len is configured to affected values.

THX and Greetings
Thomas