Does the replica has same hardware as primary ?
Did you monitor system resources to find if the bottleneck is CPU/disk/concurrency? or if the replica is swapping?
If SBM increases it can be either to system resources not being enough for the workload, it can be the workload is not parallelizable (i.e if there is a hot table that gets most of the writes) , or if there are long running/frequent DDLs that can be executing on parallel on the primary, but the replica requires the DDL to finish executing before being able to continue applying relay logs.
Above said, it seems you already have configured parallel replication correctly, but it’s likely either the system resources, other configuration or the workload makes the replica lag behind and requires further analysis.