Galera Cluser hung state with wsrep: preparing for TO isolation

Hi,
I have a backend database for orchestrator running galera cluster. When the wsrep_OSU_method is TOI, whole cluster goes into hung state with all connections waiting on the DDL to complete.

When I change the wsrep_OSU_method to NBO, I get errors from orchestrator whenever I run any operation. ERROR Error 1235 (42000): This version of MySQL doesnt yet support this query in wsrep_OSU_method NBO

Please help as this is happening too frequently now.

This is correct, and expected behavior.

What operation are you trying to do? You should not be making any topology changes or other DDLs while an existing DDL is running.

It appears to be MySQL orchestrator trying to run
"create database if not exists orchestrator " – This is run often.

Any other alternatives to avoid this behavior @matthewb ?

NBO does not support “CREATE DATABASE …”

You can change the wsrep_osu_method on a per-session basis, so keep the global setting in TOI, but change it for the session to NBO to accomplish your long-running DDL.

I would investigate why; this should run once and never again. Find a way to turn this off.

@matthewb I really doubt orchestrator has any such option of doing specific ones per session. Please correct me if I am wrong here.

Also, create database appears to be on this line. Worst case I can may be disable and have the database created prior to starting the orchestrator service.

Do you happen to know any option to enable create database to run only once ?

Are you using your main PXC as both application DB and Orchestrator backend? This is not recommended if that is the case.

Also, looking at the code you provided, it says “first time ever we talk to MySQL” so Orc should only be executing this DDL once. If this is executing often, then you should investigate if Orc is restarting on its own, or otherwise crashing/restarting.