Application connection failing intermittently with MySQL Exception

Dear Team,

Our application team is facing strange intermittent connection drop issue with below logs being captured from application side. However, from database side we dont see any errors or load on the srevers is very minimal.

Below is the configuration:
Innodb Cluster ( 3 node - 1 RW and 2 RO) with GTID replication. We have mysqlrouter with keepalive configured.

Below is the log extract:

Error Message: ------------------------------ERROR START-----------------------------------

Error Message: CheckSocialSubscriberCredentials

Error Message: HandlingInstanceID: 1cb6322a-584a-4dce-bb50-38dffd227325

An exception of type ‘MySql.Data.MySqlClient.MySqlException’ occurred and was caught.


09/01/2023 17:52:18

Type : MySql.Data.MySqlClient.MySqlException, MySql.Data, Version=8.0.32.0, Culture=neutral, PublicKeyToken=c5687fc88969c44d

Message : Unable to connect to any of the specified MySQL hosts.

Source : MySql.Data

Help link :

Number : 1042

SqlState :

Code : 0

ErrorCode : -2147467259

Data : System.Collections.ListDictionaryInternal

TargetSite : Void Open()

HResult : -2147467259

Stack Trace : at MySql.Data.MySqlClient.NativeDriver.Open()

at MySql.Data.MySqlClient.Driver.Open()

at MySql.Data.MySqlClient.Driver.Create(MySqlConnectionStringBuilder settings)

at MySql.Data.MySqlClient.MySqlPool.CreateNewPooledConnection()

at MySql.Data.MySqlClient.MySqlPool.GetPooledConnection()

at MySql.Data.MySqlClient.MySqlPool.TryToGetDriver()

at MySql.Data.MySqlClient.MySqlPool.GetConnection()

at MySql.Data.MySqlClient.MySqlConnection.Open()

at Microsoft.Practices.EnterpriseLibrary.Data.Database.GetNewOpenConnection()

at Microsoft.Practices.EnterpriseLibrary.Data.Database.GetWrappedConnection()

at Microsoft.Practices.EnterpriseLibrary.Data.Database.ExecuteReader(DbCommand command)

at Helper.Database.MySqlManager.ExecuteReader(DbCommand dbCommand)

at eMDMS.Domain.SubscriberDetails.DAL.GetExternalLoginInformationusingUAEPASS(LoginRequestDTO data)

at eMDMS.Domain.SubscriberDetails.SubscriberBL.eMDMS.Domain.SubscriberDetails.ISubscriber.GetExternalLoginInformationusingUAEPASS(LoginRequestDTO loginData)

at eMDMS.Facade.SubscriberFacade.GetExternalLoginInformationusingUAEPASS(LoginRequestDTO data)

at eMDMS.Discovery.Subscriber.GetExternalLoginInformationusingUAEPASS(LoginRequestDTO data)

Additional Info:

MachineName : XXXXXXXXXS001B

TimeStamp : 01-09-2023 13:52:18

FullName : Microsoft.Practices.EnterpriseLibrary.ExceptionHandling, Version=5.0.414.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35

AppDomainName : /LM/W3SVC/2/ROOT-1-133380380334763419

ThreadIdentity :

WindowsIdentity : NT AUTHORITY\NETWORK SERVICE

          Inner Exception

          ---------------

          Type : MySql.Data.MySqlClient.MySqlException, MySql.Data, Version=8.0.32.0, Culture=neutral, PublicKeyToken=c5687fc88969c44d

          Message : Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.

          Source : MySql.Data

          Help link : 

          Number : 0

          SqlState : 

          Code : 0

          ErrorCode : -2147467259

          Data : System.Collections.ListDictionaryInternal

          TargetSite : System.IO.Stream GetTcpStream(MySql.Data.MySqlClient.MySqlConnectionStringBuilder, MySql.Data.Common.MyNetworkStream ByRef)

          HResult : -2147467259

          Stack Trace :    at MySql.Data.Common.StreamCreator.GetTcpStream(MySqlConnectionStringBuilder settings, MyNetworkStream& networkStream)

             at MySql.Data.MySqlClient.NativeDriver.Open()



                         Inner Exception

                         ---------------

                         Type : System.TimeoutException, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089

                         Message : The operation has timed out.

                         Source : 

                         Help link : 

                         Data : System.Collections.ListDictionaryInternal

                         TargetSite : 

                         HResult : -2146233083

                         Stack Trace : The stack trace is unavailable.

Error Message: -------------------------------ERROR END------------------------------------

Kindly help me to understand the issue and suggestions on the error resolution.

How did you configure mysqlrouter? Did you bootstrap it correctly? What is its status? Have you viewed the logs of mysqlrouter to ensure that it has no issues connecting to the backends?

Hi @matthewb ,
mysqlrouter is running. I can see below latest mysqlrouter log entries.

2023-09-01 15:24:13 routing ERROR [7ff0dfefe700] classic::loop() processor failed: Bad message (generic:74)
2023-09-01 15:24:13 routing ERROR [7ff0dfefe700] classic::loop() processor failed: Bad message (generic:74)
2023-09-01 15:24:13 routing ERROR [7ff0e2f25700] classic::loop() processor failed: Bad message (generic:74)
2023-09-01 15:24:13 routing ERROR [7ff0de6da700] classic::loop() processor failed: Bad message (generic:74)
2023-09-01 15:24:13 routing ERROR [7ff0de6da700] classic::loop() processor failed: Bad message (generic:74)
2023-09-01 15:24:13 routing ERROR [7ff0e1722700] classic::loop() processor failed: Bad message (generic:74)
2023-09-01 15:24:13 routing ERROR [7ff0e2f25700] classic::loop() processor failed: Bad message (generic:74)
2023-09-01 15:24:13 routing ERROR [7ff0e2f25700] classic::loop() processor failed: Bad message (generic:74)
2023-09-01 15:24:13 routing ERROR [7ff0e06ff700] classic::loop() processor failed: Bad message (generic:74)
2023-09-01 15:24:13 routing ERROR [7ff0e1f23700] classic::loop() processor failed: Bad message (generic:74)
2023-09-02 12:50:08 metadata_cache INFO [7ff0d868c700] Potential changes detected in cluster after metadata refresh (view_id=3)
2023-09-02 12:50:08 metadata_cache INFO [7ff0d868c700] Target cluster(s) are part of a ClusterSet: accepting RW connections
2023-09-02 12:50:08 metadata_cache INFO [7ff0d868c700] Cluster ‘dr_esigcls’: role of a cluster within a ClusterSet is ‘primary’;
2023-09-02 12:50:08 metadata_cache INFO [7ff0d868c700] Metadata for cluster ‘dr_esigcls’ has 3 member(s), single-primary:
2023-09-02 12:50:08 metadata_cache INFO [7ff0d868c700] lxxxxx1b:15433 / 33060 - mode=RO
2023-09-02 12:50:08 metadata_cache INFO [7ff0d868c700] lxxxxx2b:15433 / 33060 - mode=n/a
2023-09-02 12:50:08 metadata_cache INFO [7ff0d868c700] lxxxxx3b:15433 / 33060 - mode=RW
2023-09-02 12:50:23 metadata_cache WARNING [7ff0d868c700] Member lxxxxx2b:15433 (eaeaa5b9-f95a-11ed-88b3-005056a420d3) defined in metadata not found in actual Group Replication
2023-09-02 12:50:23 metadata_cache INFO [7ff0d868c700] Potential changes detected in cluster after metadata refresh (view_id=3)
2023-09-02 12:50:23 metadata_cache INFO [7ff0d868c700] Target cluster(s) are part of a ClusterSet: accepting RW connections
2023-09-02 12:50:23 metadata_cache INFO [7ff0d868c700] Cluster ‘dr_esigcls’: role of a cluster within a ClusterSet is ‘primary’;
2023-09-02 12:50:23 metadata_cache INFO [7ff0d868c700] Metadata for cluster ‘dr_esigcls’ has 3 member(s), single-primary:
2023-09-02 12:50:23 metadata_cache INFO [7ff0d868c700] lxxxxx1b:15433 / 33060 - mode=RO
2023-09-02 12:50:23 metadata_cache INFO [7ff0d868c700] lxxxxx2b:15433 / 33060 - mode=n/a
2023-09-02 12:50:23 metadata_cache INFO [7ff0d868c700] lxxxxx3b:15433 / 33060 - mode=RW
2023-09-02 12:52:09 metadata_cache INFO [7ff0d868c700] Potential changes detected in cluster after metadata refresh (view_id=3)
2023-09-02 12:52:09 metadata_cache INFO [7ff0d868c700] Target cluster(s) are part of a ClusterSet: accepting RW connections
2023-09-02 12:52:09 metadata_cache INFO [7ff0d868c700] Cluster ‘dr_esigcls’: role of a cluster within a ClusterSet is ‘primary’;
2023-09-02 12:52:09 metadata_cache INFO [7ff0d868c700] Metadata for cluster ‘dr_esigcls’ has 3 member(s), single-primary:
2023-09-02 12:52:09 metadata_cache INFO [7ff0d868c700] lxxxxx1b:15433 / 33060 - mode=RO
2023-09-02 12:52:09 metadata_cache INFO [7ff0d868c700] lxxxxx2b:15433 / 33060 - mode=n/a
2023-09-02 12:52:09 metadata_cache INFO [7ff0d868c700] lxxxxx3b:15433 / 33060 - mode=RW
2023-09-02 12:52:10 metadata_cache INFO [7ff0d868c700] Potential changes detected in cluster after metadata refresh (view_id=3)
2023-09-02 12:52:10 metadata_cache INFO [7ff0d868c700] Target cluster(s) are part of a ClusterSet: accepting RW connections
2023-09-02 12:52:10 metadata_cache INFO [7ff0d868c700] Cluster ‘dr_esigcls’: role of a cluster within a ClusterSet is ‘primary’;
2023-09-02 12:52:10 metadata_cache INFO [7ff0d868c700] Metadata for cluster ‘dr_esigcls’ has 3 member(s), single-primary:
2023-09-02 12:52:10 metadata_cache INFO [7ff0d868c700] lxxxxx1b:15433 / 33060 - mode=RO
2023-09-02 12:52:10 metadata_cache INFO [7ff0d868c700] lxxxxx2b:15433 / 33060 - mode=RO
2023-09-02 12:52:10 metadata_cache INFO [7ff0d868c700] lxxxxx3b:15433 / 33060 - mode=RW
2023-09-03 18:38:42 metadata_cache WARNING [7ff0d868c700] Updating the authentication credentials failed: Error executing MySQL query “SELECT user, authentication_string, privileges, authentication_method FROM mysql_innodb_cluster_metadata.v2_router_rest_accounts WHERE cluster_id=(SELECT cluster_id FROM mysql_innodb_cluster_metadata.v2_clusters WHERE cluster_name=‘dr_esigcls’)”: Lost connection to MySQL server during query (2013)
2023-09-03 18:38:57 metadata_cache WARNING [7ff0d868c700] Failed connecting with Metadata Server lxxxxx1b:15433: Unknown MySQL server host ‘lxxxxx1b’ (-2) (2005)
2023-09-03 18:39:07 metadata_cache WARNING [7ff0d868c700] Failed connecting with Metadata Server lxxxxx2b:15433: Unknown MySQL server host ‘lxxxxx2b’ (-2) (2005)
2023-09-03 18:39:22 metadata_cache WARNING [7ff0d868c700] Failed connecting with Metadata Server lxxxxx1p:15433: Unknown MySQL server host ‘lxxxxx1p’ (-2) (2005)
2023-09-03 18:39:42 metadata_cache WARNING [7ff0d868c700] Failed connecting with Metadata Server lxxxxx2p:15433: Unknown MySQL server host ‘lxxxxx2p’ (-2) (2005)
2023-09-03 18:39:57 metadata_cache WARNING [7ff0d868c700] Failed connecting with Metadata Server lxxxxx1b:15433: Can’t connect to MySQL server on ‘lxxxxx1b:15433’ (110) (2003)
2023-09-03 18:40:17 metadata_cache WARNING [7ff0d868c700] While updating metadata, could not establish a connection to cluster ‘dr_esigcls’ through lxxxxx1b:15433
2023-09-03 18:40:32 metadata_cache INFO [7ff0d868c700] Connected with metadata server running on lxxxxx2p:15433
2023-09-03 18:40:37 metadata_cache INFO [7ff0d868c700] Connected with metadata server running on lxxxxx1b:15433
2023-09-03 18:40:37 metadata_cache INFO [7ff0d868c700] Connected with metadata server running on lxxxxx1p:15433
2023-09-05 10:29:53 routing ERROR [7ff0e06ff700] classic::loop() processor failed: Bad message (generic:74)

Hi @matthewb ,
Backend connections are happening fine but sudden exceptions occured which cause entire application to stall without allowing any login. After 5 minutes everythign was normal.

What’s the status of the cluster during this time? This sounds like the cluster is recovering from some event.

Cluster status is normal during the issue time.

Well, these messages from mysqlrouter say there’s some issue in connecting to the backends. If you say the cluster status is normal during this time, then there must be something else having an issue. DNS? It says “Unknown host”. When you see this error, can you connect directly using the mysql CLI client on the router host?

Hi @matthewb ,

We didn’t tried connecting usying mysql during that time. Actually, this incident reported to us after the issue resolved.

The next time this happens, check everything. Check connections directly to each MySQL, check connections through mysqlrouter, check status of cluster using mysqlsh. something went wrong, but without more information and more test, it is hard to determine what happened.

Sure @matthewb ,

Thank you for suggestion. We will try to have comprehensive checks. Since we have monitoring only for the service and we need to check a way for monitoring solution when service goes intermittently
Do we have any recommended parameters/values across the cluster for quorum or OS related (like internode communication network values or related to mysql database and router) that you can suggest or advice on the document related to these, will be of great help.

Since we don’t have much info especially, the system is not heavily loaded in terms of number of sessions or any other activities on the server, can you please suggest where else we can check for or what might cause a router to lose connection with backend database intermittently.

2023-09-03 18:38:57 metadata_cache WARNING [7ff0d868c700] Failed connecting with Metadata Server lxxxxx1b:15433: Unknown MySQL server host ‘lxxxxx1b’ (-2) (2005)

Regards,

Is there anyway once bootstrapping is done without any errors, it can fail intermittently.

Of course. There are many reasons why a cluster can fail: networking, DNS, OS, etc You need to diagnose these basic networking issues when they occur. I’ve seen many networks at the switch level be “flakey” and replacing the switch fixed all database issues. Nothing to do with the database itself.

You can monitor cluster status using these native tools:
https://dev.mysql.com/doc/mysql-shell/8.0/en/monitoring-innodb-cluster.html

Configure your monitoring to check this status. it’s all JSON so easily read/parsed by any language.