Pgbouncer server DNS lookup failed

Hello,

I have installed a Percona Postgres cluster using the following supplemental values:

backup:
  volumeSpec:
    size: 20G  # or whatever the maximum amount of storage space required
    storageclass: longhorn

pgPrimary:
  volumeSpec:
    size: 20G  # or whatever the maximum amount of storage space required
    storageclass: longhorn

replicas:
  volumeSpec:
    size: 20G  # or whatever the maximum amount of storage space required
    storageclass: longhorn

pgBouncer:
  size: 2

replicas:
  size: 1

Next I installed using:

helm install postgres-test percona/pg-db --version 1.3.1 --namespace development-database -f percona-postgres-test.yml

I then grabbed the username and password using:

echo $(kubectl -n development-database get secrets test-pg-db-pguser-secret -o jsonpath="{.data.username}" | base64 --decode)
echo $(kubectl -n development-database get secrets postgres-test-pg-db-pguser-secret -o jsonpath="{.data.username}" | base64 --decode)

NOTE: as an aside I noticed that if you uninstall and reinstall the cluster the operator correctly picks up the storage volume that has been used before but DOES NOT reuse the password. This leads to a situation where the database contains one password and the secret contains another.

Unfortunately, when I try to connect to the database using:

kubectl run -n development-database -i --rm --tty percona-client --image=perconalab/percona-distribution-postgresql:15.1 --restart=Never -- psql "postgres://[my-password]@postgres-test-pg-db-pgbouncer.development-database.svc.cluster.local/pgdb"

(where [my-password] the the password that came from the secret)

I get:

If you don't see a command prompt, try pressing enter.
psql: error: connection to server at "postgres-test-pg-db-pgbouncer.development-database.svc.cluster.local" (10.43.151.177), port 5432 failed: FATAL:  query_wait_timeout
pod "percona-client" deleted
pod development-database/percona-client terminated (Error)

and the pgbouncer logs contains:

postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:03:04.445 UTC [34] WARNING DNS lookup failed: postgres-test-pg-db: result=0
postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:03:04.445 UTC [34] LOG S-0x560348b3c820: pgdb/pgbouncer@(bad-af):0 closing because: server DNS lookup failed (age=0s)

I then tried to connect from the database server itself using:

kubectl exec -it postgres-test-pg-db-747cb8c964-g47cz bash

PGPASSWORD=[my-password] psql -h postgres-test-pg-db-pgbouncer.development-database.svc.cluster.local -U pgadmin postgres

I get:

psql: error: connection to server at "postgres-test-pg-db-pgbouncer.development-database.svc.cluster.local" (10.43.151.177), port 5432 failed: FATAL:  query_wait_timeout

with the pgbouncer logs showing:

postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:14:23.430 UTC [34] LOG C-0x560348b358c0: postgres/(nouser)@10.42.0.55:35402 closing because: server login has been failing, try again later (server_login_retry) (age=0s)
postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:14:23.430 UTC [34] WARNING C-0x560348b358c0: postgres/(nouser)@10.42.0.55:35402 pooler error: server login has been failing, try again later (server_login_retry)
postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:14:23.430 UTC [34] LOG C-0x560348b358c0: postgres/(nouser)@10.42.0.55:35402 closing because: server login has been failing, try again later (server_login_retry) (age=0s)
postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:14:23.430 UTC [34] WARNING C-0x560348b358c0: postgres/(nouser)@10.42.0.55:35402 pooler error: server login has been failing, try again later (server_login_retry)
postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:14:23.433 UTC [34] WARNING DNS lookup failed: postgres-test-pg-db: result=0
postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:14:23.433 UTC [34] LOG S-0x560348b3c820: postgres/pgbouncer@(bad-af):0 closing because: server DNS lookup failed (age=0s)
postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:14:23.433 UTC [34] WARNING DNS lookup failed: postgres-test-pg-db: result=0
postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:14:23.433 UTC [34] LOG S-0x560348b3c820: postgres/pgbouncer@(bad-af):0 closing because: server DNS lookup failed (age=0s)

Bad login and DNS issue.

Strangely if I log into the pgbouncer pod and check to see if it can resolve the name I get:

kubectl exec -it postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj bash -n development-database

nslookup postgres-test-pg-db-pgbouncer.development-database.svc.cluster.local
or
nslookup postgres-test-pg-db-pgbouncer

I get:

Server:         10.43.0.10
Address:        10.43.0.10#53

Name:   postgres-test-pg-db-pgbouncer.development-database.svc.cluster.local
Address: 10.43.151.177

So… it CAN resolve the name… DNS is NOT an issue.

What’s up?? Everything is in the same zone, the same subnet and everything can see everything but I just can not connect through pgbouncer.

Also, a direct connection to the database (bypassing pgbouncer) using psql WORKS:

PGPASSWORD=[my-password] psql -h postgres-test-pg-db -U pguser postgres

So I know that the password is good, the username is good, it’s just the pgbouncer passthrough that’s wrong.

If useful I can include a copy of the pgbouncer configuration file, but I’ve not changed anything… it is how the operator made it.

Anyone have any idea what’s going on?

Next I am going to try to create a fresh cluster with a new persistent volume claim.

Thanks
Chris

1 Like

Hi christopher,
i have the same problem, and I noticed that in the dns queries on coredns a wrong call comes from pgbuncer.
Adds a “.” to the service name and the query fails:

[INFO] 10.42.1.85:46650 - 45863 "AAAA IN data-db-pg-db. udp 34 false 512" SERVFAIL qr,rd,ra 34 0.001903861s
[INFO] 10.42.1.85:46650 - 45863 "AAAA IN data-db-pg-db. udp 34 false 512" SERVFAIL qr,aa,rd,ra 34 0.000106711s
[INFO] 10.42.1.85:46650 - 45863 "AAAA IN data-db-pg-db. udp 34 false 512" SERVFAIL qr,aa,rd,ra 34 0.000100125s
[INFO] 10.42.1.85:46650 - 45863 "AAAA IN data-db-pg-db. udp 34 false 512" SERVFAIL qr,aa,rd,ra 34 0.000089476s

If I edit the pgbouncer configmap manually and restart the deployment it works fine.
But I would like to do it via operator, in my case i use the PerconaPGCluster kind, how did you manage to solve it?

Hi @Gabriele_Pennacchia and @christopher,

had same issue fired above with same conditions.
As @Gabriele_Pennacchia pointed out in his comment I edited the pgBouncer ConfigMap, but I put the whole FQDN like: xxx.yyy.svc.cluster.local, restarted the deployment and now was able to connect to the pgbouncer server successfully.

Digging a bit deeper, if I try dig command from a test pod in the same ns and I get as following

root@test-pod:/# dig cluster1 

; <<>> DiG 9.18.12-0ubuntu0.22.04.1-Ubuntu <<>> cluster1
.....
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 98782c2fd530f98a (echoed)
;; QUESTION SECTION:
;cluster1.                      IN      A
....

root@test-pod:/# dig cluster1.pgo.svc.cluster.local

; <<>> DiG 9.18.12-0ubuntu0.22.04.1-Ubuntu <<>> cluster1.pgo.svc.cluster.local
....
;; ANSWER SECTION:
cluster1.pgo.svc.cluster.local. 5 IN    A       10.110.178.83
....

Could be the answer that miss the service IP address that cause issue…

Yes, that’s the problem.
But how do we edit it directly from the PerconaPGCluster manifest without editing the configmap manualy?

Please create ticket in Jira (jira.percona.com) in DISTPG project so we would be able to investigate this issue

Issue opened: [K8SPG-333] Pgbouncer server DNS lookup failed - Percona JIRA, opened in K8SPG, cause it is related to PG k8s operator