Pgbouncer server DNS lookup failed

Hello,

I have installed a Percona Postgres cluster using the following supplemental values:

backup:
  volumeSpec:
    size: 20G  # or whatever the maximum amount of storage space required
    storageclass: longhorn

pgPrimary:
  volumeSpec:
    size: 20G  # or whatever the maximum amount of storage space required
    storageclass: longhorn

replicas:
  volumeSpec:
    size: 20G  # or whatever the maximum amount of storage space required
    storageclass: longhorn

pgBouncer:
  size: 2

replicas:
  size: 1

Next I installed using:

helm install postgres-test percona/pg-db --version 1.3.1 --namespace development-database -f percona-postgres-test.yml

I then grabbed the username and password using:

echo $(kubectl -n development-database get secrets test-pg-db-pguser-secret -o jsonpath="{.data.username}" | base64 --decode)
echo $(kubectl -n development-database get secrets postgres-test-pg-db-pguser-secret -o jsonpath="{.data.username}" | base64 --decode)

NOTE: as an aside I noticed that if you uninstall and reinstall the cluster the operator correctly picks up the storage volume that has been used before but DOES NOT reuse the password. This leads to a situation where the database contains one password and the secret contains another.

Unfortunately, when I try to connect to the database using:

kubectl run -n development-database -i --rm --tty percona-client --image=perconalab/percona-distribution-postgresql:15.1 --restart=Never -- psql "postgres://[my-password]@postgres-test-pg-db-pgbouncer.development-database.svc.cluster.local/pgdb"

(where [my-password] the the password that came from the secret)

I get:

If you don't see a command prompt, try pressing enter.
psql: error: connection to server at "postgres-test-pg-db-pgbouncer.development-database.svc.cluster.local" (10.43.151.177), port 5432 failed: FATAL:  query_wait_timeout
pod "percona-client" deleted
pod development-database/percona-client terminated (Error)

and the pgbouncer logs contains:

postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:03:04.445 UTC [34] WARNING DNS lookup failed: postgres-test-pg-db: result=0
postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:03:04.445 UTC [34] LOG S-0x560348b3c820: pgdb/pgbouncer@(bad-af):0 closing because: server DNS lookup failed (age=0s)

I then tried to connect from the database server itself using:

kubectl exec -it postgres-test-pg-db-747cb8c964-g47cz bash

PGPASSWORD=[my-password] psql -h postgres-test-pg-db-pgbouncer.development-database.svc.cluster.local -U pgadmin postgres

I get:

psql: error: connection to server at "postgres-test-pg-db-pgbouncer.development-database.svc.cluster.local" (10.43.151.177), port 5432 failed: FATAL:  query_wait_timeout

with the pgbouncer logs showing:

postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:14:23.430 UTC [34] LOG C-0x560348b358c0: postgres/(nouser)@10.42.0.55:35402 closing because: server login has been failing, try again later (server_login_retry) (age=0s)
postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:14:23.430 UTC [34] WARNING C-0x560348b358c0: postgres/(nouser)@10.42.0.55:35402 pooler error: server login has been failing, try again later (server_login_retry)
postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:14:23.430 UTC [34] LOG C-0x560348b358c0: postgres/(nouser)@10.42.0.55:35402 closing because: server login has been failing, try again later (server_login_retry) (age=0s)
postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:14:23.430 UTC [34] WARNING C-0x560348b358c0: postgres/(nouser)@10.42.0.55:35402 pooler error: server login has been failing, try again later (server_login_retry)
postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:14:23.433 UTC [34] WARNING DNS lookup failed: postgres-test-pg-db: result=0
postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:14:23.433 UTC [34] LOG S-0x560348b3c820: postgres/pgbouncer@(bad-af):0 closing because: server DNS lookup failed (age=0s)
postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:14:23.433 UTC [34] WARNING DNS lookup failed: postgres-test-pg-db: result=0
postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj pgbouncer 2023-01-23 12:14:23.433 UTC [34] LOG S-0x560348b3c820: postgres/pgbouncer@(bad-af):0 closing because: server DNS lookup failed (age=0s)

Bad login and DNS issue.

Strangely if I log into the pgbouncer pod and check to see if it can resolve the name I get:

kubectl exec -it postgres-test-pg-db-pgbouncer-645df74c5f-6pbpj bash -n development-database

nslookup postgres-test-pg-db-pgbouncer.development-database.svc.cluster.local
or
nslookup postgres-test-pg-db-pgbouncer

I get:

Server:         10.43.0.10
Address:        10.43.0.10#53

Name:   postgres-test-pg-db-pgbouncer.development-database.svc.cluster.local
Address: 10.43.151.177

So… it CAN resolve the name… DNS is NOT an issue.

What’s up?? Everything is in the same zone, the same subnet and everything can see everything but I just can not connect through pgbouncer.

Also, a direct connection to the database (bypassing pgbouncer) using psql WORKS:

PGPASSWORD=[my-password] psql -h postgres-test-pg-db -U pguser postgres

So I know that the password is good, the username is good, it’s just the pgbouncer passthrough that’s wrong.

If useful I can include a copy of the pgbouncer configuration file, but I’ve not changed anything… it is how the operator made it.

Anyone have any idea what’s going on?

Next I am going to try to create a fresh cluster with a new persistent volume claim.

Thanks
Chris

1 Like

Hi christopher,
i have the same problem, and I noticed that in the dns queries on coredns a wrong call comes from pgbuncer.
Adds a “.” to the service name and the query fails:

[INFO] 10.42.1.85:46650 - 45863 "AAAA IN data-db-pg-db. udp 34 false 512" SERVFAIL qr,rd,ra 34 0.001903861s
[INFO] 10.42.1.85:46650 - 45863 "AAAA IN data-db-pg-db. udp 34 false 512" SERVFAIL qr,aa,rd,ra 34 0.000106711s
[INFO] 10.42.1.85:46650 - 45863 "AAAA IN data-db-pg-db. udp 34 false 512" SERVFAIL qr,aa,rd,ra 34 0.000100125s
[INFO] 10.42.1.85:46650 - 45863 "AAAA IN data-db-pg-db. udp 34 false 512" SERVFAIL qr,aa,rd,ra 34 0.000089476s

If I edit the pgbouncer configmap manually and restart the deployment it works fine.
But I would like to do it via operator, in my case i use the PerconaPGCluster kind, how did you manage to solve it?

Hi @Gabriele_Pennacchia and @christopher,

had same issue fired above with same conditions.
As @Gabriele_Pennacchia pointed out in his comment I edited the pgBouncer ConfigMap, but I put the whole FQDN like: xxx.yyy.svc.cluster.local, restarted the deployment and now was able to connect to the pgbouncer server successfully.

Digging a bit deeper, if I try dig command from a test pod in the same ns and I get as following

root@test-pod:/# dig cluster1 

; <<>> DiG 9.18.12-0ubuntu0.22.04.1-Ubuntu <<>> cluster1
.....
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 98782c2fd530f98a (echoed)
;; QUESTION SECTION:
;cluster1.                      IN      A
....

root@test-pod:/# dig cluster1.pgo.svc.cluster.local

; <<>> DiG 9.18.12-0ubuntu0.22.04.1-Ubuntu <<>> cluster1.pgo.svc.cluster.local
....
;; ANSWER SECTION:
cluster1.pgo.svc.cluster.local. 5 IN    A       10.110.178.83
....

Could be the answer that miss the service IP address that cause issue…

Yes, that’s the problem.
But how do we edit it directly from the PerconaPGCluster manifest without editing the configmap manualy?

Please create ticket in Jira (jira.percona.com) in DISTPG project so we would be able to investigate this issue

Issue opened: [K8SPG-333] Pgbouncer server DNS lookup failed - Percona JIRA, opened in K8SPG, cause it is related to PG k8s operator

This is still happening with v2 operator

dns gives response only for FQDM

bash-4.4$ dig pg-cluster-pg-db-pgbouncer

; <<>> DiG 9.11.36-RedHat-9.11.36-8.el8 <<>> pg-cluster-pg-db-pgbouncer
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 40499
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: a12e6964752a4752 (echoed)
;; QUESTION SECTION:
;pg-cluster-pg-db-pgbouncer.    IN      A

;; Query time: 61 msec
;; SERVER: 10.43.0.10#53(10.43.0.10)
;; WHEN: Mon Aug 14 07:34:05 UTC 2023
;; MSG SIZE  rcvd: 67

bash-4.4$ dig pg-cluster-pg-db-pgbouncer.percona.svc.cluster.local

; <<>> DiG 9.11.36-RedHat-9.11.36-8.el8 <<>> pg-cluster-pg-db-pgbouncer.percona.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14358
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 7a3ec8287a638307 (echoed)
;; QUESTION SECTION:
;pg-cluster-pg-db-pgbouncer.percona.svc.cluster.local. IN A

;; ANSWER SECTION:
pg-cluster-pg-db-pgbouncer.percona.svc.cluster.local. 30 IN A 10.43.109.10

;; Query time: 1 msec
;; SERVER: 10.43.0.10#53(10.43.0.10)
;; WHEN: Mon Aug 14 07:34:21 UTC 2023
;; MSG SIZE  rcvd: 161

maybe has something to do with this code

(https://github.com/percona/percona-postgresql-operator/blob/main/internal/naming/dns.go)

I never did solve it. It’s still an issue for us and is keeping us from using pgbouncer.

In fact, I am seeing a sililar issue on the 2.2.0 operator on minikube. See: Turn off SSL when connecting to bouncer - PostgreSQL / Percona Operator for PostgreSQL - Percona Community Forum

If somebody needs this, I solved it with rewrite rule mapping in coredns, but it’s not the real solution for this issue

(https://coredns.io/2017/05/08/custom-dns-entries-for-kubernetes/)

Good idea!! I’ll try that now.

There may be another way that is baked into pgbouncer. The /docs/content/tutorial/connection-pooling.md file (in the repo) describes a way to change cr.yaml to inject other databases into pgbouncer.ini. It says:

There are several ways you can customize the configuration:

  • spec.proxy.pgBouncer.config.global: Accepts key-value pairs that apply changes globally to PgBouncer.
  • spec.proxy.pgBouncer.config.databases: Accepts key-value pairs that represent PgBouncer database definitions.

I believe this means that we can add a spec.proxy.pgBouncer.config.databases section to cr.yaml something like this:

spec:
  proxy:
    pgBouncer:
       config:
         databases:
         - developer: "host=pg-cluster1-ha port=5432"

but this did not work. Applying cr.yaml did not generate a cluster so I am sure that I got the syntax wrong.

The online docs do not show this option: Custom Resource options - Percona Operator for PostgreSQL so perhaps it is something new or perhaps they have not got around to properly implementing it yet. Once configured/working it should allow us to define our own, specific database / host mappings in pgbouncer.ini

I have also posted this question here: Pgbouncer not finding cluster1-primary - PostgreSQL - Percona Community Forum

Seems strange that I independently ran into the same issue twice when approaching deployment in 2 different ways. I wonder why they have not dealt with it yet. I raised this issue way back in January!

Does anyone here know how to raise this as a bug report on Jira? The existing issue is marked for resolution on version 1.4 so I’m not sure that it will be addressed on version 2. I have tried signing up for an account on Percona Jira but it does not seem to work.

It’s even worse in version 2 of the operator. If you edit the configmap (in place) then the operators just changes it back.

According to the docs in the repo there is a way to add global and db specific configuration entries in the pgbouncer.ini file but I’ve not yet figured out how to do that… See my recent post re this new/untested/un-online-documented capability.

I cannot seem to be able to create a jira account… every time I try it just returns me to the ‘provide your details’ screen.

Is there some magic that I am not properly performing to create an account. The last issue posted, noted that the issue related to version 1.4. This also seems to be a problem in version 2.2.0 but now much more serious because one can no longer edit the configMap directly.

1 Like

hmmm…still not working! I am detailing for anyone who happens across this thread in future.

I have added this line to coredns config:

rewrite name pg-cluster1-primary.default.svc.cluster pg-cluster1-ha.default.svc.cluster
rewrite name pg-cluster1-primary pg-cluster1-ha.default.svc.cluster

(just in case I needed both)

and I know that rewrite works and can be resolved in the pgbouncer pod because in the pgbouncer pod I can see that IP as follows:

$ kubectl exec -it pg-cluster1-pgbouncer-57fb5cfbfd-rfg87 bash
$ bash-4.4$ nslookup
> pg-cluster1-primary
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   pg-cluster1-primary.default.svc.cluster.local
Address: 10.103.237.221
>

(10.103.237.221 is the same IP as pg-cluster1-ha)

From elsewhere:

$ ping  pg-cluster1-pgbouncer
PING pg-cluster1-pgbouncer.default.svc.cluster.local (10.111.245.47) 56(84) bytes of data.

When I use:

$ psql -h pg-cluster1-primary -U developer
psql (15.4 (Ubuntu 15.4-1.pgdg22.04+1), server 15.2 - Percona Distribution)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

developer=#

it works… coredns rewrite seems to be doing it’s job.

but when I try to:

$ psql -h pg-cluster1-pgbouncer -U developer
psql: error: connection to server at "pg-cluster1-pgbouncer" (10.111.245.47), port 5432 failed: FATAL:  server login has been failing, try again later (server_login_retry)
connection to server at "pg-cluster1-pgbouncer" (10.111.245.47), port 5432 failed: FATAL:  SSL required

(or it hangs)

and the pgbouncer logs show:

2023-09-05 14:40:26.215 UTC [7] LOG S-0x55b6fd6c9540: developer/_crunchypgbouncer@10.103.237.221:5432 new connection to server (from 10.244.0.75:38056)
2023-09-05 14:40:26.217 UTC [7] LOG S-0x55b6fd6c9540: developer/_crunchypgbouncer@10.103.237.221:5432 SSL established: TLSv1.3/TLS_AES_256_GCM_SHA384/ECDH=prime256v1
2023-09-05 14:40:54.845 UTC [7] LOG C-0x55b6fd6b8900: developer/(nouser)@10.244.0.40:59998 login failed: db=developer
2023-09-05 14:40:54.845 UTC [7] LOG C-0x55b6fd6b8900: developer/(nouser)@10.244.0.40:59998 closing because: no such user (age=0s)
2023-09-05 14:40:54.845 UTC [7] LOG S-0x55b6fd6c9540: developer/_crunchypgbouncer@10.103.237.221:5432 closing because: client disconnect before everything was sent to the server (age=28s)
2023-09-05 14:40:54.845 UTC [7] WARNING C-0x55b6fd6b8900: developer/(nouser)@10.244.0.40:59998 pooler error: no such user
2023-09-05 14:40:54.845 UTC [7] LOG C-0x55b6fd6b8900: (nodb)/(nouser)@10.244.0.40:60002 closing because: SSL required (age=0s)
2023-09-05 14:40:54.845 UTC [7] WARNING C-0x55b6fd6b8900: (nodb)/(nouser)@10.244.0.40:60002 pooler error: SSL required

(could the pgbouncer errors be more useless? Everything to is seems to be an SSL error.)

So… one step closer… looks like pgbouncer is rejecting the user I specified.

pgbouncer is configured as follows:

apiVersion: v1
data:
  pgbouncer-empty: ""
  pgbouncer.ini: |
    # Generated by postgres-operator. DO NOT EDIT.
    # Your changes will not be saved.

    [pgbouncer]
    %include /etc/pgbouncer/pgbouncer.ini

    [pgbouncer]
    auth_file = /etc/pgbouncer/~postgres-operator/users.txt
    auth_query = SELECT username, password from pgbouncer.get_auth($1)
    auth_user = _crunchypgbouncer
    client_tls_ca_file = /etc/pgbouncer/~postgres-operator/frontend-ca.crt
    client_tls_cert_file = /etc/pgbouncer/~postgres-operator/frontend-tls.crt
    client_tls_key_file = /etc/pgbouncer/~postgres-operator/frontend-tls.key
    client_tls_sslmode = require
    conffile = /etc/pgbouncer/~postgres-operator.ini
    ignore_startup_parameters = extra_float_digits
    listen_addr = *
    listen_port = 5432
    server_tls_ca_file = /etc/pgbouncer/~postgres-operator/backend-ca.crt
    server_tls_sslmode = verify-full
    unix_socket_dir =

    [databases]
    * = host=pg-cluster1-primary port=5432
kind: ConfigMap
metadata:
  creationTimestamp: "2023-09-04T23:59:18Z"
  labels:
    pgv2.percona.com/version: 2.2.0
    postgres-operator.crunchydata.com/cluster: pg-cluster1
    postgres-operator.crunchydata.com/role: pgbouncer
  name: pg-cluster1-pgbouncer
  namespace: default
  ownerReferences:
  - apiVersion: postgres-operator.crunchydata.com/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: PostgresCluster
    name: pg-cluster1
    uid: 2bc2d171-541f-4443-8283-eedc39ce57ab
  resourceVersion: "409939"
  uid: e7806c04-d5d2-4d70-ae77-cf8638d93a1f

with auth users coning from : /etc/pgbouncer/~postgres-operator/users.txt , which contains:

"_crunchypgbouncer" "R3QxELX;xsab<]CFhq7@dH1PKXV}Mp}n"

I’m not sure who the ‘_crunchypgbouncer’ user is or why it’s there but ok… it seems to be in there by default so there must be a reason.

The developer user has already been defined in cr.yaml (in spec.users)… so why is it not defined in pgbouncer. Is this actually working at all? For anyone?

Looking at the online docs: Custom Resource options - Percona Operator for PostgreSQL

Again the repo doc files contain a reference to this in the tutorial files : percona-postgresql-operator/docs/content/tutorial/connection-pooling.md at main · percona/percona-postgresql-operator (github.com) as follows:

PgBouncer configuration can be customized through spec.proxy.pgBouncer.config. After making configuration changes, PGO will roll them out to any PgBouncer instance and automatically issue a "reload".

There are several ways you can customize the configuration:
* spec.proxy.pgBouncer.config.users: Accepts key-value pairs that represent connection settings applied to specific users.
* spec.proxy.pgBouncer.config.files: Accepts a list of files that are mounted in the /etc/pgbouncer directory and loaded before any other options are considered using PgBouncer's include directive.

but, as before (with the databases config), there is NO specific indication of what those key-value pairs might be… What are the keys??

This is insanely frustrating… Been a week at this trying to get something that is normally incredibly easy to configure (without Percona)… pgbouncer, working…

… (some time later)

After more research… this all seems to be a CruncyData thing: Connection Pooling (crunchydata.com) … perhaps there are examples there!

… (some time later)

ok… Got it… The mistake was making my test user a SUPERUSER. Looks like superusers cannot use pgbouncer. If the user is not SUPERUSER then the CrunchyData code lets the user in… perhaps their code is doing the authenticating.

FYI anyone who’s reading… Percona is using the CrunchData PGO… so do check in thier docs for more recent info: Documentation | Enterprise PostgreSQL Support & Kubernetes Solutions | Crunchy Data

That just leaves the issue with DNS. @Slavisa_Milojkovic 's solution works but does require a bit of fiddling… So I am not marking as solved just yet.

1 Like

Hi everyone!
I hope my comment will be helpful.

I’ve changed two files:

  • pg-db/templates/cluster.yaml:
...
      customTLSSecret:
        name: {{ .Values.proxy.pgBouncer.customTLSSecret.name }}
      {{- end }}

      {{- if .Values.proxy.pgBouncer.config }}
      {{ print "config:" }}
            {{- toYaml .Values.proxy.pgBouncer.config | indent 8 }}
      {{- end }}

      {{- if .Values.proxy.pgBouncer.topologySpreadConstraints }}
...
  • values.yaml:
...
proxy:
  pgBouncer:
    config: 
    config:
      databases:
        '*': 'host=postgresql-pg-db-primary.<your_namespace>.svc.cluster.local. port=5432'
      global:
        ignore_startup_parameters: "extra_float_digits,search_path"
        pool_mode: "session"
...

And it works fine!

@tty8747 To address this issue, we need to reproduce it, but unfortunately, we’re unable to do so at the moment. Could you please provide more detailed information about your deployment, including the Kubernetes version and the platform you are using?