Pgbouncer not finding cluster1-primary

I started this journey thinking that the issue that is keeping pgbouncer from working was related to an SSL issue but I now thing that the issue is occurring because the cluster has configured pgbouncer to look for the primary database server on the URI: cluster1-primary (we renamed to: pg-cluster1 in our cr.yaml).

The [databases] section of the Operator generated pgbouncer.ini file looks like this:

    [databases]
    * = host=pg-cluster1-primary port=5432

and services that the Operator generated look like this:

❯ Kubectl get services -o wide
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE     SELECTOR
kubernetes              ClusterIP   10.96.0.1        <none>        443/TCP    5d4h    <none>
pg-cluster1-ha          ClusterIP   10.103.237.221   <none>        5432/TCP   7m20s   <none>
pg-cluster1-ha-config   ClusterIP   None             <none>        <none>     7m20s   <none>
pg-cluster1-pgbouncer   ClusterIP   10.111.245.47    <none>        5432/TCP   7m19s   postgres-operator.crunchydata.com/cluster=pg-cluster1,postgres-operator.crunchydata.com/role=pgbouncer
pg-cluster1-pods        ClusterIP   None             <none>        <none>     7m20s   postgres-operator.crunchydata.com/cluster=pg-cluster1
pg-cluster1-primary     ClusterIP   None             <none>        5432/TCP   7m20s   <none>
pg-cluster1-replicas    ClusterIP   10.104.216.214   <none>        5432/TCP   7m20s   postgres-operator.crunchydata.com/cluster=pg-cluster1,postgres-operator.crunchydata.com/role=replica

note that pg-cluster1-primary does not have an ip address.

So when we try to connect via pg-cluster-pgbouncer using:

$ PGPASSWORD='my-password' psql -h pg-cluster1-pgbouncer -U developer
psql: error: connection to server at "pg-cluster1-pgbouncer" (10.107.250.90), port 5432 failed: FATAL:  server login has been failing, try again later (server_login_retry)
connection to server at "pg-cluster1-pgbouncer" (10.107.250.90), port 5432 failed: FATAL:  SSL required

(it either freezes, no reply at all or the above error.)

The pgbouncer log shows:

2023-09-05 00:21:22.182 UTC [7] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2023-09-05 00:21:24.939 UTC [7] LOG C-0x55b6fd6b8900: developer/(nouser)@10.244.0.40:53398 closing because: server login has been failing, try again later (server_login_retry) (age=0s)
2023-09-05 00:21:24.939 UTC [7] WARNING C-0x55b6fd6b8900: developer/(nouser)@10.244.0.40:53398 pooler error: server login has been failing, try again later (server_login_retry)
2023-09-05 00:21:24.940 UTC [7] LOG C-0x55b6fd6b8900: (nodb)/(nouser)@10.244.0.40:53414 closing because: SSL required (age=0s)
2023-09-05 00:21:24.940 UTC [7] WARNING C-0x55b6fd6b8900: (nodb)/(nouser)@10.244.0.40:53414 pooler error: SSL required
2023-09-05 00:21:24.943 UTC [7] WARNING DNS lookup failed: pg-cluster1-primary: result=0
2023-09-05 00:21:24.943 UTC [7] LOG S-0x55b6fd6c9540: developer/_crunchypgbouncer@(bad-af):0 closing because: server DNS lookup failed (age=0s)

It shows an SSL error but I think that the actual issue is that it cannot connect to a primary database server. We know that the database IS working because these commands work well:

$ PGPASSWORD='my-password' psql -h pg-cluster1-ha -U developer
psql (15.4 (Ubuntu 15.4-1.pgdg22.04+1), server 15.2 - Percona Distribution)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
$ PGPASSWORD='my-password' psql -h 10.244.0.30 -U developer
psql (15.4 (Ubuntu 15.4-1.pgdg22.04+1), server 15.2 - Percona Distribution)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)

To test this we’ve tried to modify the pgbouncer configMap to change the database that it connects to but the Operator just resets it’s contents automatically.

The /docs/content/tutorial/connection-pooling.md file (in the repo) describes a way to change cr.yaml to inject other databases into pgbouncer.ini. It says:

There are several ways you can customize the configuration:

  • spec.proxy.pgBouncer.config.global: Accepts key-value pairs that apply changes globally to PgBouncer.
  • spec.proxy.pgBouncer.config.databases: Accepts key-value pairs that represent PgBouncer database definitions.

I believe this means that we can add a spec.proxy.pgBouncer.config.databases section to cr.yaml something like this:

spec:
  proxy:
    pgBouncer:
       config:
         databases:
         - developer: "host=pg-cluster1-ha port=5432"

but this did not work. Applying cr.yaml did not generate a cluster so I am sure that I got the syntax wrong.

The online docs do not show this option: Custom Resource options - Percona Operator for PostgreSQL

So… 2 questions:

  1. Why would the Operator inject the URI: pg-cluster1-primary into pgbouncer but not set an IP for it in the service resource? Should the pg-cluster1-primary service have an IP?

  2. What is the correct syntax for configuration entries in spec.proxy.pgBouncer.config.databases?

I think that this is related to a helm deployment issue the that I highlighted 9 months ago: Pgbouncer server DNS lookup failed - PostgreSQL - Percona Community Forum

We are using Operator 2.2.0 on minikube after following the instructions here: Install on Minikube - Percona Operator for PostgreSQL.

Our cr.yaml file looks like this:

apiVersion: pgv2.percona.com/v2
kind: PerconaPGCluster
metadata:
  name: pg-cluster1
#  finalizers:
#  - percona.com/delete-pvc
#  - percona.com/delete-ssl
spec:
  crVersion: 2.2.0
#  secrets:
#    customTLSSecret:
#      name: cluster1-cert
#    customReplicationTLSSecret:
#      name: replication1-cert

#  standby:
#    enabled: true
#    host: "<primary-ip>"
#    port: "<primary-port>"
#    repoName: repo1

#  openshift: true

  users:
  - name: developer
    databases:
    - developer
    options: "SUPERUSER"
    password:
      type: ASCII
    secretName: "pg-cluster1-developer-credentials"

#  databaseInitSQL:
#    key: init.sql
#    name: cluster1-init-sql

#  pause: true
#  unmanaged: true

#  dataSource:
#    postgresCluster:
#      clusterName: cluster1
#      repoName: repo1
#      options:
#      - --type=time
#      - --target="2021-06-09 14:15:11-04"
#    pgbackrest:
#      stanza: db
#      configuration:
#      - secret:
#          name: pgo-s3-creds
#      global:
#        repo1-path: /pgbackrest/postgres-operator/hippo/repo1
#      repo:
#        name: repo1
#        s3:
#          bucket: "my-bucket"
#          endpoint: "s3.ca-central-1.amazonaws.com"
#          region: "ca-central-1"

  image: percona/percona-postgresql-operator:2.2.0-ppg15-postgres
  imagePullPolicy: Always
  postgresVersion: 15
#  port: 5432

#  expose:
#    annotations:
#      my-annotation: value1
#    labels:
#      my-label: value2
#    type: LoadBalancer

  instances:
  - name: instance1
    replicas: 1
#    resources:
#      limits:
#        cpu: 2.0
#        memory: 4Gi
#
#    sidecars:
#    - name: testcontainer
#      image: mycontainer1:latest
#    - name: testcontainer2
#      image: mycontainer1:latest
#
#    topologySpreadConstraints:
#      - maxSkew: 1
#        topologyKey: my-node-label
#        whenUnsatisfiable: DoNotSchedule
#        labelSelector:
#          matchLabels:
#            postgres-operator.crunchydata.com/instance-set: instance1
#
#    tolerations:
#    - effect: NoSchedule
#      key: role
#      operator: Equal
#      value: connection-poolers
#
#    priorityClassName: high-priority
#
#    walVolumeClaimSpec:
#       accessModes:
#       - "ReadWriteOnce"
#       resources:
#         requests:
#           storage: 1Gi
#
    dataVolumeClaimSpec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi

  proxy:
    pgBouncer:
      replicas: 1
      image: percona/percona-postgresql-operator:2.2.0-ppg15-pgbouncer
#      exposeSuperusers: true
#      resources:
#        limits:
#          cpu: 200m
#          memory: 128Mi
#
#      expose:
#        annotations:
#          my-annotation: value1
#        labels:
#          my-label: value2
#        type: LoadBalancer
#
#      affinity:
#        podAntiAffinity:
#          preferredDuringSchedulingIgnoredDuringExecution:
#          - weight: 1
#            podAffinityTerm:
#              labelSelector:
#                matchLabels:
#                  postgres-operator.crunchydata.com/cluster: keycloakdb
#                  postgres-operator.crunchydata.com/role: pgbouncer
#              topologyKey: kubernetes.io/hostname
#
#      tolerations:
#      - effect: NoSchedule
#        key: role
#        operator: Equal
#        value: connection-poolers
#
#      topologySpreadConstraints:
#        - maxSkew: 1
#          topologyKey: my-node-label
#          whenUnsatisfiable: ScheduleAnyway
#          labelSelector:
#            matchLabels:
#              postgres-operator.crunchydata.com/role: pgbouncer
#
#      sidecars:
#      - name: bouncertestcontainer1
#        image: mycontainer1:latest
#
#      customTLSSecret:
#        name: keycloakdb-pgbouncer.tls
#
#      config:
#        databases:
#        - developer: "host=pg-cluster1-ha port=5432"

#        global:
#          pool_mode: transaction

  backups:
    pgbackrest:
#      metadata:
#        labels:
      image: percona/percona-postgresql-operator:2.2.0-ppg15-pgbackrest
#      configuration:
#        - secret:
#            name: cluster1-pgbackrest-secrets
#      jobs:
#        priorityClassName: high-priority
#        resources:
#          limits:
#            cpu: 200m
#            memory: 128Mi
#        tolerations:
#        - effect: NoSchedule
#          key: role
#          operator: Equal
#          value: connection-poolers
#
#      global:
#        repo1-retention-full: "14"
#        repo1-retention-full-type: time
#        repo1-path: /pgbackrest/postgres-operator/cluster1/repo1
#        repo1-cipher-type: aes-256-cbc
#        repo1-s3-uri-style: path
#        repo2-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo2
#        repo3-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo3
#        repo4-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo4
#      repoHost:
#        priorityClassName: high-priority
#
#        topologySpreadConstraints:
#        - maxSkew: 1
#          topologyKey: my-node-label
#          whenUnsatisfiable: ScheduleAnyway
#          labelSelector:
#            matchLabels:
#              postgres-operator.crunchydata.com/pgbackrest: ""
#        affinity:
#          podAntiAffinity:
#            preferredDuringSchedulingIgnoredDuringExecution:
#             - weight: 1
#             podAffinityTerm:
#               labelSelector:
#               matchLabels:
#                 postgres-operator.crunchydata.com/cluster: keycloakdb
#                 postgres-operator.crunchydata.com/role: pgbouncer
#               topologyKey: kubernetes.io/hostname
#
      manual:
        repoName: repo1
        options:
         - --type=full
      repos:
      - name: repo1
        schedules:
          full: "0 0 * * 6"
#          differential: "0 1 * * 1-6"
        volume:
          volumeClaimSpec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 1Gi
#      - name: repo2
#        s3:
#          bucket: "<YOUR_AWS_S3_BUCKET_NAME>"
#          endpoint: "<YOUR_AWS_S3_ENDPOINT>"
#          region: "<YOUR_AWS_S3_REGION>"
#      - name: repo3
#        gcs:
#          bucket: "<YOUR_GCS_BUCKET_NAME>"
#      - name: repo4
#        azure:
#          container: "<YOUR_AZURE_CONTAINER>"
#
#    restore:
#      enabled: true
#      repoName: repo1
#      options:
#       PITR restore in place
#       - --type=time
#       - --target="2021-06-09 14:15:11-04"
#       restore individual databases
#       - --db-include=hippo

  pmm:
    enabled: false
    image: percona/pmm-client:2.37.0
#    imagePullPolicy: IfNotPresent
    secret: pg-cluster1-pmm-secret
    serverHost: monitoring-service

  patroni:
    dynamicConfiguration:
      postgresql:
        parameters:
          max_parallel_workers: 2
          max_worker_processes: 2
          shared_buffers: 1GB
          work_mem: 2MB
        pg_hba:
        - host    all all 0.0.0.0/0 md5
        - hostssl all all 0.0.0.0/0 md5
        - local   all all trust
        - host    all all ::1/128   md5

(nothing too exotic… just reduced replicas down to 1, uniquely identified the cluster and opened up postgres using pg_hba.conf settings in the patroni section)

For anyone who stumbles upon this thread… a clever hack was suggested by Slavisa_Milojkovic for question 1. Why would the Operator inject the URI: pg-cluster1-primary into pgbouncer but not set an IP for it in the service resource? Should the pg-cluster1-primary service have an IP?

You can create a coreDNS rewrite rule that forces requests for the IP for cluster1-primary to to cluster1-ha… See here: Custom DNS Entries For Kubernetes

Fore more info, see here: Pgbouncer server DNS lookup failed - PostgreSQL - Percona Community Forum

Also note that pgbouncer as rolled out here WILL NOT bounce to SUPERUSERs.

I have checked here, on the CrunchyData forum and elsewhere on the internet but there is still no real solution or a reason why cluster1-primary does not have an IP address. Percona… what do you have to say?

Question 2. What is the correct syntax for configuration entries in spec.proxy.pgBouncer.config.databases? is still open and something that I would like to know how to do.

A related question 3. What is the correct syntax for configuration entries in spec.proxy.pgBouncer.config.users? Apparently we can create lists of users here too but there are no usable examples.

Come on Percona… a little help would be great!

For now, I need to move on to other things… A week is already more time than I have to spend on this.