I wonder if the error message might be something of a red herring.
I’ve just completely rebuilt the Postgres cluster using this configuration
apiVersion: pgv2.percona.com/v2
kind: PerconaPGCluster
metadata:
name: pg-cluster1
# finalizers:
# - percona.com/delete-pvc
# - percona.com/delete-ssl
spec:
crVersion: 2.2.0
# secrets:
# customTLSSecret:
# name: cluster1-cert
# customReplicationTLSSecret:
# name: replication1-cert
# standby:
# enabled: true
# host: "<primary-ip>"
# port: "<primary-port>"
# repoName: repo1
# openshift: true
users:
- name: developer
databases:
- developer
options: "SUPERUSER"
password:
type: ASCII
secretName: "pg-cluster1-developer-credentials"
# databaseInitSQL:
# key: init.sql
# name: cluster1-init-sql
# pause: true
# unmanaged: true
# dataSource:
# postgresCluster:
# clusterName: cluster1
# repoName: repo1
# options:
# - --type=time
# - --target="2021-06-09 14:15:11-04"
# pgbackrest:
# stanza: db
# configuration:
# - secret:
# name: pgo-s3-creds
# global:
# repo1-path: /pgbackrest/postgres-operator/hippo/repo1
# repo:
# name: repo1
# s3:
# bucket: "my-bucket"
# endpoint: "s3.ca-central-1.amazonaws.com"
# region: "ca-central-1"
image: percona/percona-postgresql-operator:2.2.0-ppg15-postgres
imagePullPolicy: Always
postgresVersion: 15
# port: 5432
# expose:
# annotations:
# my-annotation: value1
# labels:
# my-label: value2
# type: LoadBalancer
instances:
- name: instance1
replicas: 1
# resources:
# limits:
# cpu: 2.0
# memory: 4Gi
#
# sidecars:
# - name: testcontainer
# image: mycontainer1:latest
# - name: testcontainer2
# image: mycontainer1:latest
#
# topologySpreadConstraints:
# - maxSkew: 1
# topologyKey: my-node-label
# whenUnsatisfiable: DoNotSchedule
# labelSelector:
# matchLabels:
# postgres-operator.crunchydata.com/instance-set: instance1
#
# tolerations:
# - effect: NoSchedule
# key: role
# operator: Equal
# value: connection-poolers
#
# priorityClassName: high-priority
#
# walVolumeClaimSpec:
# accessModes:
# - "ReadWriteOnce"
# resources:
# requests:
# storage: 1Gi
#
dataVolumeClaimSpec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
proxy:
pgBouncer:
replicas: 1
image: percona/percona-postgresql-operator:2.2.0-ppg15-pgbouncer
# exposeSuperusers: true
# resources:
# limits:
# cpu: 200m
# memory: 128Mi
#
# expose:
# annotations:
# my-annotation: value1
# labels:
# my-label: value2
# type: LoadBalancer
#
# affinity:
# podAntiAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - weight: 1
# podAffinityTerm:
# labelSelector:
# matchLabels:
# postgres-operator.crunchydata.com/cluster: keycloakdb
# postgres-operator.crunchydata.com/role: pgbouncer
# topologyKey: kubernetes.io/hostname
#
# tolerations:
# - effect: NoSchedule
# key: role
# operator: Equal
# value: connection-poolers
#
# topologySpreadConstraints:
# - maxSkew: 1
# topologyKey: my-node-label
# whenUnsatisfiable: ScheduleAnyway
# labelSelector:
# matchLabels:
# postgres-operator.crunchydata.com/role: pgbouncer
#
# sidecars:
# - name: bouncertestcontainer1
# image: mycontainer1:latest
#
# customTLSSecret:
# name: keycloakdb-pgbouncer.tls
#
# config:
# global:
# pool_mode: transaction
backups:
pgbackrest:
# metadata:
# labels:
image: percona/percona-postgresql-operator:2.2.0-ppg15-pgbackrest
# configuration:
# - secret:
# name: cluster1-pgbackrest-secrets
# jobs:
# priorityClassName: high-priority
# resources:
# limits:
# cpu: 200m
# memory: 128Mi
# tolerations:
# - effect: NoSchedule
# key: role
# operator: Equal
# value: connection-poolers
#
# global:
# repo1-retention-full: "14"
# repo1-retention-full-type: time
# repo1-path: /pgbackrest/postgres-operator/cluster1/repo1
# repo1-cipher-type: aes-256-cbc
# repo1-s3-uri-style: path
# repo2-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo2
# repo3-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo3
# repo4-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo4
# repoHost:
# priorityClassName: high-priority
#
# topologySpreadConstraints:
# - maxSkew: 1
# topologyKey: my-node-label
# whenUnsatisfiable: ScheduleAnyway
# labelSelector:
# matchLabels:
# postgres-operator.crunchydata.com/pgbackrest: ""
# affinity:
# podAntiAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - weight: 1
# podAffinityTerm:
# labelSelector:
# matchLabels:
# postgres-operator.crunchydata.com/cluster: keycloakdb
# postgres-operator.crunchydata.com/role: pgbouncer
# topologyKey: kubernetes.io/hostname
#
manual:
repoName: repo1
options:
- --type=full
repos:
- name: repo1
schedules:
full: "0 0 * * 6"
# differential: "0 1 * * 1-6"
volume:
volumeClaimSpec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
# - name: repo2
# s3:
# bucket: "<YOUR_AWS_S3_BUCKET_NAME>"
# endpoint: "<YOUR_AWS_S3_ENDPOINT>"
# region: "<YOUR_AWS_S3_REGION>"
# - name: repo3
# gcs:
# bucket: "<YOUR_GCS_BUCKET_NAME>"
# - name: repo4
# azure:
# container: "<YOUR_AZURE_CONTAINER>"
#
# restore:
# enabled: true
# repoName: repo1
# options:
# PITR restore in place
# - --type=time
# - --target="2021-06-09 14:15:11-04"
# restore individual databases
# - --db-include=hippo
pmm:
enabled: false
image: percona/pmm-client:2.37.0
# imagePullPolicy: IfNotPresent
secret: pg-cluster1-pmm-secret
serverHost: monitoring-service
patroni:
dynamicConfiguration:
postgresql:
parameters:
max_parallel_workers: 2
max_worker_processes: 2
shared_buffers: 1GB
work_mem: 2MB
pg_hba:
- host all all 0.0.0.0/0 md5
- hostssl all all 0.0.0.0/0 md5
- local all all trust
- host all all ::1/128 md5
(nothing too exotic… just reduced replicas down to 1, uniquely identified the cluster and opened up postgres using pg_hba.conf settings in the patroni section)
And the first time I try to access using pgbouncer on it’s pod IP as follows:
$ psql -h 10.244.0.59 -U developer
I get this:
2023-09-04 17:00:25.046 UTC [7] LOG C-0x55e7d94d3900: (nodb)/(nouser)@10.244.0.40:41480 registered new auto-database: db=developer
2023-09-04 17:00:25.050 UTC [7] WARNING DNS lookup failed: pg-cluster1-primary: result=0
2023-09-04 17:00:25.050 UTC [7] LOG S-0x55e7d94e4670: developer/_crunchypgbouncer@(bad-af):0 closing because: server DNS lookup failed (age=0s)
2023-09-04 17:00:40.110 UTC [7] WARNING DNS lookup failed: pg-cluster1-primary: result=0
2023-09-04 17:00:40.110 UTC [7] LOG S-0x55e7d94e4670: developer/_crunchypgbouncer@(bad-af):0 closing because: server DNS lookup failed (age=0s)
2023-09-04 17:00:55.104 UTC [7] LOG S-0x55e7d94e4670: developer/_crunchypgbouncer@(bad-af):0 closing because: server DNS lookup failed (age=0s)
2023-09-04 17:01:00.972 UTC [7] LOG C-0x55e7d94d3b60: developer/(nouser)@10.244.0.40:56204 closing because: server login has been failing, try again later (server_login_retry) (age=0s)
2023-09-04 17:01:00.972 UTC [7] WARNING C-0x55e7d94d3b60: developer/(nouser)@10.244.0.40:56204 pooler error: server login has been failing, try again later (server_login_retry)
2023-09-04 17:01:00.973 UTC [7] LOG C-0x55e7d94d3b60: (nodb)/(nouser)@10.244.0.40:56206 closing because: SSL required (age=0s)
2023-09-04 17:01:00.973 UTC [7] WARNING C-0x55e7d94d3b60: (nodb)/(nouser)@10.244.0.40:56206 pooler error: SSL required
2023-09-04 17:01:10.448 UTC [7] WARNING DNS lookup failed: pg-cluster1-primary: result=0
2023-09-04 17:01:10.448 UTC [7] LOG S-0x55e7d94e4670: developer/_crunchypgbouncer@(bad-af):0 closing because: server DNS lookup failed (age=0s)
still exactly the same error if I connect via the pgbouncer service.so… same issue as before… very strange.
Could the SSL error be a red herring? Perhaps it’s the DNS issue that’s causing the problem?
Perhaps this is related to the issue that I reported earlier this year: Pgbouncer server DNS lookup failed - PostgreSQL - Percona Community Forum
Any/all suggestions welcome!