I started this journey thinking that the issue that is keeping pgbouncer from working was related to an SSL issue but I now thing that the issue is occurring because the cluster has configured pgbouncer to look for the primary database server on the URI: cluster1-primary (we renamed to: pg-cluster1 in our cr.yaml).
The [databases] section of the Operator generated pgbouncer.ini file looks like this:
[databases]
* = host=pg-cluster1-primary port=5432
and services that the Operator generated look like this:
❯ Kubectl get services -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5d4h <none>
pg-cluster1-ha ClusterIP 10.103.237.221 <none> 5432/TCP 7m20s <none>
pg-cluster1-ha-config ClusterIP None <none> <none> 7m20s <none>
pg-cluster1-pgbouncer ClusterIP 10.111.245.47 <none> 5432/TCP 7m19s postgres-operator.crunchydata.com/cluster=pg-cluster1,postgres-operator.crunchydata.com/role=pgbouncer
pg-cluster1-pods ClusterIP None <none> <none> 7m20s postgres-operator.crunchydata.com/cluster=pg-cluster1
pg-cluster1-primary ClusterIP None <none> 5432/TCP 7m20s <none>
pg-cluster1-replicas ClusterIP 10.104.216.214 <none> 5432/TCP 7m20s postgres-operator.crunchydata.com/cluster=pg-cluster1,postgres-operator.crunchydata.com/role=replica
note that pg-cluster1-primary does not have an ip address.
So when we try to connect via pg-cluster-pgbouncer using:
$ PGPASSWORD='my-password' psql -h pg-cluster1-pgbouncer -U developer
psql: error: connection to server at "pg-cluster1-pgbouncer" (10.107.250.90), port 5432 failed: FATAL: server login has been failing, try again later (server_login_retry)
connection to server at "pg-cluster1-pgbouncer" (10.107.250.90), port 5432 failed: FATAL: SSL required
(it either freezes, no reply at all or the above error.)
The pgbouncer log shows:
2023-09-05 00:21:22.182 UTC [7] LOG stats: 0 xacts/s, 0 queries/s, in 0 B/s, out 0 B/s, xact 0 us, query 0 us, wait 0 us
2023-09-05 00:21:24.939 UTC [7] LOG C-0x55b6fd6b8900: developer/(nouser)@10.244.0.40:53398 closing because: server login has been failing, try again later (server_login_retry) (age=0s)
2023-09-05 00:21:24.939 UTC [7] WARNING C-0x55b6fd6b8900: developer/(nouser)@10.244.0.40:53398 pooler error: server login has been failing, try again later (server_login_retry)
2023-09-05 00:21:24.940 UTC [7] LOG C-0x55b6fd6b8900: (nodb)/(nouser)@10.244.0.40:53414 closing because: SSL required (age=0s)
2023-09-05 00:21:24.940 UTC [7] WARNING C-0x55b6fd6b8900: (nodb)/(nouser)@10.244.0.40:53414 pooler error: SSL required
2023-09-05 00:21:24.943 UTC [7] WARNING DNS lookup failed: pg-cluster1-primary: result=0
2023-09-05 00:21:24.943 UTC [7] LOG S-0x55b6fd6c9540: developer/_crunchypgbouncer@(bad-af):0 closing because: server DNS lookup failed (age=0s)
It shows an SSL error but I think that the actual issue is that it cannot connect to a primary database server. We know that the database IS working because these commands work well:
$ PGPASSWORD='my-password' psql -h pg-cluster1-ha -U developer
psql (15.4 (Ubuntu 15.4-1.pgdg22.04+1), server 15.2 - Percona Distribution)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
$ PGPASSWORD='my-password' psql -h 10.244.0.30 -U developer
psql (15.4 (Ubuntu 15.4-1.pgdg22.04+1), server 15.2 - Percona Distribution)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
To test this we’ve tried to modify the pgbouncer configMap to change the database that it connects to but the Operator just resets it’s contents automatically.
The /docs/content/tutorial/connection-pooling.md file (in the repo) describes a way to change cr.yaml to inject other databases into pgbouncer.ini. It says:
There are several ways you can customize the configuration:
spec.proxy.pgBouncer.config.global
: Accepts key-value pairs that apply changes globally to PgBouncer.spec.proxy.pgBouncer.config.databases
: Accepts key-value pairs that represent PgBouncer database definitions.
I believe this means that we can add a spec.proxy.pgBouncer.config.databases section to cr.yaml something like this:
spec:
proxy:
pgBouncer:
config:
databases:
- developer: "host=pg-cluster1-ha port=5432"
but this did not work. Applying cr.yaml did not generate a cluster so I am sure that I got the syntax wrong.
The online docs do not show this option: Custom Resource options - Percona Operator for PostgreSQL
So… 2 questions:
-
Why would the Operator inject the URI: pg-cluster1-primary into pgbouncer but not set an IP for it in the service resource? Should the pg-cluster1-primary service have an IP?
-
What is the correct syntax for configuration entries in spec.proxy.pgBouncer.config.databases?
I think that this is related to a helm deployment issue the that I highlighted 9 months ago: Pgbouncer server DNS lookup failed - PostgreSQL - Percona Community Forum
We are using Operator 2.2.0 on minikube after following the instructions here: Install on Minikube - Percona Operator for PostgreSQL.
Our cr.yaml file looks like this:
apiVersion: pgv2.percona.com/v2
kind: PerconaPGCluster
metadata:
name: pg-cluster1
# finalizers:
# - percona.com/delete-pvc
# - percona.com/delete-ssl
spec:
crVersion: 2.2.0
# secrets:
# customTLSSecret:
# name: cluster1-cert
# customReplicationTLSSecret:
# name: replication1-cert
# standby:
# enabled: true
# host: "<primary-ip>"
# port: "<primary-port>"
# repoName: repo1
# openshift: true
users:
- name: developer
databases:
- developer
options: "SUPERUSER"
password:
type: ASCII
secretName: "pg-cluster1-developer-credentials"
# databaseInitSQL:
# key: init.sql
# name: cluster1-init-sql
# pause: true
# unmanaged: true
# dataSource:
# postgresCluster:
# clusterName: cluster1
# repoName: repo1
# options:
# - --type=time
# - --target="2021-06-09 14:15:11-04"
# pgbackrest:
# stanza: db
# configuration:
# - secret:
# name: pgo-s3-creds
# global:
# repo1-path: /pgbackrest/postgres-operator/hippo/repo1
# repo:
# name: repo1
# s3:
# bucket: "my-bucket"
# endpoint: "s3.ca-central-1.amazonaws.com"
# region: "ca-central-1"
image: percona/percona-postgresql-operator:2.2.0-ppg15-postgres
imagePullPolicy: Always
postgresVersion: 15
# port: 5432
# expose:
# annotations:
# my-annotation: value1
# labels:
# my-label: value2
# type: LoadBalancer
instances:
- name: instance1
replicas: 1
# resources:
# limits:
# cpu: 2.0
# memory: 4Gi
#
# sidecars:
# - name: testcontainer
# image: mycontainer1:latest
# - name: testcontainer2
# image: mycontainer1:latest
#
# topologySpreadConstraints:
# - maxSkew: 1
# topologyKey: my-node-label
# whenUnsatisfiable: DoNotSchedule
# labelSelector:
# matchLabels:
# postgres-operator.crunchydata.com/instance-set: instance1
#
# tolerations:
# - effect: NoSchedule
# key: role
# operator: Equal
# value: connection-poolers
#
# priorityClassName: high-priority
#
# walVolumeClaimSpec:
# accessModes:
# - "ReadWriteOnce"
# resources:
# requests:
# storage: 1Gi
#
dataVolumeClaimSpec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
proxy:
pgBouncer:
replicas: 1
image: percona/percona-postgresql-operator:2.2.0-ppg15-pgbouncer
# exposeSuperusers: true
# resources:
# limits:
# cpu: 200m
# memory: 128Mi
#
# expose:
# annotations:
# my-annotation: value1
# labels:
# my-label: value2
# type: LoadBalancer
#
# affinity:
# podAntiAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - weight: 1
# podAffinityTerm:
# labelSelector:
# matchLabels:
# postgres-operator.crunchydata.com/cluster: keycloakdb
# postgres-operator.crunchydata.com/role: pgbouncer
# topologyKey: kubernetes.io/hostname
#
# tolerations:
# - effect: NoSchedule
# key: role
# operator: Equal
# value: connection-poolers
#
# topologySpreadConstraints:
# - maxSkew: 1
# topologyKey: my-node-label
# whenUnsatisfiable: ScheduleAnyway
# labelSelector:
# matchLabels:
# postgres-operator.crunchydata.com/role: pgbouncer
#
# sidecars:
# - name: bouncertestcontainer1
# image: mycontainer1:latest
#
# customTLSSecret:
# name: keycloakdb-pgbouncer.tls
#
# config:
# databases:
# - developer: "host=pg-cluster1-ha port=5432"
# global:
# pool_mode: transaction
backups:
pgbackrest:
# metadata:
# labels:
image: percona/percona-postgresql-operator:2.2.0-ppg15-pgbackrest
# configuration:
# - secret:
# name: cluster1-pgbackrest-secrets
# jobs:
# priorityClassName: high-priority
# resources:
# limits:
# cpu: 200m
# memory: 128Mi
# tolerations:
# - effect: NoSchedule
# key: role
# operator: Equal
# value: connection-poolers
#
# global:
# repo1-retention-full: "14"
# repo1-retention-full-type: time
# repo1-path: /pgbackrest/postgres-operator/cluster1/repo1
# repo1-cipher-type: aes-256-cbc
# repo1-s3-uri-style: path
# repo2-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo2
# repo3-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo3
# repo4-path: /pgbackrest/postgres-operator/cluster1-multi-repo/repo4
# repoHost:
# priorityClassName: high-priority
#
# topologySpreadConstraints:
# - maxSkew: 1
# topologyKey: my-node-label
# whenUnsatisfiable: ScheduleAnyway
# labelSelector:
# matchLabels:
# postgres-operator.crunchydata.com/pgbackrest: ""
# affinity:
# podAntiAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - weight: 1
# podAffinityTerm:
# labelSelector:
# matchLabels:
# postgres-operator.crunchydata.com/cluster: keycloakdb
# postgres-operator.crunchydata.com/role: pgbouncer
# topologyKey: kubernetes.io/hostname
#
manual:
repoName: repo1
options:
- --type=full
repos:
- name: repo1
schedules:
full: "0 0 * * 6"
# differential: "0 1 * * 1-6"
volume:
volumeClaimSpec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
# - name: repo2
# s3:
# bucket: "<YOUR_AWS_S3_BUCKET_NAME>"
# endpoint: "<YOUR_AWS_S3_ENDPOINT>"
# region: "<YOUR_AWS_S3_REGION>"
# - name: repo3
# gcs:
# bucket: "<YOUR_GCS_BUCKET_NAME>"
# - name: repo4
# azure:
# container: "<YOUR_AZURE_CONTAINER>"
#
# restore:
# enabled: true
# repoName: repo1
# options:
# PITR restore in place
# - --type=time
# - --target="2021-06-09 14:15:11-04"
# restore individual databases
# - --db-include=hippo
pmm:
enabled: false
image: percona/pmm-client:2.37.0
# imagePullPolicy: IfNotPresent
secret: pg-cluster1-pmm-secret
serverHost: monitoring-service
patroni:
dynamicConfiguration:
postgresql:
parameters:
max_parallel_workers: 2
max_worker_processes: 2
shared_buffers: 1GB
work_mem: 2MB
pg_hba:
- host all all 0.0.0.0/0 md5
- hostssl all all 0.0.0.0/0 md5
- local all all trust
- host all all ::1/128 md5
(nothing too exotic… just reduced replicas down to 1, uniquely identified the cluster and opened up postgres using pg_hba.conf settings in the patroni section)