Request for guidance: pg_tde WAL encryption behavior with pgBackRest PITR

Hello team :waving_hand:

We are currently testing pg_tde with pgBackRest on Percona Distribution for PostgreSQL 18, and we would really appreciate your guidance to better understand the expected and supported behavior around WAL encryption and PITR.

Our environment
PostgreSQL 18.1
Percona Server for PostgreSQL 18.1.1
pg_tde 2.1
pgBackRest 2.57
Patroni-based HA cluster

What we are trying to achieve

Our goal is to have:
:white_check_mark: Data-at-rest encryption using pg_tde
:white_check_mark: Encrypted backups
:white_check_mark: Point-in-Time Recovery (PITR) using pgBackRest

What we observed during testing

We ran multiple controlled tests with the following setup:

shared_preload_libraries = ‘pg_tde’
pg_tde.wal_encrypt = on

archive_command =
‘/usr/lib/postgresql/18/bin/pg_tde_archive_decrypt %f %p “pgbackrest --stanza=cluster_1 archive-push %%p”’

restore_command =
‘/usr/lib/postgresql/18/bin/pg_tde_restore_encrypt %f %p “pgbackrest --stanza=cluster_1 archive-get %%f %%p”’

Test flow

Insert data before backup
Take a full pgBackRest backup
Insert additional data
Perform pgBackRest PITR (target time before step 3)

Result we are seeing

Full backups restore correctly
WAL archiving and restore commands work

However, during PITR:

WAL replay does not stop at the target time, or PostgreSQL fails during recovery with WAL-related errors, if we disable/off WAL encryption and take backup:

pg_tde.wal_encrypt = off

PITR works consistently and as expected.

Our current understanding (please correct us if wrong)Based on our testing, it appears that:

pgBackRest PITR may not be compatible with WAL encryption at this time

However, we are not fully sure if this is:

An expected limitation
A configuration mistake on our side
Or something that requires a different workflow

Where we need help :folded_hands:
We would really appreciate clarification on:

Hi @asad121,

Your archive_command and restore_command wrappers look correct. I reproduced your setup (PG 18.1, pg_tde 2.1.1, pgBackRest 2.57) and found two issues that explain the PITR failures.

Issue 1: async archiving is incompatible with pg_tde

Check whether pgBackRest async archiving is enabled (grep -i archive-async /etc/pgbackrest/pgbackrest.conf). pg_tde does not support it. pg_tde_archive_decrypt decrypts WAL to a temp directory in /dev/shm/, and pgBackRest async mode rejects any path outside pg1-path:

ASSERT: [025]: absolute path '/dev/shm/pg_tde_archive.../archive_status'
is not in base path '/data/db'

WAL never reaches the archive, so backups time out waiting for segments that will never arrive. If async is on, set archive-async=n in [global]. This is a documented limitation: Limitations of pg_tde - Percona Transparent Data Encryption for PostgreSQL

Issue 2: pgBackRest overrides your restore_command wrapper

pgBackRest restore writes its own restore_command into postgresql.auto.conf. This is expected behavior, but it overrides your pg_tde_restore_encrypt wrapper because PostgreSQL uses last-value-wins for config entries. PG then reads plaintext WAL from the archive but expects encrypted WAL, producing:

invalid magic number 0DDA in WAL segment ...
FATAL: could not locate required checkpoint record

The fix is to pass the pg_tde wrapper via --recovery-option during restore:

pgbackrest --stanza=cluster_1 restore \
  --delta --type=time \
  --target='<your target time>' \
  --target-action=promote \
  --recovery-option=restore_command='/usr/lib/postgresql/18/bin/pg_tde_restore_encrypt %f %p "pgbackrest --stanza=cluster_1 archive-get %%f %%p"'

This tells pgBackRest to use your wrapper instead of its default. Alternatively, you can edit postgresql.auto.conf manually between restore and PG startup: remove the # Recovery settings generated by pgBackRest block and re-add with the pg_tde wrapper. With either approach, my PITR test correctly restored to the target time (100 rows inserted before backup, 50 after, got 100 back).

Additional checks

If neither issue above matches your setup, verify data checksums (SHOW data_checksums;). The pg_tde docs recommend disabling them when using pgBackRest with WAL encryption, and Patroni enables checksums by default. Also confirm no key rotation happened between backup and target time (SELECT * FROM pg_tde_principal_key_info();).

Since you’re running Patroni, how are you triggering the PITR restore? Patroni manages postgresql.auto.conf and restore_command on its own, which could interact with both the pgBackRest override and the pg_tde wrapper.

Hi Anderson,

Thank you very much for replying to my query and for setting up the lab environment to test this.

Yes, I had read in the Percona documentation that async archiving is incompatible, so I did not enable it. I also did not set any custom value for it.

Secondly, I had disabled checksum as well, so that should not be causing the issue.

I tested again using your suggested command. However, when I restarted percona-patroni, it restored again from WAL and did not perform PITR as expected.

After that, I manually started PostgreSQL using:

$sudo -iu postgres /usr/lib/postgresql/18/bin/pg_ctl -D /var/lib/postgresql/18/main start

With this manual start, PITR completed successfully.

Could you please confirm whether percona-patroni pushes or overrides any recovery-related parameters during restart that might affect PITR behavior?

For reference, below are the exact steps I performed:

$systemctl stop percona-patroni
$rm -rf /var/lib/postgresql/18/main/*
$sudo -iu postgres pgbackrest --stanza=cluster_1 restore --delta --type=time --target=“2026-02-21 21:45:05+05” --target-action=promote --recovery-option=restore_command=‘/usr/lib/postgresql/18/bin/pg_tde_restore_encrypt %f %p “pgbackrest --stanza=cluster_1 archive-get %%f %%p”’

$sudo -iu postgres /usr/lib/postgresql/18/bin/pg_ctl -D /var/lib/postgresql/18/main start

I would appreciate your clarification on this behavior.

Hi @asad121,

Good news: your pg_ctl result confirms the --recovery-option workaround works. The reason Patroni behaves differently is that on every managed startup it runs _sanitize_auto_conf(), which strips all recovery parameters (restore_command, recovery_target_time, recovery_target_action, etc.) from postgresql.auto.conf and writes its own into postgresql.conf from the DCS config. I confirmed this by tracing the code in Patroni’s config.py and reproducing it on a single-node cluster with pg_tde 2.1, pgBackRest 2.57, and etcd.

For restore_command specifically, you can make it persistent by adding it to the DCS dynamic config:

patronictl edit-config <scope> --force \
  -s 'postgresql.recovery_conf.restore_command=/usr/lib/postgresql/18/bin/pg_tde_restore_encrypt %f %p "pgbackrest --stanza=cluster_1 archive-get %%f %%p"'

This ensures the pg_tde wrapper is used whenever Patroni sets up recovery (e.g. for replicas). However, Patroni’s recovery build logic intentionally filters out PITR targets like recovery_target_time and recovery_target_action, even if you add them to the DCS. There is no patronictl command for ad-hoc PITR on an existing cluster. When Patroni starts PostgreSQL after your restore, it replays all available WAL without a target and promotes immediately, which is why you got a full replay instead of stopping at your target time.

Workaround: use Patroni’s pause mode and pg_ctl for the recovery phase, then hand back to Patroni:

# 1. Pause Patroni HA (prevents automatic actions)
patronictl pause <scope>

# 2. Stop PostgreSQL
pg_ctl -D /var/lib/postgresql/18/main stop -m fast

# 3. Restore with pg_tde wrapper
pgbackrest --stanza=cluster_1 restore \
  --delta --type=time \
  --target="2026-02-21 21:45:05+05" \
  --target-action=promote \
  --recovery-option=restore_command='/usr/lib/postgresql/18/bin/pg_tde_restore_encrypt %f %p "pgbackrest --stanza=cluster_1 archive-get %%f %%p"'

# 4. Start PG directly (recovery completes, PG promotes at target)
pg_ctl -D /var/lib/postgresql/18/main start

# 5. After recovery finishes and PG is accepting connections:
pg_ctl -D /var/lib/postgresql/18/main stop -m fast

# 6. Resume Patroni
patronictl resume <scope>

Steps 4-5 are the key difference from your previous attempt. PostgreSQL must complete recovery and promote before Patroni takes over, otherwise Patroni strips the targets and promotes prematurely. While paused, Patroni will not try to start or stop PostgreSQL, so manual pg_ctl operations are safe.

If your Patroni cluster has replicas, they will need patronictl reinit after the PITR completes because the timeline will have diverged. Also make sure the cluster is fully paused before starting manual operations to avoid split-brain.

I tested this full flow (100 rows inserted before backup, 50 after, PITR to before the second insert) and confirmed 100 rows after recovery, with Patroni cleanly taking over as leader afterward.

References:

Hi Anderson,

Thank you very much for understanding my query and providing a solution accordingly.

By following the steps you shared, I was able to successfully perform the PITR restore on the cluster. The recovery completed properly, and PostgreSQL promoted at the correct target point without any issues.

After that, I reinitialized the replicas using the following commands:

patronictl -c /etc/patroni/patroni.yml reinit cluster_1 node2
patronictl -c /etc/patroni/patroni.yml reinit cluster_1 node3

Both replicas are now synchronized and following the new timeline correctly. The cluster is healthy and working as expected.

Thank you again for your guidance and support — it was greatly appreciated.

1 Like