On my production server sometimes backup failed with below messages. This backup is scheduled on standby node. (In my environment streaming replication is configured.) There is no any process running during this backup time. this is nightly cron job on ubuntu machine.
2020-07-01 05:21:07.567 CEST  postgres@DBname LOG: process 27925 still waiting for AccessShareLock on relation 2610 of database 17948 after 1000.096 ms2020-07-01 05:21:07.567 CEST  postgres@DBname DETAIL: Process holding the lock: 25802. Wait queue: 1559, 27925.2020-07-01 05:21:17.120 CEST  postgres@DBname ERROR: canceling statement due to conflict with recovery2020-07-01 05:21:17.120 CEST  postgres@DBname DETAIL: User was holding a relation lock for too long.2020-07-01 05:21:17.120 CEST  postgres@DBname STATEMENT: COPY public.tablename(id, col1,col2,col3, geom) TO stdout;2020-07-01 05:21:17.127 CEST  postgres@DBname LOG: process 27925 acquired AccessShareLock on relation 2610 of database 17948 after 10560.447 ms
relation 2610 = pg_index. I tried to lock this table and reproduce the table but didn’t get any error.
Could anyone faced such issue ? Any hint /fix is appreciated.
This generally happens when a vacuum on the Primary cleans up the tuples which are required for consistency rules for the queries running on the standby side. PostgreSQL allows the standby to send feedback to primary so that it can be prevented. please try setting the parameter
hot_standby_feedback = on
You may set max_standby_archive_delay and max_standby_streaming_delay values. These values are specified in milliseconds. This can hold any conflicting WAL apply.
Thanks for your comments.
On my primary server vacuum+ Analyze is scheduled at 1:00 am. And it takes not more than 15 minutes for all databases. And then backup starts using pg_dump on standby server at 2:00 am . Do you think in this case also Vacuum can effect backup ? Again this fail backup issue is not daily. It can be failed once in a week or 2-3 times in a week also. Sometimes it works whole week properly.
As per above set up Shall I still need to set hot_standby_feedback = on ?
max_standby_archive_delay and max_standby_streaming_delay are set 30s (default value) as stand by can be used for high availability. So if I increased it more than there may be a chance of data loss.
I’m planning to take backup before doing vacuum and reindex process. So if these process cause the issue during the backup on standby it can be handled.
As I understand if we do this process there will be no issue in pg_dump like more time or more backup size.
Anybody has idea/comment/hint on this ?
Vacuum and reindex happens only on Primary side. PostgreSQL does only the WAL apply on the physical standby. Normally no one takes backups because of vacuum or reindex, if there is properly configured hot backup for PITR. pg_dump backup will be containing the data as at the beginning of the backup which won’t be good for real-world high transaction tables.
we have already streaming replication is already configured, and backup is configured on standby. As mentioned in above issue pg_dump failed on standby server sometimes with error :
2020-07-01 05:21:17.120 CEST  postgres@DBname ERROR: canceling statement due to conflict with recovery2020-07-01 05:21:17.120 CEST  postgres@DBname DETAIL: User was holding a relation lock for too long.
This issue started after migrating postgresql server from 10 to 11. Also didn’t find any expensive process running on master node during the backup time. That’s why thinking if Vacuum process is the cause of this error then schedule first backup and then vacuum process.