pg_dump failed on standby server in postgresql on ubuntu.

Hi, 
On my production server sometimes backup failed with below messages. This backup is scheduled on standby node. (In my environment streaming replication is configured.) There is no any process running during this backup time. this is nightly cron job on ubuntu machine. 

2020-07-01 05:21:07.567 CEST [27925] postgres@DBname LOG:  process 27925 still waiting for AccessShareLock on relation 2610 of database 17948 after 1000.096 ms2020-07-01 05:21:07.567 CEST [27925] postgres@DBname DETAIL:  Process holding the lock: 25802. Wait queue: 1559, 27925.2020-07-01 05:21:17.120 CEST [25802] postgres@DBname ERROR:  canceling statement due to conflict with recovery2020-07-01 05:21:17.120 CEST [25802] postgres@DBname DETAIL:  User was holding a relation lock for too long.2020-07-01 05:21:17.120 CEST [25802] postgres@DBname STATEMENT:  COPY public.tablename(id, col1,col2,col3, geom) TO stdout;2020-07-01 05:21:17.127 CEST [27925] postgres@DBname LOG:  process 27925 acquired AccessShareLock on relation 2610 of database 17948 after 10560.447 ms

relation 2610 = pg_index.  I tried to lock this table and reproduce the table but didn’t get any error. 

Could anyone  faced such issue ? Any hint /fix  is appreciated.

Thanks 
Darshan.

Hi, @DarshanShah
This generally happens when a vacuum on the Primary cleans up the tuples which are required for consistency rules for the queries running on the standby side. PostgreSQL allows the standby to send feedback to primary so that it can be prevented. please try setting the parameter
hot_standby_feedback = on 
You may set max_standby_archive_delay and max_standby_streaming_delay values. These values are specified in milliseconds. This can hold any conflicting WAL apply.


Hello @“jobin.augustine”
Thanks for your comments. 
On my  primary server vacuum+ Analyze  is scheduled at 1:00 am. And it takes not more than  15 minutes for all databases. And then backup starts using pg_dump on standby server at 2:00 am . Do you think in this case also Vacuum can effect backup ? Again this fail backup issue is not daily. It can be failed once in a week or 2-3 times in a week also.  Sometimes it works whole week properly. 
As per above set up  Shall I still need to set  hot_standby_feedback = on ?
max_standby_archive_delay and max_standby_streaming_delay are set 30s (default value) as stand by can be used for high availability.  So if I increased it more than there may be a chance of data loss. 

Hello, 
I’m planning to take backup before doing vacuum and reindex process. So if  these process cause the issue during the backup on standby it can be handled.
As I understand if we do this process  there will be no issue in pg_dump like more time or more  backup size. 
Anybody has idea/comment/hint on this ?

What difference does it make to backup PostgreSQL before or after vacuum?


Vacuum and reindex happens only on Primary side.  PostgreSQL does only the WAL apply on the physical standby. Normally no one takes backups because of vacuum or reindex, if there is properly configured hot backup for PITR. pg_dump backup will be containing the data as at the beginning of the backup which won’t be good for real-world high transaction tables.


Agree Jobin,
we have already streaming replication is already configured, and backup is configured on standby. As mentioned  in above issue pg_dump failed on standby server sometimes with error :  
2020-07-01 05:21:17.120 CEST [25802] postgres@DBname ERROR:  canceling statement due to conflict with recovery2020-07-01 05:21:17.120 CEST [25802] postgres@DBname DETAIL:  User was holding a relation lock for too long.
This issue started after migrating postgresql server  from 10 to 11.  Also didn’t find any  expensive process running on master node during the backup time.  That’s why thinking  if Vacuum process is the cause of this error then schedule first backup and then vacuum process.