New here?
Check out the Percona Distribution for PostgreSQL!
Not the answer you need?
Register and ask your own question!

pg_dump failed on standby server in postgresql on ubuntu.

DarshanShahDarshanShah Current User Role Contributor
Hi, 
On my production server sometimes backup failed with below messages. This backup is scheduled on standby node. (In my environment streaming replication is configured.) There is no any process running during this backup time. this is nightly cron job on ubuntu machine. 

2020-07-01 05:21:07.567 CEST [27925] [email protected] LOG:  process 27925 still waiting for AccessShareLock on relation 2610 of database 17948 after 1000.096 ms
2020-07-01 05:21:07.567 CEST [27925] [email protected] DETAIL:  Process holding the lock: 25802. Wait queue: 1559, 27925.
2020-07-01 05:21:17.120 CEST [25802] [email protected] ERROR:  canceling statement due to conflict with recovery
2020-07-01 05:21:17.120 CEST [25802] [email protected] DETAIL:  User was holding a relation lock for too long.
2020-07-01 05:21:17.120 CEST [25802] [email protected] STATEMENT:  COPY public.tablename(id, col1,col2,col3, geom) TO stdout;
2020-07-01 05:21:17.127 CEST [27925] [email protected] LOG:  process 27925 acquired AccessShareLock on relation 2610 of database 17948 after 10560.447 ms

relation 2610 = pg_index.  I tried to lock this table and reproduce the table but didn't get any error. 

Could anyone  faced such issue ? Any hint /fix  is appreciated.

Thanks 
Darshan.

Answers

  • jobin.augustinejobin.augustine Percona Percona Staff Role Poster
    edited July 16
    Hi, @DarshanShah
    This generally happens when a vacuum on the Primary cleans up the tuples which are required for consistency rules for the queries running on the standby side. PostgreSQL allows the standby to send feedback to primary so that it can be prevented. please try setting the parameter
    hot_standby_feedback = on 
    You may set max_standby_archive_delay and max_standby_streaming_delay values. These values are specified in milliseconds. This can hold any conflicting WAL apply.


  • DarshanShahDarshanShah Current User Role Contributor
    edited July 16
    Hello @jobin.augustine
    Thanks for your comments. 
    On my  primary server vacuum+ Analyze  is scheduled at 1:00 am. And it takes not more than  15 minutes for all databases. And then backup starts using pg_dump on standby server at 2:00 am . Do you think in this case also Vacuum can effect backup ? Again this fail backup issue is not daily. It can be failed once in a week or 2-3 times in a week also.  Sometimes it works whole week properly. 
    As per above set up  Shall I still need to set  hot_standby_feedback = on ?
    max_standby_archive_delay and max_standby_streaming_delay are set 30s (default value) as stand by can be used for high availability.  So if I increased it more than there may be a chance of data loss. 
  • DarshanShahDarshanShah Current User Role Contributor
    Hello, 
    I'm planning to take backup before doing vacuum and reindex process. So if  these process cause the issue during the backup on standby it can be handled.
    As I understand if we do this process  there will be no issue in pg_dump like more time or more  backup size. 
    Anybody has idea/comment/hint on this ?

    What difference does it make to backup PostgreSQL before or after vacuum?


  • jobin.augustinejobin.augustine Percona Percona Staff Role Poster
    Vacuum and reindex happens only on Primary side.  PostgreSQL does only the WAL apply on the physical standby. Normally no one takes backups because of vacuum or reindex, if there is properly configured hot backup for PITR. pg_dump backup will be containing the data as at the beginning of the backup which won't be good for real-world high transaction tables.


  • DarshanShahDarshanShah Current User Role Contributor
    Agree Jobin,
    we have already streaming replication is already configured, and backup is configured on standby. As mentioned  in above issue pg_dump failed on standby server sometimes with error :  
    2020-07-01 05:21:17.120 CEST [25802] [email protected] ERROR:  canceling statement due to conflict with recovery
    2020-07-01 05:21:17.120 CEST [25802] [email protected] DETAIL:  User was holding a relation lock for too long.
    This issue started after migrating postgresql server  from 10 to 11.  Also didn't find any  expensive process running on master node during the backup time.  That's why thinking  if Vacuum process is the cause of this error then schedule first backup and then vacuum process. 

Sign In or Register to comment.

MySQL, InnoDB, MariaDB and MongoDB are trademarks of their respective owners.
Copyright ©2005 - 2020 Percona LLC. All rights reserved.