Is there a better way to resolve this issue permanently?**
Problem**
PMM databases showed intermittent “down” status due to scrape timeouts being too short:
-
HR jobs:
9s(too short) -
MR jobs:
13.5s(too short) -
LR jobs:
54s(too short) -
Global:
54s(overridden by job-level)
Solution Applied
Temporarily increased scrape timeouts in /etc/victoriametrics-promscrape.yml:
-
HR jobs:
9s→60s -
MR jobs:
13.5s→60s -
LR jobs:
54s→120s -
Global:
54s→120s
Step-by-Step Commands & Outputs
Step 1: Bash into the pod
kubectl exec -it pmm-server-0 -n pmm -- bash
Output:
[pmm@pmm-server-0 opt] #
Step 2: Create backup
cp /etc/victoriametrics-promscrape.yml /tmp/victoriametrics-promscrape.yml.backup.$(date +%Y%m%d_%H%M%S)
ls -lh /tmp/victoriametrics-promscrape.yml.backup.*
Output:
-rw-r--r-- 1 pmm pmm 122K Feb 10 14:15 /tmp/victoriametrics-promscrape.yml.backup.20260210_141530
Step 3: Check current timeout values
grep "scrape_timeout:" /etc/victoriametrics-promscrape.yml | sort | uniq -c
Output:
47 scrape_timeout: 13500ms
43 scrape_timeout: 54s
1 scrape_timeout: 54s
45 scrape_timeout: 9s
Step 4: Check file permissions (discovered permission issue)
ls -la /etc/victoriametrics-promscrape.yml
touch /etc/test-write 2>&1
Output:
-rw-rw-r-- 1 pmm root 124237 Feb 10 14:30 /etc/victoriametrics-promscrape.yml
touch: cannot touch '/etc/test-write': Permission denied
Finding: Cannot write directly to /etc directory, but can overwrite the file since we own it.
Step 5: Apply changes to real file (via /tmp)
# Copy to /tmp for editing
cp /etc/victoriametrics-promscrape.yml /tmp/victoriametrics-promscrape.yml.edit
# Apply all three replacements
sed -i 's/scrape_timeout: 54s$/scrape_timeout: 120s/' /tmp/victoriametrics-promscrape.yml.edit
sed -i 's/scrape_timeout: 9s$/scrape_timeout: 60s/' /tmp/victoriametrics-promscrape.yml.edit
sed -i 's/scrape_timeout: 13500ms$/scrape_timeout: 60s/' /tmp/victoriametrics-promscrape.yml.edit
# Verify edited file
grep "scrape_timeout:" /tmp/victoriametrics-promscrape.yml.edit | sort | uniq -c
Output:
43 scrape_timeout: 120s
1 scrape_timeout: 120s
92 scrape_timeout: 60s
Step 6: Copy edited file back to /etc
cp /tmp/victoriametrics-promscrape.yml.edit /etc/victoriametrics-promscrape.yml
# Verify real file
grep "scrape_timeout:" /etc/victoriametrics-promscrape.yml | sort | uniq -c
Output:
43 scrape_timeout: 120s
1 scrape_timeout: 120s
92 scrape_timeout: 60s
Step 7: Verify file structure
grep -A 2 "^global:" /etc/victoriametrics-promscrape.yml
grep -A 10 "postgres_exporter.*_hr" /etc/victoriametrics-promscrape.yml | grep -A 2 "scrape_timeout:" | head -3
Output:
global:
scrape_interval: 1m
scrape_timeout: 120s
scrape_timeout: 60s
Step 8: Monitor config reload
tail -f /srv/logs/victoriametrics.log | grep -i "SIGHUP\|reloading"
Output:
2026-02-10T14:40:02.358Z info SIGHUP received; reloading Prometheus configs from "/etc/victoriametrics-promscrape.yml"
2026-02-10T14:41:01.200Z info SIGHUP received; reloading Prometheus configs from "/etc/victoriametrics-promscrape.yml"
...
Step 9: Verify no errors and service status
tail -100 /srv/logs/victoriametrics.log | grep -i "error" | grep -v "warn\|cannot scrape" | tail -10
supervisorctl status victoriametrics
tail -20 /srv/logs/victoriametrics.log | grep -i "reloading\|nothing changed"
Output:
(no errors found)
victoriametrics RUNNING pid 3064549, uptime 0:05:46
2026-02-10T14:43:01.243Z info nothing changed in "/etc/victoriametrics-promscrape.yml"