Disk usage of VictoriaMetrics is increasing

Description:

I’m setting DATA_RETENTION environment variable as 1 day to the PMM Server to retain data by VictoriaMetrics and I also can confirm the VictoriaMetrics can aware this value via below logs.

2023-06-12T07:28:47.245Z	info	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/app/vmstorage/main.go:86	opening storage at "/srv/victoriametrics/data" with -retentionPeriod=1d

But I can see that the disk usage of VictoriaMetrics is increasing.
I’m having about 36 Pods to monitor by VictoriaMetrics, and I can see the data size growth per day by VictoriaMetrics is about 200MB, so I guess that there was no data removal during about 65 days even the data retention period is 1 day.

Please see below result to confirm the disk usage of the VictoriaMetrics.

# du -sh *
...
13G     victoriametrics

Steps to Reproduce:

Make the VictoriaMetircs monitors 36 pods.
Set the METRICS_RESOLUTION_LR to 10s.
Set the DATA_RETENTION to 1 day.

See if the old data is being removed correctly.

Version:

pmm-admin --version

ProjectName: pmm-admin
Version: 2.25.0
PMMVersion: 2.25.0
Timestamp: 2021-12-13 09:34:50 (UTC)
FullCommit: 4b81157ad02975c417daef2600cb0dcd3907ffa1

Logs:

Due to security policy of my company, I cannot upload a full log but I can observe below things.

  1. I CANNOT see data removal log such as below during about 65 days.
partition "YYYY_MM" has been created
  1. I CAN see below kind of logs.
2023-07-02T00:00:47.438Z	info	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/storage/partition.go:652	waiting for stale parts remover to stop on "/srv/victoriametrics/data/data/small/2023_06"...
....
2023-07-02T04:00:00.007Z	info	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/storage/index_db.go:336	dropping indexDB "/srv/victoriametrics/data/indexdb/1767D8C5446BAD0B"
2023-07-02T04:00:00.138Z	info	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/storage/index_db.go:338	indexDB "/srv/victoriametrics/data/indexdb/1767D8C5446BAD0B" has been dropped
....
2023-07-02T05:45:00.106Z	info	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/storage/partition.go:1368	removing part "/srv/victoriametrics/data/data/small/2023_07/203508029_128525_20230701000000.029_20230701054321.950_176D954E96E5520D", since its data is out of the configured retention (86400 secs)

Expected Result:

The VictoriaMetrics should retain only 1 day data for an operation.

Actual Result:

The VictoiraMetrics retains data during about 65 days.

Additional Information:

None.

@chadr you can change the data retention policy in the pmm, please check this Configure - Percona Monitoring and Management and check the part Data Retention thanks

Please see above result. I can see that it has been set to 1 day correctly.

@chadr Can you still see the old data in the PMM UI? This will give indication as to if the data is actually being purged or not. If you set retention to 1d, but when you load PMM UI, you can see >2d, then that’s an issue to resolve.

If you set retention to 1d, but when you load PMM UI, you can see >2d, then that’s an issue to resolve.

I set retention to 1d via environment variable and I can see the retention 1d in the PMM UI but the data was still there. :frowning:

After some time, the quote of disk has been exceeded, so the victoriametrics could not write more data on disk so that it was get restarted. Please see below logs.

2023-08-23T21:44:40.116Z	panic	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/filestream/filestream.go:221	FATAL: cannot sync file "/srv/victoriametrics/data/data/small/2023_08/tmp/1777194C2D1BBFFF/index.bin": &{%!d(string=sync) %!d(string=/srv/victoriametrics/data/data/small/2023_08/tmp/1777194C2D1BBFFF/index.bin) 28}
panic: FATAL: cannot sync file "/srv/victoriametrics/data/data/small/2023_08/tmp/1777194C2D1BBFFF/index.bin": &{%!d(string=sync) %!d(string=/srv/victoriametrics/data/data/small/2023_08/tmp/1777194C2D1BBFFF/index.bin) 28}

goroutine 878759613 [running]:
github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.logMessage(0xd983a6, 0x5, 0xc0001a9260, 0xd5, 0x4)
	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/logger/logger.go:270 +0xc69
github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.logLevelSkipframes(0x1, 0xd983a6, 0x5, 0xdaafa3, 0x1e, 0xc0092d9830, 0x2, 0x2)
	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/logger/logger.go:138 +0xd1
github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.logLevel(...)
	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/logger/logger.go:130
github.com/VictoriaMetrics/VictoriaMetrics/lib/logger.Panicf(...)
	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/logger/logger.go:126
github.com/VictoriaMetrics/VictoriaMetrics/lib/filestream.(*Writer).MustClose(0xc02b14c1e0)
	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/filestream/filestream.go:221 +0x405
github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(*blockStreamWriter).MustClose(0xc01a4c8b40)
	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/storage/block_stream_writer.go:174 +0x1b0
github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.mergeBlockStreams(0xc0092d9a80, 0xc01a4c8b40, 0xc02b14c0f0, 0x5, 0x5, 0x0, 0xc000992a20, 0x18a1f34f982, 0xc01dbf2f28, 0xc01dbf2f38, ...)
	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/storage/merge.go:27 +0x18a
github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(*partition).mergeParts(0xc01dbf2f00, 0xc01cdaa180, 0x5, 0x8, 0x0, 0x0, 0x0)
	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/storage/partition.go:1170 +0x811
github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(*partition).mergePartsOptimal(0xc01dbf2f00, 0xc01cdaa180, 0x5, 0x8, 0x0, 0x0, 0x0)
	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/storage/partition.go:856 +0x225
github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(*partition).flushInmemoryParts(0xc01dbf2f00, 0xc01cdaa180, 0x5, 0x8, 0x0, 0xc01cdaa101, 0x0, 0x1, 0x0, 0x0)
	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/storage/partition.go:830 +0x20b
github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(*partition).inmemoryPartsFlusher(0xc01dbf2f00)
	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/storage/partition.go:802 +0x13f
github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(*partition).startInmemoryPartsFlusher.func1(0xc01dbf2f00)
	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/storage/partition.go:787 +0x2b
created by github.com/VictoriaMetrics/VictoriaMetrics/lib/storage.(*partition).startInmemoryPartsFlusher
	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/storage/partition.go:786 +0x5f
2023-08-23T21:44:40.209Z	info	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/logger/flag.go:12	build version: victoria-metrics-20210615-100652-pmm-6401-v1.60.0
2023-08-23T21:44:40.209Z	info	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/logger/flag.go:13	command line flags
2023-08-23T21:44:40.209Z	info	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/logger/flag.go:20	flag "bigMergeConcurrency" = "0"
2023-08-23T21:44:40.209Z	info	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/logger/flag.go:20	flag "csvTrimTimestamp" = "1ms"
2023-08-23T21:44:40.209Z	info	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/logger/flag.go:20	flag "dedup.minScrapeInterval" = "0s"
2023-08-23T21:44:40.209Z	info	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/logger/flag.go:20	flag "deleteAuthKey" = "secret"
....

After that, the old data was purged during restarting of victoriametrics. Please see below the logs.

2023-08-23T21:45:41.481Z	info	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/storage/partition.go:249	partition "2023_07" has been dropped
....
2023-08-23T21:45:42.615Z	info	/home/builder/rpm/BUILD/VictoriaMetrics-pmm-6401-v1.60.0/lib/storage/partition.go:249	partition "2023_06" has been dropped

As you can see that there were data for 2 months. After the purge, I could see that data size of victoriametrics was shrunk.

Before: 13G     victoriametrics
After: 2.4G     victoriametrics

Hi. I can see below files are created during removal of old data from VictoriaMetrics.

# find /srv/victoriametrics/data/ -name .nfs*
/srv/victoriametrics/data/data/small/2023_08/210107030_79865_20230828191044.229_20230829055654.005_177FA1C31B7D598D/.nfs00000000700cca2e02066890
/srv/victoriametrics/data/data/small/2023_08/210107030_79865_20230828191044.229_20230829055654.005_177FA1C31B7D598D/.nfs00000000700cca2f02066891
/srv/victoriametrics/data/data/small/2023_08/210107030_79865_20230828191044.229_20230829055654.005_177FA1C31B7D598D/.nfs00000000700cca3002066892

But above files are never be deleted so the usage of disk is growing. I can also see below metric to check whether there are pending dirs to remove.

# curl -sk -u admin:**** https://localhost/prometheus/api/v1/query -d 'query=vm_nfs_pending_dirs_to_remove{job="victoriametrics"}' | python -m json.tool
{
    "data": {
        "result": [
            {
                "metric": {
                    "__name__": "vm_nfs_pending_dirs_to_remove",
                    "instance": "pmm-server",
                    "job": "victoriametrics"
                },
                "value": [
                    1693483167,
                    "1"
                ]
            }
        ],
        "resultType": "vector"
    },
    "status": "success"
}