Hi, everyone!
I had upgrade 1.1.5 to 1.2.0 right after release, and after that i was not able for resolve problems appear after upgrade. Just had no time.
For now result is looks like this.
Before upgrade configuration was: AWS m4.xlarge instance, 6 clients, 1.1.5 version. It worked fine.
After upgrade to 1.2.0 i receive problems - gaps in graphics. Upgrade process was processed with official instruction for docker usage.
Because of some reasons, i recreated AWS instance from 0 with different file system(xfs). I add 1 client. No gups, looks good, but 1 problem - “no datapoins” in “Current QPS” on “MySQL overview”. I added second instance, and problems came back - gups in graphics + “no datapoins” in “Current QPS”. Looks like not enough resources(m4.xlarge!!!) even for 2 clients. Prometheus take all processor time.
After that i drop 1.2.0 and install 1.1.5. I dont even reinstalled clients from 1.2.0 to 1.1.5. And it works great. I had connect 4 of 6 clients. Perfect. Problems of 1.2.0 version dissapeared. And no CPU load from prometheus.
Question is: whats wrong with 1.2.0? How many resources need for this version? It is not normal that m4.xlarge not enough even for 2 clients.
This is quite strange. I have the box which handles few MySQL instances with 1.2.0 on the Intel(R) Celeron(R) CPU N3050 @ 1.60GHz which is much slower than what you have.
I would encourage you to troubleshoot it one by one. Why did you have MySQL QPS dissapeared ? It is especially strange if other data presents as many of them also source from MySQL Output.
How does your Samples Ingested looks like in the “Prometheus” dashboards ? Your box should be easily able to handle 50K+ easily
Hi, Peter!
Honestly, I do not know why with our PMM there are so many interesting things happening. I myself wonder why, besides me, no one has similar problems. Maybe our environment in something special, I do not know. Maybe I just do not know how to cook it And the lack of time to be a tester.
But in any case, let’s do some tests and measurements on each version with the same clients, under the most similar conditions, so that you can conduct a normal analysis:
- You will tell exactly what and how to measure.
- I will do this on versions 1.1.5 and 1.2.0 and give it to you.
Aleksey,
It must be luck…
In any case I appreciate all the time and effort you put into helping us to make PMM better.
Let me ask the basic question - so you do basic install of PMM on m4.xlarge instance (4vCPU and 16GB of memory). You do not do any special configuration right ?
How do you have your EBS configured ?
When you’re adding one node - what is it ? MySQL Server ? Any special options you disable
Once you enable if you can upload how your prometheus dashboards looks like. And what wrong you’re seeing ? Note among other things Prometheus dashboard should show CPU and memory usage by prometheus process.
Hi Aleksey and Peter !
I have had the same problem after upgrading from 1.1.5 to 1.2.0 (on both server and a few clients) yesterday. Started to get gaps in the graphs immediately. Reverted the server version to
1.1.5 this morning and the gaps have disappeared on (at least what i think) most of the graphs. Not sure about InnoDB Log Buffer Performance for example.
The difference between my setup and Alekseys is that we run on physical hardware (lots of RAM and SSD RAID) in our own datacentre.
We haven’t reinstalled any clients and haven’t noticed anything special with CPU usage on the server.
So yes, something seems to be weird with version 1.2.0
BR
Johan
1. “so you do basic install of PMM on m4.xlarge instance (4vCPU and 16GB of memory)”
Yes
2. “You do not do any special configuration right ?”
-
instance was upgraded to last centos7
-
docker was moved from /var/lib/docker to /data/docker using simlink:
ls -l /data/docker/
drwx------ 5 root root 222 Авг 6 16:54 containers
drwx------ 3 root root 21 Авг 5 18:16 image
drwxr-x— 3 root root 19 Авг 5 18:16 network
drwx------ 27 root root 4096 Авг 6 16:54 overlay
drwx------ 4 root root 32 Авг 5 18:16 plugins
drwx------ 2 root root 6 Авг 5 18:16 swarm
drwx------ 2 root root 6 Авг 6 16:54 tmp
drwx------ 2 root root 6 Авг 5 18:16 trust
drwx------ 6 root root 313 Авг 5 18:28 volumes
[root@MySQL-PMMC ~]# ls -l /var/lib/ | grep docker
lrwxrwxrwx 1 root root 12 Авг 5 18:25 docker → /data/docker
3. "How do you have your EBS configured ? "
Volume 1 - Volume type “gp2”, IOPS 100/3000, Mountpoint “/”, Size 20Gb, Filesystem XFS, Not Encrypted
Volume 2 - Volume type “gp2”, IOPS 330/3000, Mountpoint “/data”, Size 110Gb, Filesystem XFS, Encrypted
4. “When you’re adding one node - what is it ? MySQL Server ?”
Yes, it is Percona MySQL Server 5.7.18. 60 databases(changes all the time) and 3500 tables for now.
5. “Any special options you disable”
Some disable, some enable.
Docker pmm setup:
docker run -d
-p 80:80
–volumes-from pmm-data
–name pmm-server
–restart always
–env TZ=“Europe/Kiev”
–env METRICS_RESOLUTION=5s
percona/pmm-server:latest
Client setup:
yum install -y pmm-client
pmm-admin config --server pmm.srv --client-name mysql.db
pmm-admin add linux:metrics
pmm-admin add mysql:metrics --disable-tablestats-limit 3000
pmm-admin add mysql:queries --query-source slowlog
6. And what wrong you’re seeing ? Note among other things Prometheus dashboard should show CPU and memory usage by prometheus process.
Later
I tried to find what you want in 6 question in prometheus PMM 1.1.5. I was not successful :). I upgraded to 1.2.0(problems back), and was unsuccessful again. Can you show me screen shot or instruction what exactly i have to find in prometheus and share? Screen shot is better.
fwiw, I am also seeing gaps and no value for “Current QPS” after upgrading 1.1.5 to 1.2
Wow, I’m not alone
[URL=“https://www.percona.com/forums/questions-discussions/percona-monitoring-and-management/48687-prometheus-high-cpu”]https://www.percona.com/forums/quest...theus-high-cpu[/URL]
[URL=“https://www.percona.com/forums/questions-discussions/percona-monitoring-and-management/49047-pmm-1-2-0-a-lot-of-data-is-not-shown”]https://www.percona.com/forums/quest...a-is-not-shown[/URL]
I will try advice from related post “METRICS_MEMORY=786432”
Thank you. Yes please try Metrics Memory increase. I’m still puzzled why it can’t handle even single MySQL, especially considering you’re using 5sec as resolution
increasing “Metrics Memory” didn’t help in my case
Hn,
OK can you upload the image what you’re seeing on your prometheus dashboard such as this
[url]https://pmmdemo.percona.com/graph/dashboard/db/prometheus?refresh=1m&orgId=1[/url]
docker run -d -p 80:80 --volumes-from pmm-data --name pmm-server -e SERVER_USER=pmm -e SERVER_PASSWORD=123456 -e METRICS_MEMORY=786432 --restart always --init percona/pmm-server:1.2.0
Shame on me. I must read previous topics carefully.
Yes, METRICS_MEMORY resolve problems with gaps. I think it is good idea to fix this in next release, or make mention in documentation.
docker run -d
-p 80:80
–volumes-from pmm-data
–name pmm-server
–restart always
–env TZ=“Europe/Kiev”
–env METRICS_RESOLUTION=5s
–env METRICS_MEMORY=7864320
percona/pmm-server:latest
So, biggest problem resolved.
liuqian, roma.novikov, Mykola, Peter - thanks guys!
Only one problem was not resolved - “no value” in “Current QPS” on “MySQL overview”.
Hi all,
Setting METRICS_MEMORY seems to have fixed the gaps for me. Happily running 1.2.0 now. Have also updated my docs
BR
Johan
aleksey.filippov , I think 5 sec - is the core of "QPS problem ". Created [url][PMM-1275] QPS SingleStat broken with scrape_interval: 5s - Percona JIRA and we’ll take a look what we can do with this.
Thanks, Roman. Waiting for result
fwiw, I also have resolution set to 5 seconds
Hi,
Edit the graph
replace:
rate(mysql_global_status_queries{instance=“$host”}[1s]) or irate(mysql_global_status_queries{instance=“$host”}[2s])
with
rate(mysql_global_status_queries{instance=“$host”}[1s]) or irate(mysql_global_status_queries{instance=“$host”}[5m])
that works, thx