Problems after upgrade from 1.1.5 to 1.2.0

aleksey.filippov · August 6, 2017, 9:21am

Hi, everyone!
I had upgrade 1.1.5 to 1.2.0 right after release, and after that i was not able for resolve problems appear after upgrade. Just had no time.
For now result is looks like this.
Before upgrade configuration was: AWS m4.xlarge instance, 6 clients, 1.1.5 version. It worked fine.
After upgrade to 1.2.0 i receive problems - gaps in graphics. Upgrade process was processed with official instruction for docker usage.
Because of some reasons, i recreated AWS instance from 0 with different file system(xfs). I add 1 client. No gups, looks good, but 1 problem - “no datapoins” in “Current QPS” on “MySQL overview”. I added second instance, and problems came back - gups in graphics + “no datapoins” in “Current QPS”. Looks like not enough resources(m4.xlarge!!!) even for 2 clients. Prometheus take all processor time.
After that i drop 1.2.0 and install 1.1.5. I dont even reinstalled clients from 1.2.0 to 1.1.5. And it works great. I had connect 4 of 6 clients. Perfect. Problems of 1.2.0 version dissapeared. And no CPU load from prometheus.
Question is: whats wrong with 1.2.0? How many resources need for this version? It is not normal that m4.xlarge not enough even for 2 clients.

Peter · August 6, 2017, 9:34am

This is quite strange. I have the box which handles few MySQL instances with 1.2.0 on the Intel(R) Celeron(R) CPU N3050 @ 1.60GHz which is much slower than what you have.

I would encourage you to troubleshoot it one by one. Why did you have MySQL QPS dissapeared ? It is especially strange if other data presents as many of them also source from MySQL Output.

How does your Samples Ingested looks like in the “Prometheus” dashboards ? Your box should be easily able to handle 50K+ easily

aleksey.filippov · August 6, 2017, 1:21pm

Hi, Peter!
Honestly, I do not know why with our PMM there are so many interesting things happening. I myself wonder why, besides me, no one has similar problems. Maybe our environment in something special, I do not know. Maybe I just do not know how to cook it And the lack of time to be a tester.
But in any case, let’s do some tests and measurements on each version with the same clients, under the most similar conditions, so that you can conduct a normal analysis:

You will tell exactly what and how to measure.
I will do this on versions 1.1.5 and 1.2.0 and give it to you.

Peter · August 6, 2017, 3:51pm

Aleksey,

It must be luck…

In any case I appreciate all the time and effort you put into helping us to make PMM better.

Let me ask the basic question - so you do basic install of PMM on m4.xlarge instance (4vCPU and 16GB of memory). You do not do any special configuration right ?
How do you have your EBS configured ?

When you’re adding one node - what is it ? MySQL Server ? Any special options you disable

Once you enable if you can upload how your prometheus dashboards looks like. And what wrong you’re seeing ? Note among other things Prometheus dashboard should show CPU and memory usage by prometheus process.

Catoman · August 8, 2017, 5:59am

Hi Aleksey and Peter !

I have had the same problem after upgrading from 1.1.5 to 1.2.0 (on both server and a few clients) yesterday. Started to get gaps in the graphs immediately. Reverted the server version to
1.1.5 this morning and the gaps have disappeared on (at least what i think) most of the graphs. Not sure about InnoDB Log Buffer Performance for example.
The difference between my setup and Alekseys is that we run on physical hardware (lots of RAM and SSD RAID) in our own datacentre.
We haven’t reinstalled any clients and haven’t noticed anything special with CPU usage on the server.

So yes, something seems to be weird with version 1.2.0

BR
Johan

aleksey.filippov · August 8, 2017, 6:27am

1. “so you do basic install of PMM on m4.xlarge instance (4vCPU and 16GB of memory)”

Yes

2. “You do not do any special configuration right ?”

instance was upgraded to last centos7
docker was moved from /var/lib/docker to /data/docker using simlink:

ls -l /data/docker/

drwx------ 5 root root 222 Авг 6 16:54 containers
drwx------ 3 root root 21 Авг 5 18:16 image
drwxr-x— 3 root root 19 Авг 5 18:16 network
drwx------ 27 root root 4096 Авг 6 16:54 overlay
drwx------ 4 root root 32 Авг 5 18:16 plugins
drwx------ 2 root root 6 Авг 5 18:16 swarm
drwx------ 2 root root 6 Авг 6 16:54 tmp
drwx------ 2 root root 6 Авг 5 18:16 trust
drwx------ 6 root root 313 Авг 5 18:28 volumes
[root@MySQL-PMMC ~]# ls -l /var/lib/ | grep docker
lrwxrwxrwx 1 root root 12 Авг 5 18:25 docker → /data/docker

3. "How do you have your EBS configured ? "

Volume 1 - Volume type “gp2”, IOPS 100/3000, Mountpoint “/”, Size 20Gb, Filesystem XFS, Not Encrypted
Volume 2 - Volume type “gp2”, IOPS 330/3000, Mountpoint “/data”, Size 110Gb, Filesystem XFS, Encrypted

4. “When you’re adding one node - what is it ? MySQL Server ?”

Yes, it is Percona MySQL Server 5.7.18. 60 databases(changes all the time) and 3500 tables for now.

5. “Any special options you disable”

Some disable, some enable.

Docker pmm setup:

docker run -d
-p 80:80
–volumes-from pmm-data
–name pmm-server
–restart always
–env TZ=“Europe/Kiev”
–env METRICS_RESOLUTION=5s
percona/pmm-server:latest

Client setup:

yum install -y pmm-client
pmm-admin config --server pmm.srv --client-name mysql.db
pmm-admin add linux:metrics
pmm-admin add mysql:metrics --disable-tablestats-limit 3000
pmm-admin add mysql:queries --query-source slowlog

6. And what wrong you’re seeing ? Note among other things Prometheus dashboard should show CPU and memory usage by prometheus process.

Later

aleksey.filippov · August 8, 2017, 6:38am

I tried to find what you want in 6 question in prometheus PMM 1.1.5. I was not successful :). I upgraded to 1.2.0(problems back), and was unsuccessful again. Can you show me screen shot or instruction what exactly i have to find in prometheus and share? Screen shot is better.

fritchie · August 8, 2017, 1:29pm

fwiw, I am also seeing gaps and no value for “Current QPS” after upgrading 1.1.5 to 1.2

aleksey.filippov · August 8, 2017, 1:33pm

Wow, I’m not alone
[URL=“https://www.percona.com/forums/questions-discussions/percona-monitoring-and-management/48687-prometheus-high-cpu”]https://www.percona.com/forums/quest...theus-high-cpu[/URL]
[URL=“https://www.percona.com/forums/questions-discussions/percona-monitoring-and-management/49047-pmm-1-2-0-a-lot-of-data-is-not-shown”]https://www.percona.com/forums/quest...a-is-not-shown[/URL]

I will try advice from related post “METRICS_MEMORY=786432”

Peter · August 8, 2017, 2:18pm

Thank you. Yes please try Metrics Memory increase. I’m still puzzled why it can’t handle even single MySQL, especially considering you’re using 5sec as resolution

fritchie · August 8, 2017, 4:56pm

increasing “Metrics Memory” didn’t help in my case

Peter · August 8, 2017, 7:28pm

Hn,

OK can you upload the image what you’re seeing on your prometheus dashboard such as this
[url]https://pmmdemo.percona.com/graph/dashboard/db/prometheus?refresh=1m&orgId=1[/url]

liuqian · August 8, 2017, 10:05pm

docker run -d -p 80:80 --volumes-from pmm-data --name pmm-server -e SERVER_USER=pmm -e SERVER_PASSWORD=123456 -e METRICS_MEMORY=786432 --restart always --init percona/pmm-server:1.2.0

aleksey.filippov · August 9, 2017, 1:14am

Shame on me. I must read previous topics carefully.
Yes, METRICS_MEMORY resolve problems with gaps. I think it is good idea to fix this in next release, or make mention in documentation.

docker run -d
-p 80:80
–volumes-from pmm-data
–name pmm-server
–restart always
–env TZ=“Europe/Kiev”
–env METRICS_RESOLUTION=5s
–env METRICS_MEMORY=7864320
percona/pmm-server:latest

So, biggest problem resolved.
liuqian, roma.novikov, Mykola, Peter - thanks guys!
Only one problem was not resolved - “no value” in “Current QPS” on “MySQL overview”.

Catoman · August 9, 2017, 2:41am

Hi all,

Setting METRICS_MEMORY seems to have fixed the gaps for me. Happily running 1.2.0 now. Have also updated my docs

BR
Johan

Roma_Novikov · August 9, 2017, 3:47am

aleksey.filippov , I think 5 sec - is the core of "QPS problem ". Created [url][PMM-1275] QPS SingleStat broken with scrape_interval: 5s - Percona JIRA and we’ll take a look what we can do with this.

aleksey.filippov · August 9, 2017, 3:55am

Thanks, Roman. Waiting for result

fritchie · August 9, 2017, 10:17am

fwiw, I also have resolution set to 5 seconds

Peter · August 9, 2017, 10:22am

Hi,

Edit the graph

replace:

rate(mysql_global_status_queries{instance=“$host”}[1s]) or irate(mysql_global_status_queries{instance=“$host”}[2s])

with

rate(mysql_global_status_queries{instance=“$host”}[1s]) or irate(mysql_global_status_queries{instance=“$host”}[5m])

fritchie · August 9, 2017, 11:07am

that works, thx

Topic		Replies	Views
PMM-Server is not reacting PMM 1.x	2	1153	September 30, 2016
Metrics loss on pmm-server PMM 1.x	3	1034	November 10, 2016
prometheus high cpu PMM 1.x	6	2175	April 19, 2018
missing data in PMM PMM 1.x	4	693	July 4, 2018
i did upgrade pmm 1.10 to 1.15 PMM 1.x	5	711	November 2, 2018

Problems after upgrade from 1.1.5 to 1.2.0

ls -l /data/docker/

Related topics