15:30:02 #startmeeting Performance Team 15:30:03 Meeting started Tue Sep 6 15:30:02 2016 UTC and is due to finish in 60 minutes. The chair is DinaBelova. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:30:05 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:30:07 The meeting name has been set to 'performance_team' 15:30:14 hey folks :) 15:30:22 let's see who's around :) 15:30:29 o/ 15:30:33 rcherrueau, rohanion ? :) 15:30:35 gokrokve o/ 15:30:38 hello 15:30:49 rcherrueau o/ 15:30:53 hi! 15:30:54 msimonin o/ 15:30:58 ilyashakhat wow 15:31:10 ilyashakhat really glad to see you :) 15:31:13 hi all 15:31:28 yep! 15:31:28 hi 15:31:29 Hey DinaBelova, I can't attend today :( 15:31:38 But rcherrueau is here :) 15:31:43 JorgeCardoso o/ would you like to present yourself yo the audience? 15:31:46 msimonin ck 15:32:11 yes 15:32:14 to the audience*** :) sorry 15:32:26 My name is Jorge Cardoso. 15:32:26 please go ahead 15:33:05 I am Chief Architect at Huawei Research and working on CLoud Reliability and Resiliency 15:33:31 I am currently workign in a project with Deutsche Telekom on OpenStack Reliability 15:34:05 Very nice to meet you! 15:34:19 You can find more about my interests on reliability at: http://jorge-cardoso.github.io/research/ 15:34:29 JorgeCardoso nice to see you here :) do you have something to share regarding this activity? We're really interested in OpenStack 15:34:35 Like wise! 15:34:35 reliability 15:34:41 JorgeCardoso indeed :) 15:35:03 #info JorgeCardoso 's research regarding reliability at: http://jorge-cardoso.github.io/research/ 15:35:15 JorgeCardoso I'll definitely take a look on this 15:35:27 thank you so much 15:35:51 ok, so it looks like we may jump to the agenda 15:36:05 We are looking into Chaos Engineering and Google SRE to udnerstadn how a Fault Injection framework would look like for openstack 15:36:16 #link https://wiki.openstack.org/wiki/Meetings/Performance#Agenda_for_next_meeting 15:36:29 JorgeCardoso I think you may also speak to ilyashakhat 15:36:53 as we re working on the same thing right now - and we came up with rally+os-faults library 15:37:25 #link https://github.com/openstack/os-faults 15:37:54 right now we mostly interested in investigating how particular operations survive the single fault 15:37:58 I looked at it, and it is a good starting point IMHO 15:38:59 JorgeCardoso ok, thanks for the feedback, lemme take a look on your points described in doc, it looks like we can somehow merge our efforts on this 15:39:55 so we can jump to current progress on the planned tests 15:39:59 #topic Current progress on the planned tests 15:40:05 rcherrueau would you like to start? 15:40:10 yes 15:40:21 so we continue on the 1000 node experiment 15:40:32 There is a pad with our roadmap 15:40:42 #link https://pad.inria.fr/p/FK18ZSyjFyLPITfE 15:41:22 we curently collect a lot of information 15:41:28 rcherrueau it looks like you had several runs 15:41:54 result of rally / network traffic / cpu & me usage / memcache usage ... 15:42:08 several timeouts were observed? 15:42:21 DinaBelova several runs ? 15:42:34 rcherrueau I see pass #1, pass #2, etc 15:42:38 in the doc :) 15:42:57 Oh yes, you should look at ligne 169 15:43:05 163 sorry 15:43:14 line* 15:43:23 rcherrueau ack, thanks 15:43:37 rcherrueau any issues / interesting results observed so far? 15:44:22 nothing special, so far we mainly spend time on developing our framework to run test and collect information 15:44:56 now we have to interpret results of these experimentations 15:45:15 rcherrueau ack 15:45:30 rcherrueau btw do you have any monitoring for the containers? 15:45:38 yes 15:45:48 rcherrueau what are you using for this purpose? 15:45:59 what kind of tool? 15:46:06 we go with cadvisor 15:46:42 we also use some plugins from collectd 15:46:46 rcherrueau ack, I'm asking as I was experimenting with various pods / containers monitoring for openstack on top of k8s deployments 15:46:53 Did you integrate cadvisor with InfluxDB+Graphana? 15:46:59 especially to monitor rabbitmq and mariadb 15:47:10 gokrokve yes 15:47:11 That what we did in our 1000 nodes experiments 15:47:15 the same for collectd 15:47:31 rcherrueau and frankly speaking monitoring of ALL containers for all OpenStack services looks a bit overkill sometimes 15:47:45 Did you see problems with InfluxDB slowness? 15:48:22 no not me, but I will ask to msimonin tomorrow 15:48:24 When we used grafana sometimes Influx used 100 CPU and was not very responsive. 15:49:02 rcherrueau I can confirm 15:49:06 I'm sure msimonin tune grafana is such manner that there isn't too much writing into the database 15:49:40 That is cool. 15:49:44 rcherrueau ok, probably it'll be nice to include this information (about what monitoring tools were used and ho tuned) to this documnet 15:49:55 probably at the very end of it, just for information 15:49:57 We did just dirty wiring to have some graphs. 15:50:03 DinaBelova yes you'r right 15:50:40 rcherrueau I'll add an action item 15:51:10 #action rcherrueau msimonin add information about what monitoring tools were used to the https://pad.inria.fr/p/FK18ZSyjFyLPITfE 15:51:24 rcherrueau thank you for sharing this 15:51:38 you are welcome 15:52:02 from our side: we have merged 400 nodes baseline testing results to the performance docs 15:52:06 #link http://docs.openstack.org/developer/performance-docs/test_results/control_plane_400_nodes/index.html 15:52:22 dataplane and Neutron testing results are under construction 15:52:26 to get published 15:52:47 probably some first part of them will be shared before next meeting 15:53:42 we're having the plans to start fuel-ccp (containerized control plane) + k8s scale testing this week 15:53:48 let's see how it'll go 15:54:29 jfyi: I'm visiting k8s Scale Special Interest Group meetings 15:55:08 to understand how k8s itself as a basement for OpenStack deployments can scale and what kind of performance to support 15:55:39 so if anyone is interested in that kind of information, I really advice you to attend these meetings - they are quite interesting 15:56:21 also reliability tests with os-faults library are coming soon 15:56:32 so I think that's all from our side right now 15:56:58 let's proceed to the osprofiler 15:57:04 #topic OSProfiler weekly update 15:57:16 we have several new interesting changes on review 15:57:39 #info new Redis-based driver for OSprofiler among them 15:57:41 #link https://review.openstack.org/#/c/364487/ 15:58:12 frankly speaking did not have chance to try it yet on the test env, I'm having it in my todo list for this wee 15:58:14 kweek* 15:58:50 #info ElasticSearch driver still under development https://review.openstack.org/#/c/340936/ 15:59:53 afaik rohanion you did not have a chance to publish your script for enabling profiling on vanilla OpenStack yet? 16:00:39 rohanion ? :) 16:00:46 actually, there are 2 scripts 16:00:58 a script that patches conf files 16:01:05 and a script that restarts the services 16:01:19 the latter requires Fuel 16:01:36 rohanion well yeah, that is the deployment-specific thing 16:01:39 the first just hacks the config files so I can publish it somewhere 16:01:53 rohanion yes please, this will be useful 16:01:54 it's really short btw 16:02:02 rohanion I believe it should :) 16:02:18 #action @rohanion publish osprofiler config patcher 16:02:40 ok, so it looks like that's all regarding the profiler 16:03:00 for today we did not more specific items in agenda 16:03:03 #topic Open Discussion 16:03:12 anything to share, folks? 16:04:01 it looks like nope 16:04:16 so thanks everyone for coming, have a nice day/evening 16:04:22 #endmeeting