15:30:02 <DinaBelova> #startmeeting Performance Team 15:30:03 <openstack> Meeting started Tue Sep 13 15:30:02 2016 UTC and is due to finish in 60 minutes. The chair is DinaBelova. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:30:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:30:07 <openstack> The meeting name has been set to 'performance_team' 15:30:14 <DinaBelova> hey folks o/ 15:30:20 <AugieMena> o/ 15:30:20 <ad_rien_> o/ 15:30:24 <aignatev> o/ 15:30:37 <DinaBelova> rohanion, rcherrueau o/ 15:30:38 <rcherrueau> Hi, 15:30:44 <msimonin> hello 15:30:49 <JorgeCardoso> o/ 15:31:04 <DinaBelova> JorgeCardoso nice to see you sir 15:31:11 <DinaBelova> ilyashakhat are you around today? :) 15:31:13 <JorgeCardoso> likewise :) 15:31:39 <DinaBelova> ok, so good evening / good morning :) 15:31:50 <DinaBelova> let's start with action items from last meeting 15:31:57 <DinaBelova> #topic Action Items 15:32:07 <DinaBelova> msimonin rcherrueau - one of them was on you folks 15:32:12 <DinaBelova> add information about what monitoring tools were used to the https://pad.inria.fr/p/FK18ZSyjFyLPITfE 15:32:40 <msimonin> yep sorry we are doing it right now 15:32:54 <DinaBelova> msimonin ack, lemme keep this action item 15:33:00 <rohanion> hey everyone! 15:33:08 <DinaBelova> #action msimonin rcherrueau add information about what monitoring tools were used to the https://pad.inria.fr/p/FK18ZSyjFyLPITfE 15:33:17 <DinaBelova> rohanion o/ 15:33:35 <DinaBelova> rohanion second action item is about you 15:33:46 <DinaBelova> publish osprofiler config patcher 15:33:55 <DinaBelova> rohanion afaik you pushed it somewhere 15:34:01 <DinaBelova> may you please share? 15:34:12 <rohanion> yes 15:34:29 <rohanion> https://github.com/rohanion/osprofiler-cfg 15:34:33 <rohanion> here we go 15:34:36 <DinaBelova> #link https://github.com/rohanion/osprofiler-cfg 15:34:54 <rohanion> is the readme good enough or should I change it? 15:35:27 <DinaBelova> rohanion I'd propose to add info that this needs to be run on all OpenStack nodes 15:35:40 <DinaBelova> rohanion and mention that services need to be restrted separately 15:35:46 <DinaBelova> as this depends much on the deployment 15:36:04 <DinaBelova> rohanion what do you think? 15:36:23 <rohanion> I guess it's pretty obvious... but will do, thanks 15:36:35 <DinaBelova> rohanion it's obvious for you :) 15:36:46 <DinaBelova> rohanion thanks 15:37:00 <DinaBelova> that' all about our action items for today 15:37:11 <DinaBelova> #topic Current progress on the planned tests 15:37:23 <DinaBelova> okay msimonin ad_rien_ rcherrueau - the floor is yours 15:37:29 <ad_rien_> thanks 15:37:35 <DinaBelova> I see some new details added to the etherpad 15:37:41 <ad_rien_> actually we have lot of questions 15:37:47 <ad_rien_> :-) 15:37:51 <DinaBelova> #link https://pad.inria.fr/p/FK18ZSyjFyLPITfE 15:37:53 <DinaBelova> ad_rien_ sure 15:38:12 <ad_rien_> msimonin will start 15:38:14 <msimonin> so, 15:38:16 <msimonin> :) 15:39:02 <msimonin> first set of experiments : collect metric of an "idle" openstack with different number of compute 15:39:28 <msimonin> this one is done 15:39:34 <msimonin> we are analysing the results 15:39:35 <DinaBelova> msimonin may you please clarify what is :idle: here? 15:39:44 <msimonin> DinaBelova: sure 15:40:11 <msimonin> For us idle means an openstack without any incoming requests 15:40:32 <msimonin> the goal is to measure the noise generated by openstack services 15:40:43 <msimonin> an maybe find some patterns 15:40:48 <DinaBelova> msimonin ack, gotcha 15:41:31 <msimonin> second set of experiments : collect metric of an "non-idle" openstack with different number of compute 15:41:56 <msimonin> basically during rally benchmarks 15:42:13 <DinaBelova> msimonin due to what I ee in the etherpad - some timeouts happen? 15:42:21 <msimonin> which line ? 15:42:30 <DinaBelova> 376 15:42:52 <msimonin> yep right this has to be explained 15:43:17 <DinaBelova> msimonin is this happening for boot VM I suppose? 15:43:39 <msimonin> the fact is that with a bigger deployment (1000 compute) we got some timeout in the rally reports 15:44:03 <DinaBelova> msimonin was rally timed out waiting for server to become active? 15:45:06 <rcherrueau> DinaBelova: Here is an example of error we got 'Timed out waiting for a reply to message ID b0628f97f1f942c1962859a3c290a731' 15:45:20 <DinaBelova> rcherrueau oh, interesting 15:45:52 <ad_rien_> Actually, we are working on these experiments now so we can dive into details I'm not sure whether it is the best time to discuss such points right now ;) Maybe we can just raise all our questions and see which ones need deeper investigations. 15:46:12 <DinaBelova> ad_rien_ probably ye 15:46:15 <ad_rien_> ok thanks 15:46:23 <DinaBelova> rcherrueau btw - where are you observing these logs? 15:46:28 <DinaBelova> the final quesiton :) 15:46:36 <ad_rien_> :) 15:46:37 <rcherrueau> From the rally report 15:46:38 <DinaBelova> in rally? or in one of the services? 15:46:41 <DinaBelova> rcherrueau ack 15:47:04 <rcherrueau> Yes I have to dig into this and take a look at the logs of other services 15:47:08 <rcherrueau> maybe rabbitmq 15:47:25 <DinaBelova> ad_rien_ rcherrueau if you won't be able to find the root cause quickly, may you please send me rally logs / rally report? 15:47:31 <ad_rien_> sure 15:47:53 <ad_rien_> Can we switch to next questions ;) ? 15:47:54 <DinaBelova> I'd jump to our rally folks :) they have observed all types of isssues :) 15:47:58 <DinaBelova> ad_rien_ sure 15:48:35 <msimonin> DinaBelova: cool :) 15:48:45 <ad_rien_> OK we observed another issue. If we do not redeploy our testbed between scenarios, we got weird behaviors 15:49:09 <ad_rien_> For that point we need to better understand what does happen…. 15:49:19 <ad_rien_> maybe you can just tell us if one your side you saw such issues ? 15:49:51 <ad_rien_> We plan to launch 10 times the same scenario on a small deployment to see whether we got same results 15:50:28 <ad_rien_> So DinaBelova others ? did you see such issue ? Maybe we should ask the rally team ? 15:50:29 <rohanion> folks, Dina's laptop froze to death, she's rebooting 15:50:37 <msimonin> lol 15:50:40 <rohanion> please wait for a couple of minutes 15:50:45 <ad_rien_> sorry 15:50:55 <msimonin> she's already tired of our questions 15:51:11 <ad_rien_> ;) 15:52:07 <ad_rien_> meanwhile AugieMena may I ask you if you received our email regarding the IBM scheduling presentation ? 15:52:37 <DinaBelova> folks, I'm back, sorry, my laptop went crazy 15:52:40 <ad_rien_> we are definitely interested by this work 15:52:44 <ad_rien_> ok 15:52:48 <DinaBelova> ad_rien_ may you please repeat everything I missed? :) 15:53:00 <DinaBelova> the last msg I see is about second question 15:53:15 <AugieMena> ad_rien_ Yes, got the email. I believe Chris K replied to that 15:53:52 <ad_rien_> We observed another issue. If we do not redeploy our testbed between scenarios, we got weird behaviors 15:53:52 <ad_rien_> For this point we need to better understand what does happen…. 15:53:52 <ad_rien_> Maybe you can just tell us if one your side you saw such issues ? 15:53:52 <ad_rien_> We plan to launch 10 times the same scenario on a small deployment to see whether we got same results 15:53:52 <ad_rien_> So DinaBelova others ? did you see such issue ? Maybe we should ask the rally team ? 15:54:26 <ad_rien_> s/one your side/on your side sorry for the typo 15:54:41 <DinaBelova> ad_rien_ ususally we do not observe some behavior changing - rally clean up usually is enough 15:54:50 <DinaBelova> I mean clean up done after every scenario 15:55:09 <ad_rien_> we also did that (actually Rally is doing that :() 15:55:18 <ad_rien_> ok we will confirm that for the next meeting 15:55:22 <DinaBelova> ad_rien_ - what kind of changes? andreykurilin - may you please try to help? 15:55:48 <DinaBelova> ad_rien_ ack 15:55:49 <ad_rien_> increase in the memory of the DB 15:55:54 <andreykurilin> DinaBelova ad_rien_: hi hi 15:56:06 <ad_rien_> Hi andreykurilin 15:56:22 <andreykurilin> ad_rien_ I'm rally core and I'm ready to help 15:56:23 <andreykurilin> ) 15:56:28 <ad_rien_> thanks 15:56:34 <ad_rien_> that's a good news 15:57:07 <ad_rien_> Actually we would like to be able reproduce such weird behaviors 15:57:12 <DinaBelova> ad_rien_ what DB? if Nova - so lines with deleted VMs still will be there :) with deleted == true there 15:57:22 <msimonin> soft-delete ? 15:57:24 <ad_rien_> so we can come back to you with a concrete example 15:57:31 <andreykurilin> ok 15:57:31 <DinaBelova> ad_rien_ sure 15:57:52 <DinaBelova> ad_rien_ anything else? 15:58:05 <msimonin> so the state of openstack isn't the same when running a second time a benchmark ? 15:58:06 <ad_rien_> Since the DB is not on the same state, the question is whether there will be an impact on the performance we measured 15:58:45 <ad_rien_> Memcached is also not cleaned 15:59:33 <DinaBelova> msimonin ad_rien_ well, that's a complex system - if you'll run tests always on purely new env, you'll always observe warming up OpenStack - with all caches being not filled, etc. 15:59:35 <ad_rien_> so I think we need time to identify all aspects and get back to andreykurilin 15:59:50 <andreykurilin> ad_rien_: feel free to ping me here or at #openstack-rally (there you can find more number of friendly rally cores) :) 16:00:09 <ad_rien_> DinaBelova: yes that's why we would like to see whether we can ensure reproducibility criteria between experiments 16:00:11 <DinaBelova> so the ideal test case is to run every case twice (each pair on 100% clean env) and count only a second attempt 16:00:22 <ad_rien_> andreykurilin: thanks 16:00:43 <DinaBelova> ad_rien_ yes, that's tricky 16:00:49 <ad_rien_> Ok 16:01:01 <ad_rien_> so the last question for today is related to the scheduling process on top of the fake driver 16:01:11 <DinaBelova> ad_rien_ sure 16:01:19 <ad_rien_> actually the fake driver presents 1000 vcpu and a lot of memory 16:01:36 <ad_rien_> we are wondering how 1000 VM instances are scheduled on top of a 1000 fake drivers 16:02:25 <ad_rien_> Can all VMs be scheduled on the same fake driver ? 16:02:34 <ad_rien_> What does prevent to get such a placement ? 16:03:54 <DinaBelova> ad_rien_ afaik the whole process is the same as for usual runs - filter scheduler uses filters to schedule servers to the compute nodes, some of the filters define if one node should be filled as a priority or all nodes should be filled in more or less same fasion (nova has info about resources already "eaten" on the each hypervisor) 16:04:18 <DinaBelova> ad_rien_ I guess you need to take a look on the scheduler filters used 16:04:29 <DinaBelova> and see what kind of configurations are used 16:05:00 <sai> DinaBelova: i have a few networking questions, let me know when i can go 16:05:09 <DinaBelova> sai sure 16:05:19 <ad_rien_> Ok we will perform a small test to answer this question 16:05:28 <ad_rien_> thanks DinaBelova this is ok from our side 16:05:30 <DinaBelova> sai we'll jump soon to the open discussions 16:05:35 <DinaBelova> ad_rien_ ack, thanks 16:05:39 <DinaBelova> please keep us updated 16:05:51 <DinaBelova> rohanion are you still around? ready for osprofiler? 16:06:01 <rohanion> always am! 16:06:04 <DinaBelova> #topic OSProfiler weekly update 16:06:19 <DinaBelova> rohanion please proceed 16:07:20 <rohanion> 5 sec.. 16:07:59 <rohanion> We're ready to merge the neutron patch https://review.openstack.org/#/c/342505/, waiting for CR from neutron core team 16:08:44 <rohanion> cinder https://review.openstack.org/#/c/315676/ - add one test and ready to merge too. 16:09:13 <DinaBelova> #link https://review.openstack.org/#/c/342505/ 16:09:19 <DinaBelova> #link https://review.openstack.org/#/c/315676/ 16:09:29 <DinaBelova> rohanion ack, thanks 16:09:37 <rohanion> it's not everything :) 16:09:52 <DinaBelova> it's just thanks :) you can proceed :D 16:09:56 <rohanion> glance https://review.openstack.org/#/c/316799/ - work in progress, debugging failed tests 16:10:07 <DinaBelova> #link https://review.openstack.org/#/c/316799/ 16:10:30 <rohanion> sahara https://review.openstack.org/#/c/344370/ - nothing, no active development. will ask sahara team for support 16:10:47 <rohanion> internal osprofiler stuff: 16:10:56 <DinaBelova> #link https://review.openstack.org/#/c/344370/ 16:11:16 <rohanion> elasticsearch https://review.openstack.org/#/c/340936/ - on review, almost ready to merge. will finish reviewing that tomorrow 16:12:01 <rohanion> redis driver https://review.openstack.org/#/c/364487/ - I had no time to review it, will work on that tomorrow as well. 16:12:10 <DinaBelova> rohanion ack 16:12:46 <rohanion> influxdb driver - no active development. if the performance team decides that we have to implement it asap, I'll code it in an hour. 16:12:55 <DinaBelova> rohanion :D 16:13:03 <DinaBelova> you'll be quick :D 16:13:14 <DinaBelova> anything else sir? 16:13:23 <rohanion> 1 hour writing - 1 week testing :) 16:13:34 <rohanion> no, that's everything regarding OSProfiler 16:13:43 <DinaBelova> rohanion thanks for extended update 16:13:46 <DinaBelova> #topic Open Discussion 16:13:52 <DinaBelova> sai please go ahead 16:13:55 <sai> sure 16:14:19 <sai> so DinaBelova was wondering what kind of networking tests you are running currently and I could update with some of the network perf work we are doing at red hat 16:14:55 <DinaBelova> sai so mostly we're running data plane tests using shaker 16:15:05 <sai> ok shaker is what we are using too 16:15:13 <DinaBelova> sai lemme share a test plan 16:15:29 <DinaBelova> #link http://docs.openstack.org/developer/performance-docs/test_plans/tenant_networking/plan.html 16:15:33 <DinaBelova> sai here it is ^^ 16:15:38 <sai> cool 16:15:44 <DinaBelova> and the related report http://docs.openstack.org/developer/performance-docs/test_results/tenant_networking/neutron_vxlan_dvr_200/index.html#tenant-networking-report-vxlan-dvr-200-nodes 16:15:45 <sai> any specific neutron features u r targetting? 16:15:53 <sai> we r working on DVR currently 16:16:03 <DinaBelova> sai for neutron features we have separated tests being run 16:16:08 <sai> and had some previous work looking at control plane performance tuning neutron workers 16:16:18 <DinaBelova> sai http://docs.openstack.org/developer/performance-docs/test_plans/hardware_features/index.html 16:16:30 <DinaBelova> as for DVR - we usually run with it, yes 16:16:45 <DinaBelova> sai some time ago we run both DVR and non-DVR topologies 16:17:05 <DinaBelova> right now mostly DVR (as requested by Mirantis) 16:17:07 <sai> cool, i have some of those results, i can share upstream and contribute to the repo 16:17:13 <sai> the comparison 16:17:17 <DinaBelova> sai this will be perfect 16:17:40 <DinaBelova> so please submit lab description here as well http://docs.openstack.org/developer/performance-docs/labs/index.html 16:17:43 <DinaBelova> sai ^^ 16:17:51 <DinaBelova> so we can compare the test bed as well 16:18:31 <sai> DinaBelova: what im really curious about is the tunings that u put into the network tests 16:18:37 <sai> threads of neteprf/iperf 16:18:40 <sai> and TCP packet sizes 16:18:43 <sai> or u run defaults 16:18:48 <DinaBelova> sai sorry I added a wrong link when was targeting the neutron features - here is right one http://docs.openstack.org/developer/performance-docs/test_plans/neutron_features/index.html 16:19:41 <DinaBelova> sai you can take a look here http://docs.openstack.org/developer/performance-docs/test_results/tenant_networking/neutron_vxlan_dvr_200/index.html#tenant-networking-report-vxlan-dvr-200-nodes - if you'll click to every scenario - e.g. http://docs.openstack.org/developer/performance-docs/test_results/tenant_networking/neutron_vxlan_dvr_200/perf_l2_dense/index.html - you'll see full scenario 16:19:41 <sai> DinaBelova: because we are running uperf with different IP packet sizes( outside of shaker) and seeing better throughput at higher packet sizes 16:20:00 <sai> cool thanks DinaBelova 16:20:11 <ilyashakhat_mobi> sai: we ran with defaults, but our networking guys tried with different settings, mostly number of threads 16:20:19 <DinaBelova> sai np, you're really welcome 16:20:22 <sai> hopefully i will controbute to performance-docs 16:20:37 <sai> and do i need to have a test plan or just test-results with scenario will help?/ 16:20:49 <sai> ilyashakhat_mobi: ack 16:20:59 <ilyashakhat_mobi> sai: it would be interesting to run tests with the same parameters as you do (meaning packet size) 16:21:17 <ilyashakhat_mobi> yu can follow existing plan or update it 16:21:38 <ilyashakhat_mobi> it's not restricted :) 16:21:50 <DinaBelova> sai if your tests are a subset of http://docs.openstack.org/developer/performance-docs/test_plans/tenant_networking/plan.html (just with different params) you can jut push test results with other configs and explanations - if scenarios are different, you can add yours to the test plan - http://docs.openstack.org/developer/performance-docs/test_plans/tenant_networking/plan.html 16:22:14 <sai> DinaBelova: ilyashakhat_mobi https://github.com/openstack/browbeat/tree/master/rally/rally-plugins/nova-create-pbench-uperf 16:22:36 <DinaBelova> #link https://github.com/openstack/browbeat/tree/master/rally/rally-plugins/nova-create-pbench-uperf 16:22:43 <sai> that is the rally plugin we use combined with pbench(inhouse but opensourced) 16:22:53 <sai> kicks of uperf with varying packet sizes and threads 16:23:39 <sai> prints rally summary with throughput results as well 16:23:46 <sai> so interesting data plane perf through rally 16:23:52 <DinaBelova> sai ok, so the tool is different - so you can add new section to the test plan I believe 16:24:11 <DinaBelova> as it's still the same testing field (tenant networking) but really other type of tools used 16:24:12 <sai> DinaBelova: my test results are with shaker 16:24:17 <DinaBelova> sai ah 16:24:19 <DinaBelova> I got it 16:24:25 <sai> but i have some toher results too with the tool i mentioned 16:24:27 <sai> :) 16:24:50 <sai> i think thats all the network perf questions i have 16:25:10 <DinaBelova> sai :D ack :) so shaker stuff can be just published as test report - against your env with your configs, for browbeat stuff I'll propose to add one more section to the test plan 16:25:30 <DinaBelova> and publish results regarding it 16:25:37 <DinaBelova> sai you're welcome 16:25:40 <sai> sure, we index shaker results into elastic 16:25:48 <sai> i can pull up the graphs thru kibana 16:25:50 <sai> :) 16:25:53 <DinaBelova> cool 16:25:57 <sai> that was we average out results 16:26:05 <sai> because im seeing varying results due to NUMA 16:26:12 <sai> when a vm is numa local vs non local 16:26:29 <sai> so we figured the best was it rerun same test multiple times and average out using elasticsearch/kibana 16:26:41 <sai> so we can get something really representative 16:26:41 <DinaBelova> sai I believe our networking folks were looking on this as well, but not sure 16:26:55 <DinaBelova> sai that's a good apprioach 16:27:18 <sai> cool, is there a specific subset of openstack-network=perf somewhere on irc 16:27:23 <sai> or is this the best channel 16:27:30 <sai> i know ilyashakhat_mobi does networking perf 16:27:34 <sai> not sure who else 16:27:35 <DinaBelova> sai I believe that's a best channel 16:27:42 <sai> awesome 16:27:43 <DinaBelova> and ilyashakhat_mobi is probably the best person to ping 16:27:45 <sai> ill stick to this then 16:27:49 <DinaBelova> sure 16:28:12 <DinaBelova> ok, anything else here? 16:28:17 <DinaBelova> msimonin rcherrueau ad_rien_ ? 16:28:30 <msimonin> nope 16:28:31 <ad_rien_> nothing on my side 16:28:32 <msimonin> thanks 16:28:35 <rcherrueau> nope 16:28:37 <DinaBelova> thanks folks! 16:28:39 <DinaBelova> bye! 16:28:41 <DinaBelova> #endmeeting