16:02:45 #startmeeting neutron_performance 16:02:46 Meeting started Mon Oct 22 16:02:45 2018 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:02:47 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:02:49 The meeting name has been set to 'neutron_performance' 16:02:50 hi 16:03:31 hello 16:03:46 njohnston, slaweq, rubasov: thanks for attending 16:03:51 hello again 16:04:33 mlavalle: ah well they stopped ~12 hours ago so I don't know anyone is still around to do it for you :) 16:05:06 #topic Discussion 16:05:33 Eearlier today I sent each one of you (and others) a message with html files 16:05:42 did everyone see it? 16:05:56 I'm just looking at them now 16:06:06 mlavalle: yes, I'm looking too 16:06:29 mlavalle: could you please give access to the add-router-interface.html too? 16:06:44 mlavalle: for some reason I can't open it 16:06:58 yes, I don't have access to this one also 16:07:31 sure I'll resend it soon 16:08:09 in the meantime, let me make a few general statements about them 16:09:46 if you want to deploy a test environment with devstack enabling osprofiler, you just need to add two lines to your local.conf file: 16:09:59 enable_plugin osprofiler https://git.openstack.org/openstack/osprofiler 16:10:13 OSPROFILER_COLLECTOR=redis 16:10:25 it doesn't have to be redis 16:10:46 if you use redis, do you have to already have that up and running, or does devstack handle setting it up for you? 16:10:54 there are other options. But I don't have any preference and redis seems to be the most used by the odprofiler team 16:11:09 njohnston: devstack sets it up for you 16:11:14 nice 16:11:50 It most be part of what the osprofiler plugin for devstack does for you 16:13:11 when stack.sh is done, you will see the following section added to neutron.conf: http://paste.openstack.org/show/732600/ 16:13:47 Please note that with this setup, you enable tracing in general and sqlalchemy tracing 16:14:14 and as you can see, the 'SECRET_KEY' is the default 16:14:36 side note: and similar config should also be added to all other openstack component config files when devstack is done 16:14:51 have it got big performance impact on running neutron server? if not maybe we could enable it in gate jobs? 16:14:53 rubasov: you got it 16:15:06 slaweq: the docs say it shouldn't 16:15:12 slaweq: I cannot see any diffrence in performance 16:15:25 whatsoever 16:15:39 +1 for adding it to the CI 16:16:14 ok, and how to get such report from the API call? is it generated for each call then? 16:16:29 slaweq: it's integrated with osc 16:17:06 it's like this: openstack ... --os-profile SECRET_KEY 16:17:18 this is an example of a command execution along with the generation of the corresponding html file: http://paste.openstack.org/show/732601/ 16:17:18 then a trace is generated for that call 16:17:46 and osc gives back an id of the trace 16:17:50 Note that "Trace ID: b723c5c7-f87e-49e3-a51e-f1071a00c433" is added to the output of the command 16:18:13 and you use that in the osprofiler command to generate the html file 16:18:22 eg: osprofiler trace show --html ... --connection-string redis://localhost:6379 16:19:20 makes sene? 16:19:57 yep 16:20:16 but this means that we can't easily integrate it with tempest jobs 16:20:18 right? 16:20:36 well, let me explain 16:20:49 apparently it is integrated with Rally.... 16:21:19 ok, sorry for disturbing then. Go on mlavalle :) 16:21:49 yes as mlavalle said, the test runner must know about osprofiler, and rally seems to know that 16:21:51 these early sessions of this meeting are more brainstroming than anythig else 16:22:14 slaweq, for tempest job, you can see this patch https://review.openstack.org/#/c/523935/ 16:22:18 #link https://docs.openstack.org/rally/latest/quick_start/tutorial/step_10_profiling_openstack_internals.html 16:22:55 #link Add profiler support into Tempest https://review.openstack.org/#/c/523935/ 16:23:06 tovin07: thx, I will check that 16:23:29 tovin07: ahhh, that's so cool. Is this patch something that will be merged soon? or we take the patch and depploy it? 16:23:54 that's nice 16:24:46 I'm working on the other half - enabling osprofiler in a tempest gate check job: https://review.openstack.org/611835 16:26:06 that's a good step forward rubaso 16:26:10 rubasov: 16:26:39 I spent ometime over the weekend looking at results of our rally jobs 16:26:57 and it doesn't seem that we are generating the traces 16:26:57 last friday I thought ps3 was working already, then today I realized I misread the logs 16:27:34 but I'm progressing slowly as the gate gives me a chance once every few hours 16:27:49 but if we were, according to the rally docs we would generate a osprofiler id for each request 16:27:53 mlavalle: yes, it's only enabling osprofiler in devstack, not tracing any call yet 16:28:33 mlavalle, it depends on tempest team 16:28:33 and then we could retrieve the file generated and generate htnl analsis 16:28:48 tovin07: thanks :-) 16:28:57 I think the patch is good to me for now 16:29:16 mlavalle: exactly 16:29:36 so I think one action item would be to create a rally job with osprofiler enabled 16:29:58 if you guys feel plain html output from osprofiler is boring 16:30:15 you can use jeager as backend instead of redis 16:30:37 mlavalle: I can take care of this rally job 16:30:44 I think it should be a periodic job 16:30:48 like this one for a simple glance command (sorry, it's not related to neutron :D ) 16:30:49 http://137.116.141.93:16686/trace/92dbc0064374abfc 16:31:25 nice tovin07 ! 16:31:47 what do other think of a periodic job? 16:32:04 mlavalle: periodic rally job with enabled osprofiler? 16:32:11 yeah 16:32:26 I just don't want to have too many jobs in the check queue 16:32:27 for rally job, we can look at this patch, hope that it can help https://review.openstack.org/#/c/467252/ 16:32:58 would it be a problem to just enable it for neutron-rally-task job which we have now in check queue? 16:32:59 let's put it in experimental to start, so we can trigger it when we want it 16:33:20 slaweq++ 16:33:26 ok, experimental sounds good 16:33:31 mlavalle: at first it likely can't vote anyway, so it makes sense to make it periodical instead of check 16:34:04 and no, I don see a problem using neutron-rally-task as a base 16:34:11 so.... 16:34:17 agree? 16:34:24 with the experimental job 16:34:26 mlavalle: but I was asking if we need new job for that 16:34:45 slaweq: let's not make that decision today 16:34:57 I want to see how much effect it has first 16:34:58 maybe stupid question: how do I trigger an experimental job? 16:35:00 ok, it can be experimental job at the beginning 16:35:14 rubasov: You write a comment "check experimental" in gerrit 16:35:22 slaweq: thanks 16:35:54 ok, I have one more question about this rally experimental job then 16:36:02 #action slaweq to create a experimental rally job with osprofiler enabled 16:36:21 from commit message in https://review.openstack.org/#/c/467252/ and from rally docs it looks that this integration "just" put profiler id to each api call 16:36:33 yes 16:36:42 how You think we should then genearate reports from each call in job? 16:36:44 that's my understanding 16:36:56 should we add some script which will do it in post-run? 16:37:30 my understanding is that we should be able to download the generated database and do the analysis locally 16:37:41 but it is an assumption 16:37:56 ok, I will investigate that 16:38:03 and ask infra if will be necessary 16:38:04 I would be looking at the person enabling the job to help us answer this 16:38:14 i.e. slaweq 16:38:20 :-) 16:38:22 :) 16:38:29 mlavalle: you mean the whole tracepoint db, right? 16:38:38 rubasov: yeah 16:39:08 this is the process I forsee: 16:39:18 1) We run the rally job 16:39:19 we may want to log the trace ids somewhere 16:39:43 rubasov: trace ids will be stored in rally result page for each iteration of each test 16:39:48 2) We identify problematci requests from the rally report 16:39:58 slaweq: cool 16:40:28 3) with the trace ids associated with the problematic requests, we analyze from the traces db 16:40:32 makes sense? 16:40:47 again, that is what I am assuming based on what I know so far 16:40:58 mlavalle: I think so, let's test that in practise :) 16:41:01 it does to me 16:41:46 The other think is that someone should look at the requests that we are currently testing in our Rally job.... 16:42:16 the problematic ones that slaweq identified in his original analysis are not all there 16:42:32 #link http://paste.openstack.org/show/731683/ 16:42:46 #link http://paste.openstack.org/show/731684/ 16:43:05 any volunteer to do that? 16:43:38 mlavalle: I can take that 16:44:10 rubasov: thanks. I think the action item is to see what is missing and start adding some of them 16:44:26 mlavalle: ack 16:44:28 Let's focus on the top 10 offenders in slaweq's analysis 16:45:01 #action rubasov to add top request offnders to rally job 16:45:03 maybe we can (for now) skip first one and focus on 2-10th :) 16:45:15 slaweq: absolutely. good point 16:45:20 slaweq: of course 16:45:26 LOL 16:45:36 :) 16:45:36 :-) 16:46:24 The other think that draw my attention looking at the html file is that we are duplicating or triplicating the information for DB related traces 16:46:59 that is due to the fact that we enable the tracing both in the neutron-lib and the neutron DB api modules 16:47:30 so if you look at the each DB query, it shows up several times 16:47:44 I can take an action item to experiment and simplify that 16:47:52 agree? 16:48:02 sounds good 16:48:05 mlavalle: yes 16:48:20 let me know if you'd like any help with that 16:48:33 #action mlavalle to simplify the DB traces 16:48:37 njohnston: will do 16:49:24 I have one final point related to the traces 16:50:03 if you look for example at the subnet creation ones, the largest amount of time is spent with the dhcp server rpc 16:50:14 about two thirds 16:50:59 and even in other requests where the relative weight is not that much (for example router create), you will see the dhcp rcp time is significant 16:51:07 even the largest individually 16:51:17 yes, I also noticed that now 16:51:26 I am not drawing any conclusions just yet about that 16:51:46 becuase I tested without concurrent load 16:52:01 I just wanted to draw your attention to the fact 16:52:11 I also found in router-create call that there is a lot of "SELECT 1" calls there 16:52:19 do You know what it is maybe? 16:52:26 nope 16:52:34 my guess was some sqlalchemy connection check 16:52:42 but that's just a guess 16:52:46 yes, I believe rubasov is correct 16:52:47 there is e.g. such call which took 50 ms which is quite significant comapring to other calls 16:53:03 interesting 16:53:20 ok 16:53:34 I think we've had a very productive meeting today :-) 16:53:43 anything else we should discuss? 16:54:05 about Berlin summit 16:54:08 one more note about traces I noticed 16:54:15 will anyone join? 16:54:23 tovin07: I'll be there 16:54:34 and rubasov as well 16:54:40 if you trace nova too the SECRET_KEY can leak into periodic jobs 16:54:58 just ignore that if the trace is suspiciously long 16:55:11 rubasov: nice point! thanks 16:55:12 mlavalle, see you there :D 16:55:13 tovin07: yep, as mlavalle said 16:55:38 tovin07: maybe we can pick your brains in Berlin, if we have questions :_) 16:56:09 yep, hope that i can help 16:56:14 :-) 16:56:42 tovin07: this meeting is bi-weekly. you are more than welcome to contribute 16:57:02 ok team, thanks for your attendance 16:57:12 thank you guys 16:57:12 #endmeeting