16:00:14 #startmeeting neutron_performance 16:00:15 Meeting started Mon Oct 8 16:00:14 2018 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:16 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:19 The meeting name has been set to 'neutron_performance' 16:00:56 hi everyone! 16:01:03 hi rubasov_ 16:01:44 let's give other guys a minute to join the meeting 16:02:22 sure, it's a first 16:03:23 ok, let's get going 16:03:33 #topic Brainstorm 16:04:12 slaweq sent me an email indicating that he will be unable to attend the meeting, due to personal reasons 16:05:12 I also know thant njonhnston will not attend. Today is Columbus Day in the US and some people take the day off (mostly beaacuse their children get their day off from school) 16:05:12 sorry to hear that, he did most of the work to kick this off 16:06:08 before getting to a solid agenda for this meeting, I wanted to use the first gathering to brainstrom how to tackle this problem 16:06:32 based on this we can evolve gradually an agenda for future meetings 16:06:46 is that ok? 16:06:59 makes sense 16:07:07 so.... 16:07:30 in prepartion for today's meeting I spent some time getting myself up to speed on osprofiler 16:07:53 https://github.com/openstack/osprofiler 16:08:16 to tell you the truth, I had seen some of the calls in the code 16:08:50 I'm still lagging in that regard, because I just returned to work from vacation 16:08:51 but never had sat down to understand exactly what it does 16:08:58 but I want to catch up on that 16:09:21 the good news is that it seems it is excatly what we need 16:10:02 it allows you to intrument the call to trace where the bottlenecks are located 16:10:13 the code^^^^ 16:10:32 that sounds good 16:10:37 and we even have already hooks in the code: 16:10:57 what did you use to generate load? rally? 16:11:00 http://codesearch.openstack.org/?q=osprofiler&i=nope&files=&repos=neutron 16:11:34 I haven't gotten so far as to generate load 16:11:51 but it seems Rally uses osprofiler anyways 16:12:12 sorry for getting too much ahead 16:12:21 that's ok 16:12:55 look at https://docs.openstack.org/developer/rally/quick_start/tutorial/step_11_profiling_openstack_internals.html 16:15:11 there is also a recent Vancouver Summit presentation on osprofiler: https://www.youtube.com/watch?v=Gvi8NfDjxxM 16:16:41 do we want to create a gate job using these tools? 16:17:26 and the other thing that I want to mention is that slaweq refined the script he used to get data and produced this: http://paste.openstack.org/show/731684/ and http://paste.openstack.org/show/731683/ 16:19:34 so based on all this here's the action items I propose between now and the next meeting: 16:20:44 1) We all should get more familiar with osprofiler. I propose we enable it in our local devstack and start playing with it locally, possibly combined with Rally 16:21:54 agree on 1) and sorry for not having done that already 16:22:54 2) We should make sure that we have in the code calls to osprofiler to capture the problematic areas that are reflected in the dta provided by slaweq. It seems to me that this means the L3 code (I see a lot of router and floatingip related requests in the top offenders) as well as ports and networks 16:23:46 we should probably throw in subnets, just to make sure we cover the fundamental absatractions 16:24:01 do you agree with my reading of the dat? 16:24:05 data^^^ 16:25:06 yes, especially the router-interface operations 16:26:38 it seems the l3 agent is intrumented: http://git.openstack.org/cgit/openstack/neutron/tree/neutron/agent/l3/agent.py#n179 16:27:05 but I am not sure that's the only place we want to look at 16:28:20 server-side code may also be interesting 16:28:53 exactly 16:29:11 I am trying to find the places in the code where I think we are profiling 16:29:15 * mlavalle digging 16:30:35 here they are: http://codesearch.openstack.org/?q=from%20neutron.common%20import%20profiler&i=nope&files=&repos=neutron 16:31:20 there are a couple of places there that seem server related: 16:32:14 http://git.openstack.org/cgit/openstack/neutron/tree/neutron/server/__init__.py#n77 16:32:19 and 16:32:54 http://git.openstack.org/cgit/openstack/neutron/tree/neutron/service.py#n71 16:33:19 but I am not sure that enough to start profiling 16:34:32 the next action item is 16:35:16 3) as rubasov_ already suggested, we need to start looking at creating a job in the check queue where we routenile gather profiling data 16:36:32 this job maybe should a mix of tests similar to the jobs where we see the most timeouts, which is where I assume slaweq has been gathering data 16:36:44 I'd be happy to work on that as soon as I catch up learning the tools 16:36:58 ok 16:37:03 let's do that 16:37:26 form my point of view, I think we have enough homework here for two weeks 16:37:34 what do you think? 16:37:41 I definitely do 16:37:52 ok, let's summarize: 16:38:12 #action mlavalle and rubasov to deepen knowledge on osprofiler 16:39:06 #action mlavalle to make sure we have calls to osprofiler in the code that correspond to problematic areas: routers, fips, ports, networks, subnets 16:39:31 #action rubasov to start defining a job for check queue to gather performance data 16:39:44 am I missing something? 16:39:51 sounds complete 16:40:33 btw, I think this presentation is pretty good: https://www.youtube.com/watch?v=Gvi8NfDjxxM 16:40:40 to get started with odprofiler 16:41:14 it's on my watchlist and next meeting I shall be more prepared 16:41:20 ok 16:41:37 Thanks for attending the first ever Neutron performance meeting! 16:41:46 Have a nice week 16:41:51 #endmeeting