#openstack-meeting log

16:02:45 <mlavalle> #startmeeting neutron_performance
16:02:46 <openstack> Meeting started Mon Oct 22 16:02:45 2018 UTC and is due to finish in 60 minutes.  The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:02:47 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:02:49 <openstack> The meeting name has been set to 'neutron_performance'
16:02:50 <slaweq> hi
16:03:31 <njohnston> hello
16:03:46 <mlavalle> njohnston, slaweq, rubasov: thanks for attending
16:03:51 <rubasov> hello again
16:04:33 <clarkb> mlavalle: ah well they stopped ~12 hours ago so I don't know anyone is still around to do it for you :)
16:05:06 <mlavalle> #topic Discussion
16:05:33 <mlavalle> Eearlier today I sent each one of you (and others) a message with html files
16:05:42 <mlavalle> did everyone see it?
16:05:56 <slaweq> I'm just looking at them now
16:06:06 <rubasov> mlavalle: yes, I'm looking too
16:06:29 <rubasov> mlavalle: could you please give access to the add-router-interface.html too?
16:06:44 <rubasov> mlavalle: for some reason I can't open it
16:06:58 <slaweq> yes, I don't have access to this one also
16:07:31 <mlavalle> sure I'll resend it soon
16:08:09 <mlavalle> in the meantime, let me make a few general statements about them
16:09:46 <mlavalle> if you want to deploy a test environment with devstack enabling osprofiler, you just need to add two lines to your local.conf file:
16:09:59 <mlavalle> enable_plugin osprofiler https://git.openstack.org/openstack/osprofiler
16:10:13 <mlavalle> OSPROFILER_COLLECTOR=redis
16:10:25 <mlavalle> it doesn't have to be redis
16:10:46 <njohnston> if you use redis, do you have to already have that up and running, or does devstack handle setting it up for you?
16:10:54 <mlavalle> there are other options. But I don't have any preference and redis seems to be the most used by the odprofiler team
16:11:09 <mlavalle> njohnston: devstack sets it up for you
16:11:14 <njohnston> nice
16:11:50 <mlavalle> It most be part of what the osprofiler plugin for devstack does for you
16:13:11 <mlavalle> when stack.sh is done, you will see the following section added to neutron.conf: http://paste.openstack.org/show/732600/
16:13:47 <mlavalle> Please note that with this setup, you enable tracing in general and sqlalchemy tracing
16:14:14 <mlavalle> and as you can see, the 'SECRET_KEY' is the default
16:14:36 <rubasov> side note: and similar config should also be added to all other openstack component config files when devstack is done
16:14:51 <slaweq> have it got big performance impact on running neutron server? if not maybe we could enable it in gate jobs?
16:14:53 <mlavalle> rubasov: you got it
16:15:06 <rubasov> slaweq: the docs say it shouldn't
16:15:12 <mlavalle> slaweq: I cannot see any diffrence in performance
16:15:25 <mlavalle> whatsoever
16:15:39 <njohnston> +1 for adding it to the CI
16:16:14 <slaweq> ok, and how to get such report from the API call? is it generated for each call then?
16:16:29 <rubasov> slaweq: it's integrated with osc
16:17:06 <rubasov> it's like this: openstack ... --os-profile SECRET_KEY
16:17:18 <mlavalle> this is an example of a command execution along with the generation of the corresponding html file: http://paste.openstack.org/show/732601/
16:17:18 <rubasov> then a trace is generated for that call
16:17:46 <rubasov> and osc gives back an id of the trace
16:17:50 <mlavalle> Note that "Trace ID: b723c5c7-f87e-49e3-a51e-f1071a00c433" is added to the output of the command
16:18:13 <mlavalle> and you use that in the osprofiler command to generate the html file
16:18:22 <rubasov> eg: osprofiler trace show --html ... --connection-string redis://localhost:6379
16:19:20 <mlavalle> makes sene?
16:19:57 <slaweq> yep
16:20:16 <slaweq> but this means that we can't easily integrate it with tempest jobs
16:20:18 <slaweq> right?
16:20:36 <mlavalle> well, let me explain
16:20:49 <mlavalle> apparently it is integrated with Rally....
16:21:19 <slaweq> ok, sorry for disturbing then. Go on mlavalle :)
16:21:49 <rubasov> yes as mlavalle said, the test runner must know about osprofiler, and rally seems to know that
16:21:51 <mlavalle> these early sessions of this meeting are more brainstroming than anythig else
16:22:14 <tovin07> slaweq, for tempest job, you can see this patch https://review.openstack.org/#/c/523935/
16:22:18 <mlavalle> #link https://docs.openstack.org/rally/latest/quick_start/tutorial/step_10_profiling_openstack_internals.html
16:22:55 <tovin07> #link Add profiler support into Tempest https://review.openstack.org/#/c/523935/
16:23:06 <slaweq> tovin07: thx, I will check that
16:23:29 <mlavalle> tovin07: ahhh, that's so cool. Is this patch something that will be merged soon? or we take the patch and depploy it?
16:23:54 <rubasov> that's nice
16:24:46 <rubasov> I'm working on the other half - enabling osprofiler in a tempest gate check job: https://review.openstack.org/611835
16:26:06 <mlavalle> that's a good step forward rubaso
16:26:10 <mlavalle> rubasov:
16:26:39 <mlavalle> I spent ometime over the weekend looking at results of our rally jobs
16:26:57 <mlavalle> and it doesn't seem that we are generating the traces
16:26:57 <rubasov> last friday I thought ps3 was working already, then today I realized I misread the logs
16:27:34 <rubasov> but I'm progressing slowly as the gate gives me a chance once every few hours
16:27:49 <mlavalle> but if we were, according to the rally docs we would generate a osprofiler id for each request
16:27:53 <rubasov> mlavalle: yes, it's only enabling osprofiler in devstack, not tracing any call yet
16:28:33 <tovin07> mlavalle, it depends on tempest team
16:28:33 <mlavalle> and then we could retrieve the file generated and generate htnl analsis
16:28:48 <mlavalle> tovin07: thanks :-)
16:28:57 <tovin07> I think the patch is good to me for now
16:29:16 <rubasov> mlavalle: exactly
16:29:36 <mlavalle> so I think one action item would be to create a rally job with osprofiler enabled
16:29:58 <tovin07> if you guys feel plain html output from osprofiler is boring
16:30:15 <tovin07> you can use jeager as backend instead of redis
16:30:37 <slaweq> mlavalle: I can take care of this rally job
16:30:44 <mlavalle> I think it should be a periodic job
16:30:48 <tovin07> like this one for a simple glance command (sorry, it's not related to neutron :D )
16:30:49 <tovin07> http://137.116.141.93:16686/trace/92dbc0064374abfc
16:31:25 <mlavalle> nice tovin07 !
16:31:47 <mlavalle> what do other think of a periodic job?
16:32:04 <slaweq> mlavalle: periodic rally job with enabled osprofiler?
16:32:11 <mlavalle> yeah
16:32:26 <mlavalle> I just don't want to have too many jobs in the check queue
16:32:27 <tovin07> for rally job, we can look at this patch, hope that it can help https://review.openstack.org/#/c/467252/
16:32:58 <slaweq> would it be a problem to just enable it for neutron-rally-task job which we have now in check queue?
16:32:59 <njohnston> let's put it in experimental to start, so we can trigger it when we want it
16:33:20 <njohnston> slaweq++
16:33:26 <mlavalle> ok, experimental sounds good
16:33:31 <rubasov> mlavalle: at first it likely can't vote anyway, so it makes sense to make it periodical instead of check
16:34:04 <mlavalle> and no, I don see a problem using neutron-rally-task as a base
16:34:11 <mlavalle> so....
16:34:17 <mlavalle> agree?
16:34:24 <mlavalle> with the experimental job
16:34:26 <slaweq> mlavalle: but I was asking if we need new job for that
16:34:45 <mlavalle> slaweq: let's not make that decision today
16:34:57 <mlavalle> I want to see how much effect it has first
16:34:58 <rubasov> maybe stupid question: how do I trigger an experimental job?
16:35:00 <slaweq> ok, it can be experimental job at the beginning
16:35:14 <slaweq> rubasov: You write a comment "check experimental" in gerrit
16:35:22 <rubasov> slaweq: thanks
16:35:54 <slaweq> ok, I have one more question about this rally experimental job then
16:36:02 <mlavalle> #action slaweq to create a experimental rally job with osprofiler enabled
16:36:21 <slaweq> from commit message in https://review.openstack.org/#/c/467252/ and from rally docs it looks that this integration "just" put profiler id to each api call
16:36:33 <mlavalle> yes
16:36:42 <slaweq> how You think we should then genearate reports from each call in job?
16:36:44 <mlavalle> that's my understanding
16:36:56 <slaweq> should we add some script which will do it in post-run?
16:37:30 <mlavalle> my understanding is that we should be able to download the generated database and do the analysis locally
16:37:41 <mlavalle> but it is an assumption
16:37:56 <slaweq> ok, I will investigate that
16:38:03 <slaweq> and ask infra if will be necessary
16:38:04 <mlavalle> I would be looking at the person enabling the job to help us answer this
16:38:14 <mlavalle> i.e. slaweq
16:38:20 <mlavalle> :-)
16:38:22 <slaweq> :)
16:38:29 <rubasov> mlavalle: you mean the whole tracepoint db, right?
16:38:38 <mlavalle> rubasov: yeah
16:39:08 <mlavalle> this is the process I forsee:
16:39:18 <mlavalle> 1) We run the rally job
16:39:19 <rubasov> we may want to log the trace ids somewhere
16:39:43 <slaweq> rubasov: trace ids will be stored in rally result page for each iteration of each test
16:39:48 <mlavalle> 2) We identify problematci requests from the rally report
16:39:58 <rubasov> slaweq: cool
16:40:28 <mlavalle> 3) with the trace ids associated with the problematic requests, we analyze from the traces db
16:40:32 <mlavalle> makes sense?
16:40:47 <mlavalle> again, that is what I am assuming based on what I know so far
16:40:58 <slaweq> mlavalle: I think so, let's test that in practise :)
16:41:01 <rubasov> it does to me
16:41:46 <mlavalle> The other think is that someone should look at the requests that we are currently testing in our Rally job....
16:42:16 <mlavalle> the problematic ones that slaweq identified in his original analysis are not all there
16:42:32 <mlavalle> #link http://paste.openstack.org/show/731683/
16:42:46 <mlavalle> #link http://paste.openstack.org/show/731684/
16:43:05 <mlavalle> any volunteer to do that?
16:43:38 <rubasov> mlavalle: I can take that
16:44:10 <mlavalle> rubasov: thanks. I think the action item is to see what is missing and start adding some of them
16:44:26 <rubasov> mlavalle: ack
16:44:28 <mlavalle> Let's focus on the top 10 offenders in slaweq's analysis
16:45:01 <mlavalle> #action rubasov to add top request offnders to rally job
16:45:03 <slaweq> maybe we can (for now) skip first one and focus on 2-10th :)
16:45:15 <mlavalle> slaweq: absolutely. good point
16:45:20 <rubasov> slaweq: of course
16:45:26 <mlavalle> LOL
16:45:36 <slaweq> :)
16:45:36 <njohnston> :-)
16:46:24 <mlavalle> The other think that draw my attention looking at the html file is that we are duplicating or triplicating the information for DB related traces
16:46:59 <mlavalle> that is due to the fact that we enable the tracing both in the neutron-lib and the neutron DB api modules
16:47:30 <mlavalle> so if you look at the each DB query, it shows up several times
16:47:44 <mlavalle> I can take an action item to experiment and simplify that
16:47:52 <mlavalle> agree?
16:48:02 <njohnston> sounds good
16:48:05 <rubasov> mlavalle: yes
16:48:20 <njohnston> let me know if you'd like any help with that
16:48:33 <mlavalle> #action mlavalle to simplify the DB traces
16:48:37 <mlavalle> njohnston: will do
16:49:24 <mlavalle> I have one final point related to the traces
16:50:03 <mlavalle> if you look for example at the subnet creation ones, the largest amount of time is spent with the dhcp server rpc
16:50:14 <mlavalle> about two thirds
16:50:59 <mlavalle> and even in other requests where the relative weight is not that much (for example router create), you will see the dhcp rcp time is significant
16:51:07 <mlavalle> even the largest individually
16:51:17 <slaweq> yes, I also noticed that now
16:51:26 <mlavalle> I am not drawing any conclusions just yet about that
16:51:46 <mlavalle> becuase I tested without concurrent load
16:52:01 <mlavalle> I just wanted to draw your attention to the fact
16:52:11 <slaweq> I also found in router-create call that there is a lot of "SELECT 1" calls there
16:52:19 <slaweq> do You know what it is maybe?
16:52:26 <mlavalle> nope
16:52:34 <rubasov> my guess was some sqlalchemy connection check
16:52:42 <rubasov> but that's just a guess
16:52:46 <njohnston> yes, I believe rubasov is correct
16:52:47 <slaweq> there is e.g. such call which took 50 ms which is quite significant comapring to other calls
16:53:03 <mlavalle> interesting
16:53:20 <mlavalle> ok
16:53:34 <mlavalle> I think we've had a very productive meeting today :-)
16:53:43 <mlavalle> anything else we should discuss?
16:54:05 <tovin07> about Berlin summit
16:54:08 <rubasov> one more note about traces I noticed
16:54:15 <tovin07> will anyone join?
16:54:23 <mlavalle> tovin07: I'll be there
16:54:34 <mlavalle> and rubasov as well
16:54:40 <rubasov> if you trace nova too the SECRET_KEY can leak into periodic jobs
16:54:58 <rubasov> just ignore that if the trace is suspiciously long
16:55:11 <mlavalle> rubasov: nice point! thanks
16:55:12 <tovin07> mlavalle, see you there :D
16:55:13 <rubasov> tovin07: yep, as mlavalle said
16:55:38 <mlavalle> tovin07: maybe we can pick your brains in Berlin, if we have questions :_)
16:56:09 <tovin07> yep, hope that i can help
16:56:14 <mlavalle> :-)
16:56:42 <mlavalle> tovin07: this meeting is bi-weekly. you are more than welcome to contribute
16:57:02 <mlavalle> ok team, thanks for your attendance
16:57:12 <rubasov> thank you guys
16:57:12 <mlavalle> #endmeeting