#openstack-performance log

15:30:02 <DinaBelova> #startmeeting Performance Team
15:30:03 <openstack> Meeting started Tue Sep 13 15:30:02 2016 UTC and is due to finish in 60 minutes.  The chair is DinaBelova. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:30:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:30:07 <openstack> The meeting name has been set to 'performance_team'
15:30:14 <DinaBelova> hey folks o/
15:30:20 <AugieMena> o/
15:30:20 <ad_rien_> o/
15:30:24 <aignatev> o/
15:30:37 <DinaBelova> rohanion, rcherrueau o/
15:30:38 <rcherrueau> Hi,
15:30:44 <msimonin> hello
15:30:49 <JorgeCardoso> o/
15:31:04 <DinaBelova> JorgeCardoso nice to see you sir
15:31:11 <DinaBelova> ilyashakhat are you around today? :)
15:31:13 <JorgeCardoso> likewise :)
15:31:39 <DinaBelova> ok, so good evening / good morning :)
15:31:50 <DinaBelova> let's start with action items from last meeting
15:31:57 <DinaBelova> #topic Action Items
15:32:07 <DinaBelova> msimonin rcherrueau - one of them was on you folks
15:32:12 <DinaBelova> add information about what monitoring tools were used to the https://pad.inria.fr/p/FK18ZSyjFyLPITfE
15:32:40 <msimonin> yep sorry we are doing it right now
15:32:54 <DinaBelova> msimonin ack, lemme keep this action item
15:33:00 <rohanion> hey everyone!
15:33:08 <DinaBelova> #action msimonin rcherrueau add information about what monitoring tools were used to the https://pad.inria.fr/p/FK18ZSyjFyLPITfE
15:33:17 <DinaBelova> rohanion o/
15:33:35 <DinaBelova> rohanion second action item is about you
15:33:46 <DinaBelova> publish osprofiler config patcher
15:33:55 <DinaBelova> rohanion afaik you pushed it somewhere
15:34:01 <DinaBelova> may you please share?
15:34:12 <rohanion> yes
15:34:29 <rohanion> https://github.com/rohanion/osprofiler-cfg
15:34:33 <rohanion> here we go
15:34:36 <DinaBelova> #link https://github.com/rohanion/osprofiler-cfg
15:34:54 <rohanion> is the readme good enough or should I change it?
15:35:27 <DinaBelova> rohanion I'd propose to add info that this needs to be run on all OpenStack nodes
15:35:40 <DinaBelova> rohanion and mention that services need to be restrted separately
15:35:46 <DinaBelova> as this depends much on the deployment
15:36:04 <DinaBelova> rohanion what do you think?
15:36:23 <rohanion> I guess it's pretty obvious... but will do, thanks
15:36:35 <DinaBelova> rohanion it's obvious for you :)
15:36:46 <DinaBelova> rohanion thanks
15:37:00 <DinaBelova> that' all about our action items for today
15:37:11 <DinaBelova> #topic Current progress on the planned tests
15:37:23 <DinaBelova> okay msimonin ad_rien_ rcherrueau - the floor is yours
15:37:29 <ad_rien_> thanks
15:37:35 <DinaBelova> I see some new details added to the etherpad
15:37:41 <ad_rien_> actually we have lot of questions
15:37:47 <ad_rien_> :-)
15:37:51 <DinaBelova> #link https://pad.inria.fr/p/FK18ZSyjFyLPITfE
15:37:53 <DinaBelova> ad_rien_ sure
15:38:12 <ad_rien_> msimonin will start
15:38:14 <msimonin> so,
15:38:16 <msimonin> :)
15:39:02 <msimonin> first set of experiments : collect metric of an "idle" openstack with different number of compute
15:39:28 <msimonin> this one is done
15:39:34 <msimonin> we are analysing the results
15:39:35 <DinaBelova> msimonin may you please clarify what is :idle: here?
15:39:44 <msimonin> DinaBelova: sure
15:40:11 <msimonin> For us idle means an openstack without any incoming requests
15:40:32 <msimonin> the goal is to measure the noise generated by openstack services
15:40:43 <msimonin> an maybe find some patterns
15:40:48 <DinaBelova> msimonin ack, gotcha
15:41:31 <msimonin> second set of experiments : collect metric of an "non-idle" openstack with different number of compute
15:41:56 <msimonin> basically during rally benchmarks
15:42:13 <DinaBelova> msimonin due to what I ee in the etherpad - some timeouts happen?
15:42:21 <msimonin> which line ?
15:42:30 <DinaBelova> 376
15:42:52 <msimonin> yep right this has to be explained
15:43:17 <DinaBelova> msimonin is this happening for boot VM I suppose?
15:43:39 <msimonin> the fact is that with a bigger deployment (1000 compute) we got some timeout in the rally reports
15:44:03 <DinaBelova> msimonin was rally timed out waiting for server to become active?
15:45:06 <rcherrueau> DinaBelova: Here is an example of error we got 'Timed out waiting for a reply to message ID b0628f97f1f942c1962859a3c290a731'
15:45:20 <DinaBelova> rcherrueau oh, interesting
15:45:52 <ad_rien_> Actually, we are working on  these experiments now so we can dive into details I'm not sure whether it is the best time to discuss such points right now ;)  Maybe we can just raise all our questions and see which ones need deeper investigations.
15:46:12 <DinaBelova> ad_rien_ probably ye
15:46:15 <ad_rien_> ok thanks
15:46:23 <DinaBelova> rcherrueau btw - where are you observing these logs?
15:46:28 <DinaBelova> the final quesiton :)
15:46:36 <ad_rien_> :)
15:46:37 <rcherrueau> From the rally report
15:46:38 <DinaBelova> in rally? or in one of the services?
15:46:41 <DinaBelova> rcherrueau ack
15:47:04 <rcherrueau> Yes I have to dig into this and take a look at the logs of other services
15:47:08 <rcherrueau> maybe rabbitmq
15:47:25 <DinaBelova> ad_rien_ rcherrueau if you won't be able to find the root cause quickly, may you please send me rally logs / rally report?
15:47:31 <ad_rien_> sure
15:47:53 <ad_rien_> Can we switch to next questions ;) ?
15:47:54 <DinaBelova> I'd jump to our rally folks :) they have observed all types of isssues :)
15:47:58 <DinaBelova> ad_rien_ sure
15:48:35 <msimonin> DinaBelova: cool :)
15:48:45 <ad_rien_> OK we observed  another issue. If we do not redeploy our testbed between scenarios, we got weird behaviors
15:49:09 <ad_rien_> For that point we need to better understand what does happen….
15:49:19 <ad_rien_> maybe you can just tell us if one your side you saw such issues ?
15:49:51 <ad_rien_> We plan to launch 10 times the same scenario on a small deployment to see whether we got same results
15:50:28 <ad_rien_> So DinaBelova others ? did you see such issue ? Maybe we should ask the rally team ?
15:50:29 <rohanion> folks, Dina's laptop froze to death, she's rebooting
15:50:37 <msimonin> lol
15:50:40 <rohanion> please wait for a couple of minutes
15:50:45 <ad_rien_> sorry
15:50:55 <msimonin> she's already tired of our questions
15:51:11 <ad_rien_> ;)
15:52:07 <ad_rien_> meanwhile AugieMena may I ask you if you received our email regarding the IBM scheduling presentation ?
15:52:37 <DinaBelova> folks, I'm back, sorry, my laptop went crazy
15:52:40 <ad_rien_> we are definitely interested by this work
15:52:44 <ad_rien_> ok
15:52:48 <DinaBelova> ad_rien_ may you please repeat everything I missed? :)
15:53:00 <DinaBelova> the last msg I see is about second question
15:53:15 <AugieMena> ad_rien_ Yes, got the email.  I believe Chris K replied to that
15:53:52 <ad_rien_> We observed  another issue. If we do not redeploy our testbed between scenarios, we got weird behaviors
15:53:52 <ad_rien_> For this point we need to better understand what does happen….
15:53:52 <ad_rien_> Maybe you can just tell us if one your side you saw such issues ?
15:53:52 <ad_rien_> We plan to launch 10 times the same scenario on a small deployment to see whether we got same results
15:53:52 <ad_rien_> So DinaBelova others ? did you see such issue ? Maybe we should ask the rally team ?
15:54:26 <ad_rien_> s/one your side/on your side sorry for the typo
15:54:41 <DinaBelova> ad_rien_ ususally we do not observe some behavior changing - rally clean up usually is enough
15:54:50 <DinaBelova> I mean clean up done after every scenario
15:55:09 <ad_rien_> we also did that (actually Rally is doing that :()
15:55:18 <ad_rien_> ok we will confirm that for the next meeting
15:55:22 <DinaBelova> ad_rien_ - what kind of changes? andreykurilin - may you please try to help?
15:55:48 <DinaBelova> ad_rien_ ack
15:55:49 <ad_rien_> increase in the memory of the DB
15:55:54 <andreykurilin> DinaBelova ad_rien_: hi hi
15:56:06 <ad_rien_> Hi andreykurilin
15:56:22 <andreykurilin> ad_rien_ I'm rally core and I'm ready to help
15:56:23 <andreykurilin> )
15:56:28 <ad_rien_> thanks
15:56:34 <ad_rien_> that's a good news
15:57:07 <ad_rien_> Actually we would like to be able reproduce such weird behaviors
15:57:12 <DinaBelova> ad_rien_ what DB? if Nova - so lines with deleted VMs still will be there :) with deleted == true there
15:57:22 <msimonin> soft-delete ?
15:57:24 <ad_rien_> so we can come back to you with a concrete example
15:57:31 <andreykurilin> ok
15:57:31 <DinaBelova> ad_rien_ sure
15:57:52 <DinaBelova> ad_rien_ anything else?
15:58:05 <msimonin> so the state of openstack isn't the same when running a second time a benchmark ?
15:58:06 <ad_rien_> Since the DB is not on the same state, the question is whether there will be an impact on the performance we measured
15:58:45 <ad_rien_> Memcached is also not cleaned
15:59:33 <DinaBelova> msimonin ad_rien_ well, that's a complex system - if you'll run tests always on purely new env, you'll always observe warming up OpenStack - with all caches being not filled, etc.
15:59:35 <ad_rien_> so I think we need time to identify all aspects and get back to andreykurilin
15:59:50 <andreykurilin> ad_rien_: feel free to ping me here or at #openstack-rally (there you can find more number of friendly rally cores) :)
16:00:09 <ad_rien_> DinaBelova:   yes that's why we would like to see whether we can ensure reproducibility criteria between experiments
16:00:11 <DinaBelova> so the ideal test case is to run every case twice (each pair on 100% clean env) and count only a second attempt
16:00:22 <ad_rien_> andreykurilin: thanks
16:00:43 <DinaBelova> ad_rien_ yes, that's tricky
16:00:49 <ad_rien_> Ok
16:01:01 <ad_rien_> so the last question for today is related to the scheduling process on top of the fake driver
16:01:11 <DinaBelova> ad_rien_ sure
16:01:19 <ad_rien_> actually the fake driver presents 1000 vcpu and a lot of memory
16:01:36 <ad_rien_> we are wondering how 1000 VM instances are scheduled on top of a 1000 fake drivers
16:02:25 <ad_rien_> Can all VMs be scheduled on the same fake driver ?
16:02:34 <ad_rien_> What does prevent to get such a placement ?
16:03:54 <DinaBelova> ad_rien_ afaik the whole process is the same as for usual runs - filter scheduler uses filters to schedule servers to the compute nodes, some of the filters define if one node should be filled as a priority or all nodes should be filled in more or less same fasion (nova has info about resources already "eaten" on the each hypervisor)
16:04:18 <DinaBelova> ad_rien_ I guess you need to take a look on the scheduler filters used
16:04:29 <DinaBelova> and see what kind of configurations are used
16:05:00 <sai> DinaBelova: i have a few networking questions, let me know when i can go
16:05:09 <DinaBelova> sai sure
16:05:19 <ad_rien_> Ok we will perform a small test to answer this question
16:05:28 <ad_rien_> thanks DinaBelova this is ok from our side
16:05:30 <DinaBelova> sai we'll jump soon to the open discussions
16:05:35 <DinaBelova> ad_rien_ ack, thanks
16:05:39 <DinaBelova> please keep us updated
16:05:51 <DinaBelova> rohanion are you still around? ready for osprofiler?
16:06:01 <rohanion> always am!
16:06:04 <DinaBelova> #topic OSProfiler weekly update
16:06:19 <DinaBelova> rohanion please proceed
16:07:20 <rohanion> 5 sec..
16:07:59 <rohanion> We're ready to merge the neutron patch https://review.openstack.org/#/c/342505/, waiting for CR from neutron core team
16:08:44 <rohanion> cinder https://review.openstack.org/#/c/315676/ - add one test and ready to merge too.
16:09:13 <DinaBelova> #link  https://review.openstack.org/#/c/342505/
16:09:19 <DinaBelova> #link https://review.openstack.org/#/c/315676/
16:09:29 <DinaBelova> rohanion ack, thanks
16:09:37 <rohanion> it's not everything :)
16:09:52 <DinaBelova> it's just thanks :) you can proceed :D
16:09:56 <rohanion> glance https://review.openstack.org/#/c/316799/ - work in progress, debugging failed tests
16:10:07 <DinaBelova> #link  https://review.openstack.org/#/c/316799/
16:10:30 <rohanion> sahara https://review.openstack.org/#/c/344370/ - nothing, no active development. will ask sahara team for support
16:10:47 <rohanion> internal osprofiler stuff:
16:10:56 <DinaBelova> #link  https://review.openstack.org/#/c/344370/
16:11:16 <rohanion> elasticsearch https://review.openstack.org/#/c/340936/ - on review, almost ready to merge. will finish reviewing that tomorrow
16:12:01 <rohanion> redis driver https://review.openstack.org/#/c/364487/ - I had no time to review it, will work on that tomorrow as well.
16:12:10 <DinaBelova> rohanion ack
16:12:46 <rohanion> influxdb driver - no active development. if the performance team decides that we have to implement it asap, I'll code it in an hour.
16:12:55 <DinaBelova> rohanion :D
16:13:03 <DinaBelova> you'll be quick :D
16:13:14 <DinaBelova> anything else sir?
16:13:23 <rohanion> 1 hour writing - 1 week testing :)
16:13:34 <rohanion> no, that's everything regarding OSProfiler
16:13:43 <DinaBelova> rohanion thanks for extended update
16:13:46 <DinaBelova> #topic Open Discussion
16:13:52 <DinaBelova> sai please go ahead
16:13:55 <sai> sure
16:14:19 <sai> so DinaBelova was wondering what kind of networking tests you are running currently and I could update with some of the network perf work we are doing at red hat
16:14:55 <DinaBelova> sai so mostly we're running data plane tests using shaker
16:15:05 <sai> ok shaker is what we are using too
16:15:13 <DinaBelova> sai lemme share a test plan
16:15:29 <DinaBelova> #link http://docs.openstack.org/developer/performance-docs/test_plans/tenant_networking/plan.html
16:15:33 <DinaBelova> sai here it is ^^
16:15:38 <sai> cool
16:15:44 <DinaBelova> and the related report http://docs.openstack.org/developer/performance-docs/test_results/tenant_networking/neutron_vxlan_dvr_200/index.html#tenant-networking-report-vxlan-dvr-200-nodes
16:15:45 <sai> any specific neutron features u r targetting?
16:15:53 <sai> we r working on DVR currently
16:16:03 <DinaBelova> sai for neutron features we have separated tests being run
16:16:08 <sai> and had some previous work looking at control plane performance tuning neutron workers
16:16:18 <DinaBelova> sai http://docs.openstack.org/developer/performance-docs/test_plans/hardware_features/index.html
16:16:30 <DinaBelova> as for DVR - we usually run with it, yes
16:16:45 <DinaBelova> sai some time ago we run both DVR and non-DVR topologies
16:17:05 <DinaBelova> right now mostly DVR (as requested by Mirantis)
16:17:07 <sai> cool, i have some of those results, i can share upstream and contribute to the repo
16:17:13 <sai> the comparison
16:17:17 <DinaBelova> sai this will be perfect
16:17:40 <DinaBelova> so please submit lab description here as well http://docs.openstack.org/developer/performance-docs/labs/index.html
16:17:43 <DinaBelova> sai ^^
16:17:51 <DinaBelova> so we can compare the test bed as well
16:18:31 <sai> DinaBelova: what im really curious about is the tunings that u put into the network tests
16:18:37 <sai> threads of neteprf/iperf
16:18:40 <sai> and TCP packet sizes
16:18:43 <sai> or u run defaults
16:18:48 <DinaBelova> sai sorry I added a wrong link when was targeting the neutron features - here is right one http://docs.openstack.org/developer/performance-docs/test_plans/neutron_features/index.html
16:19:41 <DinaBelova> sai you can take a look here http://docs.openstack.org/developer/performance-docs/test_results/tenant_networking/neutron_vxlan_dvr_200/index.html#tenant-networking-report-vxlan-dvr-200-nodes - if you'll click to every scenario - e.g. http://docs.openstack.org/developer/performance-docs/test_results/tenant_networking/neutron_vxlan_dvr_200/perf_l2_dense/index.html - you'll see full scenario
16:19:41 <sai> DinaBelova: because we are running uperf with different IP packet sizes( outside of shaker) and seeing better throughput at higher packet sizes
16:20:00 <sai> cool thanks DinaBelova
16:20:11 <ilyashakhat_mobi> sai: we ran with defaults, but our networking guys tried with different settings, mostly number of threads
16:20:19 <DinaBelova> sai np, you're really welcome
16:20:22 <sai> hopefully i will controbute to performance-docs
16:20:37 <sai> and do i need to have a test plan or just test-results with scenario will help?/
16:20:49 <sai> ilyashakhat_mobi: ack
16:20:59 <ilyashakhat_mobi> sai: it would be interesting to run tests with the same parameters as you do (meaning packet size)
16:21:17 <ilyashakhat_mobi> yu can follow existing plan or update it
16:21:38 <ilyashakhat_mobi> it's not restricted :)
16:21:50 <DinaBelova> sai if your tests are a subset of http://docs.openstack.org/developer/performance-docs/test_plans/tenant_networking/plan.html (just with different params) you can jut push test results with other configs and explanations - if scenarios are different, you can add yours to the test plan - http://docs.openstack.org/developer/performance-docs/test_plans/tenant_networking/plan.html
16:22:14 <sai> DinaBelova: ilyashakhat_mobi https://github.com/openstack/browbeat/tree/master/rally/rally-plugins/nova-create-pbench-uperf
16:22:36 <DinaBelova> #link https://github.com/openstack/browbeat/tree/master/rally/rally-plugins/nova-create-pbench-uperf
16:22:43 <sai> that is the rally plugin we use combined with pbench(inhouse but opensourced)
16:22:53 <sai> kicks of uperf with varying packet sizes and threads
16:23:39 <sai> prints rally summary with throughput results as well
16:23:46 <sai> so interesting data plane perf through rally
16:23:52 <DinaBelova> sai ok, so the tool is different - so you can add new section to the test plan I believe
16:24:11 <DinaBelova> as it's still the same testing field (tenant networking) but really other type of tools used
16:24:12 <sai> DinaBelova: my test results are with shaker
16:24:17 <DinaBelova> sai ah
16:24:19 <DinaBelova> I got it
16:24:25 <sai> but i have some toher results too with the tool i mentioned
16:24:27 <sai> :)
16:24:50 <sai> i think thats all the network perf questions i have
16:25:10 <DinaBelova> sai :D ack :) so shaker stuff can be just published as test report - against your env with your configs, for browbeat stuff I'll propose to add one more section to the test plan
16:25:30 <DinaBelova> and publish results regarding it
16:25:37 <DinaBelova> sai you're welcome
16:25:40 <sai> sure, we index shaker results into elastic
16:25:48 <sai> i can pull up the graphs thru kibana
16:25:50 <sai> :)
16:25:53 <DinaBelova> cool
16:25:57 <sai> that was we average out results
16:26:05 <sai> because im seeing varying results due to NUMA
16:26:12 <sai> when a vm is numa local vs non local
16:26:29 <sai> so we figured the best was it rerun same test multiple times and average out using elasticsearch/kibana
16:26:41 <sai> so we can get something really representative
16:26:41 <DinaBelova> sai I believe our networking folks were looking on this as well, but not sure
16:26:55 <DinaBelova> sai that's a good apprioach
16:27:18 <sai> cool, is there a specific subset of openstack-network=perf somewhere on irc
16:27:23 <sai> or is this the best channel
16:27:30 <sai> i know ilyashakhat_mobi does networking perf
16:27:34 <sai> not sure who else
16:27:35 <DinaBelova> sai I believe that's a best channel
16:27:42 <sai> awesome
16:27:43 <DinaBelova> and ilyashakhat_mobi is probably the best person to ping
16:27:45 <sai> ill stick to this then
16:27:49 <DinaBelova> sure
16:28:12 <DinaBelova> ok, anything else here?
16:28:17 <DinaBelova> msimonin rcherrueau ad_rien_ ?
16:28:30 <msimonin> nope
16:28:31 <ad_rien_> nothing on my side
16:28:32 <msimonin> thanks
16:28:35 <rcherrueau> nope
16:28:37 <DinaBelova> thanks folks!
16:28:39 <DinaBelova> bye!
16:28:41 <DinaBelova> #endmeeting