15:30:03 #startmeeting Performance Team 15:30:04 Meeting started Tue Sep 20 15:30:03 2016 UTC and is due to finish in 60 minutes. The chair is DinaBelova. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:30:05 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:30:08 The meeting name has been set to 'performance_team' 15:30:15 ok, so hello everyone! 15:30:22 let's see who's around :) 15:30:24 o/ 15:30:36 ad_rien_ o/ 15:30:41 hello 15:30:48 rohanion, bvanhav o/ 15:31:01 but I will be the only one from Inria today (rcherrueau is finalizing his Phd thesis and msimonin had to go) 15:31:15 ad_rien_ ack 15:31:30 heeeey 15:31:32 o/ 15:32:07 from my side I need to apologize as well: Mirantis folks are having big all hands meeting right now, so not so much tracktion from our side as well 15:32:27 traction * 15:32:37 ok, so let's get started 15:32:48 DinaBelova: Error: Can't start another meeting, one is in progress. Use #endmeeting first. 15:32:58 oh sorry :) 15:32:59 #topic Action Items 15:33:03 wrong command :) 15:33:18 so geragding action items - we had only one last time 15:33:33 ad_rien_ it was about adding monitoring info to your etherpad 15:33:55 ad_rien_ I see some details were added - cadvisor/influx/grafana 15:34:20 yes but this is internal information 15:34:40 From our side, we do not have much information to share today 15:34:49 ad_rien_ ack 15:34:56 #topic Current progress on the planned tests 15:34:58 we are working on analyzing the traces we collected for 15:35:11 (DinaBelova sorry you were too fast ;)) 15:35:25 the IDLE deployments 15:35:49 and are still fighting issues with the 1000 fake drivers deployment 15:35:53 ad_rien_ well, the test plans section is better for what you're discussing, so I switched to it 15:35:56 :) 15:35:57 running rally benchmark suite 15:36:02 :-) 15:36:33 We arranged a meeting on thursday with msimonin to prepare content for the presentation (and before our friday meeting) 15:37:04 ad_rien_ cool! I think that I'll fill my part as well 15:37:35 rcherreau developed a basic script to be able to see the influxdb informations we collected (i.e. this enables us to make postmorten analysis in a friendly way) 15:37:35 ad_rien_ as for Alex part, he's helping one of our customers right now, so not sure if he'll be able to finish everything 15:37:51 DinaBelova: ack 15:37:55 hello, kinda late :) 15:38:01 sai o/ 15:38:13 ok, so what can I share from our side 15:38:34 lots of work in progress regarding preparing test plans and test results 15:38:42 that were performed during the q3 15:38:55 + we're still fighting with OpenStack on top of k8s 15:39:18 right now we're having 350 nodes baremetal k8s installation 15:39:22 with Calico overlay 15:39:34 + OpenStack installed on top of this k8s cluster 15:39:39 Merged openstack/performance-docs: modify the home-page info with the developer documentation https://review.openstack.org/372923 15:39:59 we planned to run baseline tests against this cloud, but constantly observing the issues with k8s / docker 15:40:27 docker is not that scalable when we're talking about huge load running at once 15:40:29 :) 15:40:38 :-) 15:40:55 as for 1000 nodes experiment, we've decided to give a first try to non-containerized deployment 15:41:11 so we're having 1000 nodes being under installation with fuel + some dirty hacks 15:41:16 to make it work smoothly 15:41:17 it would be great if you can share the issues you faced 15:41:32 ad_rien_ sure, first set of issues was related to installer itself 15:41:55 we used Kargo for this purpose (due to the right topology these folks are proposing) 15:42:00 lemme share the links 15:42:15 #link https://github.com/kubespray/kargo 15:42:21 so that's an installer 15:43:05 we filed two original issues 15:43:12 #link https://github.com/kubespray/kargo/issues/478 15:43:19 #link https://github.com/kubespray/kargo/issues/479 15:43:27 that later leaded to several more :) 15:43:32 https://github.com/kubespray/kargo/issues/487 15:43:32 https://github.com/kubespray/kargo/issues/485 15:43:33 https://github.com/kubespray/kargo/issues/480 15:43:57 so the installation part is now ok, and we have observed several issues with the installed cluster 15:44:07 the most annoying is the following: 15:44:20 cluster seem to degrade with time 15:44:47 what do you mean by cluster ''degrades'' ? 15:45:04 OpenStack VMs were stuck in scheduling state - with more probability of this the more time k8s cluster is existing 15:45:45 so OpenStack was reinstalled several times with no issues, but for some reasons OpenStack services running as k8s apps began not to reach each other 15:45:51 ok …. maybe some issues we observed previously with kolla (docker) but we did succeed to reproduce them on our side :( …. did you find a way to reproduce such issues ? 15:46:09 s/we did succeed/did not succeed 15:46:30 ad_rien_ only running OpenStack... one of our k8s engineers is suspecting the Docker in this behaviour 15:46:33 under research right now 15:46:41 but nothing specific today, sorry 15:46:48 the same on our side.... 15:46:57 ad_rien_ ok, so this seems to be common 15:47:14 I'll share details once (and I hope I will!) I'll have them 15:47:20 thanks 15:47:32 so that's all from our side regarding the plans 15:47:52 does anyone has anything else to share in this section? 15:48:20 it looks like we may jump to the osprofiler 15:48:36 #topic OSProfiler weekly update 15:48:41 rohanion the floor is your 15:48:44 :) 15:48:44 hey guys 15:48:56 so what we have done for the last week 15:49:08 1. neutron patch is still waiting for CR 15:49:34 2. cinder patch is documented and tested 15:49:55 3. glance patch is finished, verified, waiting to merge 15:50:10 rohanion do you need any help pinging the core reviewrs? 15:50:13 4. sahara - still nothing, no updates 15:50:18 YES PLEASE 15:50:21 rohanion ack 15:50:37 because it's a challenge finding who is responsible enough to merge the changes 15:50:53 rohanion I'll make sure core reviewers are aware about these patches and ask for the help 15:51:04 rohanion ack :) 15:51:18 5. elasticsearch driver - reviewed, ready to merge, harlowja please look at it 15:51:36 rohanion: mind if i ask u what the elastic search dirver is abt 15:51:41 or if u can point me to some docs 15:52:03 sure, I'll post a link right here 15:52:14 https://review.openstack.org/#/c/340936/ 15:52:30 this driver is basically another way of storing internal osprofiler data 15:52:56 gotcha 15:52:58 thanks 15:53:17 rohanion anything else? 15:53:19 6. redis driver - I reviewed it, I didn't like it, the setup will go nuts if the database server goes down 15:53:55 I asked Ben if he could add a subclass that will use sentinel instead of a single redis instance 15:54:07 no response since thursday 15:54:13 bvanhav ^^ 15:54:33 I'm working on making the suggested change 15:54:39 bvanhav can you please take a look? 15:54:40 ack! 15:54:51 bvanhav good to know, thanks you :) 15:54:58 thank you Ben! 15:55:18 rohanion what about nova -- sorry if I missed it 15:55:31 oh you mean Dina's patches? 15:55:47 osprofiler w/ nova 15:55:57 yeah, rohanion was going to continue my patches 15:56:13 I have not started looking at them yet 15:56:29 but thanks for reminding me that, I'll start doing that tomorrow 15:56:36 as I am simply drown in the q3 deliverables we (Mirantis Scale RnD team) need to achieve 15:56:45 so thanks rohanion for grabbing a flag here 15:56:54 hope to finish work on some of them till the next meeting 15:57:11 rohanion ack 15:57:55 ok, so it looks like we may proceed 15:58:04 #topic Open Discussion 15:58:14 so I have some small summit news :) 15:58:32 #info I was told we're goingto have separated time slot on summit for performance team needs 15:58:34 DinaBelova: yeh wanted to ask abt that, are the perf guys meeting at summit? 15:58:47 by summit authorities 15:58:51 sai yes we are :) 15:59:00 we are goin to have some presence from redhat perf, would be great to sync 15:59:11 and now (after some conversations) we'll have a separated time slot 15:59:14 sai sure 15:59:34 #info now exact time not available so far, most probably Thursday morning 15:59:46 will keep you folks updated 15:59:47 awesome 16:00:09 DinaBelova: so wrt DVR work seeing issues when trying to ping VM with floating ip when router is DVR 16:00:10 sai ;) 16:00:17 some cases takes 200 seconds 16:00:26 sai ack 16:00:26 filed https://bugs.launchpad.net/neutron/+bug/1625333 16:00:28 Launchpad bug 1625333 in neutron "Booting VM with a Floating IP and pinging it via that takes a long time with errors in L3-Agent logs when using DVR" [Undecided,New] 16:00:30 anything to share? 16:00:32 ok, I see 16:00:40 #link https://bugs.launchpad.net/neutron/+bug/1625333 16:01:08 sai lemme share this bug with our neutron team 16:01:10 rook: and i looked at it, seems to be an issue with floatingip assignment at l3-agent 16:01:17 sure 16:02:02 sai cool, thank you for the update 16:02:24 also heloing some folks at other places get started with shaker 16:02:29 helping* 16:02:35 seems to be good amount of interest 16:03:04 sai our neutron team did not jump into the issue you filed during their experiments 16:03:07 it's interesting 16:03:19 as they had plenty of tests done on various topologies 16:03:25 including DVR-based 16:03:25 DinaBelova: I ahve linked the rally-plugin we used and rally results too 16:03:36 sai ack, I'll take a look 16:03:45 DinaBelova: our neutron team is also looking at it 16:03:56 :) good to have neutron teams :) 16:03:58 :D 16:04:16 ok, anything else to share? 16:04:29 nothing from my end 16:04:38 ad_rien_ rook ? 16:04:42 nothing thanks 16:04:47 nada 16:04:59 ok, so thanks everyone for participating, stay tuned! 16:05:02 #endmeeting