15:58:42 #startmeeting kolla 15:58:43 Meeting started Wed Mar 1 15:58:42 2017 UTC and is due to finish in 60 minutes. The chair is inc0. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:58:44 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:58:47 The meeting name has been set to 'kolla' 15:58:58 #topic rollcall 15:59:02 hello all:) 15:59:07 hey! 15:59:11 0/ 15:59:13 o/ 15:59:19 woot / 15:59:59 woot / 16:00:04 o/ 16:00:07 o/ 16:00:14 o/ 16:00:52 woot 16:01:33 w00t 16:01:43 o/ :) 16:02:40 woot 16:02:48 ok, we have busy agenda so I'll move on 16:02:56 #topic Announcements 16:03:05 I have 2: 1. thank you all for great PTG 16:03:07 * krtaylor was happy to meet everyone at PTG 16:03:32 notes and session list are available here: 16:03:35 #link https://etherpad.openstack.org/p/kolla-pike-ptg-schedule 16:03:47 and 2. We release ocata next week! 16:03:56 I'd like to encourage everyone to step up testing 16:04:08 and fix these last few things that are broken 16:04:41 also we'd need good answer about whether or not Kolla runs on docker 1.13:) 16:04:49 o/ 16:04:51 sorry 16:04:55 any announements from community? 16:05:04 no worries zhubingbing, welcome 16:06:17 guess no announcements 16:06:31 #topic Applying for the Stable project maturity tag 16:06:36 sdake: I assume it's yours 16:06:46 can we make it shorter than 20min plz? 16:07:02 there is topic in agenda I'd love to talk about today 16:07:08 yup 16:07:09 not sure if it can be 16:07:09 but will try 16:07:11 so - we applied for the stable maturity tag in the past 16:07:22 now that liberty is EOL we can do so again 16:07:32 and daviey will be our liason (confirmed on irc) 16:07:54 from release team? 16:07:55 the one thing that Jeffrey4l had a q about was the lifetime of mitaka (2.0.2) 16:08:00 daviey is a core reviewer 16:08:05 and he is also on the stable maint team 16:08:12 ok makes sense 16:08:24 I can submit the review if you like 16:08:30 I'll do it 16:08:35 cool 16:08:41 i guess we can leave the unanswered question unanswered 16:08:53 although liberty (1.x) is EOL 16:08:58 and 2.0.2 (newton) is about to be eoled March 3rd 16:09:08 mitaka 16:09:10 so heads up to everyoen involved ;) 16:09:17 newton still has 6 months in it 16:09:18 sorry newton^mitaka 16:09:27 we need release last tag before 16:09:35 for mitaka branch . 16:09:37 right we do need one last tag for pip 16:09:47 maybe its march 10th 16:09:50 i foget which it is :) 16:09:53 no matter whether kolla branch is remove or not. 16:10:00 this is the undefined part 16:10:01 yeah I agree 16:10:18 Jeffrey4l: can we tag this week? 16:10:20 trailing cycle projects don't have a defined lifetime 16:10:28 I don't expect any patches merging to stable/mitaka 16:10:33 or stable/newton 16:10:38 possible. 16:10:42 there are a bunch in th ebacklog 16:10:48 there is nothing much for mitaka branch. 16:10:50 although we can punt on mitaka 16:10:55 oh cool then all good :) 16:11:17 #action inc0 to submit review for maturity tag 16:11:31 ok i think that sums it up - inc0 i'll point you at my last take at this PMT 16:11:51 I'd say we should all observe this review and address issues if release team will have it 16:11:54 inc0 there are a bunch of requirements needed - and we are hitting almost all of them 16:12:06 i'd love to pull it up now, however, i don't have it handy 16:12:11 I'll take that with you later 16:12:15 if you can move on i can link it in the closing of the meeting 16:12:25 ok, let's move on 16:12:30 tia ;-) 16:12:35 #topic serial in kolla-ansible 16:12:39 Jeffrey4l: you're up 16:12:43 thanks. 16:12:49 this may related to next topic. 16:12:54 by duonghq 16:13:13 i saw some issue when test upgrade from newton to ocata. 16:13:20 check this link https://etherpad.openstack.org/p/kolla-ansible-serial 16:13:45 serial try to upgrade service one node by another one. 16:13:59 which will cause some unexpected issue. 16:14:26 for example the sighup part in nova. 16:14:32 but that's what rolling upgrade is 16:14:38 ahh 16:14:42 I see what you mean 16:14:55 i am not trying to use rolling upgrade. 16:14:59 we are hitting ansible wall 16:15:05 serial case some issue ;( 16:15:12 agreed with Jeffrey4l 16:15:18 and serial only works with playbook, which is bad, too. 16:15:19 right 16:15:23 yeah 16:15:34 this is an example when serial would be needed at task level really 16:15:40 one propose i made is disable serial. 16:15:54 inc0, yes. but found nothing about this. 16:15:56 we need some task run only on 1st node, and some task run only on last or 1st node in the end, egonzalez also faced it 16:16:02 but then it's not a rolling upgrade or no-downtime upgrade 16:16:16 one possible solution is use a dynamic delete_to variable, but still testing this. 16:16:43 kolla do not promise no-downtime upgrade. 16:16:59 yeah 16:17:09 but if service has native support zero-downtime upgrade, we should support it too 16:17:09 however if we have clear problem with ansible 16:17:15 and no-downtime is what duonghq is doing and solving. 16:17:28 yeah we want to optimize it as much as possible 16:17:34 Jeffrey4l: yes 16:17:38 no downtime is not a promise but it is a goal 16:17:47 Jeffrey4l, you mean delegate_to? 16:17:53 any way, we face some issue when during upgrade. 16:18:08 duonghq, yep. i haven't test it. But i guess it should works. 16:18:10 that's a good observation, maybe we should talk on #ansible to ask ansible people for opinions? 16:18:19 inc0, good idea. 16:18:25 inc0: yes actual zero down time would be possible its our goal 16:18:37 would not be* 16:18:44 any kolla-k8s people around? will we face same issue with k8s? 16:18:51 not all projects support no zero-downtime upgrade 16:19:02 we need solve the serial issue. and better implement zone down time later. but they are two different thing, right? 16:19:03 if Ansible will make this impossible, we should note that and work with them to make it possible 16:19:19 egonzalez: right, but more and more does 16:19:29 which means this is problem we definetly need to fix 16:19:43 yeah, no doubt of it 16:19:45 duonghq, i guess kolla-k8s start trying to solve upgrade issue . sdake right? 16:20:01 k8s will not have this issue 16:20:02 Jeffrey4l no - we are working on basic ugprades at some point in the future for 1.0.0 16:20:08 will have different issues;) 16:20:18 but yeah, that's a goal 16:20:29 at last, one propose i want is disable serial, or at least disable this in default. 16:20:39 I wouldn't surrender just yet tho, let's talk about it on #ansible after meeting 16:20:54 yeah I tend to agree on that 16:20:55 ok. 16:21:05 we should also change "stop all schedulers" tasks 16:21:15 sdake: basic upgrade means with downtime . right ? for kolla-k8s 1.0.0 16:21:18 as without serial they will only cause downtime we don't need 16:21:26 sp_, noop 16:21:32 ah, sorry, yes 16:21:33 openstack is more than a control plane. stop the services won't affect vms. 16:21:34 sp_ possibly 16:21:37 well it's still gonna be ~minute of downtime per service 16:21:57 for service need db migration, it'll take quite long time 16:22:12 but we don't need to turn it off because of this issue 16:22:13 i think zero downtime upgrades is a great objective for kolla-ansible, whereas any upgrades are a great objective for kolla-kubernetes 16:22:18 duonghq, when db migration, the service is still running. 16:22:35 it's only about restarting containers 16:22:38 Jeffrey4l, not sure, it's depended on service 16:22:40 although we can punt zero downtime upgrades t if it wont make the deadline (for kolla-ansible) 16:22:54 sdake: yes, 16:23:04 duonghq: right, but again, that's not this issue 16:23:15 duonghq, yes. but checked nova/neutron/cinder/glance, these both support this. 16:23:26 some service cannot working while db is in migration progress 16:23:28 sdake: we are discussing that it's just impossible today with ansible being what it is 16:23:38 it's not about us, it's about ansible 16:23:46 got i t 16:23:57 duonghq, that's OK for before implement zero-downtime upgrade. 16:24:05 serial can not solve such issue too. 16:24:16 duonghq: right, but we can't help with it, that's on services themselfes 16:24:20 themselves* 16:24:24 inc0, right 16:24:46 ah, we can do something 16:24:49 Jeffrey4l: on the other hand 16:24:58 we *need* serial for compute nodes 16:25:13 duonghq, delete_to may can do the magic 16:25:16 so we're back at square one 16:25:30 inc0, hrm reason? 16:25:34 Jeffrey4l, delegate_to? 16:25:37 o/ 16:25:41 hi srwilkers 16:25:44 duonghq, yep. but i will try it. 16:25:47 test it. 16:25:50 if you start pushing new containes to all of compute nodes at the same time 16:25:56 it can be really bad 16:25:57 we already use it at some point 16:25:59 really quick 16:26:12 i can pull the image before upgrade and this should be recommended too. 16:26:18 ofc you can do *quasi* serial by modifying "forks" 16:26:32 Jeffrey4l: still good to do it in serial 16:27:19 I'd hold on before we talk to #ansible 16:27:25 forks handle task by task. it is different. but it may be helpful. 16:27:37 ok. let's move on. 16:27:55 #topic ks-rolling-upgrade 16:28:00 duonghq: you're up 16:28:09 thank you inc0 16:28:09 ( almost the same topic lol) 16:28:26 basically, it's about how we test the upgrade progress 16:28:31 Jeffrey4l: same :) 16:28:41 especially when doing zero-downtime and rolling upgrade 16:28:46 Jeffrey4l: it means it's important;) 16:28:59 yep. 16:29:16 upgrading with load? 16:29:33 testing service before and after the upgrade it easier than testing if it still working when upgrade is been done 16:29:35 up 16:29:38 yup 16:29:41 Jeffrey4l: yes 16:29:53 in gate or in locally env? 16:29:59 duonghq: there was session at ptg about gates to do it 16:30:01 Jeffrey4l: thats why we call zero downtime 16:30:08 best way I can think of to test rolling upgrade 16:30:13 Jeffrey4l, in gate 16:30:33 is deploy old -> test-old -> upgrade 50% of nodes -> test -> upgrade all -> test 16:30:38 for load, we can use rally, right? 16:30:55 for gates, we had session in ptg 16:31:08 +1 16:31:10 seem that I missed many thing in PTG :( 16:31:11 I'd say let's not start by focusing on most complex scenerio 16:31:17 look so good 16:31:23 let's start by: 16:31:27 1. multinode deploy gates 16:31:39 duonghq: me too :( missed thing of PTG 16:31:53 2. multinode upgrade gates in a way: 16:32:00 duonghq indeed, its important to attend ptgs - even when remote participatoin is an option 16:32:17 duonghq we have a travel program to help out those who don't have funding to make it 16:32:20 we deploy old from tarballs/registry -> we run test suite -> pull new from tarballs/registry -> upgrade -> test 16:32:27 we being the broader OpenStack here 16:32:36 ;) 16:32:45 (sdake: i'm here, but in another meeting) 16:32:59 and frankly? I'd love to have this as one of highest priorities in Kolla for Pike 16:33:05 inc0, I think it's ok atm, 16:33:06 Daviey_ all good - you did agree to serve as our stable liason - correct? if so, then I think we are gtg 16:33:13 sdake: ack 16:33:13 inc0, ++ the first thing kolla-ansible need to do is multi node gate. 16:33:13 it worth a highest bp 16:33:15 if we can make full upgrade gates, that's going to be awesome 16:33:18 maybe long running bp 16:33:20 sdake: Unless anyone else is super keen 16:33:31 Daviey_ i think everyone else is stretched thin 16:33:38 agree duonghq also volunteers:) I volunteer myself but I'll need help 16:33:48 sdake, I applied PTG :( 16:33:59 Daviey_ we need someone to coach and guide us on stable processes, kolla has become good at handling backports 16:34:00 this should be split into multi bp. multi gate, upgrade, load , and combine all of those. 16:34:06 duonghq: If you really want to do it, i don't mind. :) 16:34:17 i'll work on zero-downtimes upgrade 16:34:19 yeah, and for Pike I'd focus on first 2 16:34:27 Daviey_ nah he meant he applied for the tpg travel support and it was not accepted 16:34:29 and get them rock solid 16:34:44 *applied for TSP of PTG 16:34:46 sdake: can we plz move this outside of meeting?;) 16:34:55 let's stick to single topic 16:35:03 inc0 wfm 16:35:03 Sorry, my fault. 16:35:24 yes. we really need multi node gate. 16:35:29 ok, so duonghq is there anything else on your topic? 16:35:35 Jeffrey4l zuul v3 is COMING 16:35:41 Jeffrey4l: my next topic will help getting this done;) 16:35:44 no, I think some bps is good atm 16:35:45 Jeffrey4l we have to wiat for that 16:35:59 hrm, actually i do not think zuul v3 is required. even though sam think so. 16:36:09 i think zuulv3 is required 16:36:12 sdake: no, we need to get it in place now and extend it when zuul gets on, that'd be my approach 16:36:14 and sam thinks it so 16:36:23 why it's required? 16:36:26 2 nodes is something 16:36:29 2 people think it - my thoughts are not based upon sam's opinion 16:36:31 I'll work on gating in this cycle 16:36:38 yep sam thinks so, but i am not. 16:36:43 we still need to crack the networking 16:36:44 2 nodes is something - so it shouldn't block multinode, bu tfor more then 2 nodes, zuulv3 is needed 16:36:46 all the same 16:36:53 infra won't enable 3+ nodes without zuulv3 16:37:06 ok, but let's not "wait" for v3 16:37:06 why we need 3+ nodes? 16:37:11 ya no blocking 16:37:13 let's do 2 nodes now and extend 16:37:19 inc0, + 16:37:23 agreed - so lets rock :) 16:37:28 ok 16:37:33 +1 16:37:33 next topic then 16:37:39 ( we are out of the topic ... ) 16:37:40 +1 16:37:42 #topic post-ptg bps 16:37:51 ok that's an experiment for next 20min;) 16:38:04 I'd like everyone to look at ptg notes 16:38:14 #link https://etherpad.openstack.org/p/kolla-pike-ptg-schedule 16:38:33 inc0: gone through 16:38:49 #link https://etherpad.openstack.org/p/kolla-pike-ptg-blueprints 16:39:05 pick a session and draft blueprints they see in ^ this etherpad 16:39:23 then, on following meetings we'll add bluepritns with notes 16:39:33 I don't want our ptg effort to go to waste 16:40:02 we should record blueprints out of notes and assign ourselves if we feel we want to do osmething 16:40:21 agre 16:40:34 inc0: +1 16:40:47 So I'll start writing down upgrade bps 16:41:01 inc0, nice 16:41:04 I'd encourage everyone to do the same for other sessions 16:41:10 timebox - till 16:55 16:42:43 sup sdake 16:43:20 zhubingbing working my ass off :) 16:43:37 ok 16:43:41 zhubingbing although I was at the ptg, I want other people to write the blueprints 16:44:01 i'll add what I think are necessary in future meetings 16:44:03 understand 16:47:04 should we have one BP for all `blocking 1.0 reqs`? 16:51:16 jascott1 i think we need to be careful with what we define as blocking 1.0 reqs, as some of the zero downtime upgrades are not really blocking - however put in whatever ou think is helpful and we can follow the standard openstack blueprint process 16:51:57 sdake was talking about this one https://etherpad.openstack.org/p/kolla-pike-ptg-k8s-release-roadmap 16:52:42 jascott1 right i know - i think it makes sense to sort those out into blueprints 16:52:58 oh ok 16:53:02 jascott1 if you want to get that started, that would rock :) 16:53:02 agree with sdake about zero-downtime upgrade for kolla-k8s 1.0.0, it seems that we have many works for 1.0.0 16:53:27 duonghq we can record them all and then use standard blueprint process to select the ones that are essential 16:53:43 sdake, ack 16:53:51 ok few last remarks on this 16:53:52 sdake: +1 16:54:15 we'll repeat this on next meeting too as we surely can get more blueprints out of these notes 16:54:23 sdake, do we need some Y-stream version before 1.0.0? 16:54:26 also I encourage everyone to take some time to do it outside meeting 16:54:40 also feel free to post blueprint and link it to etherpad 16:55:11 that will make easier for us to track how useful sessions were in ptg and how much of them turned into code 16:55:18 questions? 16:55:41 #topic open discussion 16:55:45 4 minutes:) 16:56:32 anyone? 16:56:44 or can we end meeting and give our life back?:) 16:57:05 right...ok, thank you all for coming and see you in #openstack-kolla! 16:57:10 #endmeeting kolla