20:00:02 #startmeeting Octavia 20:00:03 Meeting started Wed Dec 13 20:00:02 2017 UTC and is due to finish in 60 minutes. The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:06 The meeting name has been set to 'octavia' 20:00:13 Hi folks! 20:00:26 o/ 20:00:29 Another fine week in OpenStack land 20:00:29 o/ 20:00:32 hi 20:00:48 hey 20:00:50 (well, ok, zuul had it's issues...) 20:00:59 #topic Announcements 20:01:00 o/ 20:01:09 I have two items 20:01:20 One, queens MS2 did finally release 20:01:31 I also released a new python-octaviaclient 20:01:36 So, good stuff there. 20:02:05 Also I wanted to make you aware of a lively discussion on the mailing list about changing the release cycle of OpenStack to once per year. 20:02:12 #link http://lists.openstack.org/pipermail/openstack-dev/2017-December/125473.html 20:02:47 guess we are maturing… 20:02:49 If you have any feeling about the PTG meetings or release timing, feel free to jump into the conversation 20:02:56 johnsom, what do you think about this? 20:03:10 I have mixed feelings. 20:04:23 I think overall it will slow innovation on the project. I worry about trying to do stable branch bug backports from master that has up to a year of feature work. I do think we are a bit of a treadmill with the PTGs and PTL elections, etc. Finally, the goals do hurt the projects with the six month cycles since some require a lot of work. 20:05:22 I think it may mean that the PTL needs to take on more management work to keep the project on track. Maybe more defined sprints or team milestones. 20:05:31 Just some top of head thoughts. 20:05:42 Yeah... Some of the stuff is too fast, but... Longer cycles means bigger releases which does not help with upgrades 20:05:43 Any other comments? 20:05:43 yeah. i wonder how we exactly shift to 1 year planning - feature-wise 20:06:03 Yep 20:06:06 one PTG per year clearly makes attanding the summits more imrportant 20:06:27 Yeah, my last read of the list the proposal was one PTG, still two summits 20:06:52 Ok, any other announcements? 20:06:55 xgerman_, so virtual PTGs? :) 20:07:03 midcycles? 20:07:04 nmagnezi I thought about that 20:07:21 could be tricky, timezone-wise 20:07:31 but I'll be in favor of that. 20:07:58 #topic Brief progress reports / bugs needing review 20:08:08 Ok, moving along to try to give time for other topics 20:08:15 Do we need to pick up anything from last week? 20:08:25 I have been focused on getting reviews done and updates to the specs in flight. 20:08:33 rm_mobile Yes, I have a long list 20:08:45 Lol k 20:08:48 5 topics including the carry over 20:09:10 I hope we can merge QoS this week. 20:10:02 Provider driver spec is looking good, but we really would like feedback on some topics. Specifically on my mind is should the Octavia API create the VIP port and pass to the driver or should the driver be responsible for creating the VIP port. 20:10:21 Please comment on that topic if you have a driver in your future. 20:10:40 The UDP spec is also coming along nicely. Feedback welcome. 20:10:56 #link https://review.openstack.org/509957 20:11:03 I commented about that (VIP port create). would be happy to elaborate more if needed 20:11:13 #link https://review.openstack.org/503606 20:11:31 Any other progress updates? 20:11:59 #topic (rm_work / dayou) Element Flag for Disabling DHCP 20:12:07 #link https://review.openstack.org/#/c/520590/ 20:12:17 We didn't get to this last week, so I put it at the top this week 20:12:20 Getting to a real keyboard for this lol 20:12:46 My understanding on it was the issue dayou was having was/can be resolved without this. 20:12:52 o/ 20:12:59 hmmm can it? 20:13:13 I mean, the DHCP issue is one that I have as well (just use an internal patch to fix it) 20:13:14 My biggest concern with the patch is issues with cloud-init overriding those changes. 20:13:33 i think the problem with cloud init it doesn't support slaac 20:13:40 expects dhcp or static 20:14:02 jniesz No, it uses slaac. That is all I have in my lab 20:14:09 my issue is that our cloud HAS NO DHCP (i believe we're not alone in this either), and images stall on the dhcp step for ipv4 even 20:14:31 maybe there is a better way to fix this that I'm missing? 20:14:32 johnsom it wasn't in the source code, i didn't see anything for it to write out inet6 auto lie 20:14:38 but then the coud-init shoudl set it as static? 20:14:42 but, it seems that it stalls on dhcp even before cloud-init runs to change it 20:14:54 because cloud-init DOES have a static config to send it 20:15:04 static != slaac 20:15:08 rm_work That is a different issue 20:15:12 right 20:15:18 but this patch was designed to fix both 20:15:24 yea 20:16:00 i guess if it's just my issue, I can just stick to my current internal patch that ... well, does exactly this (which works) 20:16:01 It's been a while since I booted a v6 enabled amp, so I'm not sure how cloud init handles it. I know we have code in for the v6 auto case 20:16:19 johnsom: is that dhcpv6 though? 20:16:30 No, I don't have dhcpv6 20:16:40 johnsom, IIRC the jinja templates handle a static ipv6 as well. but I didn't test it 20:17:01 this is what I apply internally: http://paste.openstack.org/show/628898/ 20:17:17 so rm_work says it comes up eventually after dhcp gives up 20:17:26 yes, like 5 minutes >_> 20:17:40 which is way outside the timeout for my cloud 20:17:41 So, I think we should break this into two stories. 1. for the dhcp delay 2. for booting v6 only amps. 20:17:52 +1 20:18:25 k, just figured it was a workable solution for both, but yeah if his issue needs to be solved in a different way, then this would just be for me 20:18:52 i just pushed this because at the time he was duplicating my work for dhcp disabling 20:18:58 yea this would work for booth 20:19:01 The concern is hacking on the network scripts that cloud-init manages/overwrites is worrisome, especially if we don't explicitly tell cloud-init that we are managing those. 20:19:18 Like, that paste, doesn't a reboot overwrite that? 20:19:19 +1 20:19:40 johnsom: it doesn't seem to <_< though a reboot causes a failover and recycle for me 20:19:45 so it's hard to tell 20:19:52 but i mean, it doesn't seem to be overwritten by cloudinit 20:19:54 cloud-init assumes wrong things : ) 20:19:57 looking at existing amps 20:20:11 I'm also not as familar with cloud-init under CentOS as I am ubuntu 20:20:36 nmagnezi? 20:20:44 would have to check 20:20:49 cloud-init does what neutron and nova tells it. That was the solution we came up with for your case jniesz 20:21:02 yea, but it doesn't work 20:21:06 for slaac 20:21:21 cloud init needs to be enhanced to add that 20:21:28 and then would have to wait for distros to add new cloud init 20:21:50 jniesz Something doesn't jive with that. All I have is slaac here and I had working IPv6, so ... 20:22:32 it looks for dhcp 20:22:38 otherwise assumes static 20:22:44 The other part I didn't like with the script is it is specific to the distros, where cloud init handles the translations for us. 20:22:45 in the network metadat 20:23:12 oh actually yes, looked again, it IS overwritten by cloud-init :( but cloud-init does this AFTER the boot step that tries to DHCP an address 20:23:34 so yeah probably on reboot it would have issues :( 20:23:35 So, my ask, let's open stories that characterize the use cases, then we bind patches to those and work from there. 20:23:44 is there some way for me to send these options during cloud-init boot? 20:23:53 rm_work See, I'm not totally crazy... grin 20:23:59 but ... 20:24:04 see this is a problem still 20:24:08 i still need my patch 20:24:10 Yes, there are much better ways. 20:24:12 just ALSO need to update cloud-init 20:24:23 because again, cloud-init hasn't replaced it by the time it causes a problem 20:24:27 it replaces it AFTER 20:24:32 so i need both fixes 20:24:32 So, let's open stories and work on solutions for those use cases 20:24:35 k 20:25:37 #topic (rm_work / BAR_RH) Specify management IP addresses per amphora 20:25:43 #link https://review.openstack.org/#/c/505158/ 20:26:22 So we discussed this last week. I think there was concern about the approach. We talked about the update config API extension that we have wanted for a while. 20:26:28 Where did we leave this? 20:26:50 Was this the one where nmagnezi and I needed to talk after? 20:27:04 yes. and we didn't catch each other here in the past week 20:27:16 T_T 20:27:30 lol 20:27:46 So should we table this another week? 20:27:58 Summary I think was: Instead of futzing with the flows, we just need to leave the binding as-is on boot, and then on first connect we update the config to point to the single management address 20:27:59 I must admit that the agent restart per-api call sounds a bit like a hack to me. so I wanted to discuss this a bit more 20:28:16 nmagnezi: not every API call... just the update-agent-config call 20:28:27 which we wanted for a while anyway 20:28:32 for things like updating the HM list 20:28:42 why did we want it to begin with? 20:28:49 We can use the oslo reload stuff too, no need for a full restart 20:28:56 +1 20:29:14 The HM IP:Port list is my #1 20:29:25 same 20:29:30 well, I'm open for discussion about this. if there's a nice way to achieve this I'm all ears 20:29:44 nmagnezi Why did we want the IP fixed? It's a very old bug before we added the namespace 20:30:00 if you ever hack the system you can then sink the whole fleet by sending bogus configs 20:30:34 ? 20:30:39 how 20:30:39 johnsom, because it might be risky to bind to * 20:30:48 nmagnezi: thats how it CURRENTLY works 20:30:50 johnsom, the agent runs on the root namespace 20:30:51 nmagnezi Correct 20:31:11 nmagnezi: and what i'm saying is, on the *initial* call, we move it to the correct IP 20:31:18 xgerman_ ??? what? no 20:31:36 It uses the same two way SSL all of our config work happens over 20:31:43 nmagnezi: and if the initial call fails (someone somehow beat us to it???) we fail to finish amp creation and we trash it anyway 20:31:43 if we allow configs being changed that is a theoretical possibility 20:31:45 So, if you get that far you are game over anyway 20:32:53 xgerman_ We are talking about the amphora-agent config file inside the amp, not controllers 20:32:54 rm_work, let's discuss this further. I'm open for discussion about this, as I said. 20:32:55 :) 20:33:02 lolk 20:33:06 Ok 20:33:06 so, next year? :P 20:33:23 #topic (sanfern / rm_work) Changes to DB schema for VRRP ports 20:33:24 rm_mobile, ¯\_(ツ)_/¯ 20:33:31 #link https://review.openstack.org/#/c/521138/ 20:33:44 So next up was the rename patch. 20:33:56 I guess all I wanted to know was, is everyone else OK with us changing field names in a DB schema 20:34:09 I think some work as occurred over the last week to resolve some of the backwards compatibility issues. 20:34:11 as an upgrade it is scary to me, since it absolutely kills zero-downtime 20:34:18 hmm 20:34:32 it doesn't bring the amps down, just the control plane 20:34:42 yes 20:34:44 I know sanfran added fix for amp agents 20:34:50 just a control-plane outage 20:34:53 so they wouldn't all have to be upgraded 20:34:59 yeah that's good :P 20:35:09 i hadn't even noticed that part yet, that would have been bad 20:35:14 Right, my stance has been we have not asserted any of the upgrade support tags yet. (nor do we have a gate) 20:35:21 i just saw the migration and immediately stopped 20:35:43 so as i keep saying, if everyone else is OK with this, then I am fine 20:36:04 yeah, I think we can do a rename/update — I also think we are still in the window to change the /octavia/amphora endpoint 20:36:15 yes 20:36:21 Yeah, I agree on that 20:36:53 I have to abstain once more. since we yet to ship Octavia that does affect us in any way 20:37:05 so.. If it's in I'm fine with it. 20:37:05 so I just wanted a vote on "Can we change field names in our existing DB schema" 20:37:18 I'm concerned about my deployment, but also others 20:37:20 * johnsom gives nmagnezi a glare of shame.... grin 20:37:31 johnsom, lol, why? :-) 20:37:40 You haven't shipped 20:37:42 :) 20:37:48 as one of the things we get yelled at during project update sessions every summit is that our upgrades are not clean (though usually it's around the APIs) 20:37:53 johnsom, tripleO.. soon enough! 20:38:54 rm_work I hear you. We should be working towards clean upgrades. My question to you is if all of the glue code required to do a smooth transition to reasonably sane names is worth it. 20:39:56 I mean really it's a migration that adds the new column, copies the data, and maintains both for some deprecation cycle. 20:40:14 i absolutely would love to have these column names fixed to be less dumb 20:40:29 Long term I would hate to have one set of terms in the models and a different in the DB 20:40:30 i am just concerned about whether we're supposed to be doing this kind of change 20:40:39 yeah 20:41:10 Well, from an OpenStack perspective, if we have not asserted the upgrade tags (we have not), a downtime upgrade is fair game 20:41:49 As long as we don't require the amps to go down, I think I'm ok with it. (pending reviewing the full patch again) 20:42:03 kk 20:42:12 #link https://review.openstack.org/#/c/521138/ 20:42:15 yeah that's why i asked we vote not on the patch but on the core concept 20:42:32 "Can we change field names in our existing DB schema" 20:42:39 seems like everyone is thinking "yes" 20:42:40 But you, jniesz and mnaser have this running in some form of production, so your voices matter here 20:43:03 for me it matters a lot, since this upgrade will happen for me just about the moment it merges 20:43:06 when we upgrade openstack regions we have to schedule control plane downtime anyways 20:43:09 since I run master, deployed weekly <_< 20:43:27 rm_work, master? really? 20:43:29 yes 20:43:33 man.. 20:43:41 Yes, see nmagnezi you should ship... grin 20:43:47 lol. 20:43:51 grin 20:43:57 * rm_work laughs maniacally 20:44:08 yea, I would like to get us to the point where we are deploying master 20:44:22 jniesz: it isn't that hard actually IME 20:44:30 just switch what tag you pull :P 20:44:52 hopefully existing CI+tests takes care of issues 20:44:52 yea, I think something we will look at after the pike upgrades 20:45:08 Ok, so I hear an operator wanting an upgrade path. We should try to support them. 20:45:10 yeah since Octavia is cloud-version-agnostic 20:45:18 rm_work are you willing to help with the coding for the patch? 20:45:30 the coding for this patch? 20:45:34 Yep 20:45:39 I mean... how do we even do this 20:45:46 thanks for highlighting me 20:45:48 i thought it was basically "rename or don't" 20:46:03 my comment is: control plane downtime is acceptable, because that sort of thing can be scheduled 20:46:05 mnaser Are you caught up? Do you have an opinion here? 20:46:15 yeah I am leaning towards that as well honestly 20:46:26 also, api downtime in this case doesn't really cause that much issues, its not as critically hit as a service like nova for example 20:46:27 but I was literally not sure if this was a smart idea politically 20:46:37 but johnsom asserts we have not tagged ourselves with anything that would cause a problem 20:46:38 so 20:46:50 to be honest, it is very rare that you can get downtime-less upgrades, even with big projects 20:46:55 I think we would have to do the migration to have both columns in parallel so the models can continue using the old until upgraded. 20:47:05 ah yes. 20:47:06 well 20:47:18 johnsom: brings a good point, unless you make it very clear that the old services must be shut down when starting a new one 20:47:29 honestly -- i think if we don't have any big political issues with this -- we just do it, and make a big flashy release node 20:47:32 *release note 20:47:42 Right, this would have rich release notes for upgrade.... 20:47:51 +1 20:47:59 we run 3 replicas of octavia but our upgrade procedure usually is, turn off all replicas except one, upgrade this single replica, when everything is back up and working properly ,start up the rest 20:48:42 perhaps add some sort of failsafe to prevent it from starting up with a newer database and failing unpredictabely? 20:48:51 mnaser This would be a "shutdown control plane", "run DB migration", "upgrade control plane", "start control plane" type of release 20:49:10 you forgot backup DB 20:49:14 perhaps if db_migration_version > release_migration_version: refuse_to_start() 20:49:20 As it is written now. 20:49:44 mnaser Hmm, we don't have that today 20:50:03 yeah, we need that to not corrupt the DB willingly 20:50:15 Currently our upgrade is "db migration", then update control plane 20:50:35 This is the first patch that would not work that way 20:51:24 So, let me ask again, Are there folks willing to help on this patch to make it a smooth upgarde? 20:52:08 im unable to commit any effort into that so for our side, it will likely be: turn off all replicas, upgrade packages of 1 replica, sync db, start 1 replica, check if all ok, start up other replicas after upgrading 20:52:19 as an operator im perfectly content with that procedure (its what we used for many others) 20:52:30 i'm pretty much booked for this week and then i'm out essentially until January 20:52:31 so 20:52:35 .... 20:52:56 and i'm ok with this upgrade procedure 20:53:01 personally 20:53:33 my concern was that it's the ... like, 4 of us that deploy AND are active enough in the project to be talking in the weekly meetings 20:53:37 Ok, so I think I hear all three stack holders that are present saying they are ok with this upgrade procedure. Correct? 20:53:39 that are saying it's ok 20:53:52 people who aren't this active but are deploying octavia ... those are who I worry about 20:54:03 Understand. 20:54:04 and i wonder about what kind of support load this will cause 20:54:18 I guess the last option is to sit on this until someone can work on it 20:54:22 but ... i also don't have a lot of time to devote to this patch, so 20:54:34 we COULD patch just the amphora API 20:54:40 so we get it before it releases 20:54:43 I don't think that is a good option given it's scope 20:55:10 and then fix the actual DB later 20:55:11 the only place it's exposed is that API 20:55:14 rm_work, that's a valid point. they way I see it is the best option to ensure we don't break existing deployments is with alembic migrations\ 20:55:19 so if we're looking like it won't make it to Queens, it's simple to update just those fields 20:55:44 It's the internals that are important for L3 Act/Act work 20:55:48 yeah ... 20:55:57 alright, maybe the best approach IS to "add columns" 20:56:04 and duplicate the writes? 20:56:10 these columns aren't large 20:56:16 so it's extra data but not tooo much 20:56:17 right? 20:56:22 Yeah, that is the work I described. 20:56:28 I think that would be acceptable 20:56:34 Then at some later point drop the duplicates 20:56:38 yeah 20:57:00 when is the last day we could do this 20:57:04 We have four minutes. I would like to close on this 20:57:11 Q-3? 20:57:15 Yes 20:57:16 is it a "feature"? 20:57:25 k and that's ... mid-Jan? 20:57:36 Yes. 20:57:43 k... I just won't have time until January 20:57:50 but THEN maybe I could look at it ... MAYBE. 20:57:52 if it's not done yet 20:57:59 Ok, so decided to make it upgrade compat. We can figure out how/when it gets done later. 20:58:12 k 20:58:15 what DIDN'T we get to 20:58:31 Interface driver support and Members API Improvements Proposal 20:58:44 bar is not around 20:59:01 and.. we have like a min for it.. 20:59:10 yeah... 20:59:10 yeah i was mostly just curious 20:59:18 Right. 20:59:35 Ok, thanks for a lively meeting today. We got through some of it. 20:59:52 cgoncalves If you can hang around for a minute we can talk about your topic 21:00:03 yeah i'm good for a few min too 21:00:06 #endmeeting