19:01:31 <clarkb> #startmeeting infra
19:01:32 <openstack> Meeting started Tue Sep  4 19:01:31 2018 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:33 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:35 <openstack> The meeting name has been set to 'infra'
19:01:35 <AJaeger> o/
19:02:11 <clarkb> #topic Announcements
19:02:12 <ianw> o/
19:02:42 <clarkb> The PTG is less than a week a way. I know a good chunk of us will be there. I think we can go ahead and cancel next weeks Infra meeting as a result
19:02:57 <clarkb> we'll be in the help room instead
19:03:08 <fungi> we'll be in the helproom at that time, i guess (or at lunch)
19:03:38 <corvus> or both
19:03:51 <fungi> timezone conversion says we can help lunchishly at that time
19:04:15 <corvus> i mean, i'm eating lunch now
19:04:34 <clarkb> #topic Actions from last meeting
19:04:42 <pabelanger> o/
19:04:45 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2018/infra.2018-08-28-19.01.txt minutes from last meeting
19:05:18 <clarkb> Last week was actually a fairly "quiet" meeting overall. I think many of us were (and still are) heads down on the bridge.openstack.org migration
19:05:49 <clarkb> May as well talk about that then.
19:05:53 <clarkb> #topic Priority Efforts
19:06:02 <clarkb> #topic Update Config Management
19:06:23 <clarkb> The review topic of update-cfg-mgmt and puppet-4 continue to have changes that need review
19:07:04 <clarkb> corvus has helped me get a set of changes up to manage clouds.yaml on nodepool nodes and on bridge.openstack.org. They are tested and I think passing tests now so we can probably take the plunge when ready
19:07:16 <clarkb> this is a necessary pre step to running the cloud launcher playbook against our clouds again
19:07:20 <corvus> i'll +3 after lunch^Wmeeting
19:07:32 <Shrews> clarkb: i think i saw one fail (timeout)
19:07:40 * diablo_rojo sneaks in late
19:07:47 <clarkb> Shrews: darn ok, will have to take a look then
19:08:11 <clarkb> Once those changes are in and happy I'll unwip my change to deploy the cloud launcher playbook on bridge.openstack.org
19:08:42 <clarkb> at that point we should be in a good spot to add new clouds or cloud regions again (I had to manually run the launcher to set up the new vexxhost region)
19:08:54 <clarkb> pabelanger: has also proposed a change to run our ansible with mitogen
19:09:00 <corvus> it looks like we should be able to get the per-project ssh keys feature into zuul soon -- maybe we can land that this week and restart with it.  then it'll be ready for us to experiment with zuul-driven CD at ptg.
19:09:26 <corvus> pabelanger: are the timings from the test output useful with mitogen, or do you think we need to run it in production to see if it's effective?
19:09:32 <clarkb> #link https://review.openstack.org/597297/ Adds mitogen to our ansible for hopeful performance improvements
19:10:54 <clarkb> corvus: my hunch is we may need to see it in production since I think one of the ways it is helpful is mitigating network latencies
19:11:09 <clarkb> corvus: the gains in a small localized test env may be minimal compared to our multicloud control plane
19:11:11 <pabelanger> corvus: I think we need more hosts to see how it scales.  I can try and look at tests in system-config to see if I can see any differences however
19:11:15 <fungi> mordred will probably be interested in the pbr entry on mitogen's importer wall of shame too
19:11:17 <corvus> clarkb: we do have network latencies in the tests...
19:11:36 <clarkb> corvus: we do but it si all in the same cloud region isn't it?
19:11:48 <corvus> clarkb: yes -- much like our control plane
19:13:09 <clarkb> in any case I'm willing to try it in production. I do note that I don't think we can use python3.5 on xenial with mitogen which changes my thoughts on python3 migration for ansible somewhat (assuming we stick with mitogen)
19:14:09 <corvus> i'm not eager to use mitogen.  i think we have enough work cut out for us as it is, and i've heard mixed reports on its effectiveness.  unless it's like two orders of magnitude better, i'd rather we just focus on the stuff in the spec3.
19:14:49 <corvus> i'm not going to -2 it or anything, but i'd prefer not to +2 it either, and instead defer it until later.  unless there's some reason we need to incorporate it now.
19:15:10 <clarkb> thats fair, the current cycle is about an hour long which isn't terrible
19:15:33 <corvus> clarkb: we should (have) increase(ed) the size of bridge and run with more forks
19:15:50 <corvus> if speed is what we're after, let's start with the basics
19:16:11 <corvus> we couldn't even run without ooming before, i don't know why (mitogen or no) we wouldn't run into the same problem this time.
19:16:11 <pabelanger> yah, this was mostly to see if we could get better performance with current bridge.o.o, but also okay to skip to later
19:17:07 <corvus> as a staw-man: i'm happy to click the button in the web ui to tell rax to embiggen the existing node
19:17:12 <clarkb> corvus: we don't appear to have memory pressure issues on the new bridge (at least with the current number of forks) http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=65004&rra_id=all is the cacti info
19:17:25 <corvus> clarkb: cool.  then we should increase the number of forks until we do :)
19:17:51 <clarkb> but that instance is quite small, not opposed to increasing the number of forks and then the ram if we hit the memory limits before the runtime improvements stop
19:18:50 <fungi> i don't think node embiggening is a thing in rax any longer
19:18:57 <fungi> at least not for pvhvm flavors
19:19:22 <corvus> fungi: bummer
19:19:29 <fungi> i'm checking now
19:19:47 <clarkb> nova supports resizing at least
19:20:07 <corvus> we *should* be able to launch a new bridge, but i don't know what we don't know about that.
19:20:37 <clarkb> corvus: things like clouds.yaml definitely were (and technically still are) managed by hand
19:21:16 <clarkb> ianw: is the change to make ansible more non root friendly on bridge.o.o still outstanding?
19:21:29 <clarkb> ianw: pabelanger is it worth talking about that here if so? might help reach an agreement?
19:21:37 <fungi> #link https://support.rackspace.com/how-to/upgrading-resources-for-general-purpose-or-io-optimized-cloud-servers/
19:21:55 <fungi> tl;dr is you can add more disk and you can create a new bigger server
19:22:18 <corvus> fungi: :(
19:22:32 <ianw> clarkb: yep
19:22:34 <ianw> https://review.openstack.org/#/c/596630/
19:22:42 <fungi> oh, there's a thing at the bottom about "auto scale"
19:23:02 <clarkb> ah looks like pabelanger +1'd it to not be a -1
19:23:06 <fungi> which seems to have something to do with load balancing
19:23:08 <clarkb> I guess I just need to rereview it today then
19:23:11 <corvus> fungi: i guess we could do all that.  but it sounds like for now we can probably increase forks and kick the reprovisioning can down the road.
19:23:26 <clarkb> #link https://review.openstack.org/#/c/596630/ Change to make ansible on bridge.openstack.org useable as not root user
19:23:44 <pabelanger> clarkb: yah, other eyes would be good there. Not going to block
19:24:30 <corvus> ansible runs as root; zuul is going to run ansible as root.... is doing this worthwhile?
19:25:10 <clarkb> corvus: I'm a fan of it as we often run ansible commands from puppetmaster (and now bridge) as not root
19:25:21 <clarkb> we can su every time we want to run it I guess
19:25:23 <fungi> the idea was for if we're still going to have humans running launch-node.py
19:25:40 <corvus> clarkb: huh, i always did "sudo ansible" on puppetmaster
19:26:02 <fungi> as the vast majority of the things the launch-node.py script covers doesn't need elevated privileges on the management bastion
19:26:02 <clarkb> fungi: ya that was where I ran into it first on bridge.openstack.org
19:27:21 <clarkb> I'm fine either way, forcing root will require some docs updates. Continuing to allow the admin group to run ansible commands requires ianw's change
19:27:57 <fungi> i mean, we were able to run launch-node.py previously by just being in the puppet group, and then later when we switched to using ansible for the cron we made the inventory admin-group-writeable so we still didn't need to sudo launch-node.py
19:28:17 <fungi> up until a very recent change to get rid of the admin group
19:28:54 <corvus> when i run ansible as corvus on puppetmaster it just hangs.  dunno what it's doing.
19:29:19 <clarkb> corvus: it may be generating an inventory for you if you've not done it before
19:29:26 <fungi> so this was mainly a request to restore that property in a way that doesn't offend anyone's sensibilities (apparently the admin group existing rubs some folks the wrong way)
19:29:30 <clarkb> I can't remember if it uses the global cache if run as a regularl user or not
19:29:34 <ianw> hanging usually to me indicates the inventory, and probably that is root only at this point and it's going off doing ... something?
19:29:36 <corvus> clarkb: that sounds like a fabulous reason to run it as root :)
19:30:27 <pabelanger> I personally find shared history as root user a little confusing, but not a deal breaker
19:30:42 <pabelanger> that was the main bonus I liked about user with sudo
19:31:06 <clarkb> pabelanger: we can still user + sudo in the current setup fwiw
19:31:15 <clarkb> pabelanger: the proposed change allows user without sudo.
19:31:35 <fungi> i just personally get especially uneasy when i'm running things as root or via sudo somewhere with as many sensitive contents as the management bastion
19:31:48 <corvus> ansible cron runs as root, zuul will run ansible as root; if we make all the bits necessary to run ansible and launch-node read/writable to all of us -- is there really much of a point to that?
19:31:49 <pabelanger> Right, that too. would need to check how I did launch-node.py in the past
19:32:12 <corvus> fungi: i hear that -- everything about the bastion makes me uneasy :)
19:32:35 <clarkb> ya I'm mostly wtih fungi. Many years of only using sudo/root when necesary
19:32:42 <corvus> but is the proposed change actually safer?
19:32:52 <fungi> if the idea is that zuul and/or the ansible cron will take over launching new servers for us, then i'll no longer be uneasy with the fact that i might typo a root-privileged command that needed almost no root access
19:33:21 <clarkb> corvus: the ansible commands themselves shouldn't be, but if it helps keep us from just sudoing everything that doesn't immediately work then maybe?
19:33:30 <clarkb> its mostly a discipline thing and not one we've had problems with ime
19:33:31 <corvus> i guess what i'm saying is -- the change is really about making all the currently-root-needing things not need root.  and there is nothing else on the server other than the root-needing things.
19:34:40 <clarkb> corvus: well the ansible stuff only needs root beacuse that is how we've set it up (the invocation side at least)
19:35:04 <clarkb> but that is a bigger change to move invocation side off of root to only being root on the remote
19:35:12 <fungi> i'm okay switching to using sudo if that's the plan going forward. we also had a rocky past with things like envvars not getting passed through to those scripts but i guess they're all being put in command-line options now so that shouldn't be as much of an issue
19:35:21 <clarkb> (and effecitlvey gives you root on the invocation side if you choose so may be a noop)
19:35:32 <corvus> clarkb: (it still needs to become_true on bridge itself to self-manage, but yes, agreed)
19:36:02 <pabelanger> what is we decided to revert the admin removal change? Then things go back to just working.
19:36:32 <pabelanger> if*
19:36:47 <clarkb> maybe we can follow up after the meeting with ianw a bit more? we have 24 minutes of meeting allotment left and it would be good to move on to PTG things if we can
19:36:49 <fungi> seems like the admin group was objectionable because it used to be a standard group on ubuntu and then they deprecated it in more recent releases
19:37:16 <ianw> honestly if we're going to do it, i'd rather we have an "ansible" group to avoid confusion over this in the future.  one group, one thing seems logical
19:37:19 <fungi> and for whatever reason having a group named "admin" is a mistake
19:37:45 <fungi> yeah, we used the "puppet" group for this in the past
19:37:46 <pabelanger> right, that was my main suggestion just to rename root to sudo, we'd get the new admin group for bionic.
19:38:03 <pabelanger> or even us adm
19:38:06 <pabelanger> use*
19:38:32 <corvus> in this particular instance, i'm unconvinced that an admin-like-group system is better.  if someone feels strongly about it and wants to drive that, i'm not opposed.  but it sounds like a time sink with little upside.  especially if, at the end of the day, we're going to end up doing "sudo ansible ..." anyway.
19:39:01 <fungi> i don't want to drag this out. if most of us have no objection to doing it all as root/via sudo i'll adapt
19:39:10 <clarkb> corvus: I think the only thing we would want to do is update our docs (for launch node in particular) to reflect that change in process. But otherwise I agree I think we can move forward with forcing use of root
19:39:13 <corvus> maybe i just need more convincing :)
19:39:24 <clarkb> but ya lets move on so that we can go over ptg things really quickly
19:39:30 <corvus> k
19:39:34 <clarkb> and can pick this up again after in #openstack-infra if we want
19:39:45 <clarkb> #topic General Topics
19:39:56 <clarkb> The PTG is next week
19:40:06 <clarkb> #link https://etherpad.openstack.org/p/infra-ptg-denver-2018 PTG planning document
19:40:46 <clarkb> The PTG runs Monday to Friday. On Monday and Tuesday the infra team is aprt of the helproom at the PTG. I will be there helping people with infra related items as they come up
19:41:06 <clarkb> if you are attending the PTG and have time Monday and Tuesday please join us
19:41:21 <clarkb> Then Wednesday and Thursday we will have dedicated infra time in our own room
19:41:47 <clarkb> Friday is a free day, but the openstack TC is having open meetings in the afternoon that people may be interested in
19:42:09 <clarkb> I'm open to trying to do some sort of team activity on that day if others are interested (maybe we do breakfast off site so people can get back to the TC meetings)
19:42:52 <clarkb> that is the high level overview, should be very similar in format to Dublin with exception of the snow storm
19:43:36 <clarkb> Oh, team bier garten outing. Looking at a schedule of official ptg activities Monday night would be easiest. does that work?
19:43:45 <fungi> so in other words, _nothing_ like dublin ;)
19:44:10 * fungi remembers the format for dublin was "blizzard"
19:44:16 <clarkb> #link https://ethercalc.openstack.org/fgry72wu42qw availability for biergarten outing. Monday would be best based on PTG event scheduling. Please let me know if that doesn't work
19:44:49 <clarkb> I'll go ahead and pencil in Monday now and send details to the infra list later this week if I hear no objections
19:45:15 <clarkb> That leaves us with working out an agenda. If you have things you want to work on at the PTG that are not already on the etherpad please add them
19:45:31 <clarkb> I'm goign to try and categorize and organize them a bit so that when we show up wednesday we can peel them off of the list
19:45:50 <corvus> i'm arriving on monday, so will join the help-room asap.  bier monday sounds great.  :)
19:46:02 <corvus> so does tuesday.  and wednesday.  and thursday.
19:46:17 <pabelanger> I won't be at the PTG this time around, but hope to participate remotely.
19:46:36 <clarkb> pabelanger: we'll miss you
19:46:57 <clarkb> pabelanger: re remote participation did you have a specific format in mind (eg a phone call with speakerphone on? or asterisk call from corvus' laptop? or?)
19:47:06 <clarkb> might be good to sort that out before we all get on places
19:47:09 <clarkb> *planes
19:47:12 <corvus> yeah, i'm hoping that our work on the config mgmt stuff should mostly be writing a bunch of changes, so remote participation may be effective.
19:47:21 <clarkb> corvus: ++
19:47:48 <pabelanger> yah, if people remember to log into asterisk, that would behelpful, but mostly going to be watching irc
19:48:13 <clarkb> pabelanger: good to know, I'll try to make a point of treating irc as an extension of the room
19:49:05 <pabelanger> yah, I'll keep an eye out for etherpad too
19:49:21 <clarkb> One thing worth  noting is that I think this is a good opportunity for us to get feedback on the winterscale effort from openstack and some of the newr projects. I hear that airship and starlingx will be there
19:50:21 <clarkb> info on what sort of services various projects feel should be under the generic branding vs their own in particular. Will have to resurrect my etherpad strawman breakdown
19:50:54 <clarkb> The helproom may also have a less openstack vibe to it. Unsure at this point but keep that in mind too
19:51:39 <clarkb> Any other PTG questions, concerns, thoughts?
19:52:45 <fungi> we've had some people recently pop up on the ml and in irc wanting to contribute to the team. any idea if we have new infra-curious folks attending?
19:53:11 <fungi> there always seem to be at least one or two
19:53:27 <clarkb> fungi: I've not heard of any beyond the non openstack projects at the foundation being there and possibly going to use some of that time to work out infra related things
19:54:06 <fungi> would be nice if those projects wanting to use our services also wanted to help maintain them
19:54:08 <fungi> so maybe!
19:54:12 <clarkb> indeed
19:55:11 <clarkb> #topic Open Discussion
19:55:15 <clarkb> Alright anything else?
19:55:32 <ianw> i just got notification on a ticket we have some more linaro resources in their london cloud
19:55:45 <ianw> will look into that today.  moving the builder there would probably work out well
19:56:03 <ianw> #link https://review.openstack.org/#/c/599270/ Switch unmodified readthedocs jobs to warning job
19:56:22 <clarkb> ianw: neat, keep in mind that until clouds.yaml and cloud launcher stuff is automated on bridge.o.o some of the new region bring up is still manual (I manually ran cloud launcher against vexxhost sjc1 for example)
19:56:35 <ianw> this adds a warning job for outstanding rtd publishers that aren't actually working.  actually a lot of them are infra projects
19:56:54 <ianw> if everyone could check their readthedocs login, and anything they own that is an infra project needs to have openstackci user added as an admin
19:57:05 <fungi> reminder that the stein cycle signing key is up for switching into production too
19:57:14 <ianw> i'm happy to help update the jobs, but can't do that until we have permission
19:57:39 <clarkb> huh I don't think I've ever configured rtd for an infra project
19:57:46 <ianw> clarkb: ++ thanks for reminding me on that
19:57:56 <fungi> #link https://review.openstack.org/599092 Replace the old Rocky cycle signing key with Stein
19:58:04 * AJaeger would appreciate some regular review on config repos. Especially clarkb and ianw help a lot - but more reactive. Could some of you regularly review, please?
19:58:24 <fungi> there's a process documentation update and corresponding releases site change under the same topic:artifact-signing
19:58:59 <clarkb> AJaeger: ++ that is a good reminder. I try when I have downtime to look at the open list in project-config but with everything else going on sometimes that doesn't happen
19:59:50 <AJaeger> clarkb: if you find over beer some ideas next week on what to do, I would appreciate it (I won't be at PTG but booked tickets for Berlin ;)
20:00:08 <clarkb> AJaeger: I hear they have beer in berlin too :)
20:00:10 <clarkb> And we are at time.
20:00:20 <clarkb> THanks everyone. See you back here in two weeks
20:00:24 <clarkb> #endmeeting