19:01:31 #startmeeting infra 19:01:32 Meeting started Tue Sep 4 19:01:31 2018 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:33 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:35 The meeting name has been set to 'infra' 19:01:35 o/ 19:02:11 #topic Announcements 19:02:12 o/ 19:02:42 The PTG is less than a week a way. I know a good chunk of us will be there. I think we can go ahead and cancel next weeks Infra meeting as a result 19:02:57 we'll be in the help room instead 19:03:08 we'll be in the helproom at that time, i guess (or at lunch) 19:03:38 or both 19:03:51 timezone conversion says we can help lunchishly at that time 19:04:15 i mean, i'm eating lunch now 19:04:34 #topic Actions from last meeting 19:04:42 o/ 19:04:45 #link http://eavesdrop.openstack.org/meetings/infra/2018/infra.2018-08-28-19.01.txt minutes from last meeting 19:05:18 Last week was actually a fairly "quiet" meeting overall. I think many of us were (and still are) heads down on the bridge.openstack.org migration 19:05:49 May as well talk about that then. 19:05:53 #topic Priority Efforts 19:06:02 #topic Update Config Management 19:06:23 The review topic of update-cfg-mgmt and puppet-4 continue to have changes that need review 19:07:04 corvus has helped me get a set of changes up to manage clouds.yaml on nodepool nodes and on bridge.openstack.org. They are tested and I think passing tests now so we can probably take the plunge when ready 19:07:16 this is a necessary pre step to running the cloud launcher playbook against our clouds again 19:07:20 i'll +3 after lunch^Wmeeting 19:07:32 clarkb: i think i saw one fail (timeout) 19:07:40 * diablo_rojo sneaks in late 19:07:47 Shrews: darn ok, will have to take a look then 19:08:11 Once those changes are in and happy I'll unwip my change to deploy the cloud launcher playbook on bridge.openstack.org 19:08:42 at that point we should be in a good spot to add new clouds or cloud regions again (I had to manually run the launcher to set up the new vexxhost region) 19:08:54 pabelanger: has also proposed a change to run our ansible with mitogen 19:09:00 it looks like we should be able to get the per-project ssh keys feature into zuul soon -- maybe we can land that this week and restart with it. then it'll be ready for us to experiment with zuul-driven CD at ptg. 19:09:26 pabelanger: are the timings from the test output useful with mitogen, or do you think we need to run it in production to see if it's effective? 19:09:32 #link https://review.openstack.org/597297/ Adds mitogen to our ansible for hopeful performance improvements 19:10:54 corvus: my hunch is we may need to see it in production since I think one of the ways it is helpful is mitigating network latencies 19:11:09 corvus: the gains in a small localized test env may be minimal compared to our multicloud control plane 19:11:11 corvus: I think we need more hosts to see how it scales. I can try and look at tests in system-config to see if I can see any differences however 19:11:15 mordred will probably be interested in the pbr entry on mitogen's importer wall of shame too 19:11:17 clarkb: we do have network latencies in the tests... 19:11:36 corvus: we do but it si all in the same cloud region isn't it? 19:11:48 clarkb: yes -- much like our control plane 19:13:09 in any case I'm willing to try it in production. I do note that I don't think we can use python3.5 on xenial with mitogen which changes my thoughts on python3 migration for ansible somewhat (assuming we stick with mitogen) 19:14:09 i'm not eager to use mitogen. i think we have enough work cut out for us as it is, and i've heard mixed reports on its effectiveness. unless it's like two orders of magnitude better, i'd rather we just focus on the stuff in the spec3. 19:14:49 i'm not going to -2 it or anything, but i'd prefer not to +2 it either, and instead defer it until later. unless there's some reason we need to incorporate it now. 19:15:10 thats fair, the current cycle is about an hour long which isn't terrible 19:15:33 clarkb: we should (have) increase(ed) the size of bridge and run with more forks 19:15:50 if speed is what we're after, let's start with the basics 19:16:11 we couldn't even run without ooming before, i don't know why (mitogen or no) we wouldn't run into the same problem this time. 19:16:11 yah, this was mostly to see if we could get better performance with current bridge.o.o, but also okay to skip to later 19:17:07 as a staw-man: i'm happy to click the button in the web ui to tell rax to embiggen the existing node 19:17:12 corvus: we don't appear to have memory pressure issues on the new bridge (at least with the current number of forks) http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=65004&rra_id=all is the cacti info 19:17:25 clarkb: cool. then we should increase the number of forks until we do :) 19:17:51 but that instance is quite small, not opposed to increasing the number of forks and then the ram if we hit the memory limits before the runtime improvements stop 19:18:50 i don't think node embiggening is a thing in rax any longer 19:18:57 at least not for pvhvm flavors 19:19:22 fungi: bummer 19:19:29 i'm checking now 19:19:47 nova supports resizing at least 19:20:07 we *should* be able to launch a new bridge, but i don't know what we don't know about that. 19:20:37 corvus: things like clouds.yaml definitely were (and technically still are) managed by hand 19:21:16 ianw: is the change to make ansible more non root friendly on bridge.o.o still outstanding? 19:21:29 ianw: pabelanger is it worth talking about that here if so? might help reach an agreement? 19:21:37 #link https://support.rackspace.com/how-to/upgrading-resources-for-general-purpose-or-io-optimized-cloud-servers/ 19:21:55 tl;dr is you can add more disk and you can create a new bigger server 19:22:18 fungi: :( 19:22:32 clarkb: yep 19:22:34 https://review.openstack.org/#/c/596630/ 19:22:42 oh, there's a thing at the bottom about "auto scale" 19:23:02 ah looks like pabelanger +1'd it to not be a -1 19:23:06 which seems to have something to do with load balancing 19:23:08 I guess I just need to rereview it today then 19:23:11 fungi: i guess we could do all that. but it sounds like for now we can probably increase forks and kick the reprovisioning can down the road. 19:23:26 #link https://review.openstack.org/#/c/596630/ Change to make ansible on bridge.openstack.org useable as not root user 19:23:44 clarkb: yah, other eyes would be good there. Not going to block 19:24:30 ansible runs as root; zuul is going to run ansible as root.... is doing this worthwhile? 19:25:10 corvus: I'm a fan of it as we often run ansible commands from puppetmaster (and now bridge) as not root 19:25:21 we can su every time we want to run it I guess 19:25:23 the idea was for if we're still going to have humans running launch-node.py 19:25:40 clarkb: huh, i always did "sudo ansible" on puppetmaster 19:26:02 as the vast majority of the things the launch-node.py script covers doesn't need elevated privileges on the management bastion 19:26:02 fungi: ya that was where I ran into it first on bridge.openstack.org 19:27:21 I'm fine either way, forcing root will require some docs updates. Continuing to allow the admin group to run ansible commands requires ianw's change 19:27:57 i mean, we were able to run launch-node.py previously by just being in the puppet group, and then later when we switched to using ansible for the cron we made the inventory admin-group-writeable so we still didn't need to sudo launch-node.py 19:28:17 up until a very recent change to get rid of the admin group 19:28:54 when i run ansible as corvus on puppetmaster it just hangs. dunno what it's doing. 19:29:19 corvus: it may be generating an inventory for you if you've not done it before 19:29:26 so this was mainly a request to restore that property in a way that doesn't offend anyone's sensibilities (apparently the admin group existing rubs some folks the wrong way) 19:29:30 I can't remember if it uses the global cache if run as a regularl user or not 19:29:34 hanging usually to me indicates the inventory, and probably that is root only at this point and it's going off doing ... something? 19:29:36 clarkb: that sounds like a fabulous reason to run it as root :) 19:30:27 I personally find shared history as root user a little confusing, but not a deal breaker 19:30:42 that was the main bonus I liked about user with sudo 19:31:06 pabelanger: we can still user + sudo in the current setup fwiw 19:31:15 pabelanger: the proposed change allows user without sudo. 19:31:35 i just personally get especially uneasy when i'm running things as root or via sudo somewhere with as many sensitive contents as the management bastion 19:31:48 ansible cron runs as root, zuul will run ansible as root; if we make all the bits necessary to run ansible and launch-node read/writable to all of us -- is there really much of a point to that? 19:31:49 Right, that too. would need to check how I did launch-node.py in the past 19:32:12 fungi: i hear that -- everything about the bastion makes me uneasy :) 19:32:35 ya I'm mostly wtih fungi. Many years of only using sudo/root when necesary 19:32:42 but is the proposed change actually safer? 19:32:52 if the idea is that zuul and/or the ansible cron will take over launching new servers for us, then i'll no longer be uneasy with the fact that i might typo a root-privileged command that needed almost no root access 19:33:21 corvus: the ansible commands themselves shouldn't be, but if it helps keep us from just sudoing everything that doesn't immediately work then maybe? 19:33:30 its mostly a discipline thing and not one we've had problems with ime 19:33:31 i guess what i'm saying is -- the change is really about making all the currently-root-needing things not need root. and there is nothing else on the server other than the root-needing things. 19:34:40 corvus: well the ansible stuff only needs root beacuse that is how we've set it up (the invocation side at least) 19:35:04 but that is a bigger change to move invocation side off of root to only being root on the remote 19:35:12 i'm okay switching to using sudo if that's the plan going forward. we also had a rocky past with things like envvars not getting passed through to those scripts but i guess they're all being put in command-line options now so that shouldn't be as much of an issue 19:35:21 (and effecitlvey gives you root on the invocation side if you choose so may be a noop) 19:35:32 clarkb: (it still needs to become_true on bridge itself to self-manage, but yes, agreed) 19:36:02 what is we decided to revert the admin removal change? Then things go back to just working. 19:36:32 if* 19:36:47 maybe we can follow up after the meeting with ianw a bit more? we have 24 minutes of meeting allotment left and it would be good to move on to PTG things if we can 19:36:49 seems like the admin group was objectionable because it used to be a standard group on ubuntu and then they deprecated it in more recent releases 19:37:16 honestly if we're going to do it, i'd rather we have an "ansible" group to avoid confusion over this in the future. one group, one thing seems logical 19:37:19 and for whatever reason having a group named "admin" is a mistake 19:37:45 yeah, we used the "puppet" group for this in the past 19:37:46 right, that was my main suggestion just to rename root to sudo, we'd get the new admin group for bionic. 19:38:03 or even us adm 19:38:06 use* 19:38:32 in this particular instance, i'm unconvinced that an admin-like-group system is better. if someone feels strongly about it and wants to drive that, i'm not opposed. but it sounds like a time sink with little upside. especially if, at the end of the day, we're going to end up doing "sudo ansible ..." anyway. 19:39:01 i don't want to drag this out. if most of us have no objection to doing it all as root/via sudo i'll adapt 19:39:10 corvus: I think the only thing we would want to do is update our docs (for launch node in particular) to reflect that change in process. But otherwise I agree I think we can move forward with forcing use of root 19:39:13 maybe i just need more convincing :) 19:39:24 but ya lets move on so that we can go over ptg things really quickly 19:39:30 k 19:39:34 and can pick this up again after in #openstack-infra if we want 19:39:45 #topic General Topics 19:39:56 The PTG is next week 19:40:06 #link https://etherpad.openstack.org/p/infra-ptg-denver-2018 PTG planning document 19:40:46 The PTG runs Monday to Friday. On Monday and Tuesday the infra team is aprt of the helproom at the PTG. I will be there helping people with infra related items as they come up 19:41:06 if you are attending the PTG and have time Monday and Tuesday please join us 19:41:21 Then Wednesday and Thursday we will have dedicated infra time in our own room 19:41:47 Friday is a free day, but the openstack TC is having open meetings in the afternoon that people may be interested in 19:42:09 I'm open to trying to do some sort of team activity on that day if others are interested (maybe we do breakfast off site so people can get back to the TC meetings) 19:42:52 that is the high level overview, should be very similar in format to Dublin with exception of the snow storm 19:43:36 Oh, team bier garten outing. Looking at a schedule of official ptg activities Monday night would be easiest. does that work? 19:43:45 so in other words, _nothing_ like dublin ;) 19:44:10 * fungi remembers the format for dublin was "blizzard" 19:44:16 #link https://ethercalc.openstack.org/fgry72wu42qw availability for biergarten outing. Monday would be best based on PTG event scheduling. Please let me know if that doesn't work 19:44:49 I'll go ahead and pencil in Monday now and send details to the infra list later this week if I hear no objections 19:45:15 That leaves us with working out an agenda. If you have things you want to work on at the PTG that are not already on the etherpad please add them 19:45:31 I'm goign to try and categorize and organize them a bit so that when we show up wednesday we can peel them off of the list 19:45:50 i'm arriving on monday, so will join the help-room asap. bier monday sounds great. :) 19:46:02 so does tuesday. and wednesday. and thursday. 19:46:17 I won't be at the PTG this time around, but hope to participate remotely. 19:46:36 pabelanger: we'll miss you 19:46:57 pabelanger: re remote participation did you have a specific format in mind (eg a phone call with speakerphone on? or asterisk call from corvus' laptop? or?) 19:47:06 might be good to sort that out before we all get on places 19:47:09 *planes 19:47:12 yeah, i'm hoping that our work on the config mgmt stuff should mostly be writing a bunch of changes, so remote participation may be effective. 19:47:21 corvus: ++ 19:47:48 yah, if people remember to log into asterisk, that would behelpful, but mostly going to be watching irc 19:48:13 pabelanger: good to know, I'll try to make a point of treating irc as an extension of the room 19:49:05 yah, I'll keep an eye out for etherpad too 19:49:21 One thing worth noting is that I think this is a good opportunity for us to get feedback on the winterscale effort from openstack and some of the newr projects. I hear that airship and starlingx will be there 19:50:21 info on what sort of services various projects feel should be under the generic branding vs their own in particular. Will have to resurrect my etherpad strawman breakdown 19:50:54 The helproom may also have a less openstack vibe to it. Unsure at this point but keep that in mind too 19:51:39 Any other PTG questions, concerns, thoughts? 19:52:45 we've had some people recently pop up on the ml and in irc wanting to contribute to the team. any idea if we have new infra-curious folks attending? 19:53:11 there always seem to be at least one or two 19:53:27 fungi: I've not heard of any beyond the non openstack projects at the foundation being there and possibly going to use some of that time to work out infra related things 19:54:06 would be nice if those projects wanting to use our services also wanted to help maintain them 19:54:08 so maybe! 19:54:12 indeed 19:55:11 #topic Open Discussion 19:55:15 Alright anything else? 19:55:32 i just got notification on a ticket we have some more linaro resources in their london cloud 19:55:45 will look into that today. moving the builder there would probably work out well 19:56:03 #link https://review.openstack.org/#/c/599270/ Switch unmodified readthedocs jobs to warning job 19:56:22 ianw: neat, keep in mind that until clouds.yaml and cloud launcher stuff is automated on bridge.o.o some of the new region bring up is still manual (I manually ran cloud launcher against vexxhost sjc1 for example) 19:56:35 this adds a warning job for outstanding rtd publishers that aren't actually working. actually a lot of them are infra projects 19:56:54 if everyone could check their readthedocs login, and anything they own that is an infra project needs to have openstackci user added as an admin 19:57:05 reminder that the stein cycle signing key is up for switching into production too 19:57:14 i'm happy to help update the jobs, but can't do that until we have permission 19:57:39 huh I don't think I've ever configured rtd for an infra project 19:57:46 clarkb: ++ thanks for reminding me on that 19:57:56 #link https://review.openstack.org/599092 Replace the old Rocky cycle signing key with Stein 19:58:04 * AJaeger would appreciate some regular review on config repos. Especially clarkb and ianw help a lot - but more reactive. Could some of you regularly review, please? 19:58:24 there's a process documentation update and corresponding releases site change under the same topic:artifact-signing 19:58:59 AJaeger: ++ that is a good reminder. I try when I have downtime to look at the open list in project-config but with everything else going on sometimes that doesn't happen 19:59:50 clarkb: if you find over beer some ideas next week on what to do, I would appreciate it (I won't be at PTG but booked tickets for Berlin ;) 20:00:08 AJaeger: I hear they have beer in berlin too :) 20:00:10 And we are at time. 20:00:20 THanks everyone. See you back here in two weeks 20:00:24 #endmeeting