19:01:04 #startmeeting infra 19:01:05 Meeting started Tue Jun 9 19:01:04 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:06 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:08 The meeting name has been set to 'infra' 19:01:12 #link http://lists.opendev.org/pipermail/service-discuss/2020-June/000034.html Our Agenda 19:01:20 #topic Announcements 19:01:36 No announcements were listed 19:02:14 o/ 19:02:56 #topic Actions from last meeting 19:03:03 #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-06-02-19.03.txt minutes from last meeting 19:03:57 Last week's meeting was informal and we ended up debugging the meetpad/jitsimeet/etherpad/xmpp case sensitivity thing 19:04:25 No explicit actions came out of that that we recorded. But I think it gave us a better understanding of what we can do to make that case handling difference less confusing 19:04:27 o/ 19:04:32 seems like we have a plan for it though 19:04:54 or at least some consensus of things we can do 19:04:59 ya I think what we've found is that case confusion is a thing and we should probably switch to enforcing lower case in etherpad to avoid that anyway 19:05:11 then we've got to deal with renaming/merging pads as necessary to handle that 19:06:06 #topic Specs approval 19:06:16 This spec isn't ready for approval yet, but I wanted to call it out 19:06:35 #link https://review.opendev.org/#/c/731838/ Central Authentication Service spec 19:06:45 yeah, it needs some heavy editing 19:06:50 fungi: I think we half expect a new PS based on conversation we had at the PTG? 19:06:53 good feedback in there from neal too 19:07:12 yes, you can half expect it, but i fully intend to provide it ;_ 19:07:28 just might not come this week 19:07:38 we'll see 19:08:13 thanks 19:08:18 #topic Priority Efforts 19:08:28 #topic Update Config Management 19:08:58 The main topic I wanted to bring up here was the reorganization of our ansible inventory, groups, *vars, and base playbook 19:09:34 What we've realized is that the vast majority of the base playbook is not service specific. It configures admin users and exim for email and ntp and so on. 19:10:10 But the playbook runs against all hosts which means if any one of them fails then playbook fails. This can then cause problems if you wanted letsencrypt to run on a specific host or zuul to be updated and those hosts were fine 19:10:43 in order to make that more reliable we've split the iptablse role out of base as it is service specific and put that into our service roles. Then we can decouple running base as a requirement before every service update 19:10:54 mordred: ^ is that a reasonable summary of the change? Anything else to add to that? 19:10:59 I think that's great 19:11:28 from the operator side of things be aware files haev moved around and some config has been updated. You may need to rebase outstanding changes in system-config 19:12:34 Any other configuration management items to bring up? 19:13:22 I think that's about it - we may have discovered we're actually ok to run zuul-executor in containers 19:13:37 corvus is goign to verify - but I think I found that to be true now on friday 19:13:50 so I've got some patches up to do that 19:13:53 mordred: the thought there is we have to give the container some additional permissions? 19:14:23 clarkb: turns out we don't seem to need anything past privileged 19:14:23 locally i think i saw it working in bwrap but behaving weirdly inside docker itself. but it sounds like mordred saw something different when trying on ze01 19:14:32 yeah 19:14:47 so it's possible there are differences wrt kernel versio or docker version from the original test - or who knows 19:14:57 but i did bwrap inside of docker and it SEEMED to do the right things 19:15:05 based on what i saw, i think we should be "okay" to do it without the seccomp stuff, but i think it might be more comfortable with seccomp 19:15:15 mordred: did you test out afs inside docker but not in bwrap? 19:15:27 corvus: I think so? 19:15:31 k 19:15:37 corvus: but - let's double-check :) 19:15:46 so if what mordred saw holds, then i agree, we should be gtg without anything else 19:16:00 i'll do this after the meeting 19:16:22 ^ = confirm mordred's tests 19:16:29 if that works - we'll just be down to nodepool builder on arm running non-containerized - and we need to swing back around to that issue anyway 19:16:56 the arm nodepool builder is hung up on the odd stream crossing we saw with multi arch docker builds right? 19:17:23 yeah - which we need to reproduce and figure out what's going on 19:17:59 i can probably make some tiem for at least reproduction 19:19:00 #topic OpenDev 19:19:20 #link http://lists.opendev.org/pipermail/service-discuss/2020-May/000026.html Advisory Board thread. 19:19:52 The advisory board "recruiting" is still in progress. At the PTG we discussed that a gentle reminder to those who haven't responded is a good idea and then we'll move forward in a few weeks with who we get. 19:20:07 The thought is that by having some involvement we can generate interest and an example of what the system is there for 19:20:21 I plan to send out those gentle reminders today 19:20:59 like a snowball rolling downhill 19:21:07 in june? 19:21:21 On the service disde of things Gitea 1.12.0 has had its second rc tag and I've got a change up to test a deployment of that. Looks like they've already added some additional bug fixes on top of that. We should hold off until the actual release I expect 19:21:25 corvus: in some parts of the world 19:21:33 corvus: feel like taking a trip to chile? ;) 19:21:38 fungi: yes 19:21:58 the good news is the templates have been very stable between rc1 and rc2 so any final release should be really close to ready and its just a matter of updating the tag I hope 19:22:31 i've also got a change up for upgradnig the version of etherpad. supposedly a major cause of the "broken" pads is addressed with it 19:22:41 I'm excited for this update as it adds caching of git commit info which should drastically speed up our rendering of repos with large histories like nova 19:22:51 now that the ptg is done, this may be a good time for etherpad upgrades again 19:23:04 fungi: ++ I think we can land and deploy that as soon as we are happy with the change and its testing 19:23:49 just double-checked and 1.8.4 is still the latest release 19:24:18 what does "broken" mean? 19:25:12 corvus: i think like the clarkb-test etherpad on the old etherpad-dev server 19:25:19 corvus: etherpads that eventually stop serving correctly 19:25:58 yeah, the ones which hang with "loading..." 19:26:22 ack 19:26:30 Anything else on OpenDev or shoudl we moev on? 19:26:34 (I can't type today) 19:26:36 i mentioned the change some weeks back in #opendev, but when we hit one of those there are telltale errors in the log which are referenced by the fix 19:26:54 so fingers crossed anyway 19:27:56 #topic General Topics 19:28:07 #topic Project Renames 19:28:18 I want to start with this one to make sure we get a chance to talk about it 19:28:36 we had pencilled in June 12 which is this Friday. Unfortunately I've discovered I have a kids doctor visit at ~1800UTC that day 19:29:14 I'm happy to go ahead with it and help as I can (we can do it early friday or later friday and I'll be around) or move it to another day if we don't have enough people around 19:29:33 also we've added a few more renames since we last talked about this, the openstack foundation interop repos are getting moved now I guess 19:29:53 also it sounds like the openstack tc may want to rename a few more repos out of the openstack namespace into the osf namespace (relating to osf board of directors committees/working groups) 19:30:01 er, yeah what you just said 19:30:09 fungi: yup gmann added that to the list of things about half na hour agao 19:30:15 perfect 19:30:45 do we have any volunteers for Friday other than myself? 19:30:54 i'll be around 19:31:03 happy to do renames 19:31:15 fungi: cool do you have a perference on time and I'll do my best to be around to help ? 19:31:31 let's say not 18:00 utc in that case... 19:31:41 I can start as early as 1400UTC, then have cut off around 1730UTC, and expect to be back around 2030 UTC 19:31:57 (it'll likely be shorter than that but you never know with those visits) 19:32:31 i should be around but would like not to drive 19:32:33 my schedule is wide open friday. are there other volunteers with time constraints? i could certainly accommodate either of those windows 19:32:57 21:00 would work for me if that helps others 19:33:49 That works for me and should give me plenty of padding on my schedule 19:34:02 why don't we go with that then. Thank you fungi ! 19:34:13 let's do that then, we can always do some prep earlier in the day in anticipation too 19:34:24 ++ thanks 19:34:51 Between now and then we'll want to construct the yaml input to the renaming process and commit it to opendev/project-config once the renames happen 19:34:58 yep 19:35:01 I can help coordinate with you to make sure we are ready by Friday 19:35:09 sounds good, thanks 19:35:24 #topic Pip and Virtualenv Next Steps 19:35:40 ianw: ^ Any update on this subject? 19:35:56 I believe I saw at least one project (octavia) testing that the chagnes don't break them which was reassuring 19:36:08 yeah, i didn't get any complaints, and some people saying things worked 19:36:12 #link https://review.opendev.org/734428 19:36:25 that's the review to drop it, so ... i guess we just do it? i'm not sure what else to do 19:37:11 wfm 19:37:46 we've communicated it, at least some people have done testing and reinforced the expectation that this will be low impact, I think the next step is to land the change 19:38:49 ++ 19:38:56 this is also early enough in openstack's release cycle that any resulting disruption can be addressed at a comfortable pace 19:39:10 the one to watch for is if people say virtualenv is missing 19:39:24 their best bet is to add "ensure-virtualenv" role 19:39:41 ianw: please send an email once we merge the change 19:39:47 a followup to the announcement thread would be good indicated we've landed the change once that happens 19:39:48 AJaeger: ++ 19:39:55 will do 19:40:29 anything else on this topic? 19:40:35 no, thanks 19:41:06 #topic DNS Cleanup 19:41:20 ianw: did we end up publishing the contents for comment yet? 19:41:55 it looks like the backup went into merge failure 19:41:57 #link https://review.opendev.org/#/c/728739/ 19:42:02 but it would be good to merge that 19:42:16 the one to look through is 19:42:17 #link https://etherpad.opendev.org/p/rax-dns-openstack-org 19:43:24 perhaps to make it more manageable, if people want to delete from that things that should definitely stay, it will reduce it 19:43:26 thanks and I guess we can just mark that up with comments around what can be removed? 19:43:56 ah ya I see the note about removing things that should definitely stay, thanks 19:44:31 I'll try to take a look at that today 19:45:23 #topic PTG Recap 19:45:31 #link http://lists.opendev.org/pipermail/service-discuss/2020-June/000035.html Recap Email 19:45:43 I wrote a long email trying to cover the important bits of the PTG for us 19:45:49 Overall I think it went well. 19:46:01 From an operations side meetpad seemed to work with most of its scaling issues being client side 19:46:20 there were some annoying things like the etherpad focus going away when people talked sometimes and needign to reconnect because all sound went away 19:46:38 but overall it held up and the groups using it seemed happy (though groups with more than 20 had less success) 19:47:12 As participants we managed to get through our agenda. I think the total of 6 hours was about correct for us 19:47:27 #link https://etherpad.opendev.org/p/June2020-PTG-Feedback Provide your PTG event feedback 19:47:28 i was pleased with the way it worked out 19:47:42 the PTG organizers are soliciting feedback on the etherpad I just linked. Feel free to add your thoughts there 19:47:51 i have heard from folks they'd like to continue (trying) to use meetpad in the future; i think we can/should wind down pbx in favor of meetpad 19:48:00 corvus: ++ 19:48:20 i concur 19:48:27 One of the things we talked about was getting off of pytho3n for our little tools and utilities as well as services. 19:48:30 we lose the dial-in trunk though 19:48:36 I've started to try and put together an audit of the todo list around that 19:48:37 #link https://etherpad.opendev.org/p/opendev-tools-still-running-python2 Python2 Audit 19:48:50 fungi: jitsi meet supports that and I think we can even use the same number 19:48:57 fungi: but that is new config we need to sort out 19:49:14 (I don't know how it maps phone calls to meeting rooms as an example) 19:49:50 One thing that was missing from the virtual event was unwind/decompression time 19:49:54 yeah, i figured it was something we could add 19:50:05 at the in person events there are game nights and dinner with people 19:50:16 I was wondering if anyone was interested in trying some virtual form of that 19:50:23 more likely to be game night than dinner :) 19:50:24 also beer you don't have to pour yourself ;) 19:50:41 i guess i can get over pouring my own 19:50:47 I've discovered hedgewars does remote multiplayer and maybe we can play a silly game of that with comms over meetpad 19:51:05 its an open source clone of worms armageddon 19:51:35 I'm open to other ideas or being told that there isn't sufficient interest 19:52:36 Anything else to call out from the PTG? 19:53:25 #topic Trusty Updates 19:53:35 fungi: want to quickly recap the comodo cert situation? 19:53:49 sure 19:54:13 as of june 1, the old comodo/addtrust certificate authority ca cert expired 19:54:46 some of our sites used and still use certs which were validated through a chain including that as an intermediate 19:55:01 one in particular is openstackid.org 19:56:30 we discovered that on older python deployments, like that found on ubuntu trusty, the cert validation behavior of the requests module is to report a failure/exception if there is an expired cert in the chain bundle, even if another cert in the bundle is sufficient to validate the server's cert 19:56:57 this was causing people to be unable to log into refstack.openstack.org 19:57:44 it was ultimately "fixed" by updating the intermediate chain bundle on the openstackid.org server to no longer include the expired (and thus useless) addtrust cert 19:58:02 leaving only the newer sectigo cert 19:58:32 and that is something we should apply to our other sectigo certs? 19:58:40 this matches the current chain bundle recommended by sectigo (the ca of record for our non-le certs obtained from namecheap) 19:59:07 it likely depends on what's out there accessing those sites 19:59:51 we can safely remove the old addtrust ca from all our intermediate bundles, but a lot of the copies i found are stale from before we started moving stuff to le 20:00:08 ya so two layers of cleanup there I expect 20:00:15 so we could consider generally cleaning up old data in our hiera 20:00:18 ++ 20:00:25 and that takes us to the end of our alotted time 20:00:27 thank you everyone 20:00:35 Feel free to continue conversation in #opendev 20:00:41 if someone knows a programmatic way to identify those, that would be great 20:00:46 but I'll end the meeting now to ensure people can eat breakfast/lunch or go to bed :) 20:00:53 #endmeeting