19:01:07 <clarkb> #startmeeting infra
19:01:08 <ianw> o/
19:01:09 <openstack> Meeting started Tue Aug  4 19:01:07 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:12 <openstack> The meeting name has been set to 'infra'
19:01:16 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2020-August/000068.html Our Agenda
19:01:27 <clarkb> #topic Announcements
19:01:38 <clarkb> Third and final Virtual OpenDev event next week: August 10-11
19:01:59 <fungi> well, probably not *final* just the last one planned for 2020 ;)
19:02:10 <clarkb> final of this round
19:02:15 <clarkb> The topic is Containers in Production
19:02:26 <clarkb> which may be interesting to this group as we do more and more of that
19:02:40 <fungi> great point
19:02:52 <clarkb> I also bring it up because they use the etherpad server for their discussions similar to a ptg or forum session. We'll want to try and be slushy with that service early next week
19:03:34 <clarkb> #topic Actions from last meeting
19:03:45 <diablo_rojo> o/
19:03:45 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-07-28-19.01.txt minutes from last meeting
19:04:01 <clarkb> ianw was going to look into incorporating non openstack python packages into our wheel caches
19:04:13 <clarkb> I'm not sure that has happened yet as there have been a number of other distractions recently
19:04:39 <ianw> umm started but haven't finished yet.  i got a bit distracted looking at trying to parallel the jobs so we could farm it out across two/three/n nodes
19:04:57 <fungi> seems like the only real challenge there is in designing how we want to consume the lists of packages
19:05:23 <fungi> oh, but sharding the build will be good for scaling it, excellent point
19:06:05 <fungi> also i think we're not quite as clear as we could be on how to build for a variety of python interpreter versions?
19:06:31 <clarkb> fungi: yes we've only ever done distro version + distro python + cpu arch
19:06:37 <clarkb> which as a starting point is likely fine
19:07:48 <ianw> yeah,that was another distraction looking at the various versions it builds
19:07:53 <fungi> building for non-default python on certain distros was not a thing for a while, don't recall if that got solved yet
19:08:53 <clarkb> I don't think so, but seems like that can be a followon without too much interference with pulling different lists of packages
19:08:54 <fungi> like bionic defaulting to python3.6 but people wanting to do builds for 3.7
19:09:17 <fungi> (which is packaged for bionic, just not the default)
19:09:51 <fungi> that one may specifically be less of an issue with focal available now, but will likely come up again
19:10:05 <ianw> yeah similar with 3.8 ... which is used on bionic for some 3.8 tox
19:10:51 <clarkb> #topic Specs approval
19:10:58 <clarkb> lets keep moving as we have a few things on the agenda
19:11:08 <clarkb> #link https://review.opendev.org/#/c/731838/ Authentication broker service
19:11:34 <clarkb> that got a new patchset last week
19:11:42 <clarkb> I need to rereview it and other input would be appreciated as well
19:11:44 <fungi> it did
19:11:54 <clarkb> fungi: anything else you'd like t ocall out about it?
19:11:57 <fungi> please anyone feel free to take a look
19:13:12 <fungi> latest revision records keycloak as the consensus choice, and makes a more detailed note about knikolla's suggestion regarding simplesamlphp
19:13:33 <clarkb> great, I think that gives us a more concrete set of choices to evaluate
19:13:47 <clarkb> (and they seemed to be the strong consensus in earlier discussions)
19:14:27 <clarkb> #topic Priority Efforts
19:14:37 <clarkb> #topic Update Config Management
19:14:58 <clarkb> fungi: you manually updated gerritbot configs today (or was that yesterday). Maybe we should prioritize getting that redeployed on eavesdrop
19:15:21 <clarkb> I believe we're building a container for it and now just need to deploy it with ansible and docker-compose?
19:15:34 <fungi> sure, for now i just did a wget of the file from gitea and copied it over the old one, then restarted the service
19:15:56 <fungi> but yeah, that sounds likely
19:16:13 <clarkb> ok, I may take a look at that later this week if I find time. As it seems like users are noticing more and more often
19:16:44 <clarkb> corvus' zuul and nodepool upgrade plan email remainds me of the other place we need to update our config management: nb03
19:16:58 <fungi> yep, we had some 50+ changed lines between all the project additions, retirements, renames, et cetera
19:17:13 <clarkb> I think we had assumed that we'd get containers build for arm64 and it would be switch like the othre 3 builders but maybe we should add zk tls support to the ansible in the shorter term?
19:17:29 <clarkb> corvus: ianw ^ you've been working on various aspects of that and probably hvae a better idea for whether or not that is a good choice
19:18:26 <ianw> i guess the containers is so close, we should probably just hack in support for generic wheels quickly and switch to that
19:18:34 <corvus> did we come up with a plan for nodepool arm?
19:18:38 <corvus> containers
19:18:57 <corvus> i want to say we did discuss this.. last week... but i forgot the agreemente
19:19:07 <clarkb> I think the last I remember was using an intermediate layer for nodepool
19:19:16 <clarkb> but I'm not sure if anyone is working on that yet
19:19:41 <clarkb> from the opendev side I want to make sure we don't forget that if nodepool and zuul start merging v4 changes that change expectations nb03 may be left behind
19:19:44 <ianw> my understanding was that first we'd look at building the wheels so that the existing builds were just faster
19:19:59 <corvus> ah yep that was it
19:20:11 <corvus> build wheels, and also start on a new layer in parallel
19:20:17 <ianw> and if we still couldn't get there, look into intermediate layers
19:20:31 <corvus> k
19:20:33 <fungi> also help upstreams of various libs build arm wheels
19:20:47 <fungi> (hence the subsequent discussion about pyca/cryptography)
19:20:48 <clarkb> got it, in that case it seems we're making progress there and if we keep that up we'll probably be fine
19:21:17 <ianw> yep :)  upstream became/is somewhat of a distraction getting the generic wheels built :)
19:21:30 <corvus> so the question is: should we pin zuul to 3.x?
19:21:31 <fungi> but a good distraction in my opinion
19:21:46 <corvus> since we could be really close to breaking ourselves
19:21:52 <clarkb> corvus: or short term add zk tls support to our ansible for nb03
19:22:23 <corvus> is nb03 all ansible, or is there puppet?
19:22:31 <clarkb> I believe it is all ansible now
19:22:53 <corvus> then it's probably not too hard to add zk tls; we should probably do that
19:23:02 <clarkb> oh hrm it still runs run-puppet on that host
19:23:13 <corvus> then i don't think we should touch it with a ten-foot pole
19:23:15 <clarkb> I guess its ansible in that it runs puppet
19:23:40 <corvus> adding zk tls to the puppet is just a 6-month long rabbit hole
19:24:17 <clarkb> ok, in that case we should keep aware of when zuul will require tls and pin to a previous versio nif we don't have arm64 nb03 sorted out on containers yet
19:25:08 <ianw> i think we can definitely get it done quickly, like before next week
19:25:39 <clarkb> in that case we continue as is and push for arm64 imges then imo. Thanks
19:26:58 <clarkb> any other config management topics befor we move on?
19:27:34 <clarkb> #topic OpenDev
19:28:10 <clarkb> we disbaled gerrit's /p/ mirror serving in apache
19:28:18 <clarkb> haven't heard of any issues from that yet
19:28:27 <fungi> [and there was much rejoicing]
19:28:32 <clarkb> I figure I'll give it another week or so then disable replicating to the local mirror and clean it up on the server
19:28:43 <clarkb> (just in case we need a quick revert if something comes up)
19:28:59 <clarkb> the next set of tasks related to the branch management are in gerritlib and jeepyb
19:29:01 <clarkb> #link https://review.opendev.org/741277 Needed in Gerritlib first as well as a Gerritlib release with this change.
19:29:10 <clarkb> #link https://review.opendev.org/741279 Can land once Gerritlib release is made with above change.
19:29:30 <clarkb> if folks have a chance to review those it would be appreciated. I can approve and tag releases as well as monitor things as they land
19:30:15 <clarkb> The other Gerrit service related topic was status of review-test
19:30:27 <clarkb> does anyone know where it got to? I know when we did the project renames it error'd on that particular host
19:30:38 <clarkb> it being the project rename playbook since review-test is in our gerrit group
19:31:03 <clarkb> mordred: ^ if you are around you may have an update on that?
19:32:22 <clarkb> we can move on and return to this if mordred is able to update later
19:32:29 <clarkb> #topic General topics
19:32:39 <clarkb> #topic Bup and Borg Backups
19:32:59 <clarkb> ianw: I seem to recall you said that a bup recovery on hosts that had their indexes cleaned worked as expceted
19:33:24 <ianw> yes, i checked on that, noticed that zuul wasn't backing up to the "new" bup server and fixed that
19:33:35 <ianw> i haven't brought up the new borg backup server and started with that, though
19:33:38 <clarkb> separately the borg change seems to have the rviews you need to land it then to start enrolling hosts as a next step
19:33:40 <clarkb> #link https://review.opendev.org/741366
19:33:53 <ianw> yep, thanks, just been leaving it till i start the server
19:34:09 <clarkb> no worries. Just making sure we're all caught up on the progress there
19:34:23 <clarkb> tl;dr is bup is working and borg has no major hurdles
19:34:27 <clarkb> (which is excellent news)
19:34:42 <fungi> also, resistance is futile
19:34:43 <clarkb> #topic github 3rd party ci
19:35:04 <clarkb> I think ianw has learned things baout zuul and github and is making progress working with pyca?
19:35:12 <clarkb> #link https://review.opendev.org/#/q/topic:opendev-3pci
19:35:31 <ianw> yes the only other comment there as about running tests on direct merge to master
19:35:46 <fungi> so something like our "post" pipeline?
19:35:56 <ianw> ... which is a thing that is done apparently ...
19:36:16 <fungi> or more like the "promote" pipeline maybe?
19:36:19 <ianw> fungi: well, yeah, except there's a chance the tests don't work in it :)
19:36:49 <fungi> okay, so like closing the barn door after the cows are out ;)
19:37:45 <clarkb> ianw: pabelanger or tobiash may have config snippets for making that work against github
19:37:48 <ianw> we can listen for merge events, so it can be done.  i was thinking of asking them to just start with pull-requests, and then once we have that stable we can make it listen for master merges if they want
19:38:18 <clarkb> ya starting with the most useful subset then expanding from there seems like a good idea
19:38:21 <ianw> yeah, it's hard to test, and i don't want it to go mad and make it look like zuul/me has no idea what's going on.  mostly the latter ;)
19:38:23 <clarkb> less noise if thing sneed work to get reliable
19:38:27 <clarkb> ++
19:39:34 <clarkb> ianw: from their side any feedback beyond the reporting and events that get jobs run?
19:40:11 <ianw> not so far, there was some discussion over the fact that it doesn't work with the python shipped on xenial
19:40:26 <fungi> "it" being their job workload?
19:40:26 <ianw> it being the pyca/cryptography tox testing
19:40:30 <fungi> got it
19:40:52 <ianw> that didn't seem to be something that bothered them; so xenial is running 2.7 tests but not 3.5
19:40:58 <fungi> right, so they're using travis with pyenv installed python or something like that?
19:41:32 <fungi> anything in particular they've found neat/been excited about so far?
19:41:35 <ianw> yes, well it wgets a python tarball from some travis address ...
19:41:51 <fungi> totally testing like production there ;)
19:42:43 <ianw> yeah ... i mean that's always the problem.  it's great that it works on 3.5, but not the 3.5 that someone might actually have i guess
19:43:17 <ianw> but, then again, people probably run out of their own env's they've built too.  at some point you have to decide what is in and out of the test matrix
19:43:58 <clarkb> ya eventually you do what is reasonable and that is well reasonable
19:44:01 <ianw> not much else to report, i'll give a gentle prod on the pull request and see what else comes back
19:44:27 <fungi> thanks for working on that!
19:45:28 <clarkb> #topic Open Discussion
19:45:43 <clarkb> A few things have popped up in the last day or so that didn't make it to the agenda that I thought I'd call out
19:46:04 <clarkb> the first is OpenEdge cloud is being turned back on and we need to build a new mirror there. There was an ipv6 routingissue yesterday thta has since been fixed
19:46:19 <clarkb> I can work on deploying the mirror after lunch today, and are we deploying those on focal or bionic?
19:46:24 <clarkb> (I think it may still be bionic for afs?)
19:47:08 <clarkb> ianw: also I think you have an update to launch-node that adds sshfp records. I guess I should use that as part of reviewin the change when booting the new mirror
19:47:10 <fungi> ianw had mentioned something about rebuilding the linaro mirror on focal to rule out fixed kernel bugs for the random shutoff we're experiencing
19:47:34 <ianw> heh, yeah i just had a look at that
19:47:45 <ianw> there's two servers there at the moment?  did you start them?
19:47:52 <clarkb> no I haven't started anything yet
19:48:06 <clarkb> my plan was to start over to ruel out any bootstrapping problems with sad network
19:48:18 <fungi> oh, also probably worth highlighting, following discussion on the ml we removed the sshfp record for review.open{dev,stack}.org by splitting it to its own a/aaaa record instead of using a cname
19:48:20 <clarkb> but if we think that is unnecesary I'm happy to review changes to update dns and inventory instead
19:48:31 <clarkb> fungi: has that change merged?
19:48:51 <fungi> i think i saw it merge last night my time
19:49:11 <fungi> yeah, looks merged
19:49:11 <ianw> there we no servers yesterday, perhaps donnyd started them.  the two sticking points were ipv6 and i also couldn't contact the volume endpoint
19:49:37 <clarkb> k, I'll check with donnyd and you and sort it out in an hour or two
19:49:39 <fungi> actually still need to update the openstack.org cname for it to point to review.opendev.org instead of review01, i'll do that now
19:50:07 <clarkb> other items of note: we removed the kata zuul tenant today
19:50:25 <clarkb> I kept an eye on it since it was the firs ttime we've removed a tenant as far as I can remember and it seemed to go smoothly
19:50:43 <clarkb> and pip 20.2 has broken version handling for packages with '.'s in their names
19:50:59 <clarkb> 20.2.1 has fixed that and I've triggered ubuntu focal, bionic, and xenial image builds in nodepool to pick that up
19:51:09 <clarkb> it was mostly openstack noticing as oslo packages have lots of '.'s in them
19:51:29 <clarkb> but if anyone else has noticed that problem with tox's pip version new images should correct it
19:52:48 <clarkb> Anything else?
19:53:03 <fungi> also saw that crop up with dogpile.cache, but yeah within openstack context
19:53:26 <fungi> it mostly manifested as constraints not getting applied for anything with a . in the name
19:53:48 <clarkb> right it would update the package to the latest version despite other bounds
19:53:49 <fungi> so projects not using constraints files in the deps field for tox envs probably wouldn't have noticed regardless
19:53:56 <clarkb> which for eg zuul is probably a non issue as it keeps up to date for most things
19:54:34 <clarkb> thanks everyone! we'll be here next week after the opendev event
19:54:38 <clarkb> #endmeeting