19:01:07 #startmeeting infra 19:01:08 o/ 19:01:09 Meeting started Tue Aug 4 19:01:07 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:10 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:12 The meeting name has been set to 'infra' 19:01:16 #link http://lists.opendev.org/pipermail/service-discuss/2020-August/000068.html Our Agenda 19:01:27 #topic Announcements 19:01:38 Third and final Virtual OpenDev event next week: August 10-11 19:01:59 well, probably not *final* just the last one planned for 2020 ;) 19:02:10 final of this round 19:02:15 The topic is Containers in Production 19:02:26 which may be interesting to this group as we do more and more of that 19:02:40 great point 19:02:52 I also bring it up because they use the etherpad server for their discussions similar to a ptg or forum session. We'll want to try and be slushy with that service early next week 19:03:34 #topic Actions from last meeting 19:03:45 o/ 19:03:45 #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-07-28-19.01.txt minutes from last meeting 19:04:01 ianw was going to look into incorporating non openstack python packages into our wheel caches 19:04:13 I'm not sure that has happened yet as there have been a number of other distractions recently 19:04:39 umm started but haven't finished yet. i got a bit distracted looking at trying to parallel the jobs so we could farm it out across two/three/n nodes 19:04:57 seems like the only real challenge there is in designing how we want to consume the lists of packages 19:05:23 oh, but sharding the build will be good for scaling it, excellent point 19:06:05 also i think we're not quite as clear as we could be on how to build for a variety of python interpreter versions? 19:06:31 fungi: yes we've only ever done distro version + distro python + cpu arch 19:06:37 which as a starting point is likely fine 19:07:48 yeah,that was another distraction looking at the various versions it builds 19:07:53 building for non-default python on certain distros was not a thing for a while, don't recall if that got solved yet 19:08:53 I don't think so, but seems like that can be a followon without too much interference with pulling different lists of packages 19:08:54 like bionic defaulting to python3.6 but people wanting to do builds for 3.7 19:09:17 (which is packaged for bionic, just not the default) 19:09:51 that one may specifically be less of an issue with focal available now, but will likely come up again 19:10:05 yeah similar with 3.8 ... which is used on bionic for some 3.8 tox 19:10:51 #topic Specs approval 19:10:58 lets keep moving as we have a few things on the agenda 19:11:08 #link https://review.opendev.org/#/c/731838/ Authentication broker service 19:11:34 that got a new patchset last week 19:11:42 I need to rereview it and other input would be appreciated as well 19:11:44 it did 19:11:54 fungi: anything else you'd like t ocall out about it? 19:11:57 please anyone feel free to take a look 19:13:12 latest revision records keycloak as the consensus choice, and makes a more detailed note about knikolla's suggestion regarding simplesamlphp 19:13:33 great, I think that gives us a more concrete set of choices to evaluate 19:13:47 (and they seemed to be the strong consensus in earlier discussions) 19:14:27 #topic Priority Efforts 19:14:37 #topic Update Config Management 19:14:58 fungi: you manually updated gerritbot configs today (or was that yesterday). Maybe we should prioritize getting that redeployed on eavesdrop 19:15:21 I believe we're building a container for it and now just need to deploy it with ansible and docker-compose? 19:15:34 sure, for now i just did a wget of the file from gitea and copied it over the old one, then restarted the service 19:15:56 but yeah, that sounds likely 19:16:13 ok, I may take a look at that later this week if I find time. As it seems like users are noticing more and more often 19:16:44 corvus' zuul and nodepool upgrade plan email remainds me of the other place we need to update our config management: nb03 19:16:58 yep, we had some 50+ changed lines between all the project additions, retirements, renames, et cetera 19:17:13 I think we had assumed that we'd get containers build for arm64 and it would be switch like the othre 3 builders but maybe we should add zk tls support to the ansible in the shorter term? 19:17:29 corvus: ianw ^ you've been working on various aspects of that and probably hvae a better idea for whether or not that is a good choice 19:18:26 i guess the containers is so close, we should probably just hack in support for generic wheels quickly and switch to that 19:18:34 did we come up with a plan for nodepool arm? 19:18:38 containers 19:18:57 i want to say we did discuss this.. last week... but i forgot the agreemente 19:19:07 I think the last I remember was using an intermediate layer for nodepool 19:19:16 but I'm not sure if anyone is working on that yet 19:19:41 from the opendev side I want to make sure we don't forget that if nodepool and zuul start merging v4 changes that change expectations nb03 may be left behind 19:19:44 my understanding was that first we'd look at building the wheels so that the existing builds were just faster 19:19:59 ah yep that was it 19:20:11 build wheels, and also start on a new layer in parallel 19:20:17 and if we still couldn't get there, look into intermediate layers 19:20:31 k 19:20:33 also help upstreams of various libs build arm wheels 19:20:47 (hence the subsequent discussion about pyca/cryptography) 19:20:48 got it, in that case it seems we're making progress there and if we keep that up we'll probably be fine 19:21:17 yep :) upstream became/is somewhat of a distraction getting the generic wheels built :) 19:21:30 so the question is: should we pin zuul to 3.x? 19:21:31 but a good distraction in my opinion 19:21:46 since we could be really close to breaking ourselves 19:21:52 corvus: or short term add zk tls support to our ansible for nb03 19:22:23 is nb03 all ansible, or is there puppet? 19:22:31 I believe it is all ansible now 19:22:53 then it's probably not too hard to add zk tls; we should probably do that 19:23:02 oh hrm it still runs run-puppet on that host 19:23:13 then i don't think we should touch it with a ten-foot pole 19:23:15 I guess its ansible in that it runs puppet 19:23:40 adding zk tls to the puppet is just a 6-month long rabbit hole 19:24:17 ok, in that case we should keep aware of when zuul will require tls and pin to a previous versio nif we don't have arm64 nb03 sorted out on containers yet 19:25:08 i think we can definitely get it done quickly, like before next week 19:25:39 in that case we continue as is and push for arm64 imges then imo. Thanks 19:26:58 any other config management topics befor we move on? 19:27:34 #topic OpenDev 19:28:10 we disbaled gerrit's /p/ mirror serving in apache 19:28:18 haven't heard of any issues from that yet 19:28:27 [and there was much rejoicing] 19:28:32 I figure I'll give it another week or so then disable replicating to the local mirror and clean it up on the server 19:28:43 (just in case we need a quick revert if something comes up) 19:28:59 the next set of tasks related to the branch management are in gerritlib and jeepyb 19:29:01 #link https://review.opendev.org/741277 Needed in Gerritlib first as well as a Gerritlib release with this change. 19:29:10 #link https://review.opendev.org/741279 Can land once Gerritlib release is made with above change. 19:29:30 if folks have a chance to review those it would be appreciated. I can approve and tag releases as well as monitor things as they land 19:30:15 The other Gerrit service related topic was status of review-test 19:30:27 does anyone know where it got to? I know when we did the project renames it error'd on that particular host 19:30:38 it being the project rename playbook since review-test is in our gerrit group 19:31:03 mordred: ^ if you are around you may have an update on that? 19:32:22 we can move on and return to this if mordred is able to update later 19:32:29 #topic General topics 19:32:39 #topic Bup and Borg Backups 19:32:59 ianw: I seem to recall you said that a bup recovery on hosts that had their indexes cleaned worked as expceted 19:33:24 yes, i checked on that, noticed that zuul wasn't backing up to the "new" bup server and fixed that 19:33:35 i haven't brought up the new borg backup server and started with that, though 19:33:38 separately the borg change seems to have the rviews you need to land it then to start enrolling hosts as a next step 19:33:40 #link https://review.opendev.org/741366 19:33:53 yep, thanks, just been leaving it till i start the server 19:34:09 no worries. Just making sure we're all caught up on the progress there 19:34:23 tl;dr is bup is working and borg has no major hurdles 19:34:27 (which is excellent news) 19:34:42 also, resistance is futile 19:34:43 #topic github 3rd party ci 19:35:04 I think ianw has learned things baout zuul and github and is making progress working with pyca? 19:35:12 #link https://review.opendev.org/#/q/topic:opendev-3pci 19:35:31 yes the only other comment there as about running tests on direct merge to master 19:35:46 so something like our "post" pipeline? 19:35:56 ... which is a thing that is done apparently ... 19:36:16 or more like the "promote" pipeline maybe? 19:36:19 fungi: well, yeah, except there's a chance the tests don't work in it :) 19:36:49 okay, so like closing the barn door after the cows are out ;) 19:37:45 ianw: pabelanger or tobiash may have config snippets for making that work against github 19:37:48 we can listen for merge events, so it can be done. i was thinking of asking them to just start with pull-requests, and then once we have that stable we can make it listen for master merges if they want 19:38:18 ya starting with the most useful subset then expanding from there seems like a good idea 19:38:21 yeah, it's hard to test, and i don't want it to go mad and make it look like zuul/me has no idea what's going on. mostly the latter ;) 19:38:23 less noise if thing sneed work to get reliable 19:38:27 ++ 19:39:34 ianw: from their side any feedback beyond the reporting and events that get jobs run? 19:40:11 not so far, there was some discussion over the fact that it doesn't work with the python shipped on xenial 19:40:26 "it" being their job workload? 19:40:26 it being the pyca/cryptography tox testing 19:40:30 got it 19:40:52 that didn't seem to be something that bothered them; so xenial is running 2.7 tests but not 3.5 19:40:58 right, so they're using travis with pyenv installed python or something like that? 19:41:32 anything in particular they've found neat/been excited about so far? 19:41:35 yes, well it wgets a python tarball from some travis address ... 19:41:51 totally testing like production there ;) 19:42:43 yeah ... i mean that's always the problem. it's great that it works on 3.5, but not the 3.5 that someone might actually have i guess 19:43:17 but, then again, people probably run out of their own env's they've built too. at some point you have to decide what is in and out of the test matrix 19:43:58 ya eventually you do what is reasonable and that is well reasonable 19:44:01 not much else to report, i'll give a gentle prod on the pull request and see what else comes back 19:44:27 thanks for working on that! 19:45:28 #topic Open Discussion 19:45:43 A few things have popped up in the last day or so that didn't make it to the agenda that I thought I'd call out 19:46:04 the first is OpenEdge cloud is being turned back on and we need to build a new mirror there. There was an ipv6 routingissue yesterday thta has since been fixed 19:46:19 I can work on deploying the mirror after lunch today, and are we deploying those on focal or bionic? 19:46:24 (I think it may still be bionic for afs?) 19:47:08 ianw: also I think you have an update to launch-node that adds sshfp records. I guess I should use that as part of reviewin the change when booting the new mirror 19:47:10 ianw had mentioned something about rebuilding the linaro mirror on focal to rule out fixed kernel bugs for the random shutoff we're experiencing 19:47:34 heh, yeah i just had a look at that 19:47:45 there's two servers there at the moment? did you start them? 19:47:52 no I haven't started anything yet 19:48:06 my plan was to start over to ruel out any bootstrapping problems with sad network 19:48:18 oh, also probably worth highlighting, following discussion on the ml we removed the sshfp record for review.open{dev,stack}.org by splitting it to its own a/aaaa record instead of using a cname 19:48:20 but if we think that is unnecesary I'm happy to review changes to update dns and inventory instead 19:48:31 fungi: has that change merged? 19:48:51 i think i saw it merge last night my time 19:49:11 yeah, looks merged 19:49:11 there we no servers yesterday, perhaps donnyd started them. the two sticking points were ipv6 and i also couldn't contact the volume endpoint 19:49:37 k, I'll check with donnyd and you and sort it out in an hour or two 19:49:39 actually still need to update the openstack.org cname for it to point to review.opendev.org instead of review01, i'll do that now 19:50:07 other items of note: we removed the kata zuul tenant today 19:50:25 I kept an eye on it since it was the firs ttime we've removed a tenant as far as I can remember and it seemed to go smoothly 19:50:43 and pip 20.2 has broken version handling for packages with '.'s in their names 19:50:59 20.2.1 has fixed that and I've triggered ubuntu focal, bionic, and xenial image builds in nodepool to pick that up 19:51:09 it was mostly openstack noticing as oslo packages have lots of '.'s in them 19:51:29 but if anyone else has noticed that problem with tox's pip version new images should correct it 19:52:48 Anything else? 19:53:03 also saw that crop up with dogpile.cache, but yeah within openstack context 19:53:26 it mostly manifested as constraints not getting applied for anything with a . in the name 19:53:48 right it would update the package to the latest version despite other bounds 19:53:49 so projects not using constraints files in the deps field for tox envs probably wouldn't have noticed regardless 19:53:56 which for eg zuul is probably a non issue as it keeps up to date for most things 19:54:34 thanks everyone! we'll be here next week after the opendev event 19:54:38 #endmeeting