Tuesday, 2023-04-25

clarkbOur team meeting will begin momentarily19:00
ianwo/19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Apr 25 19:01:02 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/UT4QRPDF3HXQNR3PTIJZR5MIP5WPOTAW/ Our Agenda19:01
clarkb#topic Announcements19:01
clarkbI don't have any announcements this week19:01
clarkb#topic Topics19:01
clarkbwe can dive right in19:01
clarkb#topic Migrating to Quay.io19:01
clarkbA fair bit has happend on this since we last met19:02
clarkbIn the Zuul world changes are up to convert all of the zuul image publications to quay.io instead ofdocker hub19:02
clarkbI think they all pass CI now too so Zuul should move soon and we'll want to updte all our docker-compose.yaml files to match when that happens19:02
clarkbPiggybacking off of the hard work done in Zuul I've started doing some of the work for opendev as well. In particular I have copied our images from docker hub to quay.io to pull in old content. Most of our images only tag latest so for most things this isn't super necessary, but other images (like Gerrit) do tag versions.19:03
corvustechnically the zuul-operator doesn't pass, but i think the current issue is bitrot and there's only a small chance there's a container image issue lurking there.19:03
clarkbI did skip four images that I don't plan to transfer over. opendevorg/bazel, opendevorg/grafana, opendevorg/jitsi-meet-prosody, and opendevorg/jitsi-meet-web since none of these are images we use today19:04
clarkb#link https://review.opendev.org/c/opendev/system-config/+/881285 WIP change to convert an image to quay.io using new jobs.19:04
fungisounds good to me, thanks for working through that19:04
clarkbThis change is WIP and open for feedback. corvus already had some feedback which i need to address19:04
clarkbcorvus: did you see my message in #opendev earlier today? I'm tempted to update ensure-quay-repo to check for a registry-type flag on each image in container_images and if it is set to quay then the role would act on it otherwise it would skip. Then we can run the role in the opendev base job19:05
clarkbthe major downside to this is that opendev quay users would need to include the api_token. I suppose I can skip if that isn't defined either19:05
corvusclarkb: ah missed that message, but i like the idea.  also, checking for the api_token is probably a good idea; i don't think we want to *require* the role :)19:06
clarkbok, I can work on that and that will result in changes to 881285 (and a new change)19:06
clarkbI did test ensure-quay-repo this morning and it worked for me locally outside of zuul19:06
clarkbThe other thing to keep in mind for opendev image updates is that we should resync the images from docker hub to quay.io just before we update the zuul jobs to build he image. That way we have the latest and greatest on both sides before the pulication method changes19:07
clarkbAnd I don't think we should try to do all images at once. It will be too much churn and potential broken stuff to debug. We should go an image or two at a time and ensure that we're deploying the image from its new home afterwards19:08
ianw++19:08
clarkbI'll try to pick this back up again later today. I think it is realistic that at least some of our images are published to quay.io automatically before our next meeting19:08
clarkbany other questions/concerns/comments on this effort?19:09
clarkb#topic Bastion Host Updates19:10
clarkbI think the backups changes still need reviews? Is any other infra root willing/able to look at those? I think there is value here but also getting it wrong is potnetialy dangerous so review is valuable19:11
ianwyeah it only makes sense if we have enough people who want to hold bits of it19:11
fungii can, keep meaning to19:12
clarkbthanks.19:12
fungiwhat's the review topic again?19:12
clarkb#link https://review.opendev.org/q/topic:bridge-backups19:12
fungithanks19:12
clarkbAnything else bastion related?19:13
ianwnot from me19:13
clarkb#topic Mailman 319:14
clarkbfungi: any progress with the held node?19:14
fungisorry, nothing to report yet. i've started playing with a new held node, but spent the past week catching up on urgent things that piled up while i was on vacation19:14
clarkb#topic Gerrit Updates19:15
clarkbbefore more important Gerrit items I want to note we are still running hte most recent proper release :) but gerrit 3.8 seems imminent19:15
clarkb#link https://review.opendev.org/c/openstack/project-config/+/879906 Gerrit also normalized indentation of config files we should consider this to be in sync19:15
clarkbfungi: ^ is a WIP change. We had talked about modifying the normalizer to insert a comment to explain the format to users. Is that the main missing piece before we unwip this change?19:16
fungiyeah, i guess so19:16
ianwi did propose something for that19:17
clarkb#link https://review.opendev.org/c/openstack/project-config/+/880898/ better document gerrit acl config format19:17
ianw#link https://review.opendev.org/c/openstack/project-config/+/88089819:17
fungialso can't recall if i mentioned it already, but i saw something recently in the gerrit upstream matrix channel about changes to how links are done with ui plugins in 3.8, so we'll need to make sure we test our gitea integration19:17
ianwthat i think helps generally.  if something is wrong, it should give you a clue what it is without having to read the source19:17
clarkblooks like the stack needs a rebase too?19:18
ianwoh something might have merged, i can rebase19:18
clarkbthanks i think we can remove the wip from the parent change at that point too19:18
clarkbthen it is just a matter of reviewing and applying the changes19:18
fungiit'll need a rebase for every single change that touches or adds an acl, so we probably need to just make a decision to merge, then rebase, then review while other changes are on hold19:18
clarkbfungi: yes, that should be testable from our test instances since the links will sned you to valid opendev.org links19:19
clarkbfungi: something worth checking when we do upgrade testing though19:19
ianwfungi: you're also still wip on it19:20
ianwoh sorry, clarkb mentioned that19:20
clarkbThe other Gerrit thing to call out is cleaning up 3.6 images and updating our upgrade job19:20
fungiianw: yeah, i had proposed it as a straw man seeking folks to convince me the benefits outweighed the negatives19:20
clarkbI put this on the list of things that new contributors could do, but I think we should probably tackle this sooner than later. I can probably poke at this later this week19:20
fungii guess i'm convinced enough to un-wip since others see that balance tipping toward benefit19:21
ianw++ i might have some time too19:21
clarkbfungi: ya having cleaner diffs in the future is my motivation and that seems worthwhile to me19:21
clarkbThe last Gerrit item I have is the replication plugin leaking all over the review_site/data dir19:22
clarkb#link https://review.opendev.org/c/opendev/system-config/+/880672 Dealing with leaked replication tasks on disk19:22
clarkbThis is where I started digging into the problem and while that will help a bit it won't fully solve the issue19:22
clarkbI ended up reading a fair bit of the plugin code and that resulted in two upstream bugs19:23
clarkb#link https://bugs.chromium.org/p/gerrit/issues/detail?id=1686719:23
clarkbThis first issue is related to trying to replicate All-Projects and All-Users. I'm pretty sure that what happens here is the replication plugin checks if it should replicate those repos after creating the waiting/ file on disk, but then short circuits when it shouldn't replicate without sending the finished event which cleans things up19:24
clarkb#link https://bugs.chromium.org/p/gerrit/issues/detail?id=1686819:24
clarkbThis second one is related to the plugin filtering out refs that it shouldn't replicate on changes it would otherwise replicate. For example you update a change via the web ui ths updates three refs. Two of the three should be replicated but not the third. I think this creates a new file on disk that gets properly managed and deleted orphaning the original with all three refs19:25
clarkbI think this siutation is also related to the massive number of errors we get on startup. In those cases I suspect that we are actually filtering all refs for replication and it confuses the state machine19:26
clarkbMy workaround change addresses the first issue and some ofthe second issue.19:26
clarkbI think it is still worth landing to reduce mental overhead but it likely won't completely solve the noise problem on startup.19:27
clarkbI am hoping someone upstream can chime in on whether or not my ideas for fixing these problems are sound but I haven't seen any responses yet19:27
clarkbAnd after doing all that I'm reasonably sure that deleting the files is completelyfine and that the only harm here is the leaking on disk19:28
ianwi'm guessing there's not a ton of activity on that plugin ... you might be the expert now :)19:28
clarkbya if I can find some time to set up a test instance (I'll likely hold one launched by zuul) and recompile etc I will try to fix it19:28
clarkbjust lots of other stuff going on now that I'm really certain this is almost harmless :)19:29
fungiand also the people who do have an interest in it are mostly interested in privileged gerrit-to-gerrit replication19:29
clarkbfungi: yup that was something I realized in debugging this. Seems like very few people use the plugin with the restrictions we have19:29
clarkbThat was all I had for Gerrit. Anything else?19:29
clarkb#topic Upgrading Servers19:31
clarkbEtherpad is now running on the new server. THe move went well and it stuck to the expected timeline19:31
clarkbWe have since removed etherpad01 from our inventory and dns. I shutdown the server via the nova api too.19:31
clarkbAt this point we're probably comfortable deleting the server and its cinder volume?19:31
fungifine by me, yeah19:32
ianwme too, new server lgtm19:32
ianwand we double checked the backups19:32
fungii've been using it heavily, no problems19:32
clarkbgreat I'll try to get that done (probably tomorrow)19:33
clarkbNameserver replacement is also in progress19:33
clarkbLooks like the change to deploy the new servers (but not use them as resolvers yet) has landed19:33
clarkb#link https://etherpad.opendev.org/p/2023-opendev-dns19:33
ianwyep they should be active today, i'll keep ontop of it19:33
clarkbianw: is there anything else you need at this point?19:34
ianwnope, i'll let people know when we can change registry records19:34
clarkbthanks!19:34
ianwi am assuming gating.dev and zuul* on gandi are corvus?19:35
clarkbI doubt I'll have time to take on any replacements this week, but I'd like to grab another one or two and push on them next week. Maybe jitsi meet or mirrors or something19:35
clarkbianw: I think so19:35
ianwactually zuul-ci.org is different to zuulci.org19:35
fungithe zuulci.org domain is likely the foundation19:35
ianwat least the registrar is19:36
corvusyep19:36
fungiyeah, zuulci.org is using csc global for its registrar, which is who the foundation goes through19:36
fungiany whois for our domains with "Registrar: CSC Corporate Domains, Inc." is pretty much guaranteed to be foundation staff coordinated19:37
clarkbI did give the foundation registrar managers a heads up this was going to happen soon too. They are ready and willing to help us when the time comes19:37
fungihaving a clear list of the domains we want them to update will be key, of course19:38
clarkbfungi: sounds like opendev.org and zuulci.org19:38
clarkbgating.dev and zuul-ci.org are the other two domains hosted by our nameservers and corvus has control of those19:38
clarkbAnything else realted to nameservers or replacing servers generally?19:39
ianwnope19:40
clarkb#topic AFS volume quotas19:40
clarkbGood news! The usage seems to have stabilized?19:41
clarkbit sstill quite high but I don't think it has budged since last week19:41
fungithe universe has changed in your favor, collect three tokens19:41
ianwi hope to get back to wheel cleanup19:41
ianwand fedora upgrades19:41
clarkbsounds good and ya this continues to not be an emergency but I'm trying to keep an eye on it19:42
clarkb#topic Gitea 1.1919:43
clarkbAt this point I'm happy waiting for 1.19.2. One of ianw's bugs has been fixed and will be included in 1.19.2. The other is marked as should be fixed for 1.19.2 but hasn't been fixed yet19:43
clarkbThere are no critical features we are missing. 1.19 was largely cut to add their actions implementation from what I can see19:43
clarkbIf that changes we may accelerate and deploy 1.19.1 before .219:44
clarkb#topic Quo vadis Storyboard19:44
fungias an aside, are we any closer to being concerned about needing to switch to forgejo, or has that teapot stopped tempesting?19:44
clarkb#undo19:44
opendevmeetRemoving item from minutes: #topic Quo vadis Storyboard19:44
clarkbfungi: no, as far as I can tell upstream development is still happening in the open via losely oragnized maintainers that write down their release goals at the beginning of their dev cycles19:44
clarkbNo features have been removed. Licensing has remained the same. etc19:45
fungiany idea if forgejo has hard-forked or is continuing to pull new commits from gitea?19:45
clarkbI think their intention is to avoid a hard fork19:45
fungijust wondering if it's diverging substantially over time, or sitting stagnant19:45
clarkbfomr what I've seen (and I'm totally not an expert) its basically the same code base with a different name19:46
fungithe less it becomes like gitea, the harder it will be to switch if we decide we want to19:46
fungihence my curiosity19:46
fungianyway, i can read up on it, just didn't know if anyone had seen any news on that front19:47
clarkblooking at their release notes they seem to be making intermediate releases between gitea releases but following gitea releases overall19:48
clarkband ya double checking on that is probably reasonable. I just haven't seen anything indicating major differences yet19:48
clarkb#topic Pruning vexxhost backup server19:48
clarkbWe've gotten a few emails about this recently. We are at 92% of capacity. I suspect adding some of these new servers hasn't helped either19:48
clarkbI can probably run the pruning script tomorrow19:49
ianwi think we're about monthly on pruning that, so it probably tracks19:49
clarkband then maybe we check relative disk usage compared to prior prunes19:49
clarkbah ok19:49
ianwi can run it and watch it19:50
clarkbianw: if you like. I'm happ to do it tomorrow too (I just have enough on my plate for today)19:50
ianwit would be interesting to track individually backup size19:51
clarkbAnd with that we've made it to the end of our agenda (actually this last topic was added by me at the last minute)19:51
fungiyeah, i can also do it in the background tomorrow probably. i guess whoever gets to it first just let everyone else know ;)19:51
clarkb++19:51
clarkb#topic Open Discussion19:51
clarkbAnything else?19:51
fricklerdid you skip sb intentionally? not much to say probably anyway19:52
frickleralso I'll be away for 3 weeks starting thursday19:52
clarkbfrickler: oh you know whta I undid it and the nforgot it19:52
clarkbthats on me19:52
ianw(backup prune running in a screen now)19:52
clarkbfrickler: fungi: re sb I did want to aks about recent discussion around that. Is that mostly around the logistics of reflecting changes in projects within our project-config repo?19:53
fungiianw: noted, thanks!19:53
fricklerclarkb: the latest was just about removing it for retired projects19:54
fungiclarkb: sb's database schema supports marking projects as "inactive" in order to not present them as autocompletion options and stuff, so i've been trying to remember to set that when projects retire, but should to an audit to see if any have been missed19:54
clarkbgot it. I guess it is a good idea to mark both retired projects that way as well as those that move elsewhere19:54
fungialso removing "use-storyboard" and "groups" entries from retired projects in our gerrit/projects.yaml file if there are any19:55
frickleranother short note: the wheel builds are still broken, I think ianw has some patches up for that?19:55
fricklercurrent issue seems to be afs on centos once again19:55
fungion a similar note, there was at least one dib change to fix openeuler image builds, do we need a new dib release for that before we turn them back on, or did it already happen?19:56
ianwi do have something up that will release the builds separately.  i'll try to take a look at the build failures19:56
fricklerfungi: another dib release will be needed iirc19:56
ianwwe would need a dib release and nodepool rebuild.  i can look at that19:56
ianwi don't think we have anything else in the queue for dib19:56
ianwif centos is failing that often means our images are out of sync and have different kernel versions to the headers available from the mirror19:57
frickleroh, there's also the pypi org thing19:59
fricklerdo we want to register at least an opendev org there?19:59
clarkbI'll admit I haven't read it yet.19:59
clarkbThe trusted publisher thing kinda made my grumpy19:59
fungirelated, i registered an "opendev.org" user on pypi ("opendev" was already taken) in case we eventually want to replace the openstackci account there20:00
clarkbwe are officially at time20:01
fungithanks clarkb!20:01
clarkbI'll end the meeting here but feel free to continue discussion in #opendev or on the mailing list20:01
clarkb#endmeeting20:01
opendevmeetMeeting ended Tue Apr 25 20:01:19 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:01
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2023/infra.2023-04-25-19.01.html20:01
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-04-25-19.01.txt20:01
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2023/infra.2023-04-25-19.01.log.html20:01
clarkbthank you everyone!20:01
fricklero/20:02

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!