19:01:06 <clarkb> #startmeeting infra
19:01:06 <opendevmeet> Meeting started Tue Apr 18 19:01:06 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:06 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:06 <opendevmeet> The meeting name has been set to 'infra'
19:01:12 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/246L5WVFVKR4XU6PIQRILQ6Z4PPG6NDZ/ Our Agenda
19:01:19 <clarkb> #topic Announcements
19:02:30 <clarkb> I didn't have any announcements
19:02:35 <clarkb> #topic Topics
19:02:38 <clarkb> We can jump right in
19:02:47 <clarkb> #topic Migrating container images to quay.io
19:03:28 <clarkb> Last week the promotion of container images through the intermediate registry ran successfully against zuul-client (this is the image that we've been using to test changes to jobs/playbooks/roles)
19:04:23 <corvus> i plan on copying over zuul images and updating zuul repos this week
19:04:25 <clarkb> I suspect that we are really close to taking action on this in OpenDev. In particular I expect the next sets of tasks to roughly be: copying existing image data from docker hub to quay, updating our jobs and possibly rebuilding images to test things, figuring out how to auto provision public repos in quay for new images
19:04:53 <clarkb> corvus: were you planning to do the copy of images for opendev as well?
19:05:09 <corvus> was just planning on zuul
19:05:12 <clarkb> corvus: and does the script handle deltas if we were to copy things today then rerun it again quickly before images moved?
19:05:36 <corvus> i think so
19:06:10 <corvus> with a quick change to omit :latest, could also run it again after the move, but probably no point to doing that
19:06:43 <clarkb> ok cool I can look at running it for opendev this week as an initial step with a plan to sync up any deltas as we actually move images on the zuul job side
19:07:02 <ianw> i know we figured out from old blog posts the bits to pre-make a public image on quay, but did we codify that in zuul-jobs yet?
19:07:05 <corvus> i think i shared the latest version of the script (which handles org renames and multi-arch) earlier in #opendev; i don't have the link handy right now though
19:07:30 <clarkb> ianw: I pushed a role for it https://review.opendev.org/c/zuul/zuul-jobs/+/877834 but I'm not sure where we would inject that in the current job setup
19:08:05 <clarkb> ianw: maybe it would be a separate job that runs before the promote job
19:08:10 <clarkb> to decouple things cleanly?
19:08:16 <ianw> ahh right yes i remember that now :)
19:08:27 <corvus> or could be a pre-run playbook for an inherited job
19:08:40 <clarkb> corvus: oh ya that should work too due to the nesting order
19:09:04 <ianw> it could be.  since this requires an API key, that was my thinking that you'd already have to have an api key to use the tag-baesd promotion path anyway
19:09:55 <clarkb> ianw: ya though the creation api token needs very little in terms of permissions so doing it through the intermediate registry with a very limited key may still make sense
19:10:18 <clarkb> I can try to take a look at where to add this later this week too. I basically need to get through the etherpad (and possibly gitea?) stuff then I have a lot more time
19:10:30 <corvus> incidentally, i haven't heard anything more from the quay people about the zuul org.  that's a little disappointing.  :/
19:10:42 <ianw> ++ i'm happy to help out too.  agree we can sort out details later
19:10:51 <clarkb> sounds good.
19:11:34 <corvus> i don't think we need the api creation role for the zuul projects; we don't make new container images very often
19:11:44 <clarkb> Also as a side note it doesn't look like docker hub accidentally did the april 14 doom change (we didn't expect them to but images are updating on docker hub since)
19:12:06 <corvus> i mean, once it shows up and is settled, i'm not opposed to having it there; just that it's not in the critical path for now.
19:12:42 <clarkb> ya we could manually create them in opendev too, but I think we end up adding/removing images often enough that would be annoying
19:12:53 <ianw> no they definitely backtracked on that one
19:12:58 <ianw> docker i mean
19:13:04 <corvus> yep, every python version...etc..
19:13:12 <clarkb> ianw: ya I know. They announced it too. I just wanted to make sure that reality panned out that way and it appears to have done so
19:13:46 <corvus> trust but verify.  also, maybe don't actually trust.
19:14:26 <clarkb> alright anything else on this? Hopefully we've got some exciting updates next week
19:14:53 <corvus> heh i'm hoping for boring updates :)
19:15:20 <ianw> nope -- irrespective of changes upstream, i think we've got something nicer giving us options to point at multiple places
19:15:23 <clarkb> exciting because its done (at least for zuul) not due to any fireworks :)
19:15:33 <clarkb> #topic Bastion Host Updates
19:15:34 <corvus> ++
19:15:57 <clarkb> The only thing I'm aware of here is the multiway encrypted backups stack needing reviews still
19:16:27 <clarkb> Launch node appears to be managing reverse dns in rax now and the openstack command in the venv we install can talk to rax and dns helper output all appear to work now when launching nodes
19:17:01 <ianw> yeah i used it to launch some dns nodes and it finally worked to give me all output :)
19:17:47 <clarkb> any other bridge related items?
19:17:55 <ianw> i even thought, wow, this is close to being something that could be a zuul job ... :)
19:18:34 <fungi> that would certainly be a cool future
19:19:05 <clarkb> #topic Mailman 3
19:19:31 <clarkb> We'll keep moving along. I noticed some activity on the change srelated to mailman 3 vhosting this morning but suspect it is still too early to have much to report?
19:19:44 <fungi> i've got a fresh held lists node from today (104.130.219.137) which includes the changes in 867986 and 867987, and am starting to try out the recommended commands on it this week for django site creation and association in postorius
19:19:49 <fungi> #link https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/I5MLJAESRXQARS3MZHF75YQCBY2OUL6G/ Re: Multi-domain oddities in Hyperkitty and Postorius
19:20:20 <fungi> but yeah, no actual progress to report
19:20:40 <clarkb> hopefully we'll have good news next week
19:20:48 <clarkb> #topic Gerrit Updates
19:21:18 <clarkb> There has been some movement on ACL synchronization to better align our project-config acl files with what is in Gerrit now post 3.7 upgrade migration
19:21:25 <clarkb> #link https://review.opendev.org/c/openstack/project-config/+/880115 Update project-config acls to match post migration acls in Gerrit
19:21:49 <clarkb> ianw: ^ the only reason I haven't +2'd that is you indicated we could land a couple of changes together to correct some of the concerns (post-review in particluar)
19:22:08 <clarkb> I haven't seen that followup change yet, but I think what you've got in that first one is fine as long as we do  have a followup
19:22:20 <clarkb> #link https://review.opendev.org/c/openstack/project-config/+/879906 Gerrit also normalized indentation of config files we should consider this to be in sync
19:22:28 <ianw> ohh yes sorry, i had that up in emacs and got distracted on dns yesterday.  will do
19:22:46 <clarkb> This second change is going to modify every single acl though and should be coorindated with a manual run of mangae-projects I think
19:22:51 <clarkb> (that first one is small enough it should be fine to land)
19:23:22 <clarkb> But the idea here is that since Gerit seems to insist on hardtabbing in the config files we should od that same thing to reduce deltas making it easier to read diffs and understand changes when upgrades ahppen
19:23:45 <clarkb> I think this is less urgent, but something we should eventually get to.
19:24:23 <ianw> yeah, i mean ideally we don't have changes on upgrade that get us out of sync, but if we do, it's easier to look at without also reformatting the whole thing
19:25:09 <fungi> #link https://review.opendev.org/879906 Indent Gerrit ACL options
19:25:21 <fungi> oh that was linked already
19:25:41 <clarkb> ya I think we should get ianw's first update in and then look at the tabs situation
19:25:57 <clarkb> since tabs are less necessary and more painful to get applied cleanly
19:25:59 <fungi> it's still wip because i'm unconvinced we should enforce it, given people already struggle with the current acl normalization checks
19:26:30 <clarkb> and it gives us time to decide if we think it should be enforced. I'm really sad we don't think people can properly add tabs to files :(
19:26:43 <clarkb> I also started on trying to address the leaked replication files on disk
19:26:52 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/880672 Dealing with leaked replication tasks on disk
19:27:00 <clarkb> This is tested now (though somewhat artificially)
19:27:14 <corvus> would it maybe help to add comments to files telling people to use tabs?
19:27:34 <clarkb> corvus: I think fungi's concern is that editors don't always treat the tab button as a hard tab
19:27:39 <corvus> they're not hard to deal with, but it's sometimes unclear whether one should...
19:27:41 <clarkb> which will lead to confusing people when their changes fail
19:27:53 <fungi> oh, there's an idea, insert a boilerplate comment block summarizing the normalization enforcement rules
19:28:22 <clarkb> it certainly can't hurt to have that info available when making edits to those files
19:28:27 <corvus> could also maybe look at editor config lines...
19:28:35 <fungi> though i don't know if gerrit strips comments, it would at least be easier to ignore that block in the diffs
19:28:58 <ianw> it does also give you a diff to correct your mistake
19:29:36 <ianw> (it == the normalization path)
19:29:52 <corvus> oh yeah i was assuming comments work... i don't know.
19:30:04 <fungi> yeah, mainly hoping to avoid more review round-trips
19:30:10 <clarkb> if any of the current files have comments in them in project-config we could check pretty easily
19:30:45 <clarkb> looks like they don't :(
19:31:05 <fungi> none do, no
19:31:08 <clarkb> anyway I think that is he less urgent of the two sides of the acl normalization cleanups so we can tackle that once we're happy with the functional side
19:31:20 <fungi> tangentially, came up in the gerrit matrix/discord today that 3.8 has a new plugin system for code linking so our overrides to swap gitiles out with gitea may need revisiting during the next upgrade
19:32:19 <clarkb> for the replication tasks cleanup workaround I'm fairly certain it is ok for those files to be deleted while gerrit is shutdown because we weren't bind mounting the data dir previously. My change adds a script to the gerrit images that inspects the json content and deletes those that we know leak leaving tasks we want to replicate behind
19:32:40 <clarkb> it is possible that it may also leave other classes of task that can't be replicated behind but there are 11k ish files currnetly and skimming them I've found these three types so far
19:32:54 <clarkb> it will be easier to see any potential fourth type after we deal with these first
19:33:31 <clarkb> Feedback on whether or not this seems like a good approach would be good in addition to reviews for whether or not it does what it says on the tin
19:34:03 <fungi> what is upstream's position on the bug, or has anyone weighed in yet?
19:34:22 <clarkb> I haven't filed a bug on this one yet but probably should
19:34:30 <clarkb> now that I understand it a bit better.
19:34:59 <clarkb> Another user said that changing group perms for the replication targets didn't change it for them though which was my hunch (we replicate as if we are the anonymous user)
19:35:11 <ianw> it tried it and failed, right?  it does seem like it should unlink the file after that
19:35:35 <clarkb> ianw: yes however it also does retries and I suspect that is the problem here
19:36:01 <clarkb> the plugin probably can't tell the difference between failure due to gerrit acls (or some other internal mechanism) saying no and a network failure or temporary rejection from the remote
19:36:18 <ianw> ahh, yeah that sounds very likely
19:36:30 <fungi> i would totally buy that explanation
19:36:37 <clarkb> if I find time I can dig into the plugin implementation itself
19:37:00 <clarkb> in the meantime I suspect that what I've proposed is a safe way to manage the blast radius of leaking files to disk over time and avoiding errors in gerrit's error log at startup
19:37:01 <ianw> but also fixable, luckily clarkb is our honorary on staff Java developer
19:37:05 <clarkb> heh
19:37:19 <clarkb> I'm happy for reviewers to say "this should be fixed upstream we don't want this hacky workaround" too
19:37:44 <ianw> it's only a git revert away from removal though
19:37:49 <clarkb> and hte last gerrit related item I had was a reminder we should clean up the 3.6 image at some point. Add a 3.8 image and update our upgrade job
19:38:12 <ianw> that might be a fun one to test quay creation
19:38:19 <clarkb> I don't think we'llrevert at this point. I'm happy for us to remove the 3.6 image now
19:38:26 <clarkb> oh ya that could be a good one for adding 3.8 maybe
19:39:05 <clarkb> I can followup on this as I dig into the quay stuff more later this week to se if it makes sense in that process somewhere
19:39:10 <clarkb> #topic Upgrading Servers
19:39:32 <clarkb> static.opendev.org and the ~40 somethign other names it hosts are now on a jammy static02 host. static01 is removed and out of dns too
19:40:30 <clarkb> I've got a new etherpad02 server up and running and tested a data migration from etherpad01 to etherpad02. It takes about 30 minutes to dump the db and 30 minutes to restore it plus time to copy the data between hosts and double check you aren't doing something sill. I notified service-announce that there would be a 90 minute outage of etherpad tomorrow at 22:00 UTC to do the
19:40:32 <clarkb> actual move
19:40:45 <clarkb> #link https://paste.opendev.org/show/brRuhPssVLSi4UnF5hcN/ The etherpad move plan.
19:41:08 <clarkb> This is he plan I wrote down based on my local notes of testing the process. I put it in paste and not etherpad because therpad will be shutdown during this process to avoid data in the wrong location
19:41:36 <fungi> that's some serious foresight
19:41:37 <clarkb> Please review thta if you have time before tomorrow at 22:00 UTC it is relatively straightforward but extra eyeballs making ure i Haven't done something silly are appreciated
19:42:02 <clarkb> ianw: has also made progress on replacing nameservers
19:42:14 <clarkb> #link https://etherpad.opendev.org/p/2023-opendev-dns
19:42:31 <ianw> clarkb: plan lgtm.  you could also use "zcat dump.gz | ..." :)
19:42:58 <fungi> yeah, i'm good with what you have there
19:43:01 <clarkb> This etherpad has links to changes. I've reviewed most of those changes and left questions on a couple of them. One of which also failed testing for a valid reason (I left a note indicating what i think is the fix)
19:43:03 <clarkb> thanks!
19:43:16 <ianw> yep thanks for going through that.  and good catch on updating the other zones too
19:43:32 <clarkb> ianw: I also left notes on the etherpad about a few things I noticed were missing. We have 3 other zone files to update and reverse dns for the vexxhost nameserver would ideally be set
19:43:45 <ianw> i'm actually thinking maybe we template in the nameservers, but i'll think about that
19:43:50 <clarkb> ianw: I approved the change to update le testing to jammy as well not sure if that merged or not
19:43:55 <fungi> i'll probably be mia between 18:30 and 20:30 or so, but will definitely be back by 22:00 for the maintenance
19:44:44 <clarkb> Next week I'll probably start looking at jitsi meet or mirror nodes. Say something if you want to help and have a preference for what is left to do
19:45:27 <clarkb> #topic AFS volume utilization
19:45:48 <clarkb> we have crept up to 92.2% from 91.7% since last week
19:46:41 <clarkb> if that growth rate holds we'll have about 15 weeks before there is a problem. Which is about 3 ish months?
19:46:47 <ianw> i still haven't got back to wheel clearouts or f36 (sigh, now f38 is out anyway)
19:47:13 <clarkb> ack I think we have time. But we should probably look into those tasks sooner than later to see where we end up disk wise and make our next decisions from there
19:48:14 <clarkb> #topic Gitea 1.19
19:48:30 <clarkb> I've got a change up to upgrade to gitea 1.19.1
19:48:35 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/877541 Upgrade opendev.org to 1.19.1
19:48:48 <clarkb> however, ianw rightly pointed out that the changes to api interaction are weird and likely unnecessary
19:49:18 <ianw> yeah i just noted a comment on that about removing the auth for more endpoints
19:49:28 <clarkb> digging into this ianw filed two bugs against gitea around previously anonymous apis requiring auth now and improper headers for responses indicating auth is required
19:49:38 <ianw> #link https://github.com/go-gitea/gitea/issues/24159#issuecomment-1513620323
19:49:45 <ianw> oh, and also a pull request now
19:49:50 <clarkb> ianw: yes it looks like the change to add scoped tokens to their api system made far reaching changes to the api that are problematic
19:50:57 <clarkb> thank you for looking into that I was just focused on making it work I didn't even consider they might have problems (in particular we were passing auth creds to the request so it wasn't clear to me that we went from public to private)
19:51:21 <clarkb> Anyway I think we can wait for 1.19.2 to fix this to avoid any potential breakage for users anonymousl talking to our gitea api
19:51:34 <clarkb> (we could scan the request logs for evidence of this if we were in a hurry)
19:51:50 <clarkb> but 1.19.2 should be out soon enough I hope and we don't currently hvae an urgent need to upgrade.
19:52:36 <clarkb> Reivews on that chagne would be helpful though as I expect minimal deltas between now and 1.19.2 when available (just cleanup our api requests to reflect they don't need auth anymore)
19:52:37 <ianw> yeah i agree that's unlikely -- nobody complained yet.  if we want to just go with it that's fine, but i think we should revert the user/pass/auth force when we can so we note any further regressions when it's fixed
19:53:25 <clarkb> #topic Storyboard
19:53:46 <clarkb> have we seen any more requests to mark things RO?
19:54:04 <clarkb> Mostly curious if the moves by some projects have been showing up on our radar yet
19:55:16 <clarkb> I'll take that as a no :)
19:55:19 <clarkb> #topic Open Discussion
19:55:21 <clarkb> Anything else?
19:55:33 <fungi> i have a handful of project moves off storyboard i need to clean up behind
19:55:48 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/880570?usp=dashboard
19:56:02 <fungi> see recent open/merged changes for gerrit/projects.yaml for a list of relevant exodus
19:56:05 <ianw> and a follow-on are a quick one to update links on the main page
19:56:17 <ianw> also i agree that usp= thing is very annoying
19:57:29 <clarkb> I noticed google uses usp in google docs I think it was yesterday
19:57:43 <clarkb> so ya its basically something they appear to have added to gerrit for their purposes but without any open source usage of it
19:59:03 <clarkb> they did at least ocnfirm that it is unused in gerrit 3.7. You would need to write a plugin or something like that to consume the info
19:59:30 <corvus> they interested in disabling it, or are they like "just ignore it"
19:59:47 <clarkb> corvus: they say it has no effect in open source gerrit and you should ignore it
19:59:49 <clarkb> to tl;dr
20:00:11 <ianw> except everyone knows how i copied the review link
20:00:29 <ianw> which, i admit, i don't really care about, but, why do you need to know
20:00:57 <clarkb> right it betrays info of the context where you copied links (email, dashboards, related changes, etc)
20:01:16 <clarkb> we are at time. Thank you everyone! we'll be back next week same time and location
20:01:36 <clarkb> feel free to pick up or continue conversation in #opendev or the mailing list if we want to continue to discuss any of these items
20:01:38 <clarkb> #endmeeting