Tuesday, 2023-04-18

fungiahoy!19:00
clarkbHello19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Apr 18 19:01:06 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/246L5WVFVKR4XU6PIQRILQ6Z4PPG6NDZ/ Our Agenda19:01
clarkb#topic Announcements19:01
clarkbI didn't have any announcements19:02
clarkb#topic Topics19:02
clarkbWe can jump right in19:02
clarkb#topic Migrating container images to quay.io19:02
clarkbLast week the promotion of container images through the intermediate registry ran successfully against zuul-client (this is the image that we've been using to test changes to jobs/playbooks/roles)19:03
corvusi plan on copying over zuul images and updating zuul repos this week19:04
clarkbI suspect that we are really close to taking action on this in OpenDev. In particular I expect the next sets of tasks to roughly be: copying existing image data from docker hub to quay, updating our jobs and possibly rebuilding images to test things, figuring out how to auto provision public repos in quay for new images19:04
clarkbcorvus: were you planning to do the copy of images for opendev as well?19:04
corvuswas just planning on zuul19:05
clarkbcorvus: and does the script handle deltas if we were to copy things today then rerun it again quickly before images moved?19:05
corvusi think so19:05
corvuswith a quick change to omit :latest, could also run it again after the move, but probably no point to doing that19:06
clarkbok cool I can look at running it for opendev this week as an initial step with a plan to sync up any deltas as we actually move images on the zuul job side19:06
ianwi know we figured out from old blog posts the bits to pre-make a public image on quay, but did we codify that in zuul-jobs yet?19:07
corvusi think i shared the latest version of the script (which handles org renames and multi-arch) earlier in #opendev; i don't have the link handy right now though19:07
clarkbianw: I pushed a role for it https://review.opendev.org/c/zuul/zuul-jobs/+/877834 but I'm not sure where we would inject that in the current job setup19:07
clarkbianw: maybe it would be a separate job that runs before the promote job19:08
clarkbto decouple things cleanly?19:08
ianwahh right yes i remember that now :)19:08
corvusor could be a pre-run playbook for an inherited job19:08
clarkbcorvus: oh ya that should work too due to the nesting order19:08
ianwit could be.  since this requires an API key, that was my thinking that you'd already have to have an api key to use the tag-baesd promotion path anyway19:09
clarkbianw: ya though the creation api token needs very little in terms of permissions so doing it through the intermediate registry with a very limited key may still make sense19:09
clarkbI can try to take a look at where to add this later this week too. I basically need to get through the etherpad (and possibly gitea?) stuff then I have a lot more time19:10
corvusincidentally, i haven't heard anything more from the quay people about the zuul org.  that's a little disappointing.  :/19:10
ianw++ i'm happy to help out too.  agree we can sort out details later19:10
clarkbsounds good.19:10
corvusi don't think we need the api creation role for the zuul projects; we don't make new container images very often19:11
clarkbAlso as a side note it doesn't look like docker hub accidentally did the april 14 doom change (we didn't expect them to but images are updating on docker hub since)19:11
corvusi mean, once it shows up and is settled, i'm not opposed to having it there; just that it's not in the critical path for now.19:12
clarkbya we could manually create them in opendev too, but I think we end up adding/removing images often enough that would be annoying19:12
ianwno they definitely backtracked on that one19:12
ianwdocker i mean19:12
corvusyep, every python version...etc..19:13
clarkbianw: ya I know. They announced it too. I just wanted to make sure that reality panned out that way and it appears to have done so19:13
corvustrust but verify.  also, maybe don't actually trust.19:13
clarkbalright anything else on this? Hopefully we've got some exciting updates next week19:14
corvusheh i'm hoping for boring updates :)19:14
ianwnope -- irrespective of changes upstream, i think we've got something nicer giving us options to point at multiple places19:15
clarkbexciting because its done (at least for zuul) not due to any fireworks :)19:15
clarkb#topic Bastion Host Updates19:15
corvus++19:15
clarkbThe only thing I'm aware of here is the multiway encrypted backups stack needing reviews still19:15
clarkbLaunch node appears to be managing reverse dns in rax now and the openstack command in the venv we install can talk to rax and dns helper output all appear to work now when launching nodes19:16
ianwyeah i used it to launch some dns nodes and it finally worked to give me all output :)19:17
clarkbany other bridge related items?19:17
ianwi even thought, wow, this is close to being something that could be a zuul job ... :)19:17
fungithat would certainly be a cool future19:18
clarkb#topic Mailman 319:19
clarkbWe'll keep moving along. I noticed some activity on the change srelated to mailman 3 vhosting this morning but suspect it is still too early to have much to report?19:19
fungii've got a fresh held lists node from today (104.130.219.137) which includes the changes in 867986 and 867987, and am starting to try out the recommended commands on it this week for django site creation and association in postorius19:19
fungi#link https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/I5MLJAESRXQARS3MZHF75YQCBY2OUL6G/ Re: Multi-domain oddities in Hyperkitty and Postorius19:19
fungibut yeah, no actual progress to report19:20
clarkbhopefully we'll have good news next week19:20
clarkb#topic Gerrit Updates19:20
clarkbThere has been some movement on ACL synchronization to better align our project-config acl files with what is in Gerrit now post 3.7 upgrade migration19:21
clarkb#link https://review.opendev.org/c/openstack/project-config/+/880115 Update project-config acls to match post migration acls in Gerrit19:21
clarkbianw: ^ the only reason I haven't +2'd that is you indicated we could land a couple of changes together to correct some of the concerns (post-review in particluar)19:21
clarkbI haven't seen that followup change yet, but I think what you've got in that first one is fine as long as we do  have a followup19:22
clarkb#link https://review.opendev.org/c/openstack/project-config/+/879906 Gerrit also normalized indentation of config files we should consider this to be in sync19:22
ianwohh yes sorry, i had that up in emacs and got distracted on dns yesterday.  will do19:22
clarkbThis second change is going to modify every single acl though and should be coorindated with a manual run of mangae-projects I think19:22
clarkb(that first one is small enough it should be fine to land)19:22
clarkbBut the idea here is that since Gerit seems to insist on hardtabbing in the config files we should od that same thing to reduce deltas making it easier to read diffs and understand changes when upgrades ahppen19:23
clarkbI think this is less urgent, but something we should eventually get to.19:23
ianwyeah, i mean ideally we don't have changes on upgrade that get us out of sync, but if we do, it's easier to look at without also reformatting the whole thing19:24
fungi#link https://review.opendev.org/879906 Indent Gerrit ACL options19:25
fungioh that was linked already19:25
clarkbya I think we should get ianw's first update in and then look at the tabs situation19:25
clarkbsince tabs are less necessary and more painful to get applied cleanly19:25
fungiit's still wip because i'm unconvinced we should enforce it, given people already struggle with the current acl normalization checks19:25
clarkband it gives us time to decide if we think it should be enforced. I'm really sad we don't think people can properly add tabs to files :(19:26
clarkbI also started on trying to address the leaked replication files on disk19:26
clarkb#link https://review.opendev.org/c/opendev/system-config/+/880672 Dealing with leaked replication tasks on disk19:26
clarkbThis is tested now (though somewhat artificially)19:27
corvuswould it maybe help to add comments to files telling people to use tabs?19:27
clarkbcorvus: I think fungi's concern is that editors don't always treat the tab button as a hard tab19:27
corvusthey're not hard to deal with, but it's sometimes unclear whether one should...19:27
clarkbwhich will lead to confusing people when their changes fail19:27
fungioh, there's an idea, insert a boilerplate comment block summarizing the normalization enforcement rules19:27
clarkbit certainly can't hurt to have that info available when making edits to those files19:28
corvuscould also maybe look at editor config lines...19:28
fungithough i don't know if gerrit strips comments, it would at least be easier to ignore that block in the diffs19:28
ianwit does also give you a diff to correct your mistake19:28
ianw(it == the normalization path)19:29
corvusoh yeah i was assuming comments work... i don't know.19:29
fungiyeah, mainly hoping to avoid more review round-trips19:30
clarkbif any of the current files have comments in them in project-config we could check pretty easily19:30
clarkblooks like they don't :(19:30
funginone do, no19:31
clarkbanyway I think that is he less urgent of the two sides of the acl normalization cleanups so we can tackle that once we're happy with the functional side19:31
fungitangentially, came up in the gerrit matrix/discord today that 3.8 has a new plugin system for code linking so our overrides to swap gitiles out with gitea may need revisiting during the next upgrade19:31
clarkbfor the replication tasks cleanup workaround I'm fairly certain it is ok for those files to be deleted while gerrit is shutdown because we weren't bind mounting the data dir previously. My change adds a script to the gerrit images that inspects the json content and deletes those that we know leak leaving tasks we want to replicate behind19:32
clarkbit is possible that it may also leave other classes of task that can't be replicated behind but there are 11k ish files currnetly and skimming them I've found these three types so far19:32
clarkbit will be easier to see any potential fourth type after we deal with these first19:32
clarkbFeedback on whether or not this seems like a good approach would be good in addition to reviews for whether or not it does what it says on the tin19:33
fungiwhat is upstream's position on the bug, or has anyone weighed in yet?19:34
clarkbI haven't filed a bug on this one yet but probably should19:34
clarkbnow that I understand it a bit better.19:34
clarkbAnother user said that changing group perms for the replication targets didn't change it for them though which was my hunch (we replicate as if we are the anonymous user)19:34
ianwit tried it and failed, right?  it does seem like it should unlink the file after that19:35
clarkbianw: yes however it also does retries and I suspect that is the problem here19:35
clarkbthe plugin probably can't tell the difference between failure due to gerrit acls (or some other internal mechanism) saying no and a network failure or temporary rejection from the remote19:36
ianwahh, yeah that sounds very likely19:36
fungii would totally buy that explanation19:36
clarkbif I find time I can dig into the plugin implementation itself19:36
clarkbin the meantime I suspect that what I've proposed is a safe way to manage the blast radius of leaking files to disk over time and avoiding errors in gerrit's error log at startup19:37
ianwbut also fixable, luckily clarkb is our honorary on staff Java developer19:37
clarkbheh19:37
clarkbI'm happy for reviewers to say "this should be fixed upstream we don't want this hacky workaround" too19:37
ianwit's only a git revert away from removal though19:37
clarkband hte last gerrit related item I had was a reminder we should clean up the 3.6 image at some point. Add a 3.8 image and update our upgrade job19:37
ianwthat might be a fun one to test quay creation19:38
clarkbI don't think we'llrevert at this point. I'm happy for us to remove the 3.6 image now19:38
clarkboh ya that could be a good one for adding 3.8 maybe19:38
clarkbI can followup on this as I dig into the quay stuff more later this week to se if it makes sense in that process somewhere19:39
clarkb#topic Upgrading Servers19:39
clarkbstatic.opendev.org and the ~40 somethign other names it hosts are now on a jammy static02 host. static01 is removed and out of dns too19:39
clarkbI've got a new etherpad02 server up and running and tested a data migration from etherpad01 to etherpad02. It takes about 30 minutes to dump the db and 30 minutes to restore it plus time to copy the data between hosts and double check you aren't doing something sill. I notified service-announce that there would be a 90 minute outage of etherpad tomorrow at 22:00 UTC to do the19:40
clarkbactual move19:40
clarkb#link https://paste.opendev.org/show/brRuhPssVLSi4UnF5hcN/ The etherpad move plan.19:40
clarkbThis is he plan I wrote down based on my local notes of testing the process. I put it in paste and not etherpad because therpad will be shutdown during this process to avoid data in the wrong location19:41
fungithat's some serious foresight19:41
clarkbPlease review thta if you have time before tomorrow at 22:00 UTC it is relatively straightforward but extra eyeballs making ure i Haven't done something silly are appreciated19:41
clarkbianw: has also made progress on replacing nameservers19:42
clarkb#link https://etherpad.opendev.org/p/2023-opendev-dns19:42
ianwclarkb: plan lgtm.  you could also use "zcat dump.gz | ..." :)19:42
fungiyeah, i'm good with what you have there19:42
clarkbThis etherpad has links to changes. I've reviewed most of those changes and left questions on a couple of them. One of which also failed testing for a valid reason (I left a note indicating what i think is the fix)19:43
clarkbthanks!19:43
ianwyep thanks for going through that.  and good catch on updating the other zones too19:43
clarkbianw: I also left notes on the etherpad about a few things I noticed were missing. We have 3 other zone files to update and reverse dns for the vexxhost nameserver would ideally be set19:43
ianwi'm actually thinking maybe we template in the nameservers, but i'll think about that19:43
clarkbianw: I approved the change to update le testing to jammy as well not sure if that merged or not19:43
fungii'll probably be mia between 18:30 and 20:30 or so, but will definitely be back by 22:00 for the maintenance19:43
clarkbNext week I'll probably start looking at jitsi meet or mirror nodes. Say something if you want to help and have a preference for what is left to do19:44
clarkb#topic AFS volume utilization19:45
clarkbwe have crept up to 92.2% from 91.7% since last week19:45
clarkbif that growth rate holds we'll have about 15 weeks before there is a problem. Which is about 3 ish months?19:46
ianwi still haven't got back to wheel clearouts or f36 (sigh, now f38 is out anyway)19:46
clarkback I think we have time. But we should probably look into those tasks sooner than later to see where we end up disk wise and make our next decisions from there19:47
clarkb#topic Gitea 1.1919:48
clarkbI've got a change up to upgrade to gitea 1.19.119:48
clarkb#link https://review.opendev.org/c/opendev/system-config/+/877541 Upgrade opendev.org to 1.19.119:48
clarkbhowever, ianw rightly pointed out that the changes to api interaction are weird and likely unnecessary19:48
ianwyeah i just noted a comment on that about removing the auth for more endpoints19:49
clarkbdigging into this ianw filed two bugs against gitea around previously anonymous apis requiring auth now and improper headers for responses indicating auth is required19:49
ianw#link https://github.com/go-gitea/gitea/issues/24159#issuecomment-151362032319:49
ianwoh, and also a pull request now19:49
clarkbianw: yes it looks like the change to add scoped tokens to their api system made far reaching changes to the api that are problematic19:49
clarkbthank you for looking into that I was just focused on making it work I didn't even consider they might have problems (in particular we were passing auth creds to the request so it wasn't clear to me that we went from public to private)19:50
clarkbAnyway I think we can wait for 1.19.2 to fix this to avoid any potential breakage for users anonymousl talking to our gitea api19:51
clarkb(we could scan the request logs for evidence of this if we were in a hurry)19:51
clarkbbut 1.19.2 should be out soon enough I hope and we don't currently hvae an urgent need to upgrade.19:51
clarkbReivews on that chagne would be helpful though as I expect minimal deltas between now and 1.19.2 when available (just cleanup our api requests to reflect they don't need auth anymore)19:52
ianwyeah i agree that's unlikely -- nobody complained yet.  if we want to just go with it that's fine, but i think we should revert the user/pass/auth force when we can so we note any further regressions when it's fixed19:52
clarkb#topic Storyboard19:53
clarkbhave we seen any more requests to mark things RO?19:53
clarkbMostly curious if the moves by some projects have been showing up on our radar yet19:54
clarkbI'll take that as a no :)19:55
clarkb#topic Open Discussion19:55
clarkbAnything else?19:55
fungii have a handful of project moves off storyboard i need to clean up behind19:55
ianw#link https://review.opendev.org/c/opendev/system-config/+/880570?usp=dashboard19:55
fungisee recent open/merged changes for gerrit/projects.yaml for a list of relevant exodus19:56
ianwand a follow-on are a quick one to update links on the main page19:56
ianwalso i agree that usp= thing is very annoying19:56
clarkbI noticed google uses usp in google docs I think it was yesterday19:57
clarkbso ya its basically something they appear to have added to gerrit for their purposes but without any open source usage of it19:57
clarkbthey did at least ocnfirm that it is unused in gerrit 3.7. You would need to write a plugin or something like that to consume the info19:59
corvusthey interested in disabling it, or are they like "just ignore it"19:59
clarkbcorvus: they say it has no effect in open source gerrit and you should ignore it19:59
clarkbto tl;dr19:59
ianwexcept everyone knows how i copied the review link20:00
ianwwhich, i admit, i don't really care about, but, why do you need to know20:00
clarkbright it betrays info of the context where you copied links (email, dashboards, related changes, etc)20:00
clarkbwe are at time. Thank you everyone! we'll be back next week same time and location20:01
clarkbfeel free to pick up or continue conversation in #opendev or the mailing list if we want to continue to discuss any of these items20:01
clarkb#endmeeting20:01
opendevmeetMeeting ended Tue Apr 18 20:01:38 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:01
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2023/infra.2023-04-18-19.01.html20:01
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-04-18-19.01.txt20:01
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2023/infra.2023-04-18-19.01.log.html20:01

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!