#opendev-meeting log

19:01:25 <clarkb> #startmeeting infra
19:01:25 <opendevmeet> Meeting started Tue Jul 11 19:01:25 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:25 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:25 <opendevmeet> The meeting name has been set to 'infra'
19:01:49 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/FV2S3YE62K34SWSZRQNISEERZU3IR5A7/ Our Agenda
19:02:15 <clarkb> #topic Announcements
19:02:25 <clarkb> I did make it to UTC+11
19:03:14 <clarkb> I'm finding that the best time to sit at a computer is something like 01:00/02:00 UTC and later simply due to weather. But we'll see as I get more settled this is only day 5 or something there
19:04:36 <clarkb> #topic Topics
19:04:43 <clarkb> #topic Bastion Host Updates
19:04:56 <clarkb> #link https://review.opendev.org/q/topic:bridge-backups
19:05:12 <clarkb> Looks like this set of changes from ianw could still use some infra root review
19:05:42 <clarkb> if we can get that review done we can plan the sharing of the individual key portions
19:06:17 <clarkb> #topic Mailman 3
19:06:32 <clarkb> fungi: any updates on the vhosting? then we can talk about the http 429 error emails
19:06:38 <fungi> no new progress, though a couple of things to bring up yeah
19:06:59 <clarkb> go for it on the new things
19:07:03 <fungi> the first you mentioned, i'm looking to see if there's a way to create fallback error page templates for django
19:07:19 <fungi> but perhaps someone more familiar with django knows?
19:07:35 <fungi> i know we can create specific error page templates for each status
19:08:12 <fungi> so we could create a 429 error page template, but what i'm unsure about is if there's a way to have an error page template that applies to any error response which doesn't have its own separate template
19:08:49 <fungi> i think i recall tonyb mentioning some familiarity with django so i might pick his brain later if so
19:09:06 <clarkb> I'm unsure myself
19:09:14 <fungi> assuming my web searches and documentation digging turn up little of value
19:09:23 <clarkb> a default would be nice if possible but I suspect adding a 429 file would be a big improement alone
19:09:53 <tonyb> I don't think it was me
19:10:01 <fungi> oh too bad
19:10:50 <tonyb> sorry I'll do better :P
19:10:55 <fungi> the other item is we've had a couple of (mild) spam incidents on the rust-vmm ml, similar to what hit zuul-discuss a few months back. for now it's just been one address i initially unsubscribed and then they resubscribed and sent more, so after the second time i switched the default moderation policy for their address to discard instead of unsubscribing them
19:11:58 <fungi> but still might consider switching the default moderation policy for all users on that list to moderate and then individually updating them to accept after they send good messages
19:12:17 <fungi> that is if the problem continues
19:12:38 <clarkb> I'm good with that but ideally if we can find a moderator in that community to do the filtering
19:12:50 <clarkb> I'm not sure we should be filtering for random lists like that.
19:13:44 <fungi> well, yes i stepped in as a moderator since i was already subscribed and the only current community moderator had gone on sabbatical, but we found another volunteer to take over now
19:14:03 <clarkb> great
19:14:06 <fungi> my concern is it seems like the killer feature of mm3, the ability for people to post via http, increases the spam risk as well
19:14:50 <fungi> which is going to mean a potentially increased amount of work for list moderators
19:15:13 <clarkb> Though two? incidents in ~6 months ins't too bad
19:16:08 <fungi> yeah, basically
19:16:21 <fungi> but these are also very low-volume and fairly low-profile lists
19:16:39 <fungi> so i don't know how that may translate to some of the more established lists once they get migrated
19:16:48 <fungi> something to keep an eye out for
19:16:55 <clarkb> there is probably only one way to find out unfortunately
19:17:00 <fungi> agreed
19:17:05 <fungi> anyway, that's all i had on this topic
19:17:15 <clarkb> #topic Gerrit Updates
19:17:47 <clarkb> We are still building a Gerrit 3.8 RC image. This is only used for testing the 3.7 to 3.8 upgrade as well as genereal gerrit tests on the 3.8 version but it would be good to fix that
19:17:59 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/885317?usp=dashboard Build final 3.8.0 release images
19:18:15 <clarkb> Additionally the Gerrit replication tasks stuff is still ongoing
19:18:42 <clarkb> I think my recommendation at this point is that we revert the bind mount for the task data so that when we periodically update our gerrit image and replace the gerrit contianer those files get automatically cleaned up
19:18:55 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/884779?usp=dashboard Stop bind mounting replication task file location
19:19:32 <clarkb> If we can get reviews on one or both of those then we can coordinate the moves on the server itself to ensure we're using the latest image and also cleaning up the leaked files etc
19:19:46 <fungi> what's the impact to container restarts?
19:19:59 <fungi> if we down/up the gerrit container, do we lose queued replication events?
19:20:27 <clarkb> fungi: yes. This was the case until very recently when I swapped out the giteas though so we were living with that for a while already
19:21:08 <clarkb> The tradeoff here is that having many leaked files on disk is potentially problematic when that number gets large neough. Also these bad replication tasks produce errors on gerrit startup that flood the logs
19:21:26 <clarkb> we'd be trading better replication resiliency for better service resiliency I think
19:22:53 <clarkb> fungi: that said having anothe rset of eyes look over the situation may produce additional ideas. The alternative I've got is the gerrit container startup script updates that try to clean up the leaked file sfor us. I don't think the script will clear all the file scurrently but having a smaller set to look at will help identify the additional ones
19:23:39 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/880672 Clear leaked replication tasks at gerrit startup using a script
19:24:24 <clarkb> I'm happy to continue down that path as well, its jus thte most risky and effort needed option
19:24:26 <fungi> thanks, makes sense
19:24:37 <clarkb> risky because we are automating file deletions
19:25:03 <clarkb> for a todo here maybe fungi can take a look this week and next week we can pick an option and proceed from there?
19:25:47 <clarkb> The other Gerrit item is disallowing implicit merges across branches in our All-Projects ACL
19:26:04 <clarkb> I can't think of any reason to not do this and I don't recall any objections to this in prior meetings where this was discussed
19:26:15 <fungi> yeah, i should be able to
19:26:28 <clarkb> receive.rejectImplicitMerges is the config option to reject those when set to true
19:26:32 <fungi> did i propose a change for that? i can't even remember now
19:26:48 <clarkb> fungi: I don't think so since it has to be done directly in All-Project sthen simply recorded in our docs?
19:26:55 <clarkb> there may be a chngae to do the recording bit /me looks
19:27:19 <clarkb> https://review.opendev.org/c/opendev/system-config/+/885318
19:27:39 <clarkb> so ya if you have time to push that All-Projects update I think you can +A the change to record it in our docs
19:28:39 <fungi> oh, cool
19:28:45 <fungi> i guess someone did propose that
19:29:23 <clarkb> that was all I had for gerrit. Anything else befor ewe move on?
19:31:04 <clarkb> #topic Server Upgrades
19:31:17 <clarkb> I'm not aware of any changes here since we last met
19:31:44 <clarkb> tonyb helped push the insecure ci registry upgrade through. I may still need to delete the old server I can't recal lif I did that right now
19:32:09 <corvus> ze04-ze06 upgraded to jammy today
19:32:23 <clarkb> I think tonyb is looking at other things now in order to diversify the bootstrapping process as an OpenDev contributor so I'll try to look at some of the remaining stragglers myself as I have time
19:32:26 <clarkb> corvus: excellent
19:32:44 <tonyb> the cleanup is removing it (01) from the inventory and then infra-root deleting the 01 vm?
19:32:59 <clarkb> tonyb: correct
19:33:12 <tonyb> Okay
19:34:11 <clarkb> #topic Fedora Cleanup
19:34:38 <clarkb> tonyb: I've lost track o fwhere we were in the mirror configuration stuff. Are there changes you need reivewing on or input on diretion?
19:35:21 <tonyb> I need to update the mirror setup witrh the new mirrorinfo variable
19:35:59 <clarkb> tonyb: is that something where some dedicated time to work through it would be helpful? if so we can probably sort that out with newly overlapping timezones
19:36:44 <tonyb> Yeah, that's a good idea.  I understand the concept of what needs to happen but I'm in danger or overthiking it
19:37:04 <clarkb> ok lets sync up when it isn't first thing in the morning for both of us and take it from there
19:37:24 <tonyb> great
19:37:28 <clarkb> #topic Quo vadis Storyboard
19:38:16 <fungi> i think i switched a neutron deliverable repo over to inactive and updated its description to point to lp last week? openstack/networking-odl
19:38:18 <clarkb> One thing I noticed the other day is that some projects like starlingx are still createing subprojects in storyboard. We haven't told them to stop and I'm not sure we should, but they were confused that it seems to take some time to do that creation. I think we are only creating new storyboard projects once a day
19:38:56 <clarkb> At this point I'm not sure there is much benefit in having project-config updates trigger the storyboard job more quickly
19:39:16 <clarkb> But it was a thing people noticed so I'm mentioning it here
19:39:35 <fungi> there was also some discussion about sb in the #openstack-sdks channel, in particular a user was surprised to discover that an unescaped <script> tag in a story description resulted in truncating the text once displayed. fairly easy to avoid, but turned into some conversation about why getting a fix for such things implemented would be tough with out current deployment
19:39:56 <tonyb> Can we run the create, I assume via cron, twice (#gasp) a day?
19:40:35 <clarkb> tonyb: I believe the job that does it is infra-prod-remote-puppet-else
19:40:43 <clarkb> which at this point should mostly just be storyboard?
19:40:53 <clarkb> we have removed the vast majority of any remaining puppet
19:40:56 <tonyb> Ahhh okay
19:41:27 <clarkb> so ya we basically run that job more often or when necessary to decrease the wait time
19:42:51 <clarkb> anything else storyboard related?
19:43:18 <fungi> i got nothin'
19:43:27 <clarkb> #topic Gitea Upgrades
19:43:39 <clarkb> Gitea 1.19.4 exists and fungi has pushed an update to upgrade us
19:43:51 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/887734?usp=dashboard Upgrade Gitea to 1.19.4
19:44:06 <fungi> seems to be fairly minor for our purposes, fixes mostly to stuff we disable anyway
19:44:10 <clarkb> These bugfix point upgrades tend to be pretty safe and straightforward though this one has a small template update
19:44:15 <fungi> so also probably not urgent
19:44:58 <clarkb> The other gitea upgrade to think about is the 1.20 update. They only have RC releases so far and no changelog so also not urgent
19:45:08 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/886993?usp=dashboard Begin process to upgrade gitea to 1.20
19:45:18 <clarkb> The 1.20 upgrade should happen after we upgrade to 1.19.latest
19:46:08 <clarkb> fungi: I think I may be able to be around with overlap in your timezone early tomorrow morning for me later afternoon for you if we want to land that 1.19.4 chnage an dmonitor
19:46:19 <clarkb> I'll ping you tomorrow if I manage that and we can take it from there?
19:46:44 <fungi> yeah, sure that works
19:47:01 <fungi> i should be at the keyboard by 1200z
19:47:04 <clarkb> I don't expect trouble but good to have people around if necessary
19:47:12 <clarkb> 1900 is probbaly about as eraly as I can manage :)
19:47:27 <clarkb> though maybe my evening overlaps with 1200 I need to math that out
19:47:39 <fungi> oh, you said early in your timezone not the other way around
19:47:39 <clarkb> #topic Etherpad Upgrade
19:47:45 <clarkb> fungi: ya
19:48:07 <clarkb> After a long release drought Etherpad made a 1.9.1 release
19:48:34 <fungi> i assume the commit we've been running on is included in that release
19:48:36 <clarkb> At first the tagged sha didn't actually build nad I was forced to use a commit that fixed the build issues after the reelase. But I think they updated/replaced the tag and now it seems to work
19:48:46 <fungi> aha, cool
19:48:48 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/887006?usp=dashboard Etherpad 1.9.1
19:49:01 <clarkb> I updated that chnage to use the tag again and it did not fial
19:49:30 <clarkb> Now there is an issue where numbered lists don't properly increment the list number values so every entry is 1. basically making it a weird bulleted list
19:49:39 <tonyb> Going back to gitea (sorry), is there any merrit to landing the bullseye -> bookworm update in the same window?
19:50:06 <clarkb> tonyb: no I think we should decouple those if we can. Basically swap gitea to the new debian with a fixed gitea version
19:50:13 <tonyb> okay
19:50:33 <clarkb> tonyb: the gitea upgrades are very low impact (we roll through them one by one and shouldn't lose any replication events and the haproxy should handle http requests too)
19:50:47 <tonyb> Okay
19:50:52 <clarkb> if the gitea upgrades were a bit more impactful then we should cnsider combining but they ar esuper transparent to users
19:51:14 <clarkb> Going back to etherpad I think that this is laos not very urgent given the known list bug
19:51:39 <clarkb> I also haven't held a node yet to interact with it which is probbaly a good idea to double check that we don't hvae any plugin interactions that will create problems for us
19:52:25 <clarkb> Reviews welcome and I'll try to get a held node up soon
19:53:09 <clarkb> #topic Open Discussion
19:53:12 <clarkb> Anything else?
19:53:35 <tonyb> Just to note that we started the bullseye to bookworm updates
19:53:42 <fungi> yay!
19:54:26 <tonyb> the first set of services I tried failed due to, what I think is, missing requires to get the speculative images in the buildset registry
19:54:46 <clarkb> ya I looked at that briefly and wasn't able to understand what was missing. It looks like we have what we need
19:55:04 <tonyb> Hopefully with some push and TZ overlap we can make solid progress
19:55:09 <clarkb> I feel like this comes up semi regularly though and  Ineed to be better about writing down what the issue was/improving my understanding
19:55:36 <clarkb> corvus: any chance yo umight have a few minutes to look at that?
19:55:52 <clarkb> https://zuul.opendev.org/t/openstack/build/ab79e98cdd0242649cbc50593e87dae1/log/job-output.txt#723 is the failure
19:56:20 <corvus> yeah i'll take a look and followup in #opendev
19:56:42 <clarkb> thank you
19:57:00 <tonyb> Thanks
19:58:45 <clarkb> sounds like that is everything for now. Thank you everyone!
19:58:51 <clarkb> #endmeeting