19:01:07 <clarkb> #startmeeting infra
19:01:07 <opendevmeet> Meeting started Tue Jun 15 19:01:07 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:07 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:07 <opendevmeet> The meeting name has been set to 'infra'
19:01:14 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2021-June/000254.html Our Agenda
19:01:19 <clarkb> #topic Announcements
19:01:41 <clarkb> I will not be around next week. We will either need a volunteer meeting chair or we can skip
19:01:56 <clarkb> I'll leave that up to those who will be around to decide :)
19:02:38 <clarkb> #topic Actions from last meeting
19:02:46 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-06-08-19.01.txt minutes from last meeting
19:02:54 <clarkb> #action clarkb Followup with OpenStack on ELK retirement
19:03:04 <clarkb> I have not done this yet
19:03:10 <clarkb> #action someone write spec to replace Cacti with Prometheus
19:03:22 <clarkb> I have not seen a spec for this either. I assume it hasn't been done
19:03:35 <clarkb> ianw: did centos ppc packages get cleaned up?
19:03:55 <ianw> not yet sorry
19:04:17 <clarkb> #action ianw Push change to cleanup ppc packages in our CentOS mirrors
19:04:23 <clarkb> no worries i think we had a number of distractions last week
19:04:34 <clarkb> Lets jump in and talk about them :)
19:04:36 <clarkb> #topic Topics
19:04:42 <clarkb> #topic Eavesdrop and Limnoria
19:05:03 <clarkb> I wanted to call out that we had to fix a bug in limnoria to handle joins to many channels properly
19:05:23 <clarkb> This morning fungi discovered that limnoria doesn't seem to aggressively flush files to disk, but there is a config option we can toggle to have it do that
19:05:40 <ianw> hrm i'm pretty sure i turned that on
19:05:43 <fungi> we don't know for certain this will fix the observed behavior
19:05:44 <clarkb> And gmann was asking about the ptg.openstack.org etherpad lists were were/are hosted on eavesdrop01.openstack.org
19:06:06 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/796513/ Limnoria flush channel logs
19:06:08 <fungi> yeah, those are in ptgbot's sqlite database
19:06:46 <clarkb> On the whole we seem to be tackling these issues as they pop up so I'm not super worried, but wanted to call them out in case people want to look into any of them
19:06:58 <fungi> the channel log flushing i'm not so sure is what we think it is. i watched some channels updating rather readily, while others it decided to not flush the log files for a day or so
19:07:11 <clarkb> on a related note at ~00:30UTC today freenode killed itself and split into a new freenode network with no user or channel migration
19:07:28 <clarkb> if there was any qusetion about us making the right choice to move I think that is settled now.
19:08:35 <ianw> https://review.opendev.org/c/opendev/system-config/+/795978 is what i was thinking of
19:08:39 <ianw> re flushing
19:09:00 <clarkb> ah a different flush setting
19:09:05 <ianw> ohhh, that's flushing the config file
19:09:05 <clarkb> hopefully fungi's change sorts this problem out
19:09:08 <clarkb> yup
19:09:10 <ianw> yeah, ++
19:09:26 <fungi> i'm not convinced, but we'll see
19:09:26 <mordred> today's freenode-splosion is one of the most fascinating things to have happened in a while
19:10:02 <clarkb> fungi: I doubt it will hurt anything at least so seems safe to try
19:10:08 <fungi> agreed
19:10:28 <clarkb> also upstream has been super responsive which means if we can find and fix bugs pushing back upstream is worthwhile
19:10:50 <clarkb> alright, anything else on the topic of IRC and IRC bots?
19:10:52 <fungi> the pattern of what it was writing to disk and what it had seemingly decided to just no longer flush at all was not consistent
19:11:26 <ianw> templates/supybot.conf.erb:supybot.plugins.ChannelLogger.flushImmediately: False is in the old config
19:11:28 <fungi> but maybe there was more going on behind the scenes with fs write caching
19:11:57 <clarkb> fungi: it is running on a relatively new kernel inside a container with bind mounts too
19:12:17 <fungi> yeah
19:12:22 <ianw> there's also supybot.debug.flushVeryOften: False
19:12:28 <fungi> so lots of things can have changed under the covers
19:12:44 <fungi> supybot.debug.flushVeryOften seems to be about flushing its debug logs
19:12:56 <ianw> "automatically flush all flushers"
19:13:01 <fungi> which i figured was independent from channel logging
19:13:23 <fungi> but who knows how many toilets it flushes
19:13:38 <clarkb> we don't need to debug it in the meeting :) just want to make sure it is called out as a problem with a potential fix pushed up
19:13:48 <ianw> sorry the ptg thing, is there something to migrate off the old server?  i'm not super familiar with that bit
19:14:08 <clarkb> I am not super familiar with it either. FOr some reason I thought the foundation was hosting that site
19:14:18 <clarkb> but dns said that was wrong when I asked dns about it
19:14:24 <fungi> ptgbot's ptg.openstack.org website was served from the old eavesdrop server, and the bot maintained some state in an sqlite file (mainly for things people set/updated via irc messages)
19:14:50 <ianw> hrm so it's some sort of cgi?
19:14:53 <clarkb> fungi: was the site cgi/wsgi then as part of the bot install?
19:15:06 <fungi> gmann was looking for the list of team etherpads from the last ptg, which the ptg.o.o site would still have been serving from data in ptgbot's sqlite db
19:15:19 <fungi> yeah, the puppet-ptgbot module handles all of that
19:16:10 <fungi> in theory it should get rewritten as aansible(+docker) including the website configuration
19:16:15 <clarkb> in that case I guess we can grab the sqlite db file then query against it until the site is up again if people need info from it?
19:16:24 <fungi> yeah, that was my thinking
19:16:27 <clarkb> fungi: yup and diablo_rojo volunteered to look at it starting next week
19:16:36 <diablo_rojo_phone> Yep!
19:16:38 <clarkb> *to look at the ansible (+docker) bits
19:16:53 <fungi> but also archive.org might be indexing that site, in which case there could be a list we can point people to there in the meantime
19:17:09 <diablo_rojo_phone> Almost down to that section of my to-do list. Probably by tomorrow.
19:17:31 <fungi> we can't confirm whether archive.org has an old copy until their power maintenance is over though
19:18:10 <ianw> it sounds like i should probably leave it alone then, but happy to help migrate things etc.
19:18:26 <ianw> it looks like possibly it's a javascript thing.  i'm not seeing cgi/wsgi
19:18:32 <fungi> yeah, mainly let's just not delete the old server yet
19:18:37 <clarkb> ya I think worst case we'll look at the instance disk that is shutdown and/or backups and pull the data off
19:18:47 <clarkb> but waiting a few days is probably fine too
19:18:56 <ianw> yeah it's only shut down, essentially to avoid accidentally restarting the daemons twice
19:20:06 <fungi> cool, thanks for confirming
19:20:12 <clarkb> sounds like that may be it for this topic. Lets move on.
19:20:18 <clarkb> #topic Gerrit Account Cleanup
19:20:54 <clarkb> Has anyone had a chance to look at this info yet? I think I need to go through it again myself just to page context back in. But it would be nice to disable more accounts when we have had people take a look at hte lists so that we can let them site for a few weeks before permanently cleaning them up
19:20:57 <fungi> i've lost track of whether there was something which needed reviewing on this, sorry
19:21:15 <clarkb> ya there is a file on review in my homedir. Let me dig it up
19:21:41 <fungi> i'll try to look through it after dinner, apologies
19:22:11 <clarkb> ~clarkb/gerrit_user_cleanups/notes/proposed-cleanups.20210416 I think
19:22:32 <clarkb> but I need to repage things in myself too
19:22:39 <clarkb> anyway if you can take a look that would be helpful
19:22:47 <clarkb> #topic Server Upgrades
19:23:10 <clarkb> I have not made progress on the listserv upgrade testing as I have been distracted by things like server reboots and irc and all the things
19:23:20 <clarkb> it is still on my list but at this point I likely won't make progress on this until after next week
19:23:32 <clarkb> ianw: I think you have been making progress with review02. Anything new to report?
19:24:03 <ianw> i have just approved the container db bits that you've been working on and will monitor closely
19:24:04 <fungi> mnaser mentioned that the server got rebooted due to a host outage, so double-check things are still sane there i guess
19:24:22 <clarkb> fungi: ya that also took care of the reboot I was going to do on it :)
19:24:33 <clarkb> ianw: sounds good , thanks for pushing that along
19:24:40 <ianw> after that doesn't do anything to production, i will apply it to review02 and get the server mounting it's storage and ready
19:25:11 <ianw> i think we'll be very close to deciding when to sync data and move dns at that point
19:25:21 <clarkb> exciting
19:25:37 <ianw> i also had something to up the heap
19:25:49 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/784003
19:25:54 <clarkb> And then after that we can resurrect the gerrit 3.3 and 3.4 changes (there are som threads about 3.4 and ssh host key problems, but 3.3 looks like it should be good for us at this point)
19:27:03 <clarkb> any other server upgrade notes to amke?
19:28:11 <clarkb> #topic Draft Matrix Spec
19:28:19 <clarkb> corvus: did you want to introduce this topic?
19:28:30 <corvus> oh hi
19:28:36 <corvus> incoming prepared text dump:
19:28:47 <corvus> first i want to share a little update:
19:28:52 <corvus> i spent some time today talking with folks from the ansible, fedora, and gnome communities, all of which have serious plans to adopt matrix (they either have a homeserver or have plans to).
19:28:59 <mordred> I was there too
19:29:05 <corvus> #link gnome matrix spaces https://discourse.gnome.org/t/experimenting-with-matrix-spaces/6571
19:29:06 <corvus> #link gnome matrix sovreignty https://blog.ergaster.org/post/20210610-sovereignty-federated-system-gnome/
19:29:06 <corvus> #link fedora irc/matrix https://communityblog.fedoraproject.org/irc-announcement/
19:29:06 <corvus> #link fedora matrix plan https://discussion.fedoraproject.org/t/matrix-server-channel-setup/29844/7
19:29:07 <corvus> #link ansible matrix plan https://hackmd.io/FnpIUIrrRuec-gT3lrv-rQ?view#Current-plan-as-of-2021-06-14
19:29:13 <corvus> so we've got some really good company here, and people to collaborate with as we figure stuff out.
19:29:21 <corvus> just today i've learned way too much to share here in full, but in short: there are even more options for managing transitions from irc to matrix (including ways to take full admin control of the existing matrix.org portal rooms, rename those to :opendev.org rooms, and either maintain or retire the bridge at any point).  all of that to say that we'll make some choices that are appropriate for zuul, but there are other choices that may be
19:29:21 <corvus> more appropriate for other opendev projects which are equally valid.
19:29:32 <corvus> no matter what we do, the next step is for opendev to host a homeserver, so, on to the topic at hand, i uploaded a spec: https://review.opendev.org/796156
19:29:38 <corvus> and there are 2 big questions from my pov:
19:29:46 <corvus> 1) does opendev want a homeserver?
19:29:47 <corvus> 2) if so, does opendev want to run one itself or pay EMS to run it?
19:29:51 <corvus> and so, what order do we want to answer these questions, and how do we want to decide the second one?
19:29:57 <corvus> (fwiw, i advocate for OIF paying EMS to host the homeserver)
19:30:02 <corvus> [eof]
19:30:16 <mordred> in the realm of "learned too much"...
19:30:32 <mordred> I recommend very strongly reading the matrix sovreignity post above
19:30:42 <clarkb> for 2) I'm pretty strongly in the have someone else run it if at all possible
19:30:43 <fungi> any feel for what the recurring opex is on paying ems to host a homeserver?
19:31:10 <mordred> it made me pretty well convinced that there is way more sharp edges around having a homeserver that also has user accounts
19:31:13 <mordred> than value
19:31:20 <fungi> just wondering what additional research we need to do there before appealing to have funds applied
19:31:21 <corvus> fungi: could be as little as $10/mo.  i think it's wise for someone from the oif to talk to them and determine if that's appropriate.
19:31:31 <mordred> and so I think a homeserver that just hosts rooms and not users is what we'd be wanting
19:32:01 <clarkb> mordred: does that article cover the advantages of running a homeserver in that case without user accounts? eg why not just use matrix.org in that case?
19:32:11 <mordred> the $10/mo actually technically could do that - but we might be weasel-word reading the intent of that price tier - so we should likely talk to them
19:32:21 <mordred> clarkb: it does ... but I can summarize real quick
19:32:40 <mordred> if we have an opendev.org homeserver then we have control over things that brand themselves as being opendev channels
19:33:06 <clarkb> got it, its about channel management then. Makes sense
19:33:19 <mordred> so someone can be sure that #zuul:opendev.org is the zuul channel hosted by opendev - whereas #zuul:matrix.org might or might not have any relationship to us
19:33:28 <mordred> yeah
19:33:29 <mordred> we would also have a few more integration possibilities
19:33:41 <mordred> allowing us to think about things like logging bots slightly differently - or not, we could still use bots
19:33:56 <clarkb> ya and integration with other chat systems
19:33:57 <fungi> what is the process for moving herds of users from one matrix channel to a replacement channel? like could we use a matrix.org channel and later "forward" that to an opendev.org channel?
19:34:14 <corvus> (or slack bridges....)
19:34:36 <corvus> fungi: one of the options is to actually just rename the channel :)
19:34:49 <fungi> so "renames" are a thing then
19:35:02 <clarkb> and I guess the background on this topic is that Zuul has decided they would like to use matrix for primary synchronous comms rather than irc
19:35:07 <corvus> i just learned (moments ago) that's actually a possibility for the oftc portal rooms!
19:35:11 <fungi> and #zuul:matrix.org could be "renamed" to #zuul:opendev.org?
19:35:51 <corvus> well, there's no #zuul:matrix.org to my knowledge; i have no intention of creating any rooms :matrix.org
19:35:57 <clarkb> Looking at element pricing is done per user. The $10 option is for 5 users. I suspect we'd end up with ~500 users at any one time?
19:36:07 <mordred> clarkb: nope.
19:36:11 <mordred> clarkb: we'd just have rooms
19:36:16 <mordred> users would not log in to our homeserver
19:36:17 <clarkb> oh I see
19:36:41 <corvus> fungi: but if there were, we could rename that room.  more to the point, we can rename the `#_oftc_#zuul:matrix.org` portal room, with some help from the matrix.org admins.
19:36:44 <clarkb> there would be ~500 users interacting with the channels on that homeserver but none of them would be opendev.org users
19:36:45 <mordred> (email winds up being an excellent analogy fwiw)
19:36:52 <mordred> yah
19:36:58 <fungi> yeah, if we need to have foundation staff talking to matrix about pricing, we probably should be clear on what wording is relevant
19:37:11 <corvus> we may want a handful of admin/bot accounts, that's it.  5 accounts is the right number to be thinking of.
19:37:19 <clarkb> corvus: got it
19:37:23 <mordred> exactly - that's where I'd want the EMS folks to be cool with our intended use
19:37:38 <mordred> but also - it's other homeservers that would be federating with it
19:38:04 <fungi> so my user might be fungi:yuggoth.org if i decide to run my own homeserver
19:38:09 <mordred> yah
19:38:20 <fungi> which wouldn't count against the opendev.org user count
19:38:21 <clarkb> fungi: yup. I'm currently Clark:matrix.org or something
19:38:37 <corvus> otoh, it's like "hey we're going to increase your load on matrix.org by X hundred/thousand users".  they may be ":(" or they may hear "we're going to expose X hundred/thousand more people to matrix technology" and be ":)".  i dunno.
19:38:44 <fungi> right, i have fungicide:matrix.org because fungi was taken by someone else cool enough to like the name
19:39:40 <clarkb> if I understand correctly what we would want to ask about is whether or not the $10/month (or maybe even the $75/month) options fit our use case of running a homeserver where the vast majority of users are authenticating with their own homeservers or matrix.org
19:40:02 <clarkb> The hosted homeserver would primarily be used to set channel ownership and manage those channels
19:40:42 <corvus> ++
19:40:52 <mordred> yup
19:40:53 <clarkb> fungi and I should probably go read the spec and bring that up with OIF then
19:41:22 <clarkb> and then based on what we learn we can bring that feedback to the spec
19:41:27 <corvus> that sounds like a great next step -- you can do that, and then we can revise the spec to only include one hosting option
19:41:35 <fungi> clarkb: that would be great, happy to help in that discussion, and we can certainly involve anyone else who wants to be in on those conversations too
19:42:21 <mordred> I think corvus and I would be happy to chat with our OIF friends if that would be helpful
19:42:43 <mordred> you could tell sparky that he's welcome to come to Pal's and talk with me about it there
19:42:46 <corvus> yes, i am fully prepared to be a resource as needed :)
19:42:57 <clarkb> sounds good. I'll try to get started on that when I get back on Thursday
19:43:32 <clarkb> Anything else to bring up on the subject of Matrix? Or should we see where we end up after talking to OIF?
19:43:36 <fungi> is Pal's a bar?
19:43:48 <corvus> i think that's good for me
19:44:03 <clarkb> #topic arm64 cloud status
19:44:13 <clarkb> This wasn't on the agenda but it should've been so I'm adding it :{
19:44:15 <clarkb> er :P
19:44:23 <fungi> chef's choice
19:45:01 <clarkb> When I rebooted servers the osuosl mirror node did not come back with working openafs. Googling found that ianw had run into this in the past but I couldn't find out how we got past it previously. For this reason we ended up disabling osuosl in nodepool
19:45:22 <mordred> fungi: yes
19:45:24 <fungi> more specifically, it's throwing a kernel oops in cache setup
19:45:25 <clarkb> since then we've discovered that linaro has a bunch of leaked nodes limiting our total capacity there. That cloud is functioning just not at full capacity. I have emailed kevinz with those details
19:45:49 <ianw> sorry i must have missed this
19:46:04 <clarkb> I expect kevinz will be able to clean up the nodes i listed as leaked and we'll be back to happy again in linaro. But I'm not sure what the next stpes for us in osuosl are
19:46:04 <fungi> ianw: it's partly my fault for being so scattered i forgot to mention it
19:46:31 <ianw> the usual case i've found is that the /var/cache/openafs is corrupt, and removing it helps
19:46:40 <clarkb> ianw: no worries. I think I remember for your initial query to the openafs list that this is focal specific. I suppose one option is to downgrade to bionic on the mirror
19:46:57 <fungi> we've tried a few things there, clearing the old cache files, reformatting and even recreating the cache volume in case it was a block level problem, manually loading the lkm before starting afsd...
19:46:59 <clarkb> ianw: we've cleared out the cache multiple times without it helping unfortunately. fungi even compeltely replaced the cinder volume that bakced it
19:47:24 <fungi> yeah, still the same oops every time we try to start up afsd
19:47:57 <ianw> do you have a link to the oops, i can't even remeber sorry
19:48:02 <clarkb> openafs upstream mentioned that 1.8.7 should include the expected fix
19:48:07 <clarkb> let me see if I can find it in scrollback
19:48:11 <fungi> i can scrape it from dmesg on the server, sure
19:48:23 <ianw> anyway, we can debug this today
19:48:26 <clarkb> ianw: https://www.mail-archive.com/openafs-info@openafs.org/msg41186.html should match the dmesg if I got the right thing
19:48:49 <clarkb> ianw: that would be great. Thanks!
19:49:13 <fungi> #link http://paste.openstack.org/show/806651 openafs kernel oops
19:49:18 <clarkb> #topic Gerrit Project Renames
19:49:31 <clarkb> fungi: do we have a change to update the playbook for this yet?
19:49:37 <fungi> i have not, no
19:49:43 <clarkb> ok, lets skip it for now then
19:49:45 <fungi> meant to do it late last week
19:49:49 <fungi> sorry!
19:49:57 <clarkb> no worries. It has been a fun few weeks
19:49:59 <clarkb> #topic Open Discussion
19:50:07 <clarkb> Is there anything else to talk about?
19:52:29 <clarkb> Sounds like that may be it. Thank you everyone!
19:52:57 <clarkb> #endmeeting