19:01:07 #startmeeting infra 19:01:07 Meeting started Tue Jun 15 19:01:07 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:07 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:07 The meeting name has been set to 'infra' 19:01:14 #link http://lists.opendev.org/pipermail/service-discuss/2021-June/000254.html Our Agenda 19:01:19 #topic Announcements 19:01:41 I will not be around next week. We will either need a volunteer meeting chair or we can skip 19:01:56 I'll leave that up to those who will be around to decide :) 19:02:38 #topic Actions from last meeting 19:02:46 #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-06-08-19.01.txt minutes from last meeting 19:02:54 #action clarkb Followup with OpenStack on ELK retirement 19:03:04 I have not done this yet 19:03:10 #action someone write spec to replace Cacti with Prometheus 19:03:22 I have not seen a spec for this either. I assume it hasn't been done 19:03:35 ianw: did centos ppc packages get cleaned up? 19:03:55 not yet sorry 19:04:17 #action ianw Push change to cleanup ppc packages in our CentOS mirrors 19:04:23 no worries i think we had a number of distractions last week 19:04:34 Lets jump in and talk about them :) 19:04:36 #topic Topics 19:04:42 #topic Eavesdrop and Limnoria 19:05:03 I wanted to call out that we had to fix a bug in limnoria to handle joins to many channels properly 19:05:23 This morning fungi discovered that limnoria doesn't seem to aggressively flush files to disk, but there is a config option we can toggle to have it do that 19:05:40 hrm i'm pretty sure i turned that on 19:05:43 we don't know for certain this will fix the observed behavior 19:05:44 And gmann was asking about the ptg.openstack.org etherpad lists were were/are hosted on eavesdrop01.openstack.org 19:06:06 #link https://review.opendev.org/c/opendev/system-config/+/796513/ Limnoria flush channel logs 19:06:08 yeah, those are in ptgbot's sqlite database 19:06:46 On the whole we seem to be tackling these issues as they pop up so I'm not super worried, but wanted to call them out in case people want to look into any of them 19:06:58 the channel log flushing i'm not so sure is what we think it is. i watched some channels updating rather readily, while others it decided to not flush the log files for a day or so 19:07:11 on a related note at ~00:30UTC today freenode killed itself and split into a new freenode network with no user or channel migration 19:07:28 if there was any qusetion about us making the right choice to move I think that is settled now. 19:08:35 https://review.opendev.org/c/opendev/system-config/+/795978 is what i was thinking of 19:08:39 re flushing 19:09:00 ah a different flush setting 19:09:05 ohhh, that's flushing the config file 19:09:05 hopefully fungi's change sorts this problem out 19:09:08 yup 19:09:10 yeah, ++ 19:09:26 i'm not convinced, but we'll see 19:09:26 today's freenode-splosion is one of the most fascinating things to have happened in a while 19:10:02 fungi: I doubt it will hurt anything at least so seems safe to try 19:10:08 agreed 19:10:28 also upstream has been super responsive which means if we can find and fix bugs pushing back upstream is worthwhile 19:10:50 alright, anything else on the topic of IRC and IRC bots? 19:10:52 the pattern of what it was writing to disk and what it had seemingly decided to just no longer flush at all was not consistent 19:11:26 templates/supybot.conf.erb:supybot.plugins.ChannelLogger.flushImmediately: False is in the old config 19:11:28 but maybe there was more going on behind the scenes with fs write caching 19:11:57 fungi: it is running on a relatively new kernel inside a container with bind mounts too 19:12:17 yeah 19:12:22 there's also supybot.debug.flushVeryOften: False 19:12:28 so lots of things can have changed under the covers 19:12:44 supybot.debug.flushVeryOften seems to be about flushing its debug logs 19:12:56 "automatically flush all flushers" 19:13:01 which i figured was independent from channel logging 19:13:23 but who knows how many toilets it flushes 19:13:38 we don't need to debug it in the meeting :) just want to make sure it is called out as a problem with a potential fix pushed up 19:13:48 sorry the ptg thing, is there something to migrate off the old server? i'm not super familiar with that bit 19:14:08 I am not super familiar with it either. FOr some reason I thought the foundation was hosting that site 19:14:18 but dns said that was wrong when I asked dns about it 19:14:24 ptgbot's ptg.openstack.org website was served from the old eavesdrop server, and the bot maintained some state in an sqlite file (mainly for things people set/updated via irc messages) 19:14:50 hrm so it's some sort of cgi? 19:14:53 fungi: was the site cgi/wsgi then as part of the bot install? 19:15:06 gmann was looking for the list of team etherpads from the last ptg, which the ptg.o.o site would still have been serving from data in ptgbot's sqlite db 19:15:19 yeah, the puppet-ptgbot module handles all of that 19:16:10 in theory it should get rewritten as aansible(+docker) including the website configuration 19:16:15 in that case I guess we can grab the sqlite db file then query against it until the site is up again if people need info from it? 19:16:24 yeah, that was my thinking 19:16:27 fungi: yup and diablo_rojo volunteered to look at it starting next week 19:16:36 Yep! 19:16:38 *to look at the ansible (+docker) bits 19:16:53 but also archive.org might be indexing that site, in which case there could be a list we can point people to there in the meantime 19:17:09 Almost down to that section of my to-do list. Probably by tomorrow. 19:17:31 we can't confirm whether archive.org has an old copy until their power maintenance is over though 19:18:10 it sounds like i should probably leave it alone then, but happy to help migrate things etc. 19:18:26 it looks like possibly it's a javascript thing. i'm not seeing cgi/wsgi 19:18:32 yeah, mainly let's just not delete the old server yet 19:18:37 ya I think worst case we'll look at the instance disk that is shutdown and/or backups and pull the data off 19:18:47 but waiting a few days is probably fine too 19:18:56 yeah it's only shut down, essentially to avoid accidentally restarting the daemons twice 19:20:06 cool, thanks for confirming 19:20:12 sounds like that may be it for this topic. Lets move on. 19:20:18 #topic Gerrit Account Cleanup 19:20:54 Has anyone had a chance to look at this info yet? I think I need to go through it again myself just to page context back in. But it would be nice to disable more accounts when we have had people take a look at hte lists so that we can let them site for a few weeks before permanently cleaning them up 19:20:57 i've lost track of whether there was something which needed reviewing on this, sorry 19:21:15 ya there is a file on review in my homedir. Let me dig it up 19:21:41 i'll try to look through it after dinner, apologies 19:22:11 ~clarkb/gerrit_user_cleanups/notes/proposed-cleanups.20210416 I think 19:22:32 but I need to repage things in myself too 19:22:39 anyway if you can take a look that would be helpful 19:22:47 #topic Server Upgrades 19:23:10 I have not made progress on the listserv upgrade testing as I have been distracted by things like server reboots and irc and all the things 19:23:20 it is still on my list but at this point I likely won't make progress on this until after next week 19:23:32 ianw: I think you have been making progress with review02. Anything new to report? 19:24:03 i have just approved the container db bits that you've been working on and will monitor closely 19:24:04 mnaser mentioned that the server got rebooted due to a host outage, so double-check things are still sane there i guess 19:24:22 fungi: ya that also took care of the reboot I was going to do on it :) 19:24:33 ianw: sounds good , thanks for pushing that along 19:24:40 after that doesn't do anything to production, i will apply it to review02 and get the server mounting it's storage and ready 19:25:11 i think we'll be very close to deciding when to sync data and move dns at that point 19:25:21 exciting 19:25:37 i also had something to up the heap 19:25:49 #link https://review.opendev.org/c/opendev/system-config/+/784003 19:25:54 And then after that we can resurrect the gerrit 3.3 and 3.4 changes (there are som threads about 3.4 and ssh host key problems, but 3.3 looks like it should be good for us at this point) 19:27:03 any other server upgrade notes to amke? 19:28:11 #topic Draft Matrix Spec 19:28:19 corvus: did you want to introduce this topic? 19:28:30 oh hi 19:28:36 incoming prepared text dump: 19:28:47 first i want to share a little update: 19:28:52 i spent some time today talking with folks from the ansible, fedora, and gnome communities, all of which have serious plans to adopt matrix (they either have a homeserver or have plans to). 19:28:59 I was there too 19:29:05 #link gnome matrix spaces https://discourse.gnome.org/t/experimenting-with-matrix-spaces/6571 19:29:06 #link gnome matrix sovreignty https://blog.ergaster.org/post/20210610-sovereignty-federated-system-gnome/ 19:29:06 #link fedora irc/matrix https://communityblog.fedoraproject.org/irc-announcement/ 19:29:06 #link fedora matrix plan https://discussion.fedoraproject.org/t/matrix-server-channel-setup/29844/7 19:29:07 #link ansible matrix plan https://hackmd.io/FnpIUIrrRuec-gT3lrv-rQ?view#Current-plan-as-of-2021-06-14 19:29:13 so we've got some really good company here, and people to collaborate with as we figure stuff out. 19:29:21 just today i've learned way too much to share here in full, but in short: there are even more options for managing transitions from irc to matrix (including ways to take full admin control of the existing matrix.org portal rooms, rename those to :opendev.org rooms, and either maintain or retire the bridge at any point). all of that to say that we'll make some choices that are appropriate for zuul, but there are other choices that may be 19:29:21 more appropriate for other opendev projects which are equally valid. 19:29:32 no matter what we do, the next step is for opendev to host a homeserver, so, on to the topic at hand, i uploaded a spec: https://review.opendev.org/796156 19:29:38 and there are 2 big questions from my pov: 19:29:46 1) does opendev want a homeserver? 19:29:47 2) if so, does opendev want to run one itself or pay EMS to run it? 19:29:51 and so, what order do we want to answer these questions, and how do we want to decide the second one? 19:29:57 (fwiw, i advocate for OIF paying EMS to host the homeserver) 19:30:02 [eof] 19:30:16 in the realm of "learned too much"... 19:30:32 I recommend very strongly reading the matrix sovreignity post above 19:30:42 for 2) I'm pretty strongly in the have someone else run it if at all possible 19:30:43 any feel for what the recurring opex is on paying ems to host a homeserver? 19:31:10 it made me pretty well convinced that there is way more sharp edges around having a homeserver that also has user accounts 19:31:13 than value 19:31:20 just wondering what additional research we need to do there before appealing to have funds applied 19:31:21 fungi: could be as little as $10/mo. i think it's wise for someone from the oif to talk to them and determine if that's appropriate. 19:31:31 and so I think a homeserver that just hosts rooms and not users is what we'd be wanting 19:32:01 mordred: does that article cover the advantages of running a homeserver in that case without user accounts? eg why not just use matrix.org in that case? 19:32:11 the $10/mo actually technically could do that - but we might be weasel-word reading the intent of that price tier - so we should likely talk to them 19:32:21 clarkb: it does ... but I can summarize real quick 19:32:40 if we have an opendev.org homeserver then we have control over things that brand themselves as being opendev channels 19:33:06 got it, its about channel management then. Makes sense 19:33:19 so someone can be sure that #zuul:opendev.org is the zuul channel hosted by opendev - whereas #zuul:matrix.org might or might not have any relationship to us 19:33:28 yeah 19:33:29 we would also have a few more integration possibilities 19:33:41 allowing us to think about things like logging bots slightly differently - or not, we could still use bots 19:33:56 ya and integration with other chat systems 19:33:57 what is the process for moving herds of users from one matrix channel to a replacement channel? like could we use a matrix.org channel and later "forward" that to an opendev.org channel? 19:34:14 (or slack bridges....) 19:34:36 fungi: one of the options is to actually just rename the channel :) 19:34:49 so "renames" are a thing then 19:35:02 and I guess the background on this topic is that Zuul has decided they would like to use matrix for primary synchronous comms rather than irc 19:35:07 i just learned (moments ago) that's actually a possibility for the oftc portal rooms! 19:35:11 and #zuul:matrix.org could be "renamed" to #zuul:opendev.org? 19:35:51 well, there's no #zuul:matrix.org to my knowledge; i have no intention of creating any rooms :matrix.org 19:35:57 Looking at element pricing is done per user. The $10 option is for 5 users. I suspect we'd end up with ~500 users at any one time? 19:36:07 clarkb: nope. 19:36:11 clarkb: we'd just have rooms 19:36:16 users would not log in to our homeserver 19:36:17 oh I see 19:36:41 fungi: but if there were, we could rename that room. more to the point, we can rename the `#_oftc_#zuul:matrix.org` portal room, with some help from the matrix.org admins. 19:36:44 there would be ~500 users interacting with the channels on that homeserver but none of them would be opendev.org users 19:36:45 (email winds up being an excellent analogy fwiw) 19:36:52 yah 19:36:58 yeah, if we need to have foundation staff talking to matrix about pricing, we probably should be clear on what wording is relevant 19:37:11 we may want a handful of admin/bot accounts, that's it. 5 accounts is the right number to be thinking of. 19:37:19 corvus: got it 19:37:23 exactly - that's where I'd want the EMS folks to be cool with our intended use 19:37:38 but also - it's other homeservers that would be federating with it 19:38:04 so my user might be fungi:yuggoth.org if i decide to run my own homeserver 19:38:09 yah 19:38:20 which wouldn't count against the opendev.org user count 19:38:21 fungi: yup. I'm currently Clark:matrix.org or something 19:38:37 otoh, it's like "hey we're going to increase your load on matrix.org by X hundred/thousand users". they may be ":(" or they may hear "we're going to expose X hundred/thousand more people to matrix technology" and be ":)". i dunno. 19:38:44 right, i have fungicide:matrix.org because fungi was taken by someone else cool enough to like the name 19:39:40 if I understand correctly what we would want to ask about is whether or not the $10/month (or maybe even the $75/month) options fit our use case of running a homeserver where the vast majority of users are authenticating with their own homeservers or matrix.org 19:40:02 The hosted homeserver would primarily be used to set channel ownership and manage those channels 19:40:42 ++ 19:40:52 yup 19:40:53 fungi and I should probably go read the spec and bring that up with OIF then 19:41:22 and then based on what we learn we can bring that feedback to the spec 19:41:27 that sounds like a great next step -- you can do that, and then we can revise the spec to only include one hosting option 19:41:35 clarkb: that would be great, happy to help in that discussion, and we can certainly involve anyone else who wants to be in on those conversations too 19:42:21 I think corvus and I would be happy to chat with our OIF friends if that would be helpful 19:42:43 you could tell sparky that he's welcome to come to Pal's and talk with me about it there 19:42:46 yes, i am fully prepared to be a resource as needed :) 19:42:57 sounds good. I'll try to get started on that when I get back on Thursday 19:43:32 Anything else to bring up on the subject of Matrix? Or should we see where we end up after talking to OIF? 19:43:36 is Pal's a bar? 19:43:48 i think that's good for me 19:44:03 #topic arm64 cloud status 19:44:13 This wasn't on the agenda but it should've been so I'm adding it :{ 19:44:15 er :P 19:44:23 chef's choice 19:45:01 When I rebooted servers the osuosl mirror node did not come back with working openafs. Googling found that ianw had run into this in the past but I couldn't find out how we got past it previously. For this reason we ended up disabling osuosl in nodepool 19:45:22 fungi: yes 19:45:24 more specifically, it's throwing a kernel oops in cache setup 19:45:25 since then we've discovered that linaro has a bunch of leaked nodes limiting our total capacity there. That cloud is functioning just not at full capacity. I have emailed kevinz with those details 19:45:49 sorry i must have missed this 19:46:04 I expect kevinz will be able to clean up the nodes i listed as leaked and we'll be back to happy again in linaro. But I'm not sure what the next stpes for us in osuosl are 19:46:04 ianw: it's partly my fault for being so scattered i forgot to mention it 19:46:31 the usual case i've found is that the /var/cache/openafs is corrupt, and removing it helps 19:46:40 ianw: no worries. I think I remember for your initial query to the openafs list that this is focal specific. I suppose one option is to downgrade to bionic on the mirror 19:46:57 we've tried a few things there, clearing the old cache files, reformatting and even recreating the cache volume in case it was a block level problem, manually loading the lkm before starting afsd... 19:46:59 ianw: we've cleared out the cache multiple times without it helping unfortunately. fungi even compeltely replaced the cinder volume that bakced it 19:47:24 yeah, still the same oops every time we try to start up afsd 19:47:57 do you have a link to the oops, i can't even remeber sorry 19:48:02 openafs upstream mentioned that 1.8.7 should include the expected fix 19:48:07 let me see if I can find it in scrollback 19:48:11 i can scrape it from dmesg on the server, sure 19:48:23 anyway, we can debug this today 19:48:26 ianw: https://www.mail-archive.com/openafs-info@openafs.org/msg41186.html should match the dmesg if I got the right thing 19:48:49 ianw: that would be great. Thanks! 19:49:13 #link http://paste.openstack.org/show/806651 openafs kernel oops 19:49:18 #topic Gerrit Project Renames 19:49:31 fungi: do we have a change to update the playbook for this yet? 19:49:37 i have not, no 19:49:43 ok, lets skip it for now then 19:49:45 meant to do it late last week 19:49:49 sorry! 19:49:57 no worries. It has been a fun few weeks 19:49:59 #topic Open Discussion 19:50:07 Is there anything else to talk about? 19:52:29 Sounds like that may be it. Thank you everyone! 19:52:57 #endmeeting