19:01:16 #startmeeting infra 19:01:17 Meeting started Tue Jun 8 19:01:16 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:18 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:21 The meeting name has been set to 'infra' 19:01:25 #link http://lists.opendev.org/pipermail/service-discuss/2021-June/000253.html Our Agenda 19:01:39 The agenda isn't very big today. Hopefully I'll be able to give you some time back 19:01:47 #topic Announcements 19:02:23 This didn't make the agenda, but it occurred to me wheil prepping for this meeting. I've got family in town starting tonight and I may not be able to run the next two meetings. I think I'll be around next week but definitely not the week after 19:02:49 I'll try to figure that out more accurately once i can discuss their plans tonight/tomorrow 19:03:22 #topic Actions from last meeting 19:03:28 #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-06-01-19.01.txt minutes from last meeting 19:03:34 There were no recorded actions 19:04:04 #topic Specs 19:04:14 #link https://review.opendev.org/c/zuul/zuul/+/793669 Zuul spec to use matrix instead of irc 19:04:34 This isn't our spec but it is worth keeping an eye on and commenting on if you have input 19:04:56 i think corvus is hoping to make the decision on it later this week 19:05:07 so feedback sooner would be better 19:05:35 ++ 19:05:46 #topic Topics 19:05:52 #topic Switch to OFTC 19:06:16 ianw has been working to redo our irc bot deployment stuff. Including spinning up a new server and running bots out of containers there 19:06:28 and a new eavesdrop server 19:06:30 meetbot and ptgbot are the bots left out in that process as they aren't currently containerized 19:06:35 er, you did say new server, yeah 19:06:37 #link https://review.opendev.org/q/topic:%22limnoria%22+status:open Limnoria bot rewrite 19:07:00 This topic is a set of changes that converts meetbot from supybot to limnoria and containerized that bot so ti can run on the new server too 19:07:10 statusbot, btw, is still running foreground in a screen session, hasn't crashed again :/ 19:07:14 I think I've gotten through that topic for reviews but should double check 19:07:21 #link https://review.opendev.org/c/openstack/ptgbot/+/793792 Update PTGbot for OFTC 19:07:56 This change updates ptgbot to make it possible for it to connect to oftc, but we need to double check if it needs python2 or if it can run under python3. If it can run under python3 then a good followup would be containerizing it. If not then converting to python3 then containerizing it would be good 19:08:05 oh also, i think the openstack tc has decided friday of this week is when they'll change channel topics on freenode, not sure if anyone else is following suit or if we want to do something similar for our handful of opendev channels there 19:08:36 so the bits that want review now are the deployment 19:08:39 #link https://review.opendev.org/q/topic:%22meetbot-container%22+(status:open%20OR%20status:merged) 19:09:06 ah ok I'll have to review those then 19:09:10 i am still working on statusbot in 19:09:11 #link https://review.opendev.org/c/opendev/system-config/+/795213 19:09:35 fungi: I'm happy to update our topic for #opendev and #opendev-meeting though I dropped off of freenode today. Neither channel had any real traffic since the switc hthough 19:09:58 that's the only bit i would consider hacky. i'm installing it in the limnoria container, even though it's not a plug-in, on the theory that it *should* be 19:10:05 yeah, i can do topic updates, i'm still joined there 19:10:30 but realistically, i doubt anyone is that interested in rewriting it; but if they did, that's where it should go 19:10:52 ianw: you mean rather than using a separate statusbot container? 19:11:15 ianw: is there a reason to combine them rather than do process per container? I think we have tried to stick to process per container as much as possible 19:13:06 I'll try to dig into that when get to reviewing the changes 19:13:30 well, to avoid having to make another container and build process for it all, for something that we don't really care that much about 19:13:43 I see 19:14:00 Another thing I wanted to point out is that zuul is using a matrix element oftc connection url instead of oftc's webchat as easy connect mode. If we've got similar links that need updating I think that worked reasonable well 19:14:08 might also suggest it to other users like openstack 19:15:30 anything else on the subject of IRC? 19:16:13 i guess ptgbot i wanted to call out 19:16:40 i've -1'd 19:17:05 #link https://review.opendev.org/c/opendev/puppet-ptgbot/+/793794 19:17:23 we have time to solve ptgbot, there's no looming ptg for months thankfully 19:17:24 explaining that the puppet no longer runs on eavesdrop01.openstack.org (because we're worried pip will explode on itself) 19:17:47 ianw: oh I didn't realize there was a puppet change already too. I agree we should convert it over. Particularly since we have time for that bot 19:17:52 so if it's offline for a while to get containerified, it's not a huge deal 19:18:02 so the future for this, if there is any, is for someone to make a container and migrate it to ansible, and deploy on eavesdrop01 19:18:12 ++ 19:18:21 at one time, per the irc spec, the future was to rewrite it as a limnoria plugin 19:18:39 but i honestly don't think anyone is going to do that now 19:19:28 i have no idea how much of a forklift that would actually be, i expect the internal logic is mostly isolated and would probably not be that hard to translate to a limnoria plugin, but i can't say i've looked at it with that in mind 19:19:30 and i would not be surprised if it prompts a think about it's future anyway 19:20:04 fungi: on the flip side I think installing it to a container and bind mounting the dir where it stores info is probably super trivial 19:20:11 yep 19:20:21 we already just pip install it anyway 19:21:15 o/ 19:21:34 we can bring it up with diablo_rojo_phone and ttx to see what they think and take it from there 19:21:50 I agree with ianw though taht we shouldn't try to keep the puppetry going given what hapepned with pip on the old server when we switched 19:22:01 i doubt containerising it would be too difficult. i'd just like someone else to own it :) 19:22:08 ++ 19:22:23 I suspect diablo_rojo_phone and/or ttx would be happy to help with that 19:22:42 Lets move on 19:22:46 #topic Gerrit Updates 19:22:50 i'm happy to help point people in the right direction to get it all going 19:23:02 ianw: I can let them know. I can probably help too 19:23:16 #link https://review.opendev.org/c/opendev/system-config/+/775961 MariaDB container for review02 accountPatchReviewDB 19:23:30 ianw has been working on ^ as one of the last pieces before we can schedule a cut over to review02 19:23:47 I found one more thing reviewing it this morning that would be good to double check 19:24:06 ok i'll loop back. 19:24:10 But otherwise I think it is just about there. It would be good if other infra-root can review it too since it has the potential to make review01 unhappy (though the change attempts to avoid changing anything on that) 19:24:28 I also remembered that I had achange up to unfork gerrit 19:24:39 #link https://review.opendev.org/c/opendev/system-config/+/791995 Stop adding special x/ path handling to gerrit 19:25:03 yeah, that's fixed and backported to the minor series we're on 19:25:06 It passes testing, though we may want to double check we don't need to update plugin versions on the image build? 19:25:26 Applying that will require a gerrit restart, but I think getting off of our fork is a good thing overall and something we haven't done for years :) 19:25:34 Please carefully review that one too. 19:26:07 it would be good to get the latest zuul summary plugin in with better timestamps 19:26:12 i have 19:26:14 #link https://etherpad.opendev.org/p/gerrit-upgrade-2021 19:26:24 tracking general upgrade notes 19:26:42 ianw: landing 791995 should rebuild with the latest zuul summary plugin too. 19:27:01 when we have the accountReviewDb situation sorted, i think we can more or less migrate at will 19:27:24 ianw: was reverse dns set up? Seeing that on your etherpad and remembering that is important for email 19:27:48 ahh, that's a good one, i should ping mnaser on that one 19:28:20 ++ 19:28:24 anything else on gerrit updates? 19:28:27 i think review02 will be the host now, especially with the address pinning we've done 19:28:39 i suppose the same mariadb container can be useful for a future zuul scheduler, and storyboard containerization as well 19:29:00 fungi: yup, we also already do similar for gitea and maybe something else too 19:29:17 yeah, it's just the upstream container, but the deployment bits may be helpful to copy 19:29:33 oh, right, forgot we already had that implemented for gitea since ages 19:30:16 #topic Server Upgrades 19:30:28 We have already discussed the eavesdrop upgrade and tasks for the gerrit upgrade 19:30:56 I'm still slowly poking at the listserv upgrades. I'm hoping I'll have time to boot a test node and upgrade it in place during the second half of this week 19:31:13 I'll be using the snapshot that fungi took about a month ago t oavoid needing to wait for and slow down the server again 19:31:24 #link https://review.opendev.org/c/opendev/system-config/+/795207 19:31:36 ^ that removes ask.o.o which is causing deployment issues as it's now shutdown 19:31:55 ianw: fungi: shoudl we stash a db dump from the ask db as part of that? 19:32:13 we do already have that on backup hosts 19:32:17 oh excellent 19:32:28 the redirect with LE cert seems to be working well too 19:32:36 it's a local redis, right? 19:32:48 no askbot is postgres or mysql. I forget which 19:33:01 ahh, okay, but regardless it's not in a trove instance 19:33:10 iirc it was a mysql and dumped out with mysqldump as a stream 19:33:11 for that I am not sure 19:33:21 it might be trove if it was mysql 19:33:31 i'll check, if it's trove i'll make an archival snapshot and then delete the instance 19:33:45 thanks! 19:33:53 Any other server upgrades or cleanups to call out? 19:34:17 we must be getting pretty thin on xenial now? 19:34:19 Oh it would be good if infra-root can check the test node listservs that I upgraded to see if anything stands out as broken 19:34:26 ianw: slowly but surely :) 19:34:36 post listserv, eavesdrop, review what is left? 19:34:41 paste i guess 19:34:42 one thing I notice about the inplace upgrades of the listservs is we'll end up with sysv init scripts and unit files for mailman 19:34:51 not wiki, it's still trusty! ;) 19:35:18 ianw: translate which we need to figure out because zanata, health, elasticsearch, logstash 19:36:03 we said we wouldn't just turn off ELK things to give peopel time to try and figure that out. We should probably periodically check in on that and work with openstack to draw a line in the sand so that we aren't just keepign it out there forever 19:36:59 no trove instance for askbot, btw 19:37:19 #action clarkb followup with openstack on ELK retirement planning 19:37:23 I can do that 19:37:35 zanata i forget, was that a dead project? 19:37:38 (we have trove instances for one or more of cacti, grafana, paste, refstack, review, subunit2sql, translate, wiki, zuul) 19:37:38 yup 19:38:01 one or more each i mean 19:38:09 ianw: and there was talk about transitioning to a hosted open source alternative, but it kept getting deprioritized on my end because other things kept popping up 19:38:17 (e.g., review/review-dev/review-test) 19:38:34 ianw: transitioning that would actually be a good project for someone in the community if we can find someone interested in it since all the data is exportabel as a user aiui 19:38:39 oh, cacti must be another one that needs updating 19:39:07 corvus had talked about switching to some prometheus + influxdb + grafana type setup 19:39:11 yeah, i remember discussion on translate that seemed to stall 19:39:56 if memory serves, the proposal was to have prometheus query snmpd and then store the data in its own timeseries database? 19:40:02 i recommend skipping influxdb 19:40:03 ya 19:40:31 The upside to that is we can plug services like gitea into it 19:40:32 grafana+prometheus snmp_exporter+grafana 19:40:44 i was reading last night about https://grafana.com/docs/grafana/latest/whatsnew/whats-new-in-v8-0/#grafana-v80-alerts 19:40:44 and get service level as well as host level metrics 19:40:49 and i guess if we want history, we can use rrdtool to dump the existing graphs and ingest them into prom 19:41:00 i haven't run that yet; but i have run influxdb and don't think it's a good fit right now 19:42:49 maybe we should write up a small spec for that? As I do think replacing cacti with prometheus gets us more functionality like the gitea service metrics which will be nice to have 19:43:06 and I suspect that starting fresh might be simpler than uplifting cacti (but I'm not super familiar with either tool so could be wrong about that) 19:43:22 also i guess it allows us to have one grafana interface into things 19:43:43 so we can mix-n-match stuff from graphite and prometheus into the same dashboards 19:44:11 we could include zuul system metrics alonside the service metric :) 19:44:19 precisely 19:44:28 yep. the trick will be getting actually correct memory graphs :) 19:44:40 #action Someone Write up a spec to replace cacti with prometheus 19:44:48 This way I won't forget its a thing 19:45:04 I can probably give that a go too, but not promising that at the moment 19:45:14 Any other upgrade items before we move on? 19:46:20 #topic Removing PPC packages from CentOS 8 mirrors 19:46:38 Yesterday we discovered that our centos mirrors had stopped updating because we ran out of afs quota room 19:46:55 it was discovered that we are mirroring ppc packages btu we can't find any users of that (not surprising as we don't have ppc hardware) 19:47:14 apparently we intentionally included ppc64le packages in our centos-7 mirroring, and it's lumped in with 8 and 8-stream by default 19:47:14 did blame show me/tonyb adding it explicitly at some point? 19:47:31 ianw: if you did, it was many years back 19:47:58 I wanted to bring this up here as a last call for objections if there is some reason to not make that change 19:48:05 looks like https://review.opendev.org/c/opendev/system-config/+/662727 19:48:06 otherwise I think we should update all our mirrors to drop ppc 19:48:18 for 8 and 8-stream we just mirror all the official architectures, which are presently x86-64, aarch64 and ppc64le 19:48:22 Then update afs quotas as appropriate 19:48:28 hrw just posted a note not long about about dropping ppc from kolla iirc 19:48:44 ya yoctozepto confirmed nothing ever used ppc kolla in our CI systems 19:48:53 they had some ppc support but never exercised by us 19:49:05 i have a feeling we'd cut nearly a third of our content from that volume if we dropped ppc64le packages/images 19:49:45 has anyone written a change to do that cleanup yet? Does anyone want to write it? 19:50:09 i can do that 19:50:20 i expect it to be a fair number of exclusions, since there are more than a few subtrees for it 19:50:46 fungi: I think we can jsut remove ppc64le from altarchs and add it to altarches_ignored? 19:51:05 ianw: thanks 19:51:16 for centos-7 yes, but it's not in the altarches tree in 8 and 8-stream 19:51:20 fungi: aha 19:51:30 it's an official architecture so it's in-tree along with x86-64 and aarch64 19:51:33 #action ianw Push change to cleanup ppc packages in our CentOS mirrors 19:52:08 so, yeah, i welcome someone more familiar with rhel/centos repository structure tackling that 19:52:15 #topic Scheduling Project Renames 19:52:38 fungi: you had volunteered to update the rename playbook(s), do we have any change(s) for that yet? 19:52:53 i can't remember, did we determine this will be our first attempt at renaming a project since the 3.2 upgrade? 19:53:07 this is our first attempt yes, other than what we did by hand on the test node to confirm it is possible 19:53:19 also not surprisingly i forgot i volunteered for that, will need to revisit what we determined needed to be done for it 19:53:47 just remove the mysql bits i suppose? 19:53:50 fungi: we need to drop all the db stuff from the playbook and only do the on disk renames. Otherwise I think it is very similar 19:53:52 yup 19:54:07 and we can possibly test it using our gerrit functional test jobs 19:54:11 okay, i'll try to leave myself a less forgettable reminder this time 19:54:22 by adding that playbook to the list of test playbooks and confirming a project has moved within gerrit 19:54:29 but we can do that as a followon 19:54:34 thanks! 19:54:50 once we have those updates up then I think we can think about schedulign the renames as we'll have the tools to do it 19:54:54 #topic Open Discussion 19:54:58 i'm happy to include the functional testing in my change, but will appreciate some guidance on the validating bits 19:55:06 I can help with that 19:55:08 (when i reach that point) 19:55:17 I've poked at similar for other changes recently 19:55:39 We did end up using our time afterall. Was there anything else to bring up in the last 5 minutes here? 19:55:46 i have nothing 19:55:58 need to shift to prepping dinner in a few minutes 19:56:41 Why don't we all return to eating $meal then :) 19:56:45 thanks everyone! 19:56:47 #endmeeting