19:01:16 <clarkb> #startmeeting infra
19:01:17 <opendevmeet> Meeting started Tue Jun  8 19:01:16 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:18 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:21 <opendevmeet> The meeting name has been set to 'infra'
19:01:25 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2021-June/000253.html Our Agenda
19:01:39 <clarkb> The agenda isn't very big today. Hopefully I'll be able to give you some time back
19:01:47 <clarkb> #topic Announcements
19:02:23 <clarkb> This didn't make the agenda, but it occurred to me wheil prepping for this meeting. I've got family in town starting tonight and I may not be able to run the next two meetings. I think I'll be around next week but definitely not the week after
19:02:49 <clarkb> I'll try to figure that out more accurately once i can discuss their plans tonight/tomorrow
19:03:22 <clarkb> #topic Actions from last meeting
19:03:28 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-06-01-19.01.txt minutes from last meeting
19:03:34 <clarkb> There were no recorded actions
19:04:04 <clarkb> #topic Specs
19:04:14 <clarkb> #link https://review.opendev.org/c/zuul/zuul/+/793669 Zuul spec to use matrix instead of irc
19:04:34 <clarkb> This isn't our spec but it is worth keeping an eye on and commenting on if you have input
19:04:56 <fungi> i think corvus is hoping to make the decision on it later this week
19:05:07 <fungi> so feedback sooner would be better
19:05:35 <clarkb> ++
19:05:46 <clarkb> #topic Topics
19:05:52 <clarkb> #topic Switch to OFTC
19:06:16 <clarkb> ianw has been working to redo our irc bot deployment stuff. Including spinning up a new server and running bots out of containers there
19:06:28 <fungi> and a new eavesdrop server
19:06:30 <clarkb> meetbot and ptgbot are the bots left out in that process as they aren't currently containerized
19:06:35 <fungi> er, you did say new server, yeah
19:06:37 <clarkb> #link https://review.opendev.org/q/topic:%22limnoria%22+status:open Limnoria bot rewrite
19:07:00 <clarkb> This topic is a set of changes that converts meetbot from supybot to limnoria and containerized that bot so ti can run on the new server too
19:07:10 <fungi> statusbot, btw, is still running foreground in a screen session, hasn't crashed again :/
19:07:14 <clarkb> I think I've gotten through that topic for reviews but should double check
19:07:21 <clarkb> #link https://review.opendev.org/c/openstack/ptgbot/+/793792 Update PTGbot for OFTC
19:07:56 <clarkb> This change updates ptgbot to make it possible for it to connect to oftc, but we need to double check if it needs python2 or if it can run under python3. If it can run under python3 then a good followup would be containerizing it. If not then converting to python3 then containerizing it would be good
19:08:05 <fungi> oh also, i think the openstack tc has decided friday of this week is when they'll change channel topics on freenode, not sure if anyone else is following suit or if we want to do something similar for our handful of opendev channels there
19:08:36 <ianw> so the bits that want review now are the deployment
19:08:39 <ianw> #link https://review.opendev.org/q/topic:%22meetbot-container%22+(status:open%20OR%20status:merged)
19:09:06 <clarkb> ah ok I'll have to review those then
19:09:10 <ianw> i am still working on statusbot in
19:09:11 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/795213
19:09:35 <clarkb> fungi: I'm happy to update our topic for #opendev and #opendev-meeting though I dropped off of freenode today. Neither channel had any real traffic since the switc hthough
19:09:58 <ianw> that's the only bit i would consider hacky.  i'm installing it in the limnoria container, even though it's not a plug-in, on the theory that it *should* be
19:10:05 <fungi> yeah, i can do topic updates, i'm still joined there
19:10:30 <ianw> but realistically, i doubt anyone is that interested in rewriting it; but if they did, that's where it should go
19:10:52 <clarkb> ianw: you mean rather than using a separate statusbot container?
19:11:15 <clarkb> ianw: is there a reason to combine them rather than do process per container? I think we have tried to stick to process per container as much as possible
19:13:06 <clarkb> I'll try to dig into that when get to reviewing the changes
19:13:30 <ianw> well, to avoid having to make another container and build process for it all, for something that we don't really care that much about
19:13:43 <clarkb> I see
19:14:00 <clarkb> Another thing I wanted to point out is that zuul is using a matrix element oftc connection url instead of oftc's webchat as easy connect mode. If we've got similar links that need updating I think that worked reasonable well
19:14:08 <clarkb> might also suggest it to other users like openstack
19:15:30 <clarkb> anything else on the subject of IRC?
19:16:13 <ianw> i guess ptgbot i wanted to call out
19:16:40 <ianw> i've -1'd
19:17:05 <ianw> #link https://review.opendev.org/c/opendev/puppet-ptgbot/+/793794
19:17:23 <fungi> we have time to solve ptgbot, there's no looming ptg for months thankfully
19:17:24 <ianw> explaining that the puppet no longer runs on eavesdrop01.openstack.org (because we're worried pip will explode on itself)
19:17:47 <clarkb> ianw: oh I didn't realize there was a puppet change already too. I agree we should convert it over. Particularly since we have time for that bot
19:17:52 <fungi> so if it's offline for a while to get containerified, it's not a huge deal
19:18:02 <ianw> so the future for this, if there is any, is for someone to make a container and migrate it to ansible, and deploy on eavesdrop01
19:18:12 <clarkb> ++
19:18:21 <ianw> at one time, per the irc spec, the future was to rewrite it as a limnoria plugin
19:18:39 <ianw> but i honestly don't think anyone is going to do that now
19:19:28 <fungi> i have no idea how much of a forklift that would actually be, i expect the internal logic is mostly isolated and would probably not be that hard to translate to a limnoria plugin, but i can't say i've looked at it with that in mind
19:19:30 <ianw> and i would not be surprised if it prompts a think about it's future anyway
19:20:04 <clarkb> fungi: on the flip side I think installing it to a container and bind mounting the dir where it stores info is probably super trivial
19:20:11 <fungi> yep
19:20:21 <fungi> we already just pip install it anyway
19:21:15 <corvus> o/
19:21:34 <clarkb> we can bring it up with diablo_rojo_phone and ttx to see what they think and take it from there
19:21:50 <clarkb> I agree with ianw though taht we shouldn't try to keep the puppetry going given what hapepned with pip on the old server when we switched
19:22:01 <ianw> i doubt containerising it would be too difficult.  i'd just like someone else to own it :)
19:22:08 <clarkb> ++
19:22:23 <clarkb> I suspect diablo_rojo_phone and/or ttx would be happy to help with that
19:22:42 <clarkb> Lets move on
19:22:46 <clarkb> #topic Gerrit Updates
19:22:50 <ianw> i'm happy to help point people in the right direction to get it all going
19:23:02 <clarkb> ianw: I can let them know. I can probably help too
19:23:16 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/775961 MariaDB container for review02 accountPatchReviewDB
19:23:30 <clarkb> ianw has been working on ^ as one of the last pieces before we can schedule a cut over to review02
19:23:47 <clarkb> I found one more thing reviewing it this morning that would be good to double check
19:24:06 <ianw> ok i'll loop back.
19:24:10 <clarkb> But otherwise I think it is just about there. It would be good if other infra-root can review it too since it has the potential to make review01 unhappy (though the change attempts to avoid changing anything on that)
19:24:28 <clarkb> I also remembered that I had achange up to unfork gerrit
19:24:39 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/791995 Stop adding special x/ path handling to gerrit
19:25:03 <fungi> yeah, that's fixed and backported to the minor series we're on
19:25:06 <clarkb> It passes testing, though we may want to double check we don't need to update plugin versions on the image build?
19:25:26 <clarkb> Applying that will require a gerrit restart, but I think getting off of our fork is a good thing overall and something we haven't done for years :)
19:25:34 <clarkb> Please carefully review that one too.
19:26:07 <ianw> it would be good to get the latest zuul summary plugin in with better timestamps
19:26:12 <ianw> i have
19:26:14 <ianw> #link https://etherpad.opendev.org/p/gerrit-upgrade-2021
19:26:24 <ianw> tracking general upgrade notes
19:26:42 <clarkb> ianw: landing 791995 should rebuild with the latest zuul summary plugin too.
19:27:01 <ianw> when we have the accountReviewDb situation sorted, i think we can more or less migrate at will
19:27:24 <clarkb> ianw: was reverse dns set up? Seeing that on your etherpad and remembering that is important for email
19:27:48 <ianw> ahh, that's a good one, i should ping mnaser on that one
19:28:20 <clarkb> ++
19:28:24 <clarkb> anything else on gerrit updates?
19:28:27 <ianw> i think review02 will be the host now, especially with the address pinning we've done
19:28:39 <fungi> i suppose the same mariadb container can be useful for a future zuul scheduler, and storyboard containerization as well
19:29:00 <clarkb> fungi: yup, we also already do similar for gitea and maybe something else too
19:29:17 <ianw> yeah, it's just the upstream container, but the deployment bits may be helpful to copy
19:29:33 <fungi> oh, right, forgot we already had that implemented for gitea since ages
19:30:16 <clarkb> #topic Server Upgrades
19:30:28 <clarkb> We have already discussed the eavesdrop upgrade and tasks for the gerrit upgrade
19:30:56 <clarkb> I'm still slowly poking at the listserv upgrades. I'm hoping I'll have time to boot a test node and upgrade it in place during the second half of this week
19:31:13 <clarkb> I'll be using the snapshot that fungi took about a month ago t oavoid needing to wait for and slow down the server again
19:31:24 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/795207
19:31:36 <ianw> ^ that removes ask.o.o which is causing deployment issues as it's now shutdown
19:31:55 <clarkb> ianw: fungi: shoudl we stash a db dump from the ask db as part of that?
19:32:13 <ianw> we do already have that on backup hosts
19:32:17 <clarkb> oh excellent
19:32:28 <clarkb> the redirect with LE cert seems to be working well too
19:32:36 <fungi> it's a local redis, right?
19:32:48 <clarkb> no askbot is postgres or mysql. I forget which
19:33:01 <fungi> ahh, okay, but regardless it's not in a trove instance
19:33:10 <ianw> iirc it was a mysql and dumped out with mysqldump as a stream
19:33:11 <clarkb> for that I am not sure
19:33:21 <clarkb> it might be trove if it was mysql
19:33:31 <fungi> i'll check, if it's trove i'll make an archival snapshot and then delete the instance
19:33:45 <clarkb> thanks!
19:33:53 <clarkb> Any other server upgrades or cleanups to call out?
19:34:17 <ianw> we must be getting pretty thin on xenial now?
19:34:19 <clarkb> Oh it would be good if infra-root can check the test node listservs that I upgraded to see if anything stands out as broken
19:34:26 <clarkb> ianw: slowly but surely :)
19:34:36 <ianw> post listserv, eavesdrop, review what is left?
19:34:41 <ianw> paste i guess
19:34:42 <clarkb> one thing I notice about the inplace upgrades of the listservs is we'll end up with sysv init scripts and unit files for mailman
19:34:51 <fungi> not wiki, it's still trusty! ;)
19:35:18 <clarkb> ianw: translate which we need to figure out because zanata, health, elasticsearch, logstash
19:36:03 <clarkb> we said we wouldn't just turn off ELK things to give peopel time to try and figure that out. We should probably periodically check in on that and work with openstack to draw a line in the sand so that we aren't just keepign it out there forever
19:36:59 <fungi> no trove instance for askbot, btw
19:37:19 <clarkb> #action clarkb followup with openstack on ELK retirement planning
19:37:23 <clarkb> I can do that
19:37:35 <ianw> zanata i forget, was that a dead project?
19:37:38 <fungi> (we have trove instances for one or more of cacti, grafana, paste, refstack, review, subunit2sql, translate, wiki, zuul)
19:37:38 <clarkb> yup
19:38:01 <fungi> one or more each i mean
19:38:09 <clarkb> ianw: and there was talk about transitioning to a hosted open source alternative, but it kept getting deprioritized on my end because other things kept popping up
19:38:17 <fungi> (e.g., review/review-dev/review-test)
19:38:34 <clarkb> ianw: transitioning that would actually be a good project for someone in the community if we can find someone interested in it since all the data is exportabel as a user aiui
19:38:39 <ianw> oh, cacti must be another one that needs updating
19:39:07 <clarkb> corvus had talked about switching to some prometheus + influxdb + grafana type setup
19:39:11 <ianw> yeah, i remember discussion on translate that seemed to stall
19:39:56 <fungi> if memory serves, the proposal was to have prometheus query snmpd and then store the data in its own timeseries database?
19:40:02 <corvus> i recommend skipping influxdb
19:40:03 <corvus> ya
19:40:31 <clarkb> The upside to that is we can plug services like gitea into it
19:40:32 <corvus> grafana+prometheus snmp_exporter+grafana
19:40:44 <ianw> i was reading last night about https://grafana.com/docs/grafana/latest/whatsnew/whats-new-in-v8-0/#grafana-v80-alerts
19:40:44 <clarkb> and get service level as well as host level metrics
19:40:49 <fungi> and i guess if we want history, we can use rrdtool to dump the existing graphs and ingest them into prom
19:41:00 <corvus> i haven't run that yet; but i have run influxdb and don't think it's a good fit right now
19:42:49 <clarkb> maybe we should write up a small spec for that? As I do think replacing cacti with prometheus gets us more functionality like the gitea service metrics which will be nice to have
19:43:06 <clarkb> and I suspect that starting fresh might be simpler than uplifting cacti (but I'm not super familiar with either tool so could be wrong about that)
19:43:22 <fungi> also i guess it allows us to have one grafana interface into things
19:43:43 <fungi> so we can mix-n-match stuff from graphite and prometheus into the same dashboards
19:44:11 <clarkb> we could include zuul system metrics alonside the service metric :)
19:44:19 <fungi> precisely
19:44:28 <corvus> yep.  the trick will be getting actually correct memory graphs :)
19:44:40 <clarkb> #action Someone Write up a spec to replace cacti with prometheus
19:44:48 <clarkb> This way I won't forget its a thing
19:45:04 <clarkb> I can probably give that a go too, but not promising that at the moment
19:45:14 <clarkb> Any other upgrade items before we move on?
19:46:20 <clarkb> #topic Removing PPC packages from CentOS 8 mirrors
19:46:38 <clarkb> Yesterday we discovered that our centos mirrors had stopped updating because we ran out of afs quota room
19:46:55 <clarkb> it was discovered that we are mirroring ppc packages btu we can't find any users of that (not surprising as we don't have ppc hardware)
19:47:14 <fungi> apparently we intentionally included ppc64le packages in our centos-7 mirroring, and it's lumped in with 8 and 8-stream by default
19:47:14 <ianw> did blame show me/tonyb adding it explicitly at some point?
19:47:31 <fungi> ianw: if you did, it was many years back
19:47:58 <clarkb> I wanted to bring this up here as a last call for objections if there is some reason to not make that change
19:48:05 <ianw> looks like https://review.opendev.org/c/opendev/system-config/+/662727
19:48:06 <clarkb> otherwise I think we should update all our mirrors to drop ppc
19:48:18 <fungi> for 8 and 8-stream we just mirror all the official architectures, which are presently x86-64, aarch64 and ppc64le
19:48:22 <clarkb> Then update afs quotas as appropriate
19:48:28 <ianw> hrw just posted a note not long about about dropping ppc from kolla iirc
19:48:44 <clarkb> ya yoctozepto confirmed nothing ever used ppc kolla in our CI systems
19:48:53 <clarkb> they had some ppc support but never exercised by us
19:49:05 <fungi> i have a feeling we'd cut nearly a third of our content from that volume if we dropped ppc64le packages/images
19:49:45 <clarkb> has anyone written a change to do that cleanup yet? Does anyone want to write it?
19:50:09 <ianw> i can do that
19:50:20 <fungi> i expect it to be a fair number of exclusions, since there are more than a few subtrees for it
19:50:46 <clarkb> fungi: I think we can jsut remove ppc64le from altarchs and add it to altarches_ignored?
19:51:05 <clarkb> ianw: thanks
19:51:16 <fungi> for centos-7 yes, but it's not in the altarches tree in 8 and 8-stream
19:51:20 <clarkb> fungi: aha
19:51:30 <fungi> it's an official architecture so it's in-tree along with x86-64 and aarch64
19:51:33 <clarkb> #action ianw Push change to cleanup ppc packages in our CentOS mirrors
19:52:08 <fungi> so, yeah, i welcome someone more familiar with rhel/centos repository structure tackling that
19:52:15 <clarkb> #topic Scheduling Project Renames
19:52:38 <clarkb> fungi: you had volunteered to update the rename playbook(s), do we have any change(s) for that yet?
19:52:53 <fungi> i can't remember, did we determine this will be our first attempt at renaming a project since the 3.2 upgrade?
19:53:07 <clarkb> this is our first attempt yes, other than what we did by hand on the test node to confirm it is possible
19:53:19 <fungi> also not surprisingly i forgot i volunteered for that, will need to revisit what we determined needed to be done for it
19:53:47 <fungi> just remove the mysql bits i suppose?
19:53:50 <clarkb> fungi: we need to drop all the db stuff from the playbook and only do the on disk renames. Otherwise I think it is very similar
19:53:52 <clarkb> yup
19:54:07 <clarkb> and we can possibly test it using our gerrit functional test jobs
19:54:11 <fungi> okay, i'll try to leave myself a less forgettable reminder this time
19:54:22 <clarkb> by adding that playbook to the list of test playbooks and confirming a project has moved within gerrit
19:54:29 <clarkb> but we can do that as a followon
19:54:34 <clarkb> thanks!
19:54:50 <clarkb> once we have those updates up then I think we can think about schedulign the renames as we'll have the tools to do it
19:54:54 <clarkb> #topic Open Discussion
19:54:58 <fungi> i'm happy to include the functional testing in my change, but will appreciate some guidance on the validating bits
19:55:06 <clarkb> I can help with that
19:55:08 <fungi> (when i reach that point)
19:55:17 <clarkb> I've poked at similar for other changes recently
19:55:39 <clarkb> We did end up using our time afterall. Was there anything else to bring up in the last 5 minutes here?
19:55:46 <fungi> i have nothing
19:55:58 <fungi> need to shift to prepping dinner in a few minutes
19:56:41 <clarkb> Why don't we all return to eating $meal then :)
19:56:45 <clarkb> thanks everyone!
19:56:47 <clarkb> #endmeeting