19:01:20 <clarkb> #startmeeting infra
19:01:20 <opendevmeet> Meeting started Tue Jul 27 19:01:20 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:20 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:20 <opendevmeet> The meeting name has been set to 'infra'
19:01:26 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2021-July/000270.html Our Agenda
19:01:37 <clarkb> #topic Announcements
19:01:43 <ianw> o/
19:01:48 <clarkb> I have none (though the first real agenda item is likely to produce one)
19:02:02 <clarkb> #topic Specs Approval
19:02:15 <clarkb> Just a note that I did approve the matrix spec last week
19:02:41 <clarkb> There is a series of changes that can be found at topic:matrix to start running an eavesdrop and gerritbot equivalent against our test matrix channel
19:02:58 <clarkb> I think I'm mostly up to date on reviews for those but reviews from others are always appreciated as well
19:03:30 <fungi> wake up, neo!
19:03:46 <clarkb> #topic Actions from last meeting
19:03:53 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-07-20-19.01.txt minutes from last meeting
19:03:57 <clarkb> #action someone write spec to replace Cacti with Prometheus
19:04:15 <clarkb> Keeping that there as a reminder that we intend on trying to do that. I'm hopeful that maybe next week I'll be able to start poking at that though
19:04:25 <clarkb> as well as maybe start swinging back around on the list server upgrades
19:04:32 <fungi> promethean cactus
19:04:49 <clarkb> #topic Topics
19:04:51 <clarkb> Time to dive in
19:05:01 <clarkb> #topic Service Coordinator Election
19:05:06 * fungi loves the topics topic
19:05:29 <fungi> is it that time of half-year again?
19:05:42 <clarkb> It occurred to me that we must be getting close to when I said we should do our next service coordinator election and sure enough I had missed when I said we should open nominations by a couple weeks :/
19:05:56 <clarkb> the original plan was to open nominations about 2 weeks ago then have an election next week if necessary
19:05:57 <fungi> shucks
19:06:09 <corvus> i guess you're it for another 6 months then
19:06:38 <fungi> stick a crash-test dummy in a chair and nominate it
19:06:44 <clarkb> I'm proposing that we open nominations now for two weeks then have an election from August 10-17 if necessary
19:07:08 <clarkb> If that seems reasonable I'll go ahead and send email to service-discuss telling people to get on the nominations
19:07:20 <clarkb> And figure out some maths for 6 months from now and try harder to not miss it again
19:07:34 <fungi> wfm, this is one of those "hot potato" elections where we all try to figure out how to be "not it"
19:07:41 <clarkb> (I think in my head elections happening August 1 meant that nothing needed to be done until August)
19:07:58 <clarkb> fungi: ianw corvus  ya one of yall should take the hot potato :)
19:08:44 <clarkb> I don't hear any complaints with that plan. Feel free to bring them up over the next hour or two as I need to finish this meeting and eat lunch before I send that email
19:08:55 <clarkb> But if I don't hear anything until then I'll proceed with the proposal above
19:09:12 <fungi> yeah, i'll have to decide between getting nothing done for the next six months or getting even less done ;)
19:10:06 <clarkb> #topic Review Upgrade
19:10:32 <clarkb> I mentioned this in #opendev earlier today, but I had privs bumped up temporarily to do some account cleanups and took advatnage of that to check melody
19:10:48 <clarkb> melody reports the new server has peaked at 84GB of memory use. I believe that is below the upper limit we have given it.
19:11:23 <clarkb> I think this gives us a few opportunities. First we can increase the size of our caches to fill more space. We can also reduce the heap limit provided to the jvm and give more memory to apache and the kernel caches
19:11:41 <clarkb> Neither seems super urgent right now but as we gather more of this real world data we can start to think about how to better take advantage of the new server
19:11:43 <ianw> yeah iirc we set the java heap limit at 96gb
19:11:56 <fungi> might want to check again just before we restart for the renames just to see if the needle has moved
19:12:04 <clarkb> fungi: ++
19:12:19 <fungi> (because we'll be resetting it at that point)
19:12:56 <clarkb> It is also a good double check on the assertion we needed more memory
19:12:58 <fungi> is there a formal plan somewhere yet for the rename maintenance i can add that?
19:13:05 <clarkb> This is about double what we were able to provide the server before so it definitely wanted more
19:13:19 <clarkb> fungi: not yet, that is on the agenda for later in the meeting to start getting all that together
19:13:25 <fungi> no sweat
19:13:50 <clarkb> I think the other big thing related to the review upgrade is cleanup of the old review01 server and review-test server
19:14:08 <fungi> we also probably have some clearance between the 96gib allocated to the jvm and the 125gib allocated to the vm but should see if we can spot how that's being spent
19:14:32 <clarkb> fungi: yup cacti should hopefull give us that info?
19:14:35 <clarkb> ianw: mordred and myself have confirmed that we believe we have all the info we want from review-test now. I think we are waiting on fungi to confirm as well.
19:14:53 <fungi> oh, jeez, responsibilities
19:15:02 <clarkb> ianw: I bring up review-test because I suspect cleaning it up may conflict with cleaning up review01. Do we want to work together on that?
19:15:34 <clarkb> I know you emntioned waiting for frickler_pto to return from pto before doing those cleanups as well so this probably isnt urgent. Just let me know how you want to proceed on that and I'll try to avoid stepping on toes
19:16:22 <fungi> there's nothing but dotfiles/dotdirs in my homedir now, so nothing i care to keep
19:16:24 <ianw> ok, i think i proposed a change to remove review-test bits?
19:16:33 <clarkb> ianw: oh maybe you did and I should go review it
19:16:44 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/801556
19:17:01 <clarkb> great sounds like we're ready to land ^ if it passes review. I've added it to the afternoon review list
19:17:14 <fungi> er, i mean nothing but dotfiles/dotdirs in my homedir on the old review server, but same goes for review-test too
19:17:38 <ianw> yeah, i assume everyone but frickler_pto has cleaned up on review01 too?
19:17:40 <clarkb> fungi: I called you and mordred out on review-test because you were both invovled in that server (along with myself) for various tasks
19:17:58 <clarkb> ianw: I got what I know is important on review01 from emory. But should do a skim to double check I'm not forgetting anything
19:18:03 <fungi> also getting rid of review-test will finally silence the cronjob errors i keep getting e-mailed about, so thumbs-up
19:18:21 <fungi> memory?
19:18:30 <clarkb> ya sorry memory. I knew I wnted the gerrit user cleanup records
19:18:48 <clarkb> as it will help us go back and do surgery if we need to for any of these accounts in the future
19:18:50 <fungi> just making sure there wasn't someone new named emory i was unaware of
19:19:01 <fungi> i did take a week off, after all
19:19:51 <clarkb> #topic Project Renames
19:20:07 <fungi> there's just the one so far, right?
19:20:19 <clarkb> We've said we'll do these July 30 at 15:00UTC. I sent email to service-announce announcing this
19:20:23 <fungi> and we're aiming to start at 15z friday?
19:20:26 <fungi> cool
19:20:27 <clarkb> and yup there is a single rename.
19:20:43 <fungi> having just one is probably better for testing this out anyway
19:20:44 <clarkb> As part of prep work I've been working on a change to test our rename playbook
19:20:46 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/802112 Test the rename playbook
19:20:54 <fungi> since it'll be the first rename after our major 3.x upgrade
19:20:57 <clarkb> fungi: ++
19:21:14 <clarkb> I don't think this testing change needs to land before we do the renames since it seems to show that the playbook is working as is
19:21:41 <fungi> but it's helpful to have it as a demonstration we expect this to work
19:21:43 <clarkb> There are some weird test framework integration things to sort out. I do intend on making that cleaner and merging the change. Just pointing out it isn't on our critical path here
19:21:51 <clarkb> yup
19:22:51 <clarkb> Additional prep work that needs doing: 1) review the rename change and ensure it is mergable 2) push a change to opendev/project-config to record the rename 3) write up a plan in an etherpad
19:23:22 <clarkb> for 1) having a couple people do that is good and I intend on being one of them. Does anyone want to do 2) and 3)
19:24:07 <clarkb> I can work on 2) and 3) tomorrow.
19:24:18 <clarkb> if someone else wants to do it let me know :)
19:24:29 <fungi> out of curiosity, does 802112 actually confirm the group rename worked?
19:24:49 <clarkb> fungi: it confirms that the group rename doesn't explode, but doesn't actively check the new group is what is in gerrit
19:24:57 <fungi> it just dawned on me to look and while you include a group rename there i don't see it assert the new name exists
19:25:02 <clarkb> fungi: we can pretty easily add a testinfra test to check that though
19:25:12 <fungi> yeah, okay. not super important for now
19:25:26 <fungi> just making sure i wasn't misreading
19:25:48 <fungi> fine as a todo for us down the line
19:26:04 <clarkb> The other thing to be aware of is that corvus has some changes up to zuul to change how zuul project secrets are managed
19:26:18 <clarkb> If those changes land before friday we'll need to update our process for renaming a project I think
19:26:33 <clarkb> I'm hoping that don't land this week :) but be aware of that if they do
19:26:42 <clarkb> s/that/they/
19:26:50 <fungi> i can't recall, do we actually restart zuul during the renames maintenance?
19:26:56 <clarkb> fungi: we do not
19:27:00 <corvus> i'm not planning to push for those this week
19:27:08 <fungi> then it's really if those land *and* we've restarted the scheduler?
19:27:19 <clarkb> fungi: yes
19:27:22 <fungi> k
19:27:40 <corvus> i'd like to restart soon and make a release and land those afterwords.  but even if they do land, it's not a huge deal
19:27:58 <corvus> also, if the projects being renamed don't have secrets, it doesn't matter at all
19:28:05 <clarkb> ya its not a huge deal just need to maybe comment out the zuul stuff in the playbook when we run it and manually execute the tools to do it this time around (or update the playbook)
19:28:06 <fungi> and the alternative would be to script changes via zkshell?
19:28:22 <clarkb> fungi: the changes from corvus include new tools to use instead of zkshell
19:28:35 <fungi> oh, right i sort of remember that now ;)
19:28:46 <clarkb> zkshell is an option but probably one to avoid
19:28:52 <clarkb> corvus: oh that is a good point
19:28:59 <fungi> so we'd dump, rename, then import?
19:29:01 <clarkb> I'm not sure if this project has secrets but I can double check
19:29:17 <clarkb> fungi: or if the copy and delete commands land too then we can copy then delete
19:29:17 <fungi> (and cleanup the old copies in zk afterward)
19:29:29 <fungi> yeah, that
19:29:34 <fungi> okay
19:29:39 <clarkb> I really don't expect it to be a major issue, but did want to call it out as I'm not sure how closely others are following that in zuul
19:30:02 <fungi> so  anyway, the lingering question is who will sign up for the rename prep 3 tasks you outlined
19:30:09 <clarkb> Other than all that I thinkwe check in Thursday and make sure we're ready to go as far as changes being pushed and mergable and a documented plan goes
19:30:27 <clarkb> fungi: ya I think I have time for them tomorrow if no one else gets started earlier
19:30:42 <fungi> i'll definitely do #1, i guess i can also volunteer for #2 if someone else will take #3 (or vice versa if there's a preference)
19:30:45 <clarkb> and we can check in thursday to make sure we aren't missing anything
19:31:04 <clarkb> fungi: ok I'll coordinate with you tomorrow ?
19:31:14 <fungi> wfm, i'm around all day
19:31:24 <fungi> minus a brief errand at some point
19:31:46 <clarkb> #topic gitea01 backups
19:32:20 <clarkb> ianw: any word from this on the vexxhost side of things? I had issues copying files from review-test to review02 to record gerrit user cleanups over ipv6 the other day too
19:32:24 <clarkb> I had to use a -4 flag
19:33:34 <ianw> no, apparently an issue was opened but i haven't received any response to a ping about it last week
19:34:15 <ianw> i haven't seen mnaser around much even in the vexxhost channel, so it probably is not an effective communication mechanism
19:34:29 <ianw> we also rebooted without that helping
19:34:31 <clarkb> alright, not much we can do other than dropping AAAA records in dns and I'm not sure we're there yet. Maybe we should do that for the backup servers though to make this particular issue go away?
19:34:52 <clarkb> I guess it would be server not servers
19:35:25 <ianw> yeah it works to rax
19:35:44 <fungi> so far all we know is it's broken between different vexxhost regions, right?
19:35:56 <clarkb> fungi: and review-test to review02 in similar fashion
19:36:03 <ianw> so we're just getting 24 hour dumps, instead of 12 hour dumps, so probably not super critical
19:36:05 <fungi> that lends further credence to the stale route entry in one of the core routers
19:36:13 <fungi> theory
19:37:43 <clarkb> anyway this continues to be non critical, but I don't want to forget about it. Hopefully vexxhost is able to dig into it
19:37:56 <clarkb> #topic PTG Participation
19:38:09 <clarkb> I submitted PTG participation for us
19:38:18 <fungi> thanks!
19:38:20 <clarkb> Signed up for 14:00 - 16:00 UTC Wednesday October 20, 2021
19:38:36 <clarkb> That block seemed to be popular last time and the others were not.
19:38:55 <clarkb> #link https://openinfra-ptg.eventbrite.com/ Register if you plan to attend
19:39:29 <clarkb> If you can make that time block to help answer questions or drive discussion that would be great. But I know it is pretty terrible for ianw in particular. But also fairly early for west coast usa
19:39:46 <fungi> it's no sweat for me at least
19:39:51 <clarkb> I plan to be there. If you can't make it that isn't a big deal as the idea is to help others with opendev questions more so than for us to use the time to collaborate directly
19:39:58 <fungi> now i just need to remember to register
19:41:05 <clarkb> #topic Open Discussion
19:41:20 <clarkb> Over the last day and a half I've done a big push on gerrit account email conflict cleanups
19:41:31 <clarkb> I think those went well and we are now down to 103 remaining conflicts
19:42:01 <fungi> that's amazing
19:42:09 <fungi> and soon 102 right?
19:42:14 <clarkb> I've put an audit result yaml file and a set of proposed next cleanups on review02 in ~/clarkb/gerrit_user_cleanups/notes if anyone else is able to take a look at those.
19:42:30 <clarkb> fungi: it was 105 yseterday and I got it to 103 today after cleaning up dpawlik's account and a straggler from yesterday
19:42:34 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/801667
19:42:44 <ianw> that's a review to update restart flags to be more consistent
19:42:44 <fungi> i haven't looked yet, do you recall how many the new batch will knock out?
19:42:50 <clarkb> fungi: its about 70
19:42:59 <fungi> that's a substantial chunk
19:43:00 <clarkb> fungi: I think we've whittled it down to about 30 where direct reach out is a good idea
19:43:43 <corvus> is anyone else planning on reviewing the matrix changes?  i believe we're about ready to run 2 bots, and with those landed, we could start prepping to move the zuul project over, but the changes aren't getting much in the way of votes.
19:44:08 <clarkb> corvus: I mentioned them at the beginning of the meeting and asked for reviews. I think I've reviewed them, but would be good for someone other than myself to review them if possible
19:44:35 <clarkb> For the remaing ~30 accounts I'd like to push a  change that fixes them and gets verfied by gerrit directly
19:44:44 <clarkb> we'll see what that looks like when I get to ~30
19:44:47 <corvus> yeah, i'm looking for a review volunteer :)
19:46:03 <clarkb> ++ would be good to have another reviewer
19:46:15 <fungi> i intend to try but am cautious not to overcommit and don't want the changes to be held up waiting for my feedback
19:46:22 <clarkb> #link https://review.opendev.org/q/topic:matrix matrix bot changes
19:47:13 <clarkb> Oh and yuriys jumped on OFTC today and started talking about reviving the inmotion cloud. Sounds like there was a network card problem that caused an interface to drop whcih cascaded to sad rabbitmq
19:47:41 <corvus> fungi: does that mean we should waive the 2x+2 requirement?
19:47:46 <clarkb> restarting services seems to make things happier, but we're going to take the opportunity here to upgrade the operating system for kernel patches and update to newer openstack kolla docker images as well
19:48:08 <corvus> just so we're clear, these changes have been sitting out there for about 3 weeks, so at this point, i don't think i'm pushing an overly aggressive timeline
19:48:09 <fungi> corvus: it means i don't want to say i'm going to review it and then have people hold the changes open waiting for me even if they have enough reviews to merge
19:48:50 <fungi> but i'd also be fine merging them with a core reviewer proposing and another core reviewer who isn't me approving
19:49:21 <clarkb> in that case maybe we can proceedwith the eavesdrop bot landing. Then corvus and I can review tristanC's gerritbot change
19:49:30 <fungi> yeah, no objection
19:49:42 <fungi> i'm still happy to help supporting it even though i haven't had time to review
19:49:46 <clarkb> ianw: would you like to review any of those or are you good with proceeding?
19:50:23 <corvus> there was a previous +2 from mordred on those too; i imagine it's not easy for mordred to keep up with updates since then.
19:50:55 <fungi> but very much appreciated!
19:53:01 <clarkb> I think that may be it. ianw  let us know if you want to review topic:matrix but we'll probably proceed later today/tomorrow if we don't hear otherwise
19:53:08 <clarkb> Thank you everyone!
19:53:20 <corvus> clarkb: thanks!
19:53:25 <clarkb> I'm about to grab lunch but then will be back toe send that service coordinator email and review all the things
19:53:40 <clarkb> #endmeeting