19:01:20 #startmeeting infra 19:01:20 Meeting started Tue Jul 27 19:01:20 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:20 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:20 The meeting name has been set to 'infra' 19:01:26 #link http://lists.opendev.org/pipermail/service-discuss/2021-July/000270.html Our Agenda 19:01:37 #topic Announcements 19:01:43 o/ 19:01:48 I have none (though the first real agenda item is likely to produce one) 19:02:02 #topic Specs Approval 19:02:15 Just a note that I did approve the matrix spec last week 19:02:41 There is a series of changes that can be found at topic:matrix to start running an eavesdrop and gerritbot equivalent against our test matrix channel 19:02:58 I think I'm mostly up to date on reviews for those but reviews from others are always appreciated as well 19:03:30 wake up, neo! 19:03:46 #topic Actions from last meeting 19:03:53 #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-07-20-19.01.txt minutes from last meeting 19:03:57 #action someone write spec to replace Cacti with Prometheus 19:04:15 Keeping that there as a reminder that we intend on trying to do that. I'm hopeful that maybe next week I'll be able to start poking at that though 19:04:25 as well as maybe start swinging back around on the list server upgrades 19:04:32 promethean cactus 19:04:49 #topic Topics 19:04:51 Time to dive in 19:05:01 #topic Service Coordinator Election 19:05:06 * fungi loves the topics topic 19:05:29 is it that time of half-year again? 19:05:42 It occurred to me that we must be getting close to when I said we should do our next service coordinator election and sure enough I had missed when I said we should open nominations by a couple weeks :/ 19:05:56 the original plan was to open nominations about 2 weeks ago then have an election next week if necessary 19:05:57 shucks 19:06:09 i guess you're it for another 6 months then 19:06:38 stick a crash-test dummy in a chair and nominate it 19:06:44 I'm proposing that we open nominations now for two weeks then have an election from August 10-17 if necessary 19:07:08 If that seems reasonable I'll go ahead and send email to service-discuss telling people to get on the nominations 19:07:20 And figure out some maths for 6 months from now and try harder to not miss it again 19:07:34 wfm, this is one of those "hot potato" elections where we all try to figure out how to be "not it" 19:07:41 (I think in my head elections happening August 1 meant that nothing needed to be done until August) 19:07:58 fungi: ianw corvus ya one of yall should take the hot potato :) 19:08:44 I don't hear any complaints with that plan. Feel free to bring them up over the next hour or two as I need to finish this meeting and eat lunch before I send that email 19:08:55 But if I don't hear anything until then I'll proceed with the proposal above 19:09:12 yeah, i'll have to decide between getting nothing done for the next six months or getting even less done ;) 19:10:06 #topic Review Upgrade 19:10:32 I mentioned this in #opendev earlier today, but I had privs bumped up temporarily to do some account cleanups and took advatnage of that to check melody 19:10:48 melody reports the new server has peaked at 84GB of memory use. I believe that is below the upper limit we have given it. 19:11:23 I think this gives us a few opportunities. First we can increase the size of our caches to fill more space. We can also reduce the heap limit provided to the jvm and give more memory to apache and the kernel caches 19:11:41 Neither seems super urgent right now but as we gather more of this real world data we can start to think about how to better take advantage of the new server 19:11:43 yeah iirc we set the java heap limit at 96gb 19:11:56 might want to check again just before we restart for the renames just to see if the needle has moved 19:12:04 fungi: ++ 19:12:19 (because we'll be resetting it at that point) 19:12:56 It is also a good double check on the assertion we needed more memory 19:12:58 is there a formal plan somewhere yet for the rename maintenance i can add that? 19:13:05 This is about double what we were able to provide the server before so it definitely wanted more 19:13:19 fungi: not yet, that is on the agenda for later in the meeting to start getting all that together 19:13:25 no sweat 19:13:50 I think the other big thing related to the review upgrade is cleanup of the old review01 server and review-test server 19:14:08 we also probably have some clearance between the 96gib allocated to the jvm and the 125gib allocated to the vm but should see if we can spot how that's being spent 19:14:32 fungi: yup cacti should hopefull give us that info? 19:14:35 ianw: mordred and myself have confirmed that we believe we have all the info we want from review-test now. I think we are waiting on fungi to confirm as well. 19:14:53 oh, jeez, responsibilities 19:15:02 ianw: I bring up review-test because I suspect cleaning it up may conflict with cleaning up review01. Do we want to work together on that? 19:15:34 I know you emntioned waiting for frickler_pto to return from pto before doing those cleanups as well so this probably isnt urgent. Just let me know how you want to proceed on that and I'll try to avoid stepping on toes 19:16:22 there's nothing but dotfiles/dotdirs in my homedir now, so nothing i care to keep 19:16:24 ok, i think i proposed a change to remove review-test bits? 19:16:33 ianw: oh maybe you did and I should go review it 19:16:44 #link https://review.opendev.org/c/opendev/system-config/+/801556 19:17:01 great sounds like we're ready to land ^ if it passes review. I've added it to the afternoon review list 19:17:14 er, i mean nothing but dotfiles/dotdirs in my homedir on the old review server, but same goes for review-test too 19:17:38 yeah, i assume everyone but frickler_pto has cleaned up on review01 too? 19:17:40 fungi: I called you and mordred out on review-test because you were both invovled in that server (along with myself) for various tasks 19:17:58 ianw: I got what I know is important on review01 from emory. But should do a skim to double check I'm not forgetting anything 19:18:03 also getting rid of review-test will finally silence the cronjob errors i keep getting e-mailed about, so thumbs-up 19:18:21 memory? 19:18:30 ya sorry memory. I knew I wnted the gerrit user cleanup records 19:18:48 as it will help us go back and do surgery if we need to for any of these accounts in the future 19:18:50 just making sure there wasn't someone new named emory i was unaware of 19:19:01 i did take a week off, after all 19:19:51 #topic Project Renames 19:20:07 there's just the one so far, right? 19:20:19 We've said we'll do these July 30 at 15:00UTC. I sent email to service-announce announcing this 19:20:23 and we're aiming to start at 15z friday? 19:20:26 cool 19:20:27 and yup there is a single rename. 19:20:43 having just one is probably better for testing this out anyway 19:20:44 As part of prep work I've been working on a change to test our rename playbook 19:20:46 #link https://review.opendev.org/c/opendev/system-config/+/802112 Test the rename playbook 19:20:54 since it'll be the first rename after our major 3.x upgrade 19:20:57 fungi: ++ 19:21:14 I don't think this testing change needs to land before we do the renames since it seems to show that the playbook is working as is 19:21:41 but it's helpful to have it as a demonstration we expect this to work 19:21:43 There are some weird test framework integration things to sort out. I do intend on making that cleaner and merging the change. Just pointing out it isn't on our critical path here 19:21:51 yup 19:22:51 Additional prep work that needs doing: 1) review the rename change and ensure it is mergable 2) push a change to opendev/project-config to record the rename 3) write up a plan in an etherpad 19:23:22 for 1) having a couple people do that is good and I intend on being one of them. Does anyone want to do 2) and 3) 19:24:07 I can work on 2) and 3) tomorrow. 19:24:18 if someone else wants to do it let me know :) 19:24:29 out of curiosity, does 802112 actually confirm the group rename worked? 19:24:49 fungi: it confirms that the group rename doesn't explode, but doesn't actively check the new group is what is in gerrit 19:24:57 it just dawned on me to look and while you include a group rename there i don't see it assert the new name exists 19:25:02 fungi: we can pretty easily add a testinfra test to check that though 19:25:12 yeah, okay. not super important for now 19:25:26 just making sure i wasn't misreading 19:25:48 fine as a todo for us down the line 19:26:04 The other thing to be aware of is that corvus has some changes up to zuul to change how zuul project secrets are managed 19:26:18 If those changes land before friday we'll need to update our process for renaming a project I think 19:26:33 I'm hoping that don't land this week :) but be aware of that if they do 19:26:42 s/that/they/ 19:26:50 i can't recall, do we actually restart zuul during the renames maintenance? 19:26:56 fungi: we do not 19:27:00 i'm not planning to push for those this week 19:27:08 then it's really if those land *and* we've restarted the scheduler? 19:27:19 fungi: yes 19:27:22 k 19:27:40 i'd like to restart soon and make a release and land those afterwords. but even if they do land, it's not a huge deal 19:27:58 also, if the projects being renamed don't have secrets, it doesn't matter at all 19:28:05 ya its not a huge deal just need to maybe comment out the zuul stuff in the playbook when we run it and manually execute the tools to do it this time around (or update the playbook) 19:28:06 and the alternative would be to script changes via zkshell? 19:28:22 fungi: the changes from corvus include new tools to use instead of zkshell 19:28:35 oh, right i sort of remember that now ;) 19:28:46 zkshell is an option but probably one to avoid 19:28:52 corvus: oh that is a good point 19:28:59 so we'd dump, rename, then import? 19:29:01 I'm not sure if this project has secrets but I can double check 19:29:17 fungi: or if the copy and delete commands land too then we can copy then delete 19:29:17 (and cleanup the old copies in zk afterward) 19:29:29 yeah, that 19:29:34 okay 19:29:39 I really don't expect it to be a major issue, but did want to call it out as I'm not sure how closely others are following that in zuul 19:30:02 so anyway, the lingering question is who will sign up for the rename prep 3 tasks you outlined 19:30:09 Other than all that I thinkwe check in Thursday and make sure we're ready to go as far as changes being pushed and mergable and a documented plan goes 19:30:27 fungi: ya I think I have time for them tomorrow if no one else gets started earlier 19:30:42 i'll definitely do #1, i guess i can also volunteer for #2 if someone else will take #3 (or vice versa if there's a preference) 19:30:45 and we can check in thursday to make sure we aren't missing anything 19:31:04 fungi: ok I'll coordinate with you tomorrow ? 19:31:14 wfm, i'm around all day 19:31:24 minus a brief errand at some point 19:31:46 #topic gitea01 backups 19:32:20 ianw: any word from this on the vexxhost side of things? I had issues copying files from review-test to review02 to record gerrit user cleanups over ipv6 the other day too 19:32:24 I had to use a -4 flag 19:33:34 no, apparently an issue was opened but i haven't received any response to a ping about it last week 19:34:15 i haven't seen mnaser around much even in the vexxhost channel, so it probably is not an effective communication mechanism 19:34:29 we also rebooted without that helping 19:34:31 alright, not much we can do other than dropping AAAA records in dns and I'm not sure we're there yet. Maybe we should do that for the backup servers though to make this particular issue go away? 19:34:52 I guess it would be server not servers 19:35:25 yeah it works to rax 19:35:44 so far all we know is it's broken between different vexxhost regions, right? 19:35:56 fungi: and review-test to review02 in similar fashion 19:36:03 so we're just getting 24 hour dumps, instead of 12 hour dumps, so probably not super critical 19:36:05 that lends further credence to the stale route entry in one of the core routers 19:36:13 theory 19:37:43 anyway this continues to be non critical, but I don't want to forget about it. Hopefully vexxhost is able to dig into it 19:37:56 #topic PTG Participation 19:38:09 I submitted PTG participation for us 19:38:18 thanks! 19:38:20 Signed up for 14:00 - 16:00 UTC Wednesday October 20, 2021 19:38:36 That block seemed to be popular last time and the others were not. 19:38:55 #link https://openinfra-ptg.eventbrite.com/ Register if you plan to attend 19:39:29 If you can make that time block to help answer questions or drive discussion that would be great. But I know it is pretty terrible for ianw in particular. But also fairly early for west coast usa 19:39:46 it's no sweat for me at least 19:39:51 I plan to be there. If you can't make it that isn't a big deal as the idea is to help others with opendev questions more so than for us to use the time to collaborate directly 19:39:58 now i just need to remember to register 19:41:05 #topic Open Discussion 19:41:20 Over the last day and a half I've done a big push on gerrit account email conflict cleanups 19:41:31 I think those went well and we are now down to 103 remaining conflicts 19:42:01 that's amazing 19:42:09 and soon 102 right? 19:42:14 I've put an audit result yaml file and a set of proposed next cleanups on review02 in ~/clarkb/gerrit_user_cleanups/notes if anyone else is able to take a look at those. 19:42:30 fungi: it was 105 yseterday and I got it to 103 today after cleaning up dpawlik's account and a straggler from yesterday 19:42:34 #link https://review.opendev.org/c/opendev/system-config/+/801667 19:42:44 that's a review to update restart flags to be more consistent 19:42:44 i haven't looked yet, do you recall how many the new batch will knock out? 19:42:50 fungi: its about 70 19:42:59 that's a substantial chunk 19:43:00 fungi: I think we've whittled it down to about 30 where direct reach out is a good idea 19:43:43 is anyone else planning on reviewing the matrix changes? i believe we're about ready to run 2 bots, and with those landed, we could start prepping to move the zuul project over, but the changes aren't getting much in the way of votes. 19:44:08 corvus: I mentioned them at the beginning of the meeting and asked for reviews. I think I've reviewed them, but would be good for someone other than myself to review them if possible 19:44:35 For the remaing ~30 accounts I'd like to push a change that fixes them and gets verfied by gerrit directly 19:44:44 we'll see what that looks like when I get to ~30 19:44:47 yeah, i'm looking for a review volunteer :) 19:46:03 ++ would be good to have another reviewer 19:46:15 i intend to try but am cautious not to overcommit and don't want the changes to be held up waiting for my feedback 19:46:22 #link https://review.opendev.org/q/topic:matrix matrix bot changes 19:47:13 Oh and yuriys jumped on OFTC today and started talking about reviving the inmotion cloud. Sounds like there was a network card problem that caused an interface to drop whcih cascaded to sad rabbitmq 19:47:41 fungi: does that mean we should waive the 2x+2 requirement? 19:47:46 restarting services seems to make things happier, but we're going to take the opportunity here to upgrade the operating system for kernel patches and update to newer openstack kolla docker images as well 19:48:08 just so we're clear, these changes have been sitting out there for about 3 weeks, so at this point, i don't think i'm pushing an overly aggressive timeline 19:48:09 corvus: it means i don't want to say i'm going to review it and then have people hold the changes open waiting for me even if they have enough reviews to merge 19:48:50 but i'd also be fine merging them with a core reviewer proposing and another core reviewer who isn't me approving 19:49:21 in that case maybe we can proceedwith the eavesdrop bot landing. Then corvus and I can review tristanC's gerritbot change 19:49:30 yeah, no objection 19:49:42 i'm still happy to help supporting it even though i haven't had time to review 19:49:46 ianw: would you like to review any of those or are you good with proceeding? 19:50:23 there was a previous +2 from mordred on those too; i imagine it's not easy for mordred to keep up with updates since then. 19:50:55 but very much appreciated! 19:53:01 I think that may be it. ianw let us know if you want to review topic:matrix but we'll probably proceed later today/tomorrow if we don't hear otherwise 19:53:08 Thank you everyone! 19:53:20 clarkb: thanks! 19:53:25 I'm about to grab lunch but then will be back toe send that service coordinator email and review all the things 19:53:40 #endmeeting