19:01:12 #startmeeting infra 19:01:12 Meeting started Tue Jul 6 19:01:12 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:12 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:12 The meeting name has been set to 'infra' 19:01:18 #link http://lists.opendev.org/pipermail/service-discuss/2021-July/000264.html Our Agenda 19:01:30 The agenda went out a bit late due to yesterday's holiday observance btu we do have an agenda :) 19:01:35 #topic Announcements 19:02:06 July 18 the Gerrit server will be upgraded. Update your firewall rules now if you need to do that (details are on the service-discuss mailing list) 19:02:26 I've also got this info going out on the foundation newsletter this week to try and spread the word 19:02:47 #topic Actions from last meeting 19:02:53 #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-06-29-19.01.txt minutes from last meeting 19:03:26 I had an action to talk to the openstack tc about the next steps for the ELK stack. I have done this and been asked to bring it up at the TC meeting on Thursday. I updated their agenda and plan to be there to discuss the subject 19:03:41 It sounds like the board has asked them for timelines and a few details which I think we can help with too 19:03:58 #action someone write spec to replace Cacti with Prometheus 19:04:13 I don't think this has happend yet. But its possible I may have time for that this week looking at my current todo list. We'll see 19:04:19 someone is falling down on the job 19:04:46 it's times like this i'm glad to be noone 19:04:59 careful we can make you a somebody 19:05:03 oof 19:05:25 #topic Topics 19:05:32 #topic Gerrit Account Cleanup 19:05:48 This morning I retired 176 accounts that we identified as unused or unlikely to be unused 19:06:14 yes! 19:06:19 * fungi throws a quick party 19:06:20 This was based on account activity, age, and the situation the conflicting account is in. In many cases we could see the accounts haven't been used in almost a decade or one account was used then another took over 19:06:46 That should leave us with about 80 accounts where the situation is more complicated and we'll try to reach out to users for those. 19:07:10 The next step for these 176 is to wait 2 or 3 weeks then once we've given it time for people to raise any alarms we can remove the conflicting external ids from the retired accounts 19:07:23 as usual, if folks complain their account has broken, we should probably start by grepping the id from the logs you saved, yeah? 19:07:24 I'd like to start reaching out to individuals in the ~80 remaining while we wait on that too 19:07:47 fungi: yes exactly. Then you should be able to revert the commit I pushed to refs/users/xy/abxy to set them back to the way they were 19:08:02 awesome. thanks for confirming 19:08:14 I'm 99% sure I tested this and it is only the external ids refs where gerrit will reject the changes. reverts to the refs/users/* refs are fine 19:08:51 I did have a small network hiccup when retiring users so I had to rerun my retirement script for one user. That is the only oddity in the logs 19:09:03 but the revert process doesn't change for that user 19:10:14 #topic Review Upgrade 19:10:25 This has been announced for the end of day July 18 UTC time 19:10:42 as mentioned before I've got this info going out on the foundation newsletter to help advertise it. 19:10:56 ianw: are there changes we should be reviewing now (like the SSHFP cleanups?) 19:10:56 this is the point in the meeting where i realize i'm actually away from home visiting extended family when the upgrade is scheduled to happen 19:11:15 but i'll try to be online anyway, their internet connectivity willing 19:11:17 yeah, if i could get eyes on 19:11:19 fungi: I expect we'll be fine. ianw and I can be around 19:11:27 but the help is appreciated 19:11:35 #link https://etherpad.opendev.org/p/gerrit-upgrade-2021 19:11:48 and particularly the two pre-merge things 19:12:06 i've left a few notes on there, but will go over it more closely and check out the linked changes 19:12:37 ahh, thank you, i will go through comments again today 19:12:37 ya I'ev gone over it too, but should go over it again and make sure I've reviewed the chagnes and followed up on any comments to the process 19:13:43 One thought I had was that we should maybe land https://review.opendev.org/c/opendev/system-config/+/799225 to update our gerrit image and fix the lp blueprints integration then when we do the upgrade we'll know we have the same happy image 19:13:59 er I mean land that then do a quick restart on prod before the upgrade (like this week?) 19:14:08 #link https://review.opendev.org/799124 Good riddance to track-upstream and its cronjob 19:14:24 that's related, since the new server seems to be opaquely failing to actually run its cronjob 19:14:48 ianw: ^ if you don't think those two changes are a problem for pre merge activities maybe go ahead and add them to the etherpad? Or let me know and I can add them 19:15:19 ok, will look 19:15:51 i think maybe the switch of zuul to review01.opendev.org will require a zuul restart? 19:16:04 so it might be a good time to pull in the updated image too 19:16:43 yes updates to that portion of the zuul config will require a zuul restart 19:16:47 i can do that on my monday when it's quiet 19:16:52 ok 19:17:20 that gives it a week to bake in 19:17:36 might need to coordinate with corvus on zuul restarts as there is a lot of chrun on the zuul side and we may have to do a full restart to safely update depending on the changes that land 19:17:46 (zuulv5 development is full steam ahead) 19:17:55 indeed it is! 19:18:41 Anything else on this topic? 19:19:05 not from me 19:19:28 #topic Draft matrix spec 19:19:58 A few of us had a call with Element Matrix Services (EMS) last week to discuss the possibility of using their hosted EMS platform 19:20:31 From what I understood they didn't have any problems with us doing the slightly hacky setup to only maintain admins and bot users on our server then have users use matrix.org or their own homeservers 19:21:04 i have a feeling they're happy to see open source communities making use of it 19:21:21 Their suggestion to us for next steps and getting started is to spin up a trial instance on the lowest tier of their service. Get things set up and start interacting with it. Then if we want to use the slightly more expensive silver version (they think this will be our best choice) we can upgrade to that painlessly 19:21:28 fungi: yup they were super helpful in talking us through this 19:22:02 I then talked to jbryce at the foundation about this and he didn't think this would be a problem. I need to coorindate with him to set up the accoutn and configure the payment details. Hopign to do that this week 19:22:09 what does silver get us over the entry level? 19:22:40 or is silver the lowest tier paid option? 19:22:54 fungi: Nickel is the lowest option. Silver is second lowest 19:22:54 i see you said "trial" so i suppose that's time-limited? 19:22:59 https://element.io/pricing 19:23:11 aww, missed opportunity there was no "dime" to go with "nickel" 19:23:12 ya nickel is free for the first month as a trial thn you start paying for it 19:23:28 my understanding over why silver would probably be best was simply a matter of scale 19:23:55 we would probably be ok with nickel for zuul but as we grow that can change 19:24:00 it is something to figure out as we go and interact with it 19:24:27 They also noted that if we don't want our instance hosted in sweden you have to pay for Gold or higher but we don't think that is a problem 19:24:31 oh hai 19:24:41 On our end the spec got updated to reflect the plan to try EMS 19:24:43 #link https://review.opendev.org/796156 19:24:44 i like sweden 19:24:52 corvus: hello, I was just recounting what we learned from our call with EMS 19:25:14 and noted that I talked to jbryce about it and he seemed happy with it. Now I need to coordinate with him to get an accoutn created with appropriate payment details 19:25:16 when it says "active user" that means people with a @user:opendev.org address? 19:25:19 lgtm 19:25:37 ianw: yes, so basically, our bots/admin accounts 19:25:46 ianw: yes, and you can actually have a bunch of inactive versions of that in the system. They differentiate between actually active and you can log in and cold storage 19:26:31 (not sure the active vs inactive designation will end up being useful for us but it means you can create accounts pretty safely and then just be careful about what you activate) 19:26:58 i guess if we needed occasional-use admin accounts that could come up 19:27:09 clarkb: i think i have 2 questions: 1) how to proceed with spec approval? 2) do you want to wait for that before setting up the account/server, or go ahead and get started on that and we can start working on bots, etc? 19:27:35 corvus: I'd like to go ahead and get started with the account/server creation since we may learn something important doing that. 19:27:42 fungi: yeah, i sort of feel like a single admin account that's either used by a bot or one of us manually when required will probably be fine for things like setting up rooms, etc. 19:28:13 agreed, from what little i know so far 19:28:15 corvus: but then once the server is up and spec updated to accomodate any new info and no major issues pop up I think we can land the spec? 19:28:23 clarkb: cool; everything about this can be reversed easily right up until we ask people in #zuul to move, so we have a lot of leeway if we're okay being casual about it. 19:28:24 corvus: maybe give the trial a couple of weeks and then land the spec? 19:28:32 corvus: good to know 19:28:44 also mordred moved homeservers recently and that seems to have gone reasonably well 19:29:06 yeah, mordred also did a test of moving a room from his old homeserver to the new; even that worked without a hitch 19:29:12 I moved a channel I'd created ... yeah that ^^ 19:29:29 now - moving homeservers was a bit more involved and didn't really transition state 19:29:42 I think I'm mostly worried about finding something about EMS that is a deal breaker for us and deciding we need to run it ourselves which will have a big impact on the spec 19:29:50 but once we are past that step I think we should land the spec 19:29:56 s/step/concern/ 19:29:58 i keep meaning to set up one since i have a private inspircd with some semi-used channels i'd like to add a bridge for eventually, and could have a vanity username that way too 19:30:02 from a user account pov - basically I had @mordred:waterwanders.com and I created @mordred:inaugust.com then invited @mordred:inaugust.com to anything that @mordred:waterwanders.com was in 19:30:36 but moving the room from waterwanders homserver to inaugust homeserver went amazing 19:30:57 It does seem like the plan is congealing which is nice. I'll have to review the spec properly to indicate that 19:31:01 I believe if we have a homeserver with EMS and we decide at a point in the future we need to run it ourselves they can work with us to export the data and do an actual move 19:31:06 i was literally like "what's mordred talking about, this room is on inaugust" oh yeah, it didn't use to be. ;) 19:31:12 i guess the irc bridges work by emulating an irc server and networking with existing servers on the same irc network? 19:31:39 that seemed like a lot to ask them for just my little homeserver, so I didn't do it :) 19:32:19 fungi: i'm not 100% sure; but https://github.com/matrix-org/matrix-appservice-irc is apparently the software 19:32:33 fungi: I suspect that the integartion is less coupled than that. They probably get connection limit exceptions for their bridge in the network then just emulate being a bunch of clients 19:32:37 ahh, thanks! i'll give that a thorough look 19:33:06 Alright anything else to talk about on this subject? 19:33:46 oh 1 thing 19:33:59 feel free to sign up to make a replacement irc bot (see the spec) 19:34:14 eavesdrop/statusbot are available (tristanC wrote a gerritbot) 19:34:26 meetbot too, but is not needed for zuul 19:34:37 clarkb: otherwise, next steps seem clear to me 19:34:41 and for eavesdrop we may not even need channel logging if we can just grab those directly from the matrix server 19:34:48 is there still any benefit to merging the bots into a single codebase? 19:35:12 fungi: I'm not sure I know enough to say at this point :) 19:35:29 at a minimum we ought to at least shoot for significant code reuse 19:35:37 clarkb: probably the easiest way to do that though is to have a bot account join a room 19:35:47 corvus: ah makes sense since that is the "api" 19:35:58 ultimately, i think having something writing an html file to disk is still useful for search engine indexing 19:36:20 (so even if an individual user can use the in-client search feature, if we value the indexing, we should have an eavesdrop bot) 19:36:31 (if we don't value search engine indexing, then, er, maybe we drop that :) 19:36:50 yeah. history is great - but having browsable html archives has been nice so far 19:36:53 it's always handy to be able to link to a prior conversation 19:37:15 yeah. I mean - you can do web links to matrix history - but they take you to a matrix client 19:37:33 oh in that case ++ to having separate html 19:37:35 so yeah, that's a neat feature, but separately the plain html archive is still nice 19:37:35 and that's maybe a bit heavy for wanting to reference in some other context 19:37:43 ++ 19:38:37 We have a few more topics to get to so lets move on 19:38:41 #topic Gitea01 Backups 19:38:46 https://matrix.to/#/!eitSLAJcQKeehLruKf:matrix.org/$AB4j0_Z_HOn_pOAWfXdHSI-eHvDE_MMlJD2za1-kspU?via=matrix.org 19:39:14 seems the network connectivity between vexxhost regions is stil broken 19:39:16 These continue to not be urgent to fix because we haven't done any recent project renames. However, ianw was looking at the issue any we plan to do renames in a few weeks 19:39:31 fungi: ianw: want to fill us in on what you found? 19:39:58 short story is that some ipv6 addresses in sjc1 can't communicate with some ip addresses in ca-ymq-1 over some protocols 19:40:10 yeah, there was that 19:40:17 ipv4 is fine though? 19:40:22 but the backup does seem to be daily running. i guess it's falling back to ipv4? 19:40:30 as if flows are being load balanced between routers at layer 4 and at least one router has a stray route with a greedy/short prefix 19:40:42 ianw: well it is emailing us about the failures as recently as yesterday 19:40:49 the file-system backup. but then the db part fails 19:40:53 oh got it 19:41:08 I think well behaved applications are expected to fallback to ipv4 if v6 doesn't work 19:41:12 i found some dump options that looked promising, but i have to admit i got sidetracked on the ipv6 19:41:13 openssh is probably well behaved in this way 19:42:03 fungi: you double checked the ipv6 thing right? 19:42:26 yeah, if memory serves, the !h is being returned by the first hop in ca-ymq-1 19:43:00 which leads me to suspect the core routing in that region, but it's all a black box to me. mnaser was looking into it 19:43:39 i'll ping again, i mentioned something at the end of last week 19:43:51 sounds good. 19:44:04 #topic Gerrit project renames 19:44:12 #link https://review.opendev.org/797990 rename playbook updates 19:44:32 That is a change we'll need to have in before we do renames. We're still a few weeks away from our week after the gerrit server upgrade so not in a rush but wanted to point it out 19:44:45 If we can get that reviewed that would be great 19:45:02 And if you know of any additional ernames that should be considered now is the time to get them on the list 19:45:40 all of osf/* should probably be renamed to openinfra/* but that's not urgent and i still need to talk to folks at the foundation about it 19:45:47 fungi: ok 19:45:58 I'll mention the rename schedulign to the TC on thrusday too 19:46:06 good call 19:46:38 #topic Should we change our meeting time? 19:47:03 It occurred to me that this meeting time is not very good for ianw (at least I don't think it is) and ianw is one of the primary participants. 19:47:19 frickler mentions that they lurk the meetings to follow along even though they don't actively participate 19:47:39 I wanted to put it out there that I'm ok with trying to find a time that works better for others if that would be helpful 19:47:44 i'm happy to do whatever time others would prefer 19:47:51 * fungi has no life 19:48:27 though also i expect to miss the next meeting, possibly the text two 19:49:10 frickler will miss the next three as well and mentioned that may be a good tiem to experiment with non EU timezone friendly meetings 19:49:22 i don't mind, in (my) summer the meeting moves 6-7am which is perfectly fine. 7-8am usually involves family, and at the tail end of 8-9 i have to do school run 19:50:14 so from my POV earlier is better 19:50:57 ok I wanted to double check. Since we have had changes to who participates and could move things around if it helps 19:51:07 Sounds like the current time slot is fine though and we can keep it as is 19:51:41 That was all I had 19:51:43 #topic Open Discussion 19:51:48 Anything else before our hour is up? 19:52:05 if i could get a couple of eyes on 19:52:06 #link https://review.opendev.org/c/opendev/system-config/+/798400 19:52:16 that adds paste to ansible 19:52:36 ++ I'll add that to my afternoon todo list. 19:52:37 what i'm really interested in is the mariadb container behind that, same bits as for gerrit 19:53:08 would be nice to have a bit more experience with it 19:54:43 Last call :) Otherwise I'll let everyone go find breakfast/lunch/dinner 19:55:12 #endmeeting