19:01:09 #startmeeting infra 19:01:10 Meeting started Tue Jan 19 19:01:09 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:11 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:14 The meeting name has been set to 'infra' 19:01:17 #link http://lists.opendev.org/pipermail/service-discuss/2021-January/000166.html Our Agenda 19:01:36 #topic Announcements 19:01:50 o/ 19:01:59 I didn't have any announcements. Continuing on 19:02:05 #topic Actions from last meeting 19:02:11 #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-01-12-19.01.txt minutes from last meeting 19:02:21 iane had an action to check on wiki backups after the borg migration 19:02:25 ianw ^ 19:02:28 typing is hard sometimes 19:03:00 ianw: has that happened or should we tack it back on (I know last week was crazy) 19:03:19 i don't believe it is backing up to borg, but i haven't fixed it, so yeah, leave it on 19:03:36 #action ianw confirm wiki is still backed up after bup to borg migration 19:04:01 and now we'll jump right into mnaser's topic to avoid a conflict 19:04:03 #topic General topics 19:04:09 #topic Discuss infra-core (on behalf of OpenStack) 19:04:21 we'll resume regularly scheduled agenda planning after this item 19:04:31 thanks. so it was recently brought up that andreas who's done a ton of infra reviews will not be able to help out as much 19:05:11 and the biggest load of project-config changes right now happen to openstack/project-config (i assume) and the opendev team is already a small one so .. wondering if it's an issue right now? the reviews that come in seems to be very little right now 19:05:27 mostly: how do we make sure we stay on top of this, i guess 19:05:35 active infra-core is roughly myself, fungi, ianw, corvus, and mnaser I think 19:05:52 and by active I mean active in the community but not necessarily with reviews on that particular project 19:05:53 er, project-config-core? 19:06:01 fungi: ya sorry 19:06:23 config-core is the general term we use for highlights iirc 19:06:34 andreas did a lot of things, but in addition to reviewing project-config changes that added or updated projects, he also reviewed zuul-jobs changes, especially ones that affected openstack's use of zuul. 19:06:59 and translations and docs tooling 19:07:14 so there's likely to be some "opendev maintenance" work shortfall as well as openstack tact work shortfall 19:07:30 yep those too 19:07:30 my concern on reviews was around small projects which were nurtured by infra historically, getting a review on these requires a lot of effort, as infra-cores do always have more urgent priorities. what can we do to improve? 19:07:31 but yeah, one thing which came up during the openstack tc meeting was that if we have some volunteers from the openstack community with bandwidth to review project-config changes and who can show some understanding of the material in there, we could likely expand the core review team for that fairly quickly 19:07:52 zbr: I think that is a separate but related concern (it has its own agenda item) 19:08:08 not sure if we want to combine them due to the bit of overlap (basically lack of resources overall being the overlap)_ 19:08:08 fungi: ++ 19:08:24 clarkb: separate i think 19:08:43 mnaser: also it's not just openstack/project-config, as corvus points out there's zuul/zuul-jobs and also openstack/openstack-zuul-jobs (and to a lesser extent, opendev/base-jobs) 19:09:22 yes, i believe the concern here isn't around tooling but more around the project config reviews (tooling is another valid, but seperate issue) 19:10:02 well, the bulk of the reviewing is job content, job definitions, job roles... and some project creation changes 19:10:04 I agree we could probably bootstrap a few interested individuals quickly. Maybe start by identifying who they are and ask them to do reviews with +1/-1 and they can reach out to us with questions or concerns with changes? 19:10:23 basically get them involved then once we've established comms and some trust we can bump up to +2? 19:10:26 mnaser: i also think the revies he did on project-config are a little unique -- a *lot* of the technical stuff of that is actually automated or rigidly documented. most of what he did there was, i feel, a form of hand-holding folks through the process. 19:10:48 corvus: yes that was definitely a big part of what ajaeger was doing 19:11:02 that is in contrast to all the other repos which involve things like "knowing what tox is" 19:11:16 helping folks who have submitted config changes understand the job failures which arise from those 19:11:19 corvus: i agree, it was a lot of "enforcing" the process 19:11:51 and being aware of openstack policy (pti, guidelines in the ptoject teams guide, release policies, et cetera) 19:12:11 and processes outlined in the infra manual 19:12:19 clarkb: i also agree that it's the easiest team to expand 19:13:27 and that the knowledge needed is a mix of community and tech. 19:13:49 i've tried to step up my reviewing of project-config and related repos in recent weeks, but that's also eating into my available time for other opendev work 19:14:08 maybe the thing to do then is write down these things that we've identified ajaeger was doing in an email and request volunteers from the openstack community. Then once we've identified people interested do our best to help them do reviews and gain confidence in the processes and tech? 19:14:51 sounds good to me clarkb. 19:15:07 that's also not a bad idea to try and find things we can reconsider/rework in 2020 that we've haven't looked at in a long time because ajaeger was doing such a good job keeping things humming along 19:15:15 2021 even 19:15:38 i'm not so keen to go back and try 2020 again 19:15:49 ianw: ++ perspective from some new volunteers might help us identify those areas too 19:16:16 anyone want to volunteer to write that email? I can probably do it though unlikely today 19:16:24 but probably this week 19:16:55 if one of us drafts it in an etherpad i suppose others can help flesh it out too 19:17:03 ++ to an etherpad 19:17:15 I'll go ahead and put myself down for it and if someone beats me to it they can just let the group know 19:17:25 as tact sig chair, i'll consider it my duty 19:17:36 you can look it over once i have a go at it 19:17:40 k 19:17:41 i'll bring up this discussion in the tc meeting on thursday too 19:18:14 mnaser: thanks for being on top of this :) 19:18:16 #action fungi Draft email describing ajaeger's tasks and asking for new volunteers to help with those tasks. Also open possibility for improving processes and tools 19:18:31 Anything else on this subject or should we move on? 19:18:32 no problem :) i'll try to look at project-config changes too on my side as well 19:18:38 that's it from me for this 19:18:42 #link https://etherpad.opendev.org/p/tact-sig-2021-rfh OpenStack TaCT SIG 2021 Request for Help 19:18:46 o/ 19:18:48 i'll write something there 19:18:51 fungi: thank you! 19:18:56 #topic Priority Efforts 19:19:03 #topic OpenDev 19:19:23 A reminder that nominations for the Service Coordinator position are open. 19:19:28 #link http://lists.opendev.org/pipermail/service-discuss/2021-January/000161.html 19:19:33 Please let us know if you are interested 19:19:53 I was going to bring up the zuul status change list, but it looks like ianw got that landed and applied 19:20:11 ianw: anything else to bring up on the subject of the zuul status work in gerrit? 19:20:27 not, good team effort with clarkb getting it across the line, thanks :) 19:20:39 it looks great in gerrit too, thank you for writing thatp lugin 19:20:49 it looks great, thanks! 19:21:07 Zuul's gerrit WIP change support should be deployed as of this last weekend 19:21:12 discussion on the openstack-discuss ml thread about it suggested reviewing the zuul status plugin as a potential next candidate 19:21:28 fungi: another candidate to deploy in our installation? 19:21:50 right, the live test progress display in the gerrit webui change view 19:21:57 gotcha 19:22:01 fungi: that one might be a bit harder to get screen shots of in testing, but i think the framework for including it is fairly straight forward now 19:22:34 for WIP support has anyone tested that yet? 19:22:47 I believe we want to confirm that zuul will not enqueue a wip change to the gate 19:23:00 ianw: is non-voting handled? 19:23:08 (by handled i mean displayed) 19:23:54 i did just approve zbr's gerritbot change today to start ignoring wip changes, and start announcing them if they transition out of wip (there's a separate event for that now) 19:23:58 corvus: in the in-progress plugin, or our summary-status? 19:24:10 ianw: summary -- and yes, i see it now 19:24:21 shows up on the right with the rest of the time comment 19:24:32 fungi: oh um 19:24:35 fungi: can we not do that? 19:24:40 i really want to see wip changes announced 19:24:42 we can revert it 19:24:58 it was up for review for a while 19:25:00 corvus: yeah, i did think looking at that yesterday on a massive cinder change it could a) sort the NV together and b) show them somehow differently 19:25:20 fungi: well, i may be getting ahead of things here, but i don't want anything about gerritbot to change :) 19:25:41 sure, i'm pushing up a revert now and then we can discuss further 19:26:09 fungi: I think you tested if zuul ignored WIP chagnes in the gate previously. do you know if there is a good change to test that on? 19:26:16 and i realize we may be getting into personal preferences, but i have found it really useful when, say, we're having some convo in irc, and someone pushes up a wip patch, to go look at it right then. 19:26:53 I guess we can maybe WIP that revert and approve it and see what zuul does then unwip it and merge it? 19:27:15 that should sufficiently exercise the new zuul behavior 19:27:40 i've pushed up a revert and a revert of the revert, approving the former now and we can discuss it in the latter 19:27:51 ok 19:28:30 The last major opendev item I wanted to bring up is gitea 1.13.1 upgrade. I've put this off as fires have crept up and needed attention. I think we're largely stabilized and I hope to approve the upgrade tomorrow (when I'll be able to monitor it) 19:28:37 #link https://review.opendev.org/c/opendev/system-config/+/769226 19:28:41 wip change on gerritbot was open for 2 months 19:28:59 the upgrade is reasonable well tested so I don't expect issues but the upgrade is fairly large in terms of new features and stuff in gitea 19:30:06 Any other opendev items or should we continue on? 19:30:44 clarkb: i think zbr may have had a wip change which was approved, and we saw zuul attempt to merge it and fail leaving a verified +2 behind... i don't recall which change it was off the top of my head though 19:30:49 corvus: in fact regarding WIP on gerritbot, i think that revert is not the best idea as it would be easy to enable announcement with two lines of code. 19:31:29 yep, we experimented it and found that zuul was trying to merge as wip change and gerrit was not letting it do it. 19:32:03 fungi: to be clear, are you saying that happened *after* we restarted zuul with the wip change support, or before? 19:32:22 corvus: before 19:32:29 before the wip support change was written 19:32:35 okay, so that's a description of the behavior to be on the lookout for 19:32:41 right 19:33:11 zbr: on the gerritbot change, we can edit the revert of the revert to perhaps make it configurable by channel 19:33:23 zbr: i certainly don't oppose the option for gerritbot users to disable wip notices 19:34:00 do we really need it configurable? 19:34:16 if some channels want to see wip changes announced and others don't 19:34:26 i'm just volunteering that i'm happy with the current workflow, don't think it's spammy, and would miss functionality if it changed. i would probably run my own gerritbot or something like it to make up for the loss of functionality. 19:35:21 lets move on to ensure we get to the other items, but can get back to this discussion if we have time at the end of the meeting 19:35:25 #topic Update Config Management 19:35:30 current behavior is that is even missing to anounce a change getting out of WIP, which is quite important IMHO 19:35:50 ianw has started work to ansible our afs servers 19:35:51 because that is the moment that marks "that is ready for review" 19:35:57 #link https://review.opendev.org/c/opendev/system-config/+/771268 19:36:09 this is related to the next topic whcih is openafs cluster updates 19:36:21 why don't we just jump ahead there so that meeting notes are a bit cleaner 19:36:26 there's been a flurry of openafs configuration management improvement yeah 19:36:30 #topic General Topics 19:36:40 #topic OpenAFS Cluster Status 19:36:53 i can give a quick summary 19:36:56 ianw: please do 19:37:06 probably better not to burn meeting time getting into the history of what's transpired, but current status would be great i think 19:37:14 yes, current summary sorry :) 19:37:30 so afs01/02.dfw and afs01.ord are now all running afs 1.8 from our ppa 19:37:46 we should have the changes in system-config to manage them via ansible, but when i left yesterday the CD jobs were blocked 19:37:52 (think that's fixed now) 19:37:55 so i will chase up on that 19:37:57 as are all clients (at least installed, many will run it on next reboot) 19:38:13 today, i'd like to upgrade afsdb01/02 in turn to 1.8, then we'll be 100% 1.8 19:38:24 i have the system-config change to manage them with ansible too, and remove the puppet bits 19:38:31 ianw: that would be great 19:38:39 this way we'll be able to respond to future bugs more directly 19:38:44 and jobs are using either patched 1.8.6 or 1.8.7 which has the patches officially applied 19:38:49 and as you mentioend it likely simplifies upgrading the servers 19:38:54 then i'd like to upgrade the hosts one-by-one to focal, in place. i don't think it's worth making new servers, especially for the db which are annoying to change ip addresses 19:39:16 ianw: yup we upgraded them in place previously so I expect that is still our best option 19:39:37 i had planned to start that with afs01.ord and see how it goes. this should be zero downtime as we can now do it one-by-one 19:39:57 that's about it 19:40:10 we're holding off the focal upgrades though until we have fully recovered from the storage outage on afs02.dfw 19:40:20 which is still in progress 19:40:21 there were two things I wanted to call out the first is debian buster test nodes will use our bionic ppa package until debain fixes their distro packages 19:40:24 oh yes, sorry, after we're back in a regular release schedle 19:40:49 second is the openafs-client role seems to not try to install our ppa packages on xenial 19:40:53 by my estimate we've got full releases of roughly half the mirror volumes left to complete, everything else is back on track 19:41:12 evidence from the zuul executors which are our only xenial openafs-clients shows taht we seem to be using the ppa there anyway 19:41:17 so it's progressing faster than i'd expected anyway 19:41:34 maybe someone else can look at that and we can decide if we should remove the xenial exclusion condition from the ansible role? 19:41:43 clarkb: huh, wasn't aware of that. we can chase up after meeting, definitely sounds like a bug 19:41:48 one theory on the executors is that puppet installed the ppa config, and so it's lingered there after we switched them to ansible 19:41:54 ianw: cool I'll get links together for that after the meeting 19:42:04 fungi: is that true for ze12? 19:42:32 corvus: was it built after the ansibilification? if so then i gues sthat theory can be discounted 19:42:51 but even ze12 has a lot of ppa config crufy in place 19:42:52 i don't know, but worth looking into if we need to verify/falsify that 19:42:54 it does have the ppa installed and did update the package at least 19:42:58 er, cruft 19:43:13 but also maybe we don't care about the past and only the future in this case :) 19:43:17 on ze12 there are three different files all adding that same ppa 19:44:06 all of the afs servers also had some old cruft from when we tried something that added all the ubuntu repos twice 19:44:25 so maybe thats a larger general cleanup (the extra ppa configs) 19:44:55 we can follow up on that after the meeting just wanted to make people aware 19:45:00 #topic Picking up steam on Puppet -> Ansible rewrites 19:45:06 the ppa addition files on ze12 all date from may 2020 19:45:16 oops, sorry, will discuss after the meeting 19:45:28 no worries just want to get through the agenda :) 19:45:53 one thing I noticed when we looekd at the openafs stuff last week is we have a number of servers still running xenial and for many of them not called zuul or review they are still running puppet 19:46:20 we did not solve puppet on bionic or newer which means that one of the things we need to keep in mind is xenial -> newer upgrades will want ansible (and maybe docker) on the config managment side 19:46:48 This is an early call out that we need to start looking at this (work like ianw's openafs role is great!) so that we can also do server upgrades 19:47:06 there was renewed interest expressed in last week's storyboard meeting about finishing the docker deployment automation for sb at least 19:47:14 I'm hoping that if things remain calm I can start putting together and audit so that we have a todo list we can pick things off of and annotated with interest like ^ 19:47:23 there is also interest from refstack folks 19:47:23 as that's blocking us from deploying a number of other fixes to production at this point 19:47:37 ++; things like docker for storyboard seem like something that is good for us, but good for everyone else too 19:47:51 #action clarkb put together audit/todo list for ansiblification and server upgrades 19:48:04 in the case of storyboard, we already have docker images publishing on each new change merged (thanks mordred!) 19:48:12 I've got a few things I need to get done in my backlog but thats near the top of the next items to pull off 19:48:47 #topic two-review rule impact on low-activity projects 19:49:10 zbr added this to the agenda and this is the topic I was referring to during mnaser's topic as being related but separate 19:49:34 we've got a number of low activity projects like git review and bindep and gerritbot etc that don't get regular reviews 19:49:45 fwiw, we've already said it's okay to single-core approve stuff in these as long as we observe a quick-revert policy should concerns be expressed by another core reviewer later (as we've just done with gerritbot, for example) 19:49:50 a consequence of that is that it is hard to get two +2s to land changes 19:50:10 fungi: yes, I think the bigger issue is more a lack of time for any reviews on those change due to higher priority issues/fires 19:50:33 at least for me it is rare that I'm able to make a chunk of time for reviews in those projects 19:50:55 yep, i've been trying to strafe some of them as i get time. i should probably convert a number of my solo +2's to approvals 19:51:10 i think we've always been flexible with this. i think we can continue to do so. i think that procedural/minor technical changes are especially good candidates for sigle-core reviews. i think substantial changes benefit from more reviews. again, using the gerritbot change as an example: more reviewers might have surfaced an alternate view. 19:51:54 for example, someone e-mailed an archlinux fix for bindep to openstack-discuss, i proposed it to gerrit for them in https://review.opendev.org/771108 and waffled on whether i should just approve it too 19:52:09 some things are pretty tightly coupled to our service (git review and gerritbot for example) so I'd be wary of putting them up to adoption. 19:52:16 but i'd also say that some of the low activity projects are low activity for a reason: in my view some of them are pretty close to "done" and i view them as being in maintenance mode. 19:52:24 projects like bindep are much less so (and we really only got bindep beacuse no one elsewanted ti it seems like) 19:52:32 i don't want git-review, gerritbot, and gear to change. 19:53:02 gerritlib is also in that space a bit 19:53:12 since jeepyb depends on it fairly heavily 19:53:25 (as does gerritbot) 19:53:49 I know some projects have relied heavily on priority lists (either generated by gerrit and a review priority category) or manually curated in things like an etherpad 19:54:07 we do get soe good bug fixes for those from time to time, but also they tend to attract feature additions which are harder for me to justify as they almost always imply some scope creep 19:54:21 we could try something like that to not just surface changes but more publicly communicate what we think is important right now and maybe people will unerstand why other changes are being passed over 19:54:35 someone said a sw is done only when is dead (nobody is using it) :p 19:54:49 I think the big win with something like ^ would be more to expose "this is important and keeping us from other work" more so than helping us get the other work done quicker 19:55:00 zbr: i recognize that's a common view of folks who get paid to write software, but i don't entirely agree with it 19:55:32 i use a ton of 'nix utilities day in and day out which haven't added a new feature in countless years 19:55:36 doesn't make them less useful 19:55:47 be aware that not reviewing patches from others does not help us get more people to join our effort 19:56:20 i think if people want to join our effort, then focusing work on where our priorities are would be best 19:56:30 and we have quite a few ways of conveying that 19:56:30 in fact is quite an effective deterrent, and creates a vicious cycle 19:56:39 corvus: ya thats why I'm wondering if pointing at the priorities more clearly will help 19:56:48 "this is where you can help us" also "this is why we are busy" 19:57:04 zbr: yep, in my case i think i struggle to break the bad news to people that i'm not interested in approving their pet feature addition. if i harden my heart a bit maybe i can find more time to approve bug fixes in those projects and not worry about spending as much time explaining to contributors why their proposal doesn't fit with the project 19:57:54 clarkb: yeah, it looks like we need a link to infra-specs in https://docs.opendev.org/opendev/system-config/latest/project.html#priority-efforts 19:57:58 sometimes it's easier to just put off leaving a review on something i know i'm not going to approve, because i don't have time to write an essay explaining why i'm rejecting it 19:58:21 corvus: ya I think specs as well as current fires 19:58:29 fungi: yep, that is tricky, but is still better in long term to cut it short (but politely) 19:58:36 at least the last few months have felt more like fire fighting than proper maintenance work. I'll think on how to expose that 19:58:46 maybe just start with a simple etherpad then look into automating it osmehow 19:58:49 https://docs.opendev.org/opendev/infra-specs/latest/#priority-efforts 19:58:50 zbr: i totally agree 19:58:58 something i need to improve at personally 19:59:08 clarkb: i'd rather just see that kept up to date 19:59:21 i don't think we need more places to share our priorities 19:59:34 corvus: I agree keeping it up to date is a good thing, but I'm not sure it captures "here are the five things we're dealing with to unbreak afs" 19:59:45 which lately seems to be the type of thing consuming my time 19:59:45 i would be very upset if spend 10 days on making something work, and failing to deliver it than just a couple of hours and being notified politely that that feature is outside the scope of the project. 20:00:05 i also do not mind if people have a change that seems stuck calling it out as a meeting topic. i have found that very useful over the years, and i think it's reasonable to expect some outcome from calling a specific change out in a topic 20:00:23 (as a time check we are at the end of our scheduled hour) 20:00:59 clarkb: ianw has tried to keep storyboard up to date for some efforts like that 20:01:17 corvus: good point, so maybe its tagging in storyboard and makign a dashboard out of that 20:01:28 and then linking to that in the docs next to the long term maintenance priority efforts 20:01:38 and being better about filing bugs and tagging them appropriately 20:01:44 (I know I'm really bad at that bit myself) 20:01:47 yeah, if you want to go down that road, i think that may fit the bill 20:02:14 I like that idea, I'll see if I can make that work reasonably well 20:02:48 another thing to keep in mind on this topic is that we are all human and trying our best to keep up with what at times is a flood of inputs. I doubt anything we do will ever make this problem go away 20:03:20 which means we probably need to both set expectations appropriately but also figure out methods for doing better 20:03:29 the only thing that has even remotely made a dent is having the ability to hire more people to do more work. 20:03:50 yup 20:04:15 and that was just less of a backlog than now. 20:05:27 I'll leave this open for another minute or two if there are any last thoughts 20:05:39 But feel free to continue any conversations had in this meeting in #opdnev 20:05:43 * #opdnev 20:05:50 ugh I cannot type today. #opendev 20:05:51 something like that 20:05:58 thanks clarkb! 20:06:32 thank! 20:07:45 thanks everyone! 20:07:47 #endmeeting