19:01:03 <clarkb> #startmeeting infra
19:01:04 <openstack> Meeting started Tue May 18 19:01:03 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:09 <openstack> The meeting name has been set to 'infra'
19:01:10 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2021-May/000234.html Our Agenda
19:01:17 <clarkb> #topic Announcements
19:01:37 <clarkb> This didn't make it onto the agenda, but I'm planning to take a day off on the 20th (Thursday)
19:01:45 <clarkb> shouldn't really impact anything, jsut a ehads up
19:01:58 <clarkb> #topic Actions from last meeting
19:02:05 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-05-04-19.01.txt minutes from last meeting
19:02:21 <clarkb> ianw: you had an action for pbx cleanup. I believe this has happened. Anything else to say about that one?
19:02:36 <ianw> nope, all gone
19:02:57 <clarkb> thank you for working on that. I think meetpad/jitsi meet is the thing peopel seem to want anyway
19:03:13 <clarkb> #topic Priority Efforts
19:03:17 <clarkb> #topic OpenDev
19:03:25 <clarkb> #link https://review.opendev.org/789098 Update base job nodest to focal
19:03:39 <clarkb> This change was merged a bit early today bceause devstack went ahead and made the swap on their side and things started to fail
19:04:01 <clarkb> we have noticed at least one bit of fallout where our gerrit image builds were failing due to a lack of a `python` executable
19:04:23 <clarkb> keep your eyes open for failures that could be related to those changes where nodeset isn't fixed by a job
19:04:29 <clarkb> fungi: ^ anything else to say about that?
19:05:38 <clarkb> fungi may not have made it to this meeting yet after all the previous meetings (so many meetings today)
19:05:44 <fungi> nah, minimal disruption so far
19:05:53 <ianw> i just self-approved https://review.opendev.org/c/zuul/nodepool/+/790004 which is a trivial ffi bindep update after it was mentioned this was now borken
19:05:59 <ianw> broken even
19:06:26 <fungi> i ended up doing the nodeset change earlier in the day than expected, because devstack merged a change to stop working on bionic
19:06:32 <clarkb> ya I expect that is the sort of thing we'll be looking at addressing over the next little bit
19:07:03 <clarkb> On the gerrit account side of things I haven't made any new progress since we last spoke. Been distracted by other things. I'm hoping that maybe next week I can do another pass of cleanups though if others are able to also double check the list I stashed on review
19:07:37 <clarkb> #topic General Topics
19:07:46 <clarkb> #topic Server Upgrades
19:08:00 <clarkb> The entire Zuul + nodepool + zk cluster has now been upgraded
19:08:10 <clarkb> thank you to everyone that helped with that.
19:08:20 <clarkb> The next thing on my todo list for this is mailman
19:08:31 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/789622 Mailman ansiblification
19:08:46 <clarkb> If others are ok with landing that tomorrow I think I would like to give that a go
19:09:04 <fungi> wfm
19:09:07 <ianw> ++
19:09:13 <fungi> i'll be around most of the day
19:09:15 <clarkb> probably put both list servers in the emergency file, land the change, remove lists.kata from emergency, manually run there, then if that looks happy do the same for lists.o.o
19:09:32 <clarkb> cool sounds like a plan then
19:09:49 <clarkb> ianw: for review02 what do we need to do next to keep things moving on that system?
19:10:46 <ianw> i need to get back to the database setup which you've commented on
19:11:03 <ianw> i've been a little bit worried about frickler's ongoing issues with ipv6 to opendev.org
19:11:27 <clarkb> ianw: if that is related to the issues that rdo saw I think mnaser indicated there were fixups for that?
19:11:33 <clarkb> but maybe there are multiple issues?
19:11:45 <clarkb> I agree though that it would be good to not have to remove aaaa records
19:11:53 <fungi> it has to do with how his routes are being announced into ebgp
19:12:01 <clarkb> (which is probably what we would be left with if problems persist or get worse)
19:12:12 <fungi> s/his/vexxhost's/
19:12:18 <ianw> yeah, i haven't heard a clear "that is fixed" but i also might not have been listening in the right place :)
19:12:52 <clarkb> ianw: maybe we can have frickler double check and then bring it up again in #vexxhost if necessary
19:13:23 <clarkb> anything else on the subject of upgrades?
19:13:48 <fungi> the v6 allocation for vexxhost is subnetted so those subnets can be announced out of different locations, but the allocation is from a range which the rir indicates should not be globally subnetted, so some providers are filtering those prefixes
19:14:30 <fungi> i don't see that gettnig fixed unless providers relax the routes they're willing to receive or vexxhost starts announcing aggregates
19:15:31 <clarkb> fungi: I see, vexxhost would need additional allocations for additional locatiosn or to route internally ?
19:15:44 <clarkb> so that a single location can advertise the entire allocation then route behind that
19:16:00 <fungi> clarkb: not even to route internally, but they'd need to rely on the backbone to still carry the longer prefixes and reroute packets accordingly
19:16:16 <clarkb> gotcha
19:17:05 <fungi> it's really the smaller isps who seem to be filtering the tables in that way, so it would in theory "just work"
19:17:09 <clarkb> maybe when frickler is around (so early morning my time) we can have a discussion including frickler and the impact of that problem and whether or not we want to proceed with a new review server in vexxhost using ipv6? our other options are no ipv6 or deployed elsewhere
19:17:28 <clarkb> I'll see if I can facilitate that
19:17:39 <clarkb> its already an issue for opendev.org I guess
19:17:43 <ianw> yeah, this is rather hard to explain to someone who turns up saying "i can't talk to review.opendev.org" :)
19:18:25 <clarkb> #topic Refreshing non LE certs
19:18:25 <ianw> that's true, i don't think we have too many people reporting issues on opendev.org
19:18:41 <clarkb> oh sorry I was thinking we could move on, should I undo?
19:18:58 <ianw> no
19:19:15 <clarkb> Alright we have a smallish number of non LE certs that are about to expire in ~3 weeks.
19:19:40 <clarkb> they are for ask, ethercalc, wiki, translate, storyboard, openstackid and openstackid-dev
19:20:04 <clarkb> we've already deprecated ask and made it read only. I think we can probably just let that one die on the vine.
19:20:23 <fungi> more like rot on the ground ;)
19:20:27 <clarkb> I want to say there was some discussion about people using wayback machine to access old Q&A there. Are we happy with that plan and if so do we need to write it down somewhere?
19:20:39 <ianw> we could redirect it to a static page
19:20:51 <clarkb> ianw: and that page could point to wayback machine?
19:21:01 <fungi> redirect it to the lists.openstack.org page for the openstack-discuss ml
19:21:10 <ianw> yeah, basically the banner at the top
19:21:16 <fungi> ahh, or that
19:21:30 <clarkb> that seems like a reasonable idea. We would host that on static and use typical LE setup for that then?
19:21:47 <ianw> i think so; i can take that on, it should be a quick one
19:21:51 <clarkb> ianw: thank you
19:22:14 <clarkb> for openstackid and openstackid-dev I'm meeting with the foundation web admins after this meeting to discuss how we want to do hosting for those services going forweard
19:22:51 <clarkb> we're in a weird spot where we can't actually redeploy it as is without their involvement today. Want to figure out if having it hosted on opendev/openstack infra is still valuable or if it makes sense to have them take it on more fully
19:23:10 <clarkb> That leaves us with ethercalc, wiki, translate,  and storyboard
19:23:38 <clarkb> That number is small enough that I can go buy 4 new annual certs to keep us limping along while we continue to improve the config management for them
19:23:42 <fungi> wiki will be a manual process on the server for now, the others can be distributed via puppet i guess
19:23:49 <fungi> from hiera
19:23:52 <clarkb> yup
19:24:05 <clarkb> does anyone feel strongly against that? its like $32 which isn't a major concern on my end
19:24:33 <ianw> i think we could get everything but wiki on LE if we want
19:24:56 <clarkb> ianw: within the ~25 days we've got?
19:25:04 <clarkb> if so then I'd be hapyp to help do that instead
19:25:07 <fungi> oh, like have ansible install the certs but leave puppet pointing apache at the same path?
19:25:28 <ianw> yeah, basically install the certs and then comment out the puppet bits looking to install certs
19:25:30 <clarkb> ya we have done it for a few services before, its not terrible, just takes time to get everything set up issuing the certs then update the vhost templates
19:26:01 <clarkb> ya I guess we should give that a go first. I can probably give that a go next week
19:26:08 <clarkb> feel free to look at doing it sooner if you want :)
19:26:33 <ianw> yeah i can have a look.  if we hit issues, i guess buying new certs isn't a problem
19:26:54 <clarkb> cool sounds like a plan, thanks
19:27:00 <fungi> awesome
19:27:17 <clarkb> #topic Too small swap devices
19:27:40 <clarkb> At this point this is mostly a heads up that we had some problems with make_swap.sh that resulted in a small number of servers having 7MB swap devices
19:27:49 <clarkb> We have since corrected all of the servers that had this problem
19:28:13 <clarkb> When I did my audit to check for them I discovered that a non zero set of servers have no swap at all (different problem than the one we fixed)
19:28:40 <clarkb> Considering all of those servers have been running without interruption since without swap I don't think it is a high priority to change them. But if we did want to we could easily add swapfiles to them
19:29:15 <ianw> hrm, what was that problem
19:29:26 <ianw> i'm guessing something to do with mounted /opt
19:29:34 <clarkb> ya I'm not sure
19:30:25 <clarkb> #topic Remove registration requirement for IRC channels
19:30:33 <clarkb> I pushed up a change to do this
19:30:38 <clarkb> #link https://review.opendev.org/c/openstack/project-config/+/791818 Remove channel forwarding and +r requirement
19:30:51 <clarkb> but then as if on cue we starte getting spam in the unregistered channle again
19:31:07 <clarkb> "Non Terrestrial Or Terrestrial Beings which can help me with Trans Universal Transportation (Please PM Me)099"
19:31:26 <clarkb> I think I'll WIP the change for now and see if that persists
19:32:19 <clarkb> if that ends up stopping maybe try it next week otherwise probably best to keep it as is
19:32:40 <ianw> i haven't really noticed spam in too many of the other channels i'm in
19:33:15 <fungi> i'm surprised that stranded alien intelligence can't work out how to register an account on freenode
19:33:23 <clarkb> that is a good sign. I'll pick this up again next week when we have a bit more data on the latest spam
19:33:42 <fungi> then again, i guess they ended up stranded for a reason
19:33:46 <clarkb> indeed
19:34:03 <clarkb> #topic Toggle CI button is no longer on Gerrit
19:34:10 <clarkb> rosmaita this is your topic
19:34:21 <rosmaita> thanks, i saw your response in the agenda
19:34:32 <rosmaita> looks like we have the Full Name correct
19:34:52 <rosmaita> but what is the tag that the CIs need to set on their comments?
19:35:00 <clarkb> for those who haven't read the agenda: rosmaita and the cinder prject are wondering how users can better manager CI comments on gerrit changes and what third party ci systems can do to be filterable
19:35:17 <rosmaita> yeah, thanks for summarizing
19:35:34 <clarkb> Newer gerrit has a "Only Comments" toggle which becomes "Show All comments" in even newer gerrit
19:35:46 <rosmaita> here's an example: https://review.opendev.org/c/openstack/cinder/+/790796/
19:35:58 <ianw> we had a bit of discussion on this @ http://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2021-05-17.log.html#t2021-05-17T19:34:31
19:36:24 <clarkb> rosmaita: autogenerated:yourcisystemhere is the tag convention that seems to be used
19:36:37 <clarkb> rosmaita: zuul does this for you automatically if you set it up to talk to gerrit via http(s)
19:37:19 <rosmaita> oh, i seen the tag is 'autogenerated'
19:37:35 <ianw> yeah, i would say that those CI's that still show up with "only comments" flicked on are not setting tags
19:37:35 <rosmaita> i thought you meant zuul autogenerated a tag
19:37:48 <ianw> that makes them look like a human comment to the gerrit display logic
19:37:56 <clarkb> right
19:38:14 <ianw> the summary plugin has stuff in it to regex match comments that don't have a tag
19:38:36 <ianw> one option *might* be to disable that -- only show in the summary results from comments with a tag
19:38:53 <ianw> carrot and stick -- if you want to be in the summary, your comment must have a tag
19:39:00 <clarkb> ianw: not a bad idea
19:39:09 <fungi> however it would break looking at old results
19:39:09 <clarkb> may also reduce confusion over why some bits work and others dont
19:39:25 <clarkb> fungi: they would still be in the comments though, but ya
19:39:34 <fungi> like, comments from zuul 2 years ago don't have any tagging
19:39:45 <fungi> (even ours)
19:39:50 <rosmaita> but our 3rd party CIs *are* showing up in in the Zuul Summary, so looks like you dont' need a tag for that
19:39:57 <clarkb> rosmaita: yes that is what ianw is saying
19:40:00 <ianw> #link https://gerrit.googlesource.com/plugins/zuul-results-summary/+/refs/heads/main/zuul-results-summary/zuul-results-summary.js#284
19:40:14 <clarkb> rosmaita: we could update the summary to enforce the tag which may reduce confusion and also provide a carrot for people to set the tag
19:40:18 <ianw> basically get rid of "_match_message_via_regex" there
19:40:47 <clarkb> https://gerrit.googlesource.com/plugins/zuul-results-summary/+/refs/heads/main/zuul-results-summary/zuul-results-summary.js#210 only matches zuul or zuul like taggers
19:41:05 <clarkb> maybe that is good enoguh. the format of the comment that is parsed is assumed to be zuul's format too iirc
19:42:47 <ianw> fungi: it's horrible, but we could conceivably have a config option which is a change number <= to look for comments via regex
19:44:37 <clarkb> rosmaita: are you using zuul or some other ci system?
19:44:49 <rosmaita> mostly other
19:45:07 <rosmaita> we are trying to get people to move to zuul v3
19:45:27 <fungi> v4 now. soon v5. maybe better to just say "modern"
19:45:41 <rosmaita> ok
19:45:59 <clarkb> ya probably the biggest hurdle is that it relies on others to do the right thing. but we're really trying to avoid adding in unnecessary tech debt like we had with the old tools
19:46:13 <clarkb> instead we're relying on existing features and writing plugins where necessary
19:46:20 <fungi> v2->v3 was a big jump because the job runner changed, but now zuul increments the major version component any time there's a non-backward-compatible change to deployment
19:46:26 <clarkb> in this particular case I think we should give relying on the built in feature an honest effort
19:46:56 <ianw> yeah, i think after we pulled it apart, tagged comments as implemented by gerrit are what we want
19:47:18 <ianw> so if there's things we can do to help encourage CI systems to leave such comments, i think we're all ears
19:47:24 <fungi> sad that the checks api hit a wall
19:47:26 <clarkb> rosmaita: https://review.opendev.org/Documentation/cmd-review.html has a --tag flag, that is effectively what zuul does though it doesnt' do it via ssh reviews only http
19:47:56 <clarkb> rosmaita: you should be able to instruct your third party CI systems to set autogenerated:zuul if they are reporting zuul format comments of autogenerated:somethingelse if not using the zuul format
19:48:28 <rosmaita> thanks for that link, i can get the news out
19:49:10 <rosmaita> how would you do this for http reviews?
19:49:29 <rosmaita> i dont' know how most of the CIs connect to gerrit, tbh, but i think a lot of them use ssh
19:49:57 <clarkb> I was trying to find similar docs for the rest api but not finding them
19:50:06 <clarkb> the rest api definitely supports it though as that is what zuul uses
19:50:20 <rosmaita> ok, we can do some digging
19:50:59 <clarkb> https://review.opendev.org/Documentation/rest-api-changes.html#set-review maybe and then https://review.opendev.org/Documentation/rest-api-changes.html#review-input that objects tag flag
19:51:11 <clarkb> lets move on we have one more subject to cover before we run out of time
19:51:17 <clarkb> #topic Scheduling project renames
19:51:20 <ianw> https://gerrit-review.googlesource.com/Documentation/rest-api-changes.html#set-review is the api call
19:51:28 <ianw> anyway, it's just a "tag" in the json
19:51:30 <clarkb> ianw: cool that confirms what I linked
19:51:39 <clarkb> we have at least one project rename request
19:51:51 <rosmaita> ianw: clarkb: thanks
19:52:19 <clarkb> When fungi and I were testing project renames it seemed to be as simple as stop gerrit, move repo to new name location, start gerrit, trigger online reindex
19:52:40 <clarkb> This didn't update individual user account project watches but that is a lot more work and potentailly runs into the same problems we have with user email conflict cleanup
19:52:46 <fungi> yeah, i think we assume we lose watches and such
19:52:56 <clarkb> I think I'm ok without updating project watches. Users can be instructed to update them themselves
19:53:10 <clarkb> The other thing we need to do is update our project rename playbook(s)
19:53:27 <clarkb> I'm fairly certain they still try to modify sql things
19:53:58 <clarkb> I'm thinking that a good next step here is to update our playbook(s) and exercise them in our gerrit functional testing. Then when we are happy with those results we can schedule a day for the gerrit downtime
19:53:59 <fungi> i can work on trimming that out
19:54:21 <clarkb> fungi: that would be great and you should be able to do the testing ^ I describe too since the gerrit functional testing is fairly robust as this point
19:54:21 <fungi> but yeah, adding testing for renames is a bigger task
19:54:37 <fungi> i'll see if i can also find time for that
19:54:38 <clarkb> ya its a bigger task but I don't think its much bigger. I could be wrong though
19:54:59 <clarkb> alright, we can regroup and try to nail down an actual time for the rename once we've at least gotten an updated playbook
19:55:06 <clarkb> #topic Open Discussion
19:55:30 <clarkb> We have 5 minutes for any other discussions that may have been skipped or need to be brought up again
19:55:35 <clarkb> but then I have another meeting to run to
19:56:37 <fungi> it would probably be good to talk about https://review.opendev.org/785769 but that's likely to be a longer discussion and not urgent, i can add it to next week's agenda
19:57:04 <clarkb> fungi: ++
19:57:29 <fungi> similarly https://review.opendev.org/774300
19:58:25 <clarkb> ya those would both be good discussions to have but probably also should just land them once we have ensured we're all aware of the delta
19:59:43 <fungi> or at least have reached consensus
19:59:47 <clarkb> ++
19:59:59 <clarkb> and we are at time. Thank you everyone
20:00:03 <clarkb> we'll see you here next week
20:00:05 <clarkb> #endmeeting