19:01:03 #startmeeting infra 19:01:04 Meeting started Tue May 18 19:01:03 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:06 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:09 The meeting name has been set to 'infra' 19:01:10 #link http://lists.opendev.org/pipermail/service-discuss/2021-May/000234.html Our Agenda 19:01:17 #topic Announcements 19:01:37 This didn't make it onto the agenda, but I'm planning to take a day off on the 20th (Thursday) 19:01:45 shouldn't really impact anything, jsut a ehads up 19:01:58 #topic Actions from last meeting 19:02:05 #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-05-04-19.01.txt minutes from last meeting 19:02:21 ianw: you had an action for pbx cleanup. I believe this has happened. Anything else to say about that one? 19:02:36 nope, all gone 19:02:57 thank you for working on that. I think meetpad/jitsi meet is the thing peopel seem to want anyway 19:03:13 #topic Priority Efforts 19:03:17 #topic OpenDev 19:03:25 #link https://review.opendev.org/789098 Update base job nodest to focal 19:03:39 This change was merged a bit early today bceause devstack went ahead and made the swap on their side and things started to fail 19:04:01 we have noticed at least one bit of fallout where our gerrit image builds were failing due to a lack of a `python` executable 19:04:23 keep your eyes open for failures that could be related to those changes where nodeset isn't fixed by a job 19:04:29 fungi: ^ anything else to say about that? 19:05:38 fungi may not have made it to this meeting yet after all the previous meetings (so many meetings today) 19:05:44 nah, minimal disruption so far 19:05:53 i just self-approved https://review.opendev.org/c/zuul/nodepool/+/790004 which is a trivial ffi bindep update after it was mentioned this was now borken 19:05:59 broken even 19:06:26 i ended up doing the nodeset change earlier in the day than expected, because devstack merged a change to stop working on bionic 19:06:32 ya I expect that is the sort of thing we'll be looking at addressing over the next little bit 19:07:03 On the gerrit account side of things I haven't made any new progress since we last spoke. Been distracted by other things. I'm hoping that maybe next week I can do another pass of cleanups though if others are able to also double check the list I stashed on review 19:07:37 #topic General Topics 19:07:46 #topic Server Upgrades 19:08:00 The entire Zuul + nodepool + zk cluster has now been upgraded 19:08:10 thank you to everyone that helped with that. 19:08:20 The next thing on my todo list for this is mailman 19:08:31 #link https://review.opendev.org/c/opendev/system-config/+/789622 Mailman ansiblification 19:08:46 If others are ok with landing that tomorrow I think I would like to give that a go 19:09:04 wfm 19:09:07 ++ 19:09:13 i'll be around most of the day 19:09:15 probably put both list servers in the emergency file, land the change, remove lists.kata from emergency, manually run there, then if that looks happy do the same for lists.o.o 19:09:32 cool sounds like a plan then 19:09:49 ianw: for review02 what do we need to do next to keep things moving on that system? 19:10:46 i need to get back to the database setup which you've commented on 19:11:03 i've been a little bit worried about frickler's ongoing issues with ipv6 to opendev.org 19:11:27 ianw: if that is related to the issues that rdo saw I think mnaser indicated there were fixups for that? 19:11:33 but maybe there are multiple issues? 19:11:45 I agree though that it would be good to not have to remove aaaa records 19:11:53 it has to do with how his routes are being announced into ebgp 19:12:01 (which is probably what we would be left with if problems persist or get worse) 19:12:12 s/his/vexxhost's/ 19:12:18 yeah, i haven't heard a clear "that is fixed" but i also might not have been listening in the right place :) 19:12:52 ianw: maybe we can have frickler double check and then bring it up again in #vexxhost if necessary 19:13:23 anything else on the subject of upgrades? 19:13:48 the v6 allocation for vexxhost is subnetted so those subnets can be announced out of different locations, but the allocation is from a range which the rir indicates should not be globally subnetted, so some providers are filtering those prefixes 19:14:30 i don't see that gettnig fixed unless providers relax the routes they're willing to receive or vexxhost starts announcing aggregates 19:15:31 fungi: I see, vexxhost would need additional allocations for additional locatiosn or to route internally ? 19:15:44 so that a single location can advertise the entire allocation then route behind that 19:16:00 clarkb: not even to route internally, but they'd need to rely on the backbone to still carry the longer prefixes and reroute packets accordingly 19:16:16 gotcha 19:17:05 it's really the smaller isps who seem to be filtering the tables in that way, so it would in theory "just work" 19:17:09 maybe when frickler is around (so early morning my time) we can have a discussion including frickler and the impact of that problem and whether or not we want to proceed with a new review server in vexxhost using ipv6? our other options are no ipv6 or deployed elsewhere 19:17:28 I'll see if I can facilitate that 19:17:39 its already an issue for opendev.org I guess 19:17:43 yeah, this is rather hard to explain to someone who turns up saying "i can't talk to review.opendev.org" :) 19:18:25 #topic Refreshing non LE certs 19:18:25 that's true, i don't think we have too many people reporting issues on opendev.org 19:18:41 oh sorry I was thinking we could move on, should I undo? 19:18:58 no 19:19:15 Alright we have a smallish number of non LE certs that are about to expire in ~3 weeks. 19:19:40 they are for ask, ethercalc, wiki, translate, storyboard, openstackid and openstackid-dev 19:20:04 we've already deprecated ask and made it read only. I think we can probably just let that one die on the vine. 19:20:23 more like rot on the ground ;) 19:20:27 I want to say there was some discussion about people using wayback machine to access old Q&A there. Are we happy with that plan and if so do we need to write it down somewhere? 19:20:39 we could redirect it to a static page 19:20:51 ianw: and that page could point to wayback machine? 19:21:01 redirect it to the lists.openstack.org page for the openstack-discuss ml 19:21:10 yeah, basically the banner at the top 19:21:16 ahh, or that 19:21:30 that seems like a reasonable idea. We would host that on static and use typical LE setup for that then? 19:21:47 i think so; i can take that on, it should be a quick one 19:21:51 ianw: thank you 19:22:14 for openstackid and openstackid-dev I'm meeting with the foundation web admins after this meeting to discuss how we want to do hosting for those services going forweard 19:22:51 we're in a weird spot where we can't actually redeploy it as is without their involvement today. Want to figure out if having it hosted on opendev/openstack infra is still valuable or if it makes sense to have them take it on more fully 19:23:10 That leaves us with ethercalc, wiki, translate, and storyboard 19:23:38 That number is small enough that I can go buy 4 new annual certs to keep us limping along while we continue to improve the config management for them 19:23:42 wiki will be a manual process on the server for now, the others can be distributed via puppet i guess 19:23:49 from hiera 19:23:52 yup 19:24:05 does anyone feel strongly against that? its like $32 which isn't a major concern on my end 19:24:33 i think we could get everything but wiki on LE if we want 19:24:56 ianw: within the ~25 days we've got? 19:25:04 if so then I'd be hapyp to help do that instead 19:25:07 oh, like have ansible install the certs but leave puppet pointing apache at the same path? 19:25:28 yeah, basically install the certs and then comment out the puppet bits looking to install certs 19:25:30 ya we have done it for a few services before, its not terrible, just takes time to get everything set up issuing the certs then update the vhost templates 19:26:01 ya I guess we should give that a go first. I can probably give that a go next week 19:26:08 feel free to look at doing it sooner if you want :) 19:26:33 yeah i can have a look. if we hit issues, i guess buying new certs isn't a problem 19:26:54 cool sounds like a plan, thanks 19:27:00 awesome 19:27:17 #topic Too small swap devices 19:27:40 At this point this is mostly a heads up that we had some problems with make_swap.sh that resulted in a small number of servers having 7MB swap devices 19:27:49 We have since corrected all of the servers that had this problem 19:28:13 When I did my audit to check for them I discovered that a non zero set of servers have no swap at all (different problem than the one we fixed) 19:28:40 Considering all of those servers have been running without interruption since without swap I don't think it is a high priority to change them. But if we did want to we could easily add swapfiles to them 19:29:15 hrm, what was that problem 19:29:26 i'm guessing something to do with mounted /opt 19:29:34 ya I'm not sure 19:30:25 #topic Remove registration requirement for IRC channels 19:30:33 I pushed up a change to do this 19:30:38 #link https://review.opendev.org/c/openstack/project-config/+/791818 Remove channel forwarding and +r requirement 19:30:51 but then as if on cue we starte getting spam in the unregistered channle again 19:31:07 "Non Terrestrial Or Terrestrial Beings which can help me with Trans Universal Transportation (Please PM Me)099" 19:31:26 I think I'll WIP the change for now and see if that persists 19:32:19 if that ends up stopping maybe try it next week otherwise probably best to keep it as is 19:32:40 i haven't really noticed spam in too many of the other channels i'm in 19:33:15 i'm surprised that stranded alien intelligence can't work out how to register an account on freenode 19:33:23 that is a good sign. I'll pick this up again next week when we have a bit more data on the latest spam 19:33:42 then again, i guess they ended up stranded for a reason 19:33:46 indeed 19:34:03 #topic Toggle CI button is no longer on Gerrit 19:34:10 rosmaita this is your topic 19:34:21 thanks, i saw your response in the agenda 19:34:32 looks like we have the Full Name correct 19:34:52 but what is the tag that the CIs need to set on their comments? 19:35:00 for those who haven't read the agenda: rosmaita and the cinder prject are wondering how users can better manager CI comments on gerrit changes and what third party ci systems can do to be filterable 19:35:17 yeah, thanks for summarizing 19:35:34 Newer gerrit has a "Only Comments" toggle which becomes "Show All comments" in even newer gerrit 19:35:46 here's an example: https://review.opendev.org/c/openstack/cinder/+/790796/ 19:35:58 we had a bit of discussion on this @ http://eavesdrop.openstack.org/irclogs/%23zuul/%23zuul.2021-05-17.log.html#t2021-05-17T19:34:31 19:36:24 rosmaita: autogenerated:yourcisystemhere is the tag convention that seems to be used 19:36:37 rosmaita: zuul does this for you automatically if you set it up to talk to gerrit via http(s) 19:37:19 oh, i seen the tag is 'autogenerated' 19:37:35 yeah, i would say that those CI's that still show up with "only comments" flicked on are not setting tags 19:37:35 i thought you meant zuul autogenerated a tag 19:37:48 that makes them look like a human comment to the gerrit display logic 19:37:56 right 19:38:14 the summary plugin has stuff in it to regex match comments that don't have a tag 19:38:36 one option *might* be to disable that -- only show in the summary results from comments with a tag 19:38:53 carrot and stick -- if you want to be in the summary, your comment must have a tag 19:39:00 ianw: not a bad idea 19:39:09 however it would break looking at old results 19:39:09 may also reduce confusion over why some bits work and others dont 19:39:25 fungi: they would still be in the comments though, but ya 19:39:34 like, comments from zuul 2 years ago don't have any tagging 19:39:45 (even ours) 19:39:50 but our 3rd party CIs *are* showing up in in the Zuul Summary, so looks like you dont' need a tag for that 19:39:57 rosmaita: yes that is what ianw is saying 19:40:00 #link https://gerrit.googlesource.com/plugins/zuul-results-summary/+/refs/heads/main/zuul-results-summary/zuul-results-summary.js#284 19:40:14 rosmaita: we could update the summary to enforce the tag which may reduce confusion and also provide a carrot for people to set the tag 19:40:18 basically get rid of "_match_message_via_regex" there 19:40:47 https://gerrit.googlesource.com/plugins/zuul-results-summary/+/refs/heads/main/zuul-results-summary/zuul-results-summary.js#210 only matches zuul or zuul like taggers 19:41:05 maybe that is good enoguh. the format of the comment that is parsed is assumed to be zuul's format too iirc 19:42:47 fungi: it's horrible, but we could conceivably have a config option which is a change number <= to look for comments via regex 19:44:37 rosmaita: are you using zuul or some other ci system? 19:44:49 mostly other 19:45:07 we are trying to get people to move to zuul v3 19:45:27 v4 now. soon v5. maybe better to just say "modern" 19:45:41 ok 19:45:59 ya probably the biggest hurdle is that it relies on others to do the right thing. but we're really trying to avoid adding in unnecessary tech debt like we had with the old tools 19:46:13 instead we're relying on existing features and writing plugins where necessary 19:46:20 v2->v3 was a big jump because the job runner changed, but now zuul increments the major version component any time there's a non-backward-compatible change to deployment 19:46:26 in this particular case I think we should give relying on the built in feature an honest effort 19:46:56 yeah, i think after we pulled it apart, tagged comments as implemented by gerrit are what we want 19:47:18 so if there's things we can do to help encourage CI systems to leave such comments, i think we're all ears 19:47:24 sad that the checks api hit a wall 19:47:26 rosmaita: https://review.opendev.org/Documentation/cmd-review.html has a --tag flag, that is effectively what zuul does though it doesnt' do it via ssh reviews only http 19:47:56 rosmaita: you should be able to instruct your third party CI systems to set autogenerated:zuul if they are reporting zuul format comments of autogenerated:somethingelse if not using the zuul format 19:48:28 thanks for that link, i can get the news out 19:49:10 how would you do this for http reviews? 19:49:29 i dont' know how most of the CIs connect to gerrit, tbh, but i think a lot of them use ssh 19:49:57 I was trying to find similar docs for the rest api but not finding them 19:50:06 the rest api definitely supports it though as that is what zuul uses 19:50:20 ok, we can do some digging 19:50:59 https://review.opendev.org/Documentation/rest-api-changes.html#set-review maybe and then https://review.opendev.org/Documentation/rest-api-changes.html#review-input that objects tag flag 19:51:11 lets move on we have one more subject to cover before we run out of time 19:51:17 #topic Scheduling project renames 19:51:20 https://gerrit-review.googlesource.com/Documentation/rest-api-changes.html#set-review is the api call 19:51:28 anyway, it's just a "tag" in the json 19:51:30 ianw: cool that confirms what I linked 19:51:39 we have at least one project rename request 19:51:51 ianw: clarkb: thanks 19:52:19 When fungi and I were testing project renames it seemed to be as simple as stop gerrit, move repo to new name location, start gerrit, trigger online reindex 19:52:40 This didn't update individual user account project watches but that is a lot more work and potentailly runs into the same problems we have with user email conflict cleanup 19:52:46 yeah, i think we assume we lose watches and such 19:52:56 I think I'm ok without updating project watches. Users can be instructed to update them themselves 19:53:10 The other thing we need to do is update our project rename playbook(s) 19:53:27 I'm fairly certain they still try to modify sql things 19:53:58 I'm thinking that a good next step here is to update our playbook(s) and exercise them in our gerrit functional testing. Then when we are happy with those results we can schedule a day for the gerrit downtime 19:53:59 i can work on trimming that out 19:54:21 fungi: that would be great and you should be able to do the testing ^ I describe too since the gerrit functional testing is fairly robust as this point 19:54:21 but yeah, adding testing for renames is a bigger task 19:54:37 i'll see if i can also find time for that 19:54:38 ya its a bigger task but I don't think its much bigger. I could be wrong though 19:54:59 alright, we can regroup and try to nail down an actual time for the rename once we've at least gotten an updated playbook 19:55:06 #topic Open Discussion 19:55:30 We have 5 minutes for any other discussions that may have been skipped or need to be brought up again 19:55:35 but then I have another meeting to run to 19:56:37 it would probably be good to talk about https://review.opendev.org/785769 but that's likely to be a longer discussion and not urgent, i can add it to next week's agenda 19:57:04 fungi: ++ 19:57:29 similarly https://review.opendev.org/774300 19:58:25 ya those would both be good discussions to have but probably also should just land them once we have ensured we're all aware of the delta 19:59:43 or at least have reached consensus 19:59:47 ++ 19:59:59 and we are at time. Thank you everyone 20:00:03 we'll see you here next week 20:00:05 #endmeeting