19:01:10 #startmeeting infra 19:01:11 Meeting started Tue Jun 23 19:01:10 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:12 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:14 The meeting name has been set to 'infra' 19:01:23 #link http://lists.opendev.org/pipermail/service-discuss/2020-June/000042.html Our Agenda 19:01:35 #topic Announcements 19:02:11 o/ 19:02:13 On Thursday the OpenStack Foundation is doing two rounds of community updates. One for more europe friendly timezones and the other for asia pacific timezones. The Americas are sort of stuck in the middle 19:02:19 feel free to join those if interested 19:02:38 (though really the audience is people not involved in daily happenings so maybe boring for you all :) ) 19:03:09 also that is relative to my Thursday. Local day of week may differ 19:04:03 #topic Actions from last meeting 19:04:06 #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-06-16-19.01.txt minutes from last meeting 19:04:11 We didn't record any actions 19:04:17 #topic Specs approval 19:04:23 #link https://review.opendev.org/#/c/731838/ Authentication broker service 19:04:31 Still not ready for approval but worth calling out 19:04:32 i've started to pick it back up, haven't pushed a new patchset though 19:05:00 fungi: thanks. Probably worth a call out when a new patchset does show up so we can take a look quickly 19:05:15 still digestnig recent comments, but more comments don't need to wait for me to update the spec either 19:05:33 this is true 19:06:12 #topic Priority Efforts 19:06:23 #topic Update Config Management 19:06:50 haven't seen a whole lot on this front, but I've also been distracted with image updates and openstack testing stuff 19:07:11 Is there anything worth calling out on this subject? Maybe the plan for reducing "run all the jobs when inventory updates" problems? 19:07:35 mordred: ^ that caught corvus again on Friday when trying to land the zk config updates (which eventually applied and everything was fine but the number of jobs was unexpected) 19:08:34 ianw has started work on containerizing our grafana deployment 19:08:57 ah cool. I ninja updated my local copy of the agenda to talk about grafana's nodepool dashboards in a bit too 19:09:18 can update in there, it is coming together i think 19:09:24 i also have a half-baked change underway to move our reprepro mirroring from puppet to ansible, i need to pick that back up and hack on it some more (unless anyone's just dying to take it over) 19:09:24 mordred: is step0 there getting the split up puppet else changes landed? 19:09:35 mordred: and if so are those ready for review? 19:10:50 not even 19:11:16 I think step0 can actually be just copying over the inventory/ file matchers from the system-config-run jobs 19:11:24 we have smaller file matchers for them already 19:12:00 #info Copy file matchers from system-config-run jobs to infra-prod jobs to reduce number of jobs that run when inventory/ is updated 19:12:19 another good one to tackle soon might be storyboard... mordred's already added image building jobs for it, so assuming those are functional the actual deployment ansible for them might not be too hard (though there are rather a lot of different contaniers associated with it) 19:13:14 fungi: that will also convert us to python3 for that service which would be nice 19:13:32 Alright last call on config management changes 19:13:39 * diablo_rojo sneaks in the back and sits down 19:14:06 wait we can sit during this meeting? 19:14:15 I'm sitting 19:14:16 i thought it was a standing meeting 19:14:24 sometimes I stand 19:14:24 but clarkb is a chair 19:14:35 let's table this 19:14:44 #topic OpenDev 19:14:56 First up we upgraded Gitea to version 1.12.0 19:15:22 This includes caching of info that Gitea uses to render repo UI pages which should speed those page loads up quite a bit. But its still a cache so the first load is still slow 19:15:34 we've seen this help quite a bit for repos like nova already though 19:16:04 I had some small concern that the caching would increasememory usage on those servers but I've not seen that being a problem yet according to cacti 19:16:25 #link http://lists.opendev.org/pipermail/service-discuss/2020-May/000026.html Advisory Board thread. 19:16:44 I've bumped this thread and am setting a soft deadline for July 2 where we'll proceed with the volunteers we have at that point 19:17:05 soft deadline because we can always add new members later, but I didn't want to sit in this holding pattern forever 19:18:33 The last OpenDev item I had was related to git branches. The topic of potentially renaming branches has come up in a couple of contexts and I wrote an etherpad with details on what that would mean for us as a hosting platform and our hosted repos 19:18:42 https://etherpad.opendev.org/p/opendev-git-branches 19:19:14 There are some potentially painful technical challenges which that tries to call out so that we can provide reasonable guidance to people that may consider such changes 19:20:10 Anything else to talk about on the subject of OpenDev? or does anyone want to dig into anything above? 19:21:43 #topic General topics 19:21:50 #topic Grafana Nodepool Graphs 19:22:20 We've discovered that we're no longer producing templated nodepool provider graphs taht function 19:22:47 looking at the dashboard json it appears that we aren't setting a data source properly. I wrote a change to the nodepool-ovh dashboard to force the data source and that doesn't seem ot have helped 19:23:14 The change I wrote did end up switching the data source value from null to OpenStack though 19:23:29 which maybe means the problem is unrelated to data sources 19:23:53 ianw: ^ maybe it makes more sense to debug once we've got the deployment tooling switched out? 19:24:00 ossh grafana 19:24:11 yeah, so i saw that and started to debug too and i guess yak shaving took over 19:24:20 heh, this is not my shell terminal ;) 19:24:41 however, i think it probably makes more sense to start debugging latest grafana via containers 19:25:32 i have it mostly working, building a container based on upstream grafana and including grafyaml in it 19:26:06 it's easy to deploy locally and should work the same in production 19:26:14 that is likely to aid in debugging 19:26:26 graphite is publicly accessible so we can point at production data easily 19:26:42 is the suspicion at this point that our regression is due to the recent grafana 7.x releases? 19:26:55 i think so 19:27:02 fungi: ya I'm assuming that grafyaml isn't supplying the necessary info to make the templating work 19:27:16 #link https://review.opendev.org/737397 19:27:19 that's the base container 19:27:29 what our grafyaml does is tell grafana to query graphite for a list of nodepool provider regions. Then using that list it produces graphs for each entry in the list 19:27:35 #link https://review.opendev.org/737406 19:27:42 from what I can tell we aren't producing a valid input list to the graphs 19:27:50 that will test it. both are really active wip 19:28:25 ok, I'll try not to worry about it too much until we've got the deployment stuff redone as that will simplify debugging 19:28:48 i agree, let's work from that common base 19:30:04 #topic Etherpad Upgrade to 1.8.4 or 1.8.5 19:30:30 Bringing this up to say I'm planning on holding off until the end of next week since the opendev event next week will use the etherpad 19:30:43 once that is done it should be very safe to use a hacky 1.8.4 :) 19:31:00 the change to do the upgrade is WIP'd with a similar message and shouldn't land early 19:31:04 1.8.4 is still the current release at this time 19:31:07 yes 19:31:54 i suppose "1.8.5" here could be a stand-in for tip of develop branch 19:32:11 ya 1.next might be most accurate 19:32:44 if something after 1.8.4 arrives before end of next week I'll respin to test and deploy that. Otherwise I'll land our change as is on 1.8.4 with local css fix 19:33:07 ++ 19:34:13 #topic DNS cleanup 19:34:19 #link https://etherpad.opendev.org/p/rax-dns-openstack-org First pass of record removals has been done. Could use a second pass. 19:34:40 ianw thank you for putting this together. From what I can see its all working as expected. I did a first pass of removals based on what was noted on the etherpad too. 19:34:58 Since then fungi has annotated more things on the etherpad and I think we need a second pass at cleanup. I was planning to do that todayish 19:36:04 Calling this out so that if anyone else wants to do a pass they do it soon and catch it in my next set of removals 19:36:31 I should also share it with the foundation sysadmins again and see if they want to remove anything 19:37:11 #topic Getting more stuff off of python2 19:37:21 #link https://etherpad.opendev.org/p/opendev-tools-still-running-python2 19:37:54 I ran out of steam on this last week, but wanted to call it out again in case others have notes to add about things they know need python2 attention 19:38:11 Not incredibly urgent, but crowd sourcing tends to help with this sort of problem space 19:38:33 #topic Wiki Upgrade 19:38:41 fungi anything new on the wiki upgrade? 19:38:47 nope! 19:39:05 #topic Open Discussion 19:39:47 please to review https://review.opendev.org/#/c/733967/ and https://review.opendev.org/#/c/737023/ 19:40:17 As a general heads up I'm going to be helping with the opendev event next week and will be distracted during those hours. I'm also looking at taking July 6-10 as a staycation and will need someone else to chair the meeting on the 7th if we have a meeting 19:41:40 in case folks hadn't seen, a while back i audited our listserv mta logs and determined that the long-running qq.com spam flood to the -owner aliases has *finally* abated. the starlingx list owners have requested we start allowing messages to the -owner alias for their list again so i proposed https://review.opendev.org/729649 but more generally we could think about lifting all of the blackhole 19:41:42 aliases i think 19:44:28 they do still receive some random spam on a daily or weekly basis (roughly proportional to their list activity volume it seems) so just turning it back on for all lists without warning could catch some folks by surprise 19:45:25 we've also had issues in the past with people setting list owner addresses to e-mail service providers who happily report those moderation request messages as spam to blacklisting services, which creates unfortunate deliverability issues for us 19:45:57 neat 19:46:06 anyway, just crossed my mind again, not sure if anybody has suggestions for how to go about that 19:46:37 fungi: maybe we should reach out to list moderators and ask them if they'd like it to be toggled? 19:46:45 we know who those people are and can ask directly I think 19:47:00 yeah, i thought about that. there are a lot, but the deduplicated list of them might not be so many 19:47:59 Anything else? I'll give it another minute or two then call it 19:48:04 thank you everyone for your time 19:48:16 oh, also it looks like a stale apache worker on static.opendev.org caused sslcheck to fire a warning about the cert for zuul-ci.org expiring in a month 19:48:41 can move discussion of what to do for that and when to #opendev though 19:49:09 ++ 19:49:14 #endmeeting