19:00:33 #startmeeting infra 19:00:33 Meeting started Tue Jul 7 19:00:33 2020 UTC and is due to finish in 60 minutes. The chair is ianw. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:34 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:37 The meeting name has been set to 'infra' 19:00:48 ahoy, mateys 19:01:01 ahoy 19:01:01 #topic Announcements 19:01:08 aloha 19:01:37 no clarkb today, so i am copy/pasting the agenda #topics for a change 19:02:03 no other announcements of note 19:02:08 #topic Actions from last meeting 19:02:19 #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-06-30-19.01.html minutes from last meeting 19:02:36 there were no action items, so we can keep moving on that 19:02:48 #topic Specs approval 19:02:58 #link https://review.opendev.org/#/c/731838/ Authentication broker service 19:03:15 this doesn't appear to have changed since last week, but seems to have a fair bit of comment 19:03:28 fungi: any further thoughts? 19:03:39 it was revised last week-ish 19:03:44 ready for more comments 19:04:00 i think all the comments thus far have been addressed 19:04:49 o/ 19:05:01 last week was a bit of a black hole for me what with the opendev large scale deployments conference, so it was probably actually week before last i revised it 19:05:31 anywah, bring on the comments 19:05:42 cool, so anyone who has an interest in, or knows anything about authentication should attach themselves to that 19:05:58 #topic Priority Efforts 19:06:14 #topic Opendev 19:06:44 firstly looping back on the gitea pagination issues, it seems clarkb's patch was accepted 19:07:08 so maybe we can re-simplify manage-projects at next upgrade? 19:07:35 yeah, that was my question, are we tracking that somehow 19:07:58 #link https://github.com/go-gitea/gitea/pull/12057 gitea pagination fixes 19:08:21 I've been tracking it via the "clarkb mentions its progress" method 19:08:46 ok, i'll put next week's date on the topic and we can loop back on that 19:08:49 since they accepted it upstream, we could cherry-pick it into our image builds and re-simplify - or just wait 19:08:50 at next meeting 19:09:00 sounds good, it may be time to act on by then given their release cadence 19:09:17 i don't think the simplification is urgent 19:09:33 (iirc) 19:09:53 the ddos i have down as a separate topic, so we can discuss that there 19:10:05 i seem to have picked the wrong afternoon to attend a meeting from my patio, there are children having a birthday party at the rental house next door. there's now a bouncy castle, and i'm coveting it (maybe after a heavy round of disinfecting) 19:10:23 so the only other bit was governance changes here, of which i don't think there's new news since last week 19:11:24 i think it was any time after the first of this month we could start publishing a list of representatives for the advisory board, we just haven't gotten to it yet 19:11:37 people are still welcome to volunteer at any time anyway 19:11:49 ++ 19:12:04 #topic Update Config Management 19:12:21 couple of things to get updates on 19:12:35 zuul-executors as containers : that's complete? i noticed they're still in emergency yesterday 19:13:50 last I knew there was another missing package in the container for afs - but I think that got fixed, yes? 19:14:21 i remember seeing something go by about the client package being a suggests 19:14:38 there's one more thing 19:14:39 we need gear 19:14:50 i think (haven't double checked) that ze01 is still disabled due to that 19:15:01 this one is trickier 19:15:13 yeah - ze01 is disabled 19:15:23 i feel less enthusastic about asking the zuul community to add gear to the zuul-exec images since nothing in zuul-jobs uses gear 19:15:36 (i recognize this is weird, since zuul itself uses gear) 19:15:59 the idea that has bubbled to the top of my list is actually to just vendor gear into our opendev base jobs 19:16:09 (this is for the logstash submit thing) 19:16:36 that seems like a fine compromise to me, at least 19:16:50 i also recognize that's icky. but it lets us avoid the weird conversation about including opendev-specific stuff in zuul images, and avoid building our own downstream images 19:16:53 and gear is small 19:17:02 ++ 19:17:14 we really need to reengineer the logstash stuff anyway if we're going to continue it in the long term 19:17:29 so if no one is violently objecting to that, i'll get something together this week 19:17:32 i too don't wish to heap that maintenance burden on the zuul community 19:17:41 corvus: clarkb was also questioning the value of our elk cluster during the ptg 19:17:42 corvus: ++ 19:18:12 well, several of us were questioning it, yes. the resources it consumes are likely disproportional with the value it provides 19:18:18 yeah, all the more reason to avoid over-engineering it then 19:19:00 do you want an action item so we come back to it next week? or just let it happen? 19:20:08 i'd like to revisit it next week, but that doesn't need an action item necessarily, we can just keep it on the agenda (it's relative to our config management priority effort, so already the case) 19:20:52 ok, the second thing i had, also probably corvus/mordred was an update on the cross-arch container work 19:21:28 are we at the point we have stable arm64/amd64 builds popping out? 19:21:35 i think that i think and i think that mordred thinks that it should all be working maybe now? 19:21:57 yeah - and I thnk it's time to land https://review.opendev.org/#/c/726263/ 19:22:24 but - it is of course possible that there is something we don't know that we don't know 19:22:32 but also, we sure did just take a bunch of stuff apart and put it back together, so i think we're at "lets try to use it and see if we really fixed it" 19:23:09 it conflicts with 726458, is that still needed? 19:23:16 yeah 19:23:29 seems to be based on an outdated patchset unless gertty is lagging for me 19:24:41 no - we can toss that one 19:24:45 i can redo https://review.opendev.org/#/c/726037/ to test it native for nodepool-builder 19:24:54 I went ahead and did it in the main one 19:26:20 okay, fine next step as far as i'm concerned then. +2 but ianw has already approved 19:26:20 ok, so i think wheels are in motion for that 19:26:33 woot 19:26:47 --progress-- 19:26:59 last one was grafana/graphite update 19:27:05 #link https://grafana.opendev.org/ grafana from container 19:27:18 that's up and production ready 19:28:18 graphite.opendev.org is also up, however i need to copy the data over, but i also noticed a few settings we've tweaked relating to null value storage and the mulder-scully-ish xFilesFactor that i need to port 19:28:20 i'll try to look at that this evening. i gather something we merged recently broke some of it 19:29:01 or just held up the patches to implement it? 19:29:49 fungi: ohh, there was deployment issues, i'd forgotten to add the promote job so it wasn't tagging the latest, but that's fixed 19:30:06 ahh, okay 19:30:19 and then yeah there were the holdups from review-test still getting deployed 19:30:36 with its out of control track-upstream logs 19:30:47 then there's been a few side issues with hosts dying and leaving hung processes, yeah the 200+ containers somehow on review-dev, and then openedge mirror disappearing 19:30:59 saw a few of the cron errors about the full fs 19:31:47 ianw: were they all manage-projects containers? 19:31:54 actually, it's review-test.openstack.org -- again it seems to be doing the same thing 19:32:21 review-test.opendev.org 19:32:42 I think we need to look in to our retry behavior in jeepyb - I think we currently retry indefinitely when we can't do an initial connection 19:32:52 although speaking of review-test ... 19:32:55 yeah, every hour that has launched a container 19:33:24 all of them looping now in " raise NoValidConnectionsError(errors)" 19:33:43 so, that host is on a downwards spiral again 19:33:52 yeah - I bet each one are spinning unable to connect to something 19:33:55 yup 19:34:18 that isn't really the behavior we want anymore 19:34:27 * mordred will poke at jeepyb 19:34:59 also - didn't we skip review-test in manage-projects? 19:35:03 what is starting those? 19:35:10 but while we're talking about review-test - could I get some reviews on https://review.opendev.org/#/c/737023/ ? 19:35:32 mordred: there's a cron entry 19:35:35 I'd like to sync the data from review to review-test 19:35:48 perhaps the cron job wasn't "absented:"? 19:35:55 it has been noted in the past we should carefully review that - to ensure that it's not going to delete our production data. I'm pretty sure it isn't 19:36:00 ianw: oh - this is track-upstream isn't it? 19:36:21 yup. track-upstream. let's not run that on review-test :) 19:37:30 ok, well i also managed to kill the main container on it, so it's currently doing nothing 19:37:53 mordred: do you have time to subdue it back into shape, or do want me to look into it? 19:38:10 ianw: I'll work on it 19:38:27 cool, let's move on 19:38:35 #topic General Topics 19:38:40 related, the pull-request closer is still running out of cron 19:38:45 d'oh, seconds too late 19:38:55 no we can go back if you like 19:38:59 nah 19:39:05 let's keep moving 19:39:14 i have corn and sausages on the grill 19:39:34 i think the only active trusty host transition is wiki, and i don't think there's too much happening with that ATM? 19:40:26 there is not, no 19:40:34 ok 19:40:38 it welcomes friends however 19:40:44 #topic Time to retire the openstack-infra ML yet? 19:40:54 #link http://lists.openstack.org/pipermail/openstack-infra/2020-July/006632.html 19:41:13 ianw: remote: https://review.opendev.org/739840 Don't install the track-upstream cron on review-test 19:42:47 fungi: i don't think there's been any disagreement on the list ... if anyone has an objection to why this merge should not go ahead they should speak now or forever hold your peace :) 19:43:27 indeed 19:43:35 otherwise, i guess we declare the lists joined in holy matrimony 19:43:41 its retirement is already on my personal schedule 19:43:46 no objections 19:43:57 mazel tov 19:44:10 i look forward to having one fewer list to moderate 19:44:21 #topic China telecom blocks 19:44:31 so we currently still have these in place with iptables 19:44:49 i think fungi has been most active in monitoring, what's the latest? 19:45:06 and taking the open connections graph in cacti as an indicator, the activity is ongoing from chinanet and probably elsewhere 19:45:33 #link http://lists.opendev.org/pipermail/service-discuss/2020-July/000053.html 19:45:39 that's the discussion for posterity 19:45:42 it stopped a lot over the weekend, but resumed late sunday utc and has been consuatnt 19:45:44 constant 19:45:54 #link https://review.opendev.org/738721 gitea reverse proxy 19:46:01 #link https://review.opendev.org/738725 crawler reject rules 19:46:27 that is, i think, the currently alternative solution, based on the fact the UA's making these requests appear to be from a very specific scraping script 19:46:43 basically i think we need to choose between layer 3 filtering with a lot of collateral damage, or layer 7 filtering with a lot of added complexity. rock meet hard place 19:47:30 i reckon we should try the layer 7 then 19:47:35 yeah, last week we put up the robots.txt -- but this script does not obey that 19:47:41 i'm personally at peace with the layer 7 solution, though it would be nice to not have to keep it in place indefinitely 19:48:10 btw... 19:48:29 how certain are we that these UA strings are non-current? 19:48:52 that was what i was about to say ... someone should really double check on my work there 19:49:38 some of that depends on how you define "current" as there is some software in use (albeit probably of questionable origin) which has decided this is the ua to report 19:50:14 for anyone following 19:50:16 #link https://review.opendev.org/#/c/738725/6/playbooks/roles/gitea/templates/gitea.vhost.j2 19:50:20 is the UA's in question 19:50:31 if mnaser is around, i'd love to get his input. he's the first member of our community to inquire about the status, so may have additional information about the current adverse impact 19:51:59 the change also doesn't deploy into production, it just sets up the proxy and we still have to point haproxy at it 19:52:31 other than that, maybe we just land them and see if we end up getting user reports? 19:52:36 yeah, i'm cool approving that nowish, since it's not (yet) in a production code path 19:53:07 but i also second corvus's suggestion 19:53:23 i +3d 19:54:11 yeah, i guess the requests are still coming over vexxhost, but rejecting them at layer 3 v accepting them and 403'ing them probably isn't a significant networking-back-end issue? 19:55:05 agreed, the impact is when gitea has to process the git requests 19:55:22 anything which stops them before that point is probably plenty sufficient 19:56:01 ok, so wheels in motion there 19:56:10 #topic Project Renames 19:56:16 the traffic volume and request rates are not absurd on their own (though a bit aggressive and clearly ignoring modicum of a reasonable crawler) 19:56:50 fungi: yes ... that script that appears to be hitting us does not meet the standard of "reasonable crawler" in any way :) 19:57:02 there are a couple of renames, including a mis-spelling 19:57:25 not sure how urgent it is 19:57:58 and openstack/transparency-policy -> osf/transparency-policy 19:58:12 i'm cool with handling a rename maintenance for this week if it's warranted. i'm around and will take a closer look at the commands i cut and paste from our docs in comparison to last time 19:58:37 i'm out thu/fri .au time so no help 19:58:50 the transparency-policy move is definitely non-urgent. that repo is basically dead since years 19:59:16 perhaps we wait then 19:59:21 the topiko->tobiko fix is the only one i expect has much urgency behind it 20:00:09 but... folks involved signed off on the rename patch which incorporated that typo, so i'm not in any hurry either 20:00:14 they seem to be patient 20:00:36 ok, we can come back to it next week 20:00:43 #topic Open Discussion 20:01:05 we are at time, but if anyone would like to flag something i guess we can go a little over 20:01:28 i'm good. stuff's about ready to come off the grill anyway 20:01:50 and i need to stop obsessing over the bouncy castle next door 20:02:23 we've unfortunately gone back into lockdown here :( 20:02:48 so my daughter will have to cancel her bday party since we can't have anyone over 20:03:17 it is certainly a year to remember 20:03:28 with that, i think we're done, thanks everyone 20:03:31 #endmeeting