19:00:33 <ianw> #startmeeting infra
19:00:33 <openstack> Meeting started Tue Jul  7 19:00:33 2020 UTC and is due to finish in 60 minutes.  The chair is ianw. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:34 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:37 <openstack> The meeting name has been set to 'infra'
19:00:48 <fungi> ahoy, mateys
19:01:01 <zbr> ahoy
19:01:01 <ianw> #topic Announcements
19:01:08 <corvus> aloha
19:01:37 <ianw> no clarkb today, so i am copy/pasting the agenda #topics for a change
19:02:03 <ianw> no other announcements of note
19:02:08 <ianw> #topic Actions from last meeting
19:02:19 <ianw> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-06-30-19.01.html minutes from last meeting
19:02:36 <ianw> there were no action items, so we can keep moving on that
19:02:48 <ianw> #topic Specs approval
19:02:58 <ianw> #link https://review.opendev.org/#/c/731838/ Authentication broker service
19:03:15 <ianw> this doesn't appear to have changed since last week, but seems to have a fair bit of comment
19:03:28 <ianw> fungi: any further thoughts?
19:03:39 <fungi> it was revised last week-ish
19:03:44 <fungi> ready for more comments
19:04:00 <fungi> i think all the comments thus far have been addressed
19:04:49 <mordred> o/
19:05:01 <fungi> last week was a bit of a black hole for me what with the opendev large scale deployments conference, so it was probably actually week before last i revised it
19:05:31 <fungi> anywah, bring on the comments
19:05:42 <ianw> cool, so anyone who has an interest in, or knows anything about authentication should attach themselves to that
19:05:58 <ianw> #topic Priority Efforts
19:06:14 <ianw> #topic Opendev
19:06:44 <ianw> firstly looping back on the gitea pagination issues, it seems clarkb's patch was accepted
19:07:08 <fungi> so maybe we can re-simplify manage-projects at next upgrade?
19:07:35 <ianw> yeah, that was my question, are we tracking that somehow
19:07:58 <ianw> #link https://github.com/go-gitea/gitea/pull/12057 gitea pagination fixes
19:08:21 <mordred> I've been tracking it via the "clarkb mentions its progress" method
19:08:46 <ianw> ok, i'll put next week's date on the topic and we can loop back on that
19:08:49 <mordred> since they accepted it upstream, we could cherry-pick it into our image builds and re-simplify - or just wait
19:08:50 <ianw> at next meeting
19:09:00 <fungi> sounds good, it may be time to act on by then given their release cadence
19:09:17 <corvus> i don't think the simplification is urgent
19:09:33 <corvus> (iirc)
19:09:53 <ianw> the ddos i have down as a separate topic, so we can discuss that there
19:10:05 <fungi> i seem to have picked the wrong afternoon to attend a meeting from my patio, there are children having a birthday party at the rental house next door. there's now a bouncy castle, and i'm coveting it (maybe after a heavy round of disinfecting)
19:10:23 <ianw> so the only other bit was governance changes here, of which i don't think there's new news since last week
19:11:24 <fungi> i think it was any time after the first of this month we could start publishing a list of representatives for the advisory board, we just haven't gotten to it yet
19:11:37 <fungi> people are still welcome to volunteer at any time anyway
19:11:49 <ianw> ++
19:12:04 <ianw> #topic Update Config Management
19:12:21 <ianw> couple of things to get updates on
19:12:35 <ianw> zuul-executors as containers : that's complete?  i noticed they're still in emergency yesterday
19:13:50 <mordred> last I knew there was another missing package in the container for afs - but I think that got fixed, yes?
19:14:21 <ianw> i remember seeing something go by about the client package being a suggests
19:14:38 <corvus> there's one more thing
19:14:39 <corvus> we need gear
19:14:50 <corvus> i think (haven't double checked) that ze01 is still disabled due to that
19:15:01 <corvus> this one is trickier
19:15:13 <mordred> yeah - ze01 is disabled
19:15:23 <corvus> i feel less enthusastic about asking the zuul community to add gear to the zuul-exec images since nothing in zuul-jobs uses gear
19:15:36 <corvus> (i recognize this is weird, since zuul itself uses gear)
19:15:59 <corvus> the idea that has bubbled to the top of my list is actually to just vendor gear into our opendev base jobs
19:16:09 <corvus> (this is for the logstash submit thing)
19:16:36 <fungi> that seems like a fine compromise to me, at least
19:16:50 <corvus> i also recognize that's icky.  but it lets us avoid the weird conversation about including opendev-specific stuff in zuul images, and avoid building our own downstream images
19:16:53 <corvus> and gear is small
19:17:02 <mordred> ++
19:17:14 <fungi> we really need to reengineer the logstash stuff anyway if we're going to continue it in the long term
19:17:29 <corvus> so if no one is violently objecting to that, i'll get something together this week
19:17:32 <fungi> i too don't wish to heap that maintenance burden on the zuul community
19:17:41 <mordred> corvus: clarkb was also questioning the value of our elk cluster during the ptg
19:17:42 <mordred> corvus: ++
19:18:12 <fungi> well, several of us were questioning it, yes. the resources it consumes are likely disproportional with the value it provides
19:18:18 <corvus> yeah, all the more reason to avoid over-engineering it then
19:19:00 <ianw> do you want an action item so we come back to it next week?  or just let it happen?
19:20:08 <fungi> i'd like to revisit it next week, but that doesn't need an action item necessarily, we can just keep it on the agenda (it's relative to our config management priority effort, so already the case)
19:20:52 <ianw> ok, the second thing i had, also probably corvus/mordred was an update on the cross-arch container work
19:21:28 <ianw> are we at the point we have stable arm64/amd64 builds popping out?
19:21:35 <corvus> i think that i think and i think that mordred thinks that it should all be working maybe now?
19:21:57 <mordred> yeah - and I thnk it's time to land https://review.opendev.org/#/c/726263/
19:22:24 <mordred> but - it is of course possible that there is something we don't know that we don't know
19:22:32 <corvus> but also, we sure did just take a bunch of stuff apart and put it back together, so i think we're at "lets try to use it and see if we really fixed it"
19:23:09 <fungi> it conflicts with 726458, is that still needed?
19:23:16 <mordred> yeah
19:23:29 <fungi> seems to be based on an outdated patchset unless gertty is lagging for me
19:24:41 <mordred> no - we can toss that one
19:24:45 <ianw> i can redo https://review.opendev.org/#/c/726037/ to test it native for nodepool-builder
19:24:54 <mordred> I went ahead and did it in the main one
19:26:20 <fungi> okay, fine next step as far as i'm concerned then. +2 but ianw has already approved
19:26:20 <ianw> ok, so i think wheels are in motion for that
19:26:33 <mordred> woot
19:26:47 <fungi> --progress--
19:26:59 <ianw> last one was grafana/graphite update
19:27:05 <ianw> #link https://grafana.opendev.org/ grafana from container
19:27:18 <ianw> that's up and production ready
19:28:18 <ianw> graphite.opendev.org is also up, however i need to copy the data over, but i also noticed a few settings we've tweaked relating to null value storage and the mulder-scully-ish xFilesFactor that i need to port
19:28:20 <fungi> i'll try to look at that this evening. i gather something we merged recently broke some of it
19:29:01 <fungi> or just held up the patches to implement it?
19:29:49 <ianw> fungi: ohh, there was deployment issues, i'd forgotten to add the promote job so it wasn't tagging the latest, but that's fixed
19:30:06 <fungi> ahh, okay
19:30:19 <fungi> and then yeah there were the holdups from review-test still getting deployed
19:30:36 <fungi> with its out of control track-upstream logs
19:30:47 <ianw> then there's been a few side issues with hosts dying and leaving hung processes, yeah the 200+ containers somehow on review-dev, and then openedge mirror disappearing
19:30:59 <fungi> saw a few of the cron errors about the full fs
19:31:47 <mordred> ianw: were they all manage-projects  containers?
19:31:54 <ianw> actually, it's review-test.openstack.org -- again it seems to be doing the same thing
19:32:21 <ianw> review-test.opendev.org
19:32:42 <mordred> I think we need to look in to our retry behavior in jeepyb - I think we currently retry indefinitely when we can't do an initial connection
19:32:52 <mordred> although speaking of review-test ...
19:32:55 <ianw> yeah, every hour that has launched a container
19:33:24 <ianw> all of them looping now in "    raise NoValidConnectionsError(errors)"
19:33:43 <ianw> so, that host is on a downwards spiral again
19:33:52 <mordred> yeah - I bet each one are spinning unable to connect to something
19:33:55 <mordred> yup
19:34:18 <mordred> that isn't really the behavior we want anymore
19:34:27 * mordred will poke at jeepyb
19:34:59 <mordred> also - didn't we skip review-test in manage-projects?
19:35:03 <mordred> what is starting those?
19:35:10 <mordred> but while we're talking about review-test - could I get some reviews on https://review.opendev.org/#/c/737023/ ?
19:35:32 <ianw> mordred: there's a cron entry
19:35:35 <mordred> I'd like to sync the data from review to review-test
19:35:48 <ianw> perhaps the cron job wasn't "absented:"?
19:35:55 <mordred> it has been noted in the past we should carefully review that - to ensure that it's not going to delete our production data. I'm pretty sure it isn't
19:36:00 <mordred> ianw: oh - this is track-upstream isn't it?
19:36:21 <mordred> yup. track-upstream. let's not run that on review-test :)
19:37:30 <ianw> ok, well i also managed to kill the main container on it, so it's currently doing nothing
19:37:53 <ianw> mordred: do you have time to subdue it back into shape, or do want me to look into it?
19:38:10 <mordred> ianw: I'll work on it
19:38:27 <ianw> cool, let's move on
19:38:35 <ianw> #topic General Topics
19:38:40 <fungi> related, the pull-request closer is still running out of cron
19:38:45 <fungi> d'oh, seconds too late
19:38:55 <ianw> no we can go back if you like
19:38:59 <fungi> nah
19:39:05 <fungi> let's keep moving
19:39:14 <fungi> i have corn and sausages on the grill
19:39:34 <ianw> i think the only active trusty host transition is wiki, and i don't think there's too much happening with that ATM?
19:40:26 <fungi> there is not, no
19:40:34 <ianw> ok
19:40:38 <fungi> it welcomes friends however
19:40:44 <ianw> #topic Time to retire the openstack-infra ML yet?
19:40:54 <ianw> #link http://lists.openstack.org/pipermail/openstack-infra/2020-July/006632.html
19:41:13 <mordred> ianw: remote:   https://review.opendev.org/739840 Don't install the track-upstream cron on review-test
19:42:47 <ianw> fungi: i don't think there's been any disagreement on the list ... if anyone has an objection to why this merge should not go ahead they should speak now or forever hold your peace :)
19:43:27 <fungi> indeed
19:43:35 <ianw> otherwise, i guess we declare the lists joined in holy matrimony
19:43:41 <fungi> its retirement is already on my personal schedule
19:43:46 <corvus> no objections
19:43:57 <ianw> mazel tov
19:44:10 <fungi> i look forward to having one fewer list to moderate
19:44:21 <ianw> #topic China telecom blocks
19:44:31 <ianw> so we currently still have these in place with iptables
19:44:49 <ianw> i think fungi has been most active in monitoring, what's the latest?
19:45:06 <fungi> and taking the open connections graph in cacti as an indicator, the activity is ongoing from chinanet and probably elsewhere
19:45:33 <ianw> #link http://lists.opendev.org/pipermail/service-discuss/2020-July/000053.html
19:45:39 <ianw> that's the discussion for posterity
19:45:42 <fungi> it stopped a lot over the weekend, but resumed late sunday utc and has been consuatnt
19:45:44 <fungi> constant
19:45:54 <ianw> #link https://review.opendev.org/738721 gitea reverse proxy
19:46:01 <ianw> #link https://review.opendev.org/738725 crawler reject rules
19:46:27 <ianw> that is, i think, the currently alternative solution, based on the fact the UA's making these requests appear to be from a very specific scraping script
19:46:43 <fungi> basically i think we need to choose between layer 3 filtering with a lot of collateral damage, or layer 7 filtering with a lot of added complexity. rock meet hard place
19:47:30 <corvus> i reckon we should try the layer 7 then
19:47:35 <ianw> yeah, last week we put up the robots.txt -- but this script does not obey that
19:47:41 <fungi> i'm personally at peace with the layer 7 solution, though it would be nice to not have to keep it in place indefinitely
19:48:10 <corvus> btw...
19:48:29 <corvus> how certain are we that these UA strings are non-current?
19:48:52 <ianw> that was what i was about to say ... someone should really double check on my work there
19:49:38 <fungi> some of that depends on how you define "current" as there is some software in use (albeit probably of questionable origin) which has decided this is the ua to report
19:50:14 <ianw> for anyone following
19:50:16 <ianw> #link https://review.opendev.org/#/c/738725/6/playbooks/roles/gitea/templates/gitea.vhost.j2
19:50:20 <ianw> is the UA's in question
19:50:31 <fungi> if mnaser is around, i'd love to get his input. he's the first member of our community to inquire about the status, so may have additional information about the current adverse impact
19:51:59 <ianw> the change also doesn't deploy into production, it just sets up the proxy and we still have to point haproxy at it
19:52:31 <corvus> other than that, maybe we just land them and see if we end up getting user reports?
19:52:36 <fungi> yeah, i'm cool approving that nowish, since it's not (yet) in a production code path
19:53:07 <fungi> but i also second corvus's suggestion
19:53:23 <corvus> i +3d
19:54:11 <ianw> yeah, i guess the requests are still coming over vexxhost, but rejecting them at layer 3 v accepting them and 403'ing them probably isn't a significant networking-back-end issue?
19:55:05 <fungi> agreed, the impact is when gitea has to process the git requests
19:55:22 <fungi> anything which stops them before that point is probably plenty sufficient
19:56:01 <ianw> ok, so wheels in motion there
19:56:10 <ianw> #topic Project Renames
19:56:16 <fungi> the traffic volume and request rates are not absurd on their own (though a bit aggressive and clearly ignoring modicum of a reasonable crawler)
19:56:50 <ianw> fungi: yes ... that script that appears to be hitting us does not meet the standard of "reasonable crawler" in any way :)
19:57:02 <ianw> there are a couple of renames, including a mis-spelling
19:57:25 <ianw> not sure how urgent it is
19:57:58 <ianw> and openstack/transparency-policy -> osf/transparency-policy
19:58:12 <fungi> i'm cool with handling a rename maintenance for this week if it's warranted. i'm around and will take a closer look at the commands i cut and paste from our docs in comparison to last time
19:58:37 <ianw> i'm out thu/fri .au time so no help
19:58:50 <fungi> the transparency-policy move is definitely non-urgent. that repo is basically dead since years
19:59:16 <ianw> perhaps we wait then
19:59:21 <fungi> the topiko->tobiko fix is the only one i expect has much urgency behind it
20:00:09 <fungi> but... folks involved signed off on the rename patch which incorporated that typo, so i'm not in any hurry either
20:00:14 <fungi> they seem to be patient
20:00:36 <ianw> ok, we can come back to it next week
20:00:43 <ianw> #topic Open Discussion
20:01:05 <ianw> we are at time, but if anyone would like to flag something i guess we can go a little over
20:01:28 <fungi> i'm good. stuff's about ready to come off the grill anyway
20:01:50 <fungi> and i need to stop obsessing over the bouncy castle next door
20:02:23 <ianw> we've unfortunately gone back into lockdown here :(
20:02:48 <ianw> so my daughter will have to cancel her bday party since we can't have anyone over
20:03:17 <ianw> it is certainly a year to remember
20:03:28 <ianw> with that, i think we're done, thanks everyone
20:03:31 <ianw> #endmeeting