Thursday, 2021-12-09

fungipython 3.10.1 yesterday, 3.11.0a3 today00:00
fungiseems like i'm always compiling a new python00:00
clarkbRelated to new releases this is a fun one. Gerrit's 3.5.0.1 release broke a bunch of plugins because they pulled out elasticsaerch support (since they went not open source) and elasticsearch was pulling in a dep for a number of plugins and it isn't there anymore00:01
fungioh, yeah, transitive deps silently satisfying direct deps is a major risk. it's bitten openstack projects before as well00:02
corvusianw: interested in reviewing https://review.opendev.org/820954 ?  it's the other half of a change you +2d00:05
corvusre keycloak00:05
ianwlgtm00:06
ianwsorry i think i meant to +2 that when i looked at the other bit00:06
corvus\o/ thx00:13
opendevreviewAde Lee proposed zuul/zuul-jobs master: DNM enable_fips role for zuul jobs  https://review.opendev.org/c/zuul/zuul-jobs/+/80703100:29
opendevreviewMerged opendev/system-config master: Add keycloak auth config to Zuul  https://review.opendev.org/c/opendev/system-config/+/82095400:51
fungiyay! https://zuul.opendev.org/t/openstack/build/20fccb043c35459194b1094b28586055/log/lists.openstack.org/exim4/mainlog#4701:13
fungimailman tried to notify me, exim got the notification and attempted delivery, then got its outbound smtp socket reset01:14
clarkbsuccessful failure01:14
clarkbthe best kind of failure01:14
fungii'll integrate the firewall fix, though the question remains whether we should start the mailman services in testinfra01:15
clarkbif it isn't necessary to test this properly I don't know that we need to. Though not starting them probably covered up that python path issue01:15
fungiyes01:16
clarkbit shouldn't hurt to start them if we've blocked smtp outbound. People can send mail in if they really like and it won't go anywhere01:16
fungiwell, sort of covered it up, actually the initscript does an exit 0 when python isn't found so systemd wouldn't have known the differenc01:16
fungie01:16
clarkbah01:17
clarkbanother successful failure :)01:17
opendevreviewJeremy Stanley proposed opendev/system-config master: Block outbound SMTP connections from test jobs  https://review.opendev.org/c/opendev/system-config/+/82090002:05
opendevreviewJeremy Stanley proposed opendev/system-config master: Copy Exim logs in system-config-run jobs  https://review.opendev.org/c/opendev/system-config/+/82089902:05
opendevreviewJeremy Stanley proposed opendev/system-config master: Collect mailman logs in deployment testing  https://review.opendev.org/c/opendev/system-config/+/82111202:05
opendevreviewJeremy Stanley proposed opendev/system-config master: Make sure /usr/bin/python is present for mailman  https://review.opendev.org/c/opendev/system-config/+/82109502:05
opendevreviewJeremy Stanley proposed opendev/system-config master: Use newlist's automate option  https://review.opendev.org/c/opendev/system-config/+/82039702:05
opendevreviewJeremy Stanley proposed opendev/system-config master: Restart mailman services when testing  https://review.opendev.org/c/opendev/system-config/+/82114402:05
*** rlandy|ruck|bbl is now known as rlandy|ruck02:19
*** rlandy|ruck is now known as rlandy|out02:23
ianwi'm finding it quite hard to the the zuul-client docker image to generate a secret02:43
ianw--infile doesn't help02:44
ianwso far i haven't figure out how to pipe input into it either02:47
ianwok, running with "-i", but not "-t", makes "cat file | docker run ... zuul-client encrypt ..." work02:49
*** bhagyashris_ is now known as bhagyashris03:02
Clark[m]ianw fwiw I think there is a python script in the tools dir of zuul yo do it as well03:07
Clark[m]You don't need auth for it as it grabs a pubkey to do the encryption03:07
ianwyeah, that is now giving a deprecated warning03:07
*** pojadhav|out is now known as pojadhav|rover03:18
opendevreviewJeremy Stanley proposed opendev/system-config master: Restart mailman services when testing  https://review.opendev.org/c/opendev/system-config/+/82114403:50
opendevreviewJeremy Stanley proposed opendev/system-config master: Use newlist's automate option  https://review.opendev.org/c/opendev/system-config/+/82039703:50
fungiokay, i think topic:mailman-lists is ready to go, finally05:00
opendevreviewIan Wienand proposed opendev/system-config master: infra-prod: write a secret to the bastion host  https://review.opendev.org/c/opendev/system-config/+/82115505:25
*** marios is now known as marios|ruck06:12
*** gibi_ is now known as gibi07:52
*** ysandeep is now known as ysandeep|lunch08:08
*** ysandeep|lunch is now known as ysandeep08:38
opendevreviewMerged openstack/project-config master: Add NVidia vGPU plugin charm to OpenStack charms  https://review.opendev.org/c/openstack/project-config/+/81981809:02
*** pojadhav|rover is now known as pojadhav|lunch09:07
*** pojadhav|lunch is now known as pojadhav|rover10:03
*** ysandeep is now known as ysandeep|afk10:21
*** redrobot6 is now known as redrobot10:23
*** jpena|off is now known as jpena10:35
*** ysandeep|afk is now known as ysandeep10:56
*** rlandy|out is now known as rlandy|ruck11:10
*** pojadhav|rover is now known as pojadhav|rover|brb11:42
*** pojadhav|rover|brb is now known as pojadhav|rover11:51
*** pojadhav|rover is now known as pojadhav|rover|brb12:02
*** pojadhav|rover|brb is now known as pojadhav|rover12:22
*** ykarel is now known as ykarel|away13:21
*** pojadhav|rover is now known as pojadhav|rover|brb14:18
*** pojadhav|rover|brb is now known as pojadhav|rover15:04
slittle1_having intermittent issues with 'git review -s'15:37
slittle1_trying to run a script that sets up the gerrit remote on all starlingx repos15:38
slittle1_seems like every second or third try hangs15:39
slittle1_I'm working around it with a 'tineout' and a retry15:39
slittle1_cat .gitreview15:41
slittle1_[gerrit]15:41
slittle1_host=review.opendev.org15:41
slittle1_port=2941815:41
slittle1_project=starlingx/distcloud-client.git15:41
slittle1_defaultbranch=master15:41
slittle1_as an example15:41
corvusi'm going to restart zuul-web with the new auth config; expect a several-minute outage (of web only; schedulers will continue)15:42
*** ysandeep is now known as ysandeep|out15:45
fungislittle1_: going over ipv4 or ipv6? sounds like there could be some intermittent network problems... are you seeing the same behavior from multiple locations?15:47
slittle1_ipv415:55
slittle1_single location15:56
slittle1_don't have the means to test from multiple locations at the moment15:56
slittle1_Problem running 'git remote update gerrit'15:57
slittle1_Fetching gerrit15:57
slittle1_ssh_exchange_identification: read: Connection reset by peer15:57
slittle1_fatal: Could not read from remote repository.15:57
slittle1_Please make sure you have the correct access rights15:57
slittle1_and the repository exists.15:57
slittle1_error: Could not fetch gerrit15:57
fungislittle1_: i'll see if i can reproduce from other places on the internet15:59
fungirunning `git remote update gerrit` in starlingx/distcloud-client in a loop isn't producing errors from my house but i'll try from some virtual machines in various cloud providers as well16:01
Clark[m]We limit connections per account. It this is happening concurrently or quickly enough that tcp hasn't closed completely that may the be cause16:01
Clark[m]We also limit by IP and if you go through NAT similar problem16:02
slittle1_ok, so I should try adding a delay between requests?  What delay do you recommend ?16:02
Clark[m]Well I'm suggesting this could be related but I don't know enough about your situation to be confident it is the cause.16:03
slittle1_How is the connect limit enforced?   how many connects over what time period ?16:04
Clark[m]There are two methods. The first is by iptables limiting to 100 connections per source IP. The other is Gerrit limiting to 96 per Gerrit account iirc16:06
Clark[m]If it were me I'd git review -s on demand and not try to do them in bulk16:06
opendevreviewMerged openstack/project-config master: Allow Zuul API access from keycloak server  https://review.opendev.org/c/openstack/project-config/+/82095616:08
slittle1_The 'git review -s' requests are serial, not parallel.  16:09
fungiyeah, unlikely to be either of the concurrent connection count limits in that case16:09
fungi(the limit of 96 concurrent ssh connections per account is enforced by the gerrit service, the limit of 100 concurrent ssh connections per source ip address is enforced by iptables/conntrack on the server, for future reference)16:10
Clark[m]The Gerrit ssh log may have hints. But I'm finishing a school run16:11
Clark[m]Similarly trying to reproduce with only ssh client on the client side with -vvv may be helpful16:12
fungii can't reproduce the same problem running `git remote update gerrit` in a tight loop from various places on the internet so far16:13
slittle1_any other anti spam/DOS measure I might be getting caught in?16:15
slittle1_I'd estimate ~200 of those requests over 2-3 minutes16:16
clarkbslittle1_: that maybe enough that tcp isn't fully closing16:16
clarkband you're hitting the tcp limit16:16
fungii doubt it's the conntrack overflow, since it's set to send icmp-port-unreachable not tcp reset (git's claiming to see the latter)16:18
*** pojadhav is now known as pojadhav|rover16:19
clarkbslittle1_: do you know approximately what time the last error occured? I can look at the gerrit sshd log16:21
slittle1_within the last 5 min16:23
clarkbok the sshd log doesn't seem to show any erorrs in that timeframe implying it is proably something before gerrit is involved16:25
clarkbperhaps a firewall on your end or some sort of asymmetric route causing routers/firewalls to get angry16:26
slittle1_I'll try again now16:27
clarkbI've approved https://review.opendev.org/c/opendev/system-config/+/818606 as I indicated I would yesterday (this is the lodgeit user upadte)16:28
clarkbif it has a sad I can amnually revert on the host then push a revert if the fix isn't straightforward16:28
noonedeadpunkwas that discussion about connection issues to opendev infrastructure?:)16:29
clarkbnoonedeadpunk: specifically to review.opendev.org over port 29418 over with ipv4 yes16:29
noonedeadpunkwell just for me right now git clone https://opendev.org/openstack/requirements /tmp/req ends with `GnuTLS recv error (-9): Error decoding the received TLS packet.`16:30
slittle1_got a bit further 16:31
clarkbnoonedeadpunk: that is a different system hosted in another part of the world. I doubt they are related, but I suppose it is possible16:31
slittle1_ssh://slittle1@review.opendev.org:29418/starlingx/portieris-armada-app.git did not work. Description: ssh_exchange_identification: read: Connection reset by peer16:31
slittle1_fatal: Could not read from remote repository.16:31
slittle1_Please make sure you have the correct access rights16:31
slittle1_and the repository exists.16:31
slittle1_Could not connect to gerrit.16:31
slittle1_Enter your gerrit username:16:31
noonedeadpunkcurl actually works, but you know - it;'s quite different proto used16:31
noonedeadpunkclarkb: do we have actually some rate limiting there?16:31
noonedeadpunkAs I was clonning quite a lot of repos at a time....16:32
clarkbnoonedeadpunk: we have "if you overload the system you'll break it and cause a fail over to another backend" rate limiting :)16:32
clarkbnoonedeadpunk: were you running OSA updates in a datacenter? we know that causes it to happen and had to ask osa to not ddos us16:32
noonedeadpunkmmm, I see )16:32
clarkbunfortunately git clones are not cheap and need significant amounts of memory. Eventually we run out.16:33
clarkbslittle1_: looks like the same error but in a different aprt of the process?16:33
noonedeadpunkWhile I'm aware about osa issue and we got exact reason why it's happening, and I really do some osa related stuff, it's not related :)16:33
clarkbslittle1_: the specific repo there gives me something new to look at in the logs16:33
noonedeadpunkI was retrieveing HEAD SHAs for openstack services so that shouldn't cause too much load 16:34
clarkbnoonedeadpunk: its actually the same16:35
clarkbgit has to load all the data into memory for most operations aiui16:35
clarkbthe resulting IO can differ but the IO and cpu impact to initiate operations doesn't differ by much16:35
slittle1_clarkb: it's just iterating through our starling git repos.  It got a bit further this time.16:36
noonedeadpunkum, so the issue when osa was ddosing was when it did quite the same but from each compute in the deployment16:36
noonedeadpunkah16:36
clarkber the memory, io and cpu to initiate don't differ much. The delta is the io afterwards16:36
noonedeadpunkI see16:36
clarkbslittle1_: right this is why I suggested doing it on demand earlier. Fwiw I don't see that request at all here16:36
noonedeadpunkbut well... we need to updated versions and do releases... I'm not sure I know other way how to grab top of stable/xena for example and make it persistant over time16:37
noonedeadpunkwe can do this slower though....16:37
clarkbnoonedeadpunk: well it should be fine if you do them sequentially16:37
noonedeadpunkyep, I did one by one16:38
slittle1_ultimately the goal is to create a branch on each repo, and to modify the defaultbranch of the .gitreview files in each repo16:38
noonedeadpunkand then process jsut stuck and it's been like 15 minutes already that I can't clone :(16:38
clarkbnoonedeadpunk: well are you cloning or checking the HEAD?16:38
noonedeadpunkso was wondering if there's some automated thing like fail2ban or dunno16:38
clarkbbecause I mentioned cloning and you said you weren't doing that. And no there is no fail2ban but we must load balance by source IP because git and if you overload your backend this can happen16:39
clarkbunfortauntely I'm also trying to debug a separate conenctivity issue to a separate service in another datacenter in a different country so juggling isn't easy16:39
noonedeadpunkclarkb: what script exactly was doing - `git ls-remote <repo> stable/xena`16:41
noonedeadpunkok, sorry, grab that16:41
noonedeadpunkthis can wait16:41
clarkbnoonedeadpunk: ok are you cloning then?16:41
clarkbnone of the backends indicate memory or system load pressure so likely not that16:41
noonedeadpunkas for me - git connection jsut hangs whatever I do16:42
noonedeadpunkoh, well, no16:42
noonedeadpunkgit ls-remote jsut worked16:42
noonedeadpunkclone not16:43
clarkbnoonedeadpunk: can you see which backend you are talking to by inspecting the ssl cert (we put the name of the backend in there too)16:44
noonedeadpunkI will probably just try to reboot....16:44
clarkbslittle1_: best I can tell based on lack of info in the logs on our end this is likely to be happening somewhere between you and us. Are you able to try ssh -vvv -p 29418 slittle1@review.opendev.org gerrit ls-projects and see if you can reproduce. Then maybe that gives us a bit more info16:45
noonedeadpunk CN=gitea01.opendev.org16:45
noonedeadpunkbut things jsut wen back to normal16:46
noonedeadpunkso I guess I had some stuck connection that wasn't closed properly...16:46
noonedeadpunkas I saw like 15% packet loss close to loadbalancer16:47
clarkbis it possible that vexxhost is having a widespread ipv4 routing problem?16:47
clarkb(thats just a long shot given what slittle1 observes in another datacenter but both are in vexxhost)16:47
clarkbto confirm gitea01 seems healthy. The gitea processes have been running for a couple days. Current free memory is good and there are no recent OOMKiller events16:48
clarkbslittle1_: are your connections running in parallel? I see 13 connections from your source currently 11 of which are established16:50
clarkbthat is still only 13% of our limit though so shouldn't be in danger of that. Mostly just curious16:51
*** pojadhav|rover is now known as pojadhav|out16:51
noonedeadpunkwell, connection to vexxhost never was reliable for me at least because of zayo being in the middle.... But packet loss was somewhere on the core router...16:51
clarkbnoonedeadpunk: also if it wasn't clear running an ls-remote sequentially the way you are doing is the correct method I think. I would expect that to work16:52
clarkbnoonedeadpunk: doing 200 at the same time might not :)16:52
noonedeadpunkit always worked at least before16:53
noonedeadpunkand that was exactly problem with osa upgrades16:53
clarkbslittle1_: now down to 6. So ya I don't think we're hitting that 100 limit unless it happens very quickly and everything backs off16:54
noonedeadpunkwe were too tolerant for failovers if things are broken on the deployer side (or they execute upgrade in wrong order)16:54
funginoonedeadpunk: ooh, so the cause of osa upgrades overwhelming us was finally identified? that's great news16:55
noonedeadpunkbut to get this fixed ppl would need to pull in fresh code...16:56
noonedeadpunkor follow docs while upgrading16:56
jrosserI think that of the people who were causing this we reached out to them all and no-one was able to help reproduce it16:56
noonedeadpunkboth are kind of unlikely in short term16:56
jrosserI would be in favour of adding an assert: to the code to make it just fail when this happens16:57
jrosserthough it technically is a valid configuration to use no local caching at all16:57
*** jpena is now known as jpena|off16:57
opendevreviewMerged opendev/system-config master: Switch lodgeit to run under a dedicated user  https://review.opendev.org/c/opendev/system-config/+/81860616:58
jrosseranyway - what noonedeadpunk is doing is trying to run a script to retrieve the SHA of stable/xena for all the OSA repos, nothing to do with a deployment16:59
jrosserit's needed for our release process16:59
clarkbyup, and from what I see things are fine on our side. noonedeadpunk indicated packetloss though16:59
*** marios|ruck is now known as marios|out16:59
clarkbI'm beginning to suspect there may be some Internet is having a fit problems near vexxhost right now issues16:59
slittle1_I suspect the extra connections relate to my use of 'timeout' to kill hung sessions. 17:00
clarkbbut those are always difficult to debug if you aren't on a client end with the problem and the server side doesn't see the issue beacuse packets don't reach it17:00
slittle1_The hung sessions are probably cases where ssh key exchange failed and it's prompting for user/pass17:01
slittle1_the script doesn't know how to respond to that, and I don't see it as the prompt is routed the /dev/null.  It's in a sub function, and the only thing I want coming out of stdout  is the string I'm expecting to parse.  I'll route stdout to stderr if I need another run17:04
slittle1_Ha, it finally passed17:05
slittle1_I'm afraid the 'git reviews' will hit the same issue17:06
clarkbslittle1_: it probably will17:06
clarkbnoonedeadpunk: fwiw I was just able to clone requirements at least 10 times (it ran in a while loop and I didn't count the exact number) via opendev.org to gitea01 (I am balanced to the same backend) over ipv4 successfully.17:09
clarkbfungi: ^ any better ideas on debugging slittle1_'s problem if the connections don't seem to show up in our logs. I suspect something external to us17:10
clarkbI guess run an mtr from slittle1_'s IP to review.opendev.org and see if there is packet loss. But it might be port specific etc17:10
fungigit review may prompt for account details if an ssh connection attempt fails at the wrong places in the script17:12
fungiit's likely just another manifestation of a connection issue17:12
clarkbfungi: ya I'm wondering if it is general internet unhappyness, maybe an assymetric route? Or slittle1_'s local firewall limiting connections to a single endpoint or a firewall cluster not allowing port 29418 out on a specific node etc17:13
clarkbsomething like that would explain why we never see the issue in our logs17:13
noonedeadpunkclarkb: it works nicely now as well17:13
fungiif it can be minimally reproduced with specific ssh commands, then we may be able to narrow it down with added verbosity to something like problematic route distribution, a pmtud blackhole, et cetera17:14
clarkbinfra-root: http://lists.openstack.org/pipermail/openstack-discuss/2021-December/026250.html that is proably a meeting we should try and attend. I'll mark it on my todo list but calling it out if others want to attend17:14
fungidepending on at what point the connection breaks17:14
clarkbfungi: slittle1_: ya so something like ssh -vvv -p 29418 slittle1@review.opendev.org gerrit ls-projects17:14
clarkband see if you can make that fail17:14
fungiwe've even seen examples of environments doing specific qos/dscp marking on ssh connections, causing them to get treated differently (in bad ways) from other tcp sessions, or particular firewalls with ssh-specific connection tracking features introducing nuanced inconsistencies17:16
clarkbpaste updated and I've been able to make this test paste just now https://paste.opendev.org/show/bE7I0dBfkoDBsGSDZYNT/ I think that is happy17:18
opendevreviewSorin Sbârnea proposed zuul/zuul-jobs master: Add tox-py310 job  https://review.opendev.org/c/zuul/zuul-jobs/+/82124717:20
clarkbfungi: can you check my comments on https://review.opendev.org/c/opendev/system-config/+/820900 ? I +2'd as nothing there seemed critical but didn't want ot approve in case it was worth updating17:22
fungithanks, replied to them17:26
clarkbfungi: I think I have a slight preference to aggregate by chain since each chain's rule behaviors are specific to that chain17:28
clarkbmaybe in a followup?17:28
opendevreviewSorin Sbârnea proposed zuul/zuul-jobs master: Add tox-py310 job  https://review.opendev.org/c/zuul/zuul-jobs/+/82124717:29
opendevreviewSorin Sbârnea proposed zuul/zuul-jobs master: Add tox-py310 job  https://review.opendev.org/c/zuul/zuul-jobs/+/82124717:30
slittle1_ran 'ssh -vvv -p 29418 slittle1@review.opendev.org gerrit ls-projects' ten times in rapid succession.  No issues17:31
clarkbfungi: also left a thought on https://review.opendev.org/c/opendev/system-config/+/821144 to make the test a bit more robust17:32
fungithanks17:33
slittle1_ran it in a tighter loop. failed on the 19'th iteration....17:35
opendevreviewSorin Sbârnea proposed zuul/zuul-jobs master: Add tox-py310 job  https://review.opendev.org/c/zuul/zuul-jobs/+/82124717:35
slittle1_debug1: Connecting to review.opendev.org [199.204.45.33] port 29418.17:36
slittle1_debug1: Connection established.17:36
slittle1_debug1: identity file /folk/slittle1/.ssh/openstack type 117:36
slittle1_debug1: key_load_public: No such file or directory17:36
slittle1_debug1: identity file /folk/slittle1/.ssh/openstack-cert type -117:36
slittle1_debug1: Enabling compatibility mode for protocol 2.017:36
slittle1_debug1: Local version string SSH-2.0-OpenSSH_7.417:36
slittle1_ssh_exchange_identification: read: Connection reset by peer17:36
clarkbok that indicates it is being killed very early in the protocol establishment. It gets far enough to create the tcp connection but then almost as soon as it starts to negotiate ssh on top of that a peer resets it (which can be a router or firewall in between)17:37
clarkbour firewall rules don't do resets17:37
clarkbslittle1_: did you just do that in a while loop? I'll run similar locally if so just to see if I can reproduce from here17:41
slittle1_yes17:42
slittle1_i=0; while [ $i -le 100 ]; do echo $i; i=$((i + 1)); ssh -vvv -p 29418 slittle1@review.opendev.org gerrit ls-projects; if [ $? -ne 0 ]; then break; fi; done17:42
clarkbok I just did similar with 30 iterations and had no problems.17:43
clarkband reran again just to be double sure. Definitely seems like something to do with your network connectivity. Whether local or upstream of you17:45
*** weechat1 is now known as amorin17:54
*** weechat1 is now known as amorin18:00
clarkbif you want to debug further the next step is probably a tcpdump to catch the reset and see where it originates from? fungi might have better ideas. That will liekly produce a large amount of data though18:10
fungithat *might* help narrow it down, but these days most middleboxes "spoof" tc resets on behalf of the remote address18:13
fungier, tcp resets18:13
fungiso all tcpdump will probably show you is that the server sent a tcp/rst packet, and a corresponding tcpdump on the server will show no such packet emitted18:13
fungibut it is unlikely to help in narrowing down which system between the client and server actually originated the reset18:14
fungii would say, the majority of the time i've seen those symptoms, it's either because of an overloaded state tracking/address translation table on a router selectively closing connections to keep under its limit, or a cascade effect failure due to running out of bridge table space on an ethernet switch somewhere18:16
fungithe intermittency can be further stretched by flow distribution across parallel devices, where one device is struggling but only a random sample of flows are sent through it18:18
clarkbyup tl;dr Internet18:19
fungigetting your isp to talk to vexxhost and/or their backbone providers might help get eyes on a problem, but usually the network providers are actually aware and are sitting on degraded states awaiting a maintenance window to replace/service something18:20
fungii'm just glad to no longer be one of the people making those decisions ;)18:21
fungipossibly of interest to some here, a summary of the recent pypi user feedback survey: https://pyfound.blogspot.com/2021/12/pypi-user-feedback-summary.html18:24
fungisurveys18:24
fungidecisions include adding paid organization accounts on pypi (free for community projects), and further requirements gathering on package namespacing18:27
clarkbfungi: for the lists ansible stuff. Did you want to push up a followup to do the chain move or just update the existing change? I'm thinking we should probably land the iptables update change first before anything else just to be sure it doesn't impact prod (it shouldn't as only the test all group gets rules)18:27
clarkbAnd then we should be able to land the set of lists specific changes in one block pretty safely18:28
fungiyeah, i'll revise the iptables change, i'd rather not merge too many different updates to our firewall handling, as each is a separate opportunity for breakage18:31
clarkb++18:32
fungiclarkb: for the debugging, would you prefer to record the ip(6)tables-save output some other way?18:38
fungii stuck the print statement where i did mainly so that it would be logged in close proximity to the assertion failures, but no idea if you had a chance to check whether that seemed too verbose to you18:38
opendevreviewSorin Sbârnea proposed zuul/zuul-jobs master: Add tox-py310 job  https://review.opendev.org/c/zuul/zuul-jobs/+/82124718:39
clarkbfungi: let me go look at the test logs18:39
clarkbfungi: oh huh it looks like pytest captures stdout and doesn't show it unless you fail? IN that case I think it is fine as is18:42
clarkbI was owrried a bunch of tests would be dumping iptables rules to the console log and making that noisy but doesn't seem to be th case. And if that check fails you want to see the rules18:42
fungiyeah, the output format itself also isn't awesome, it's a one-line list representation of all the lines output by the save command, but it was sufficient for me to finally find the normalized for for the rule i was trying to match in my test addition18:45
fungier, normalized form for18:46
opendevreviewJeremy Stanley proposed opendev/system-config master: Block outbound SMTP connections from test jobs  https://review.opendev.org/c/opendev/system-config/+/82090018:47
opendevreviewJeremy Stanley proposed opendev/system-config master: Copy Exim logs in system-config-run jobs  https://review.opendev.org/c/opendev/system-config/+/82089918:47
opendevreviewJeremy Stanley proposed opendev/system-config master: Collect mailman logs in deployment testing  https://review.opendev.org/c/opendev/system-config/+/82111218:47
opendevreviewJeremy Stanley proposed opendev/system-config master: Make sure /usr/bin/python is present for mailman  https://review.opendev.org/c/opendev/system-config/+/82109518:47
opendevreviewJeremy Stanley proposed opendev/system-config master: Restart mailman services when testing  https://review.opendev.org/c/opendev/system-config/+/82114418:47
opendevreviewJeremy Stanley proposed opendev/system-config master: Use newlist's automate option  https://review.opendev.org/c/opendev/system-config/+/82039718:47
*** sshnaidm is now known as sshnaidm|afk19:05
clarkbthat stack lgtm now. Thanks19:13
fungimuch obliged19:17
clarkbfungi: do you have time for https://review.opendev.org/c/opendev/gerritbot/+/818494 and parent?19:22
clarkbI should do an audit of the buster images that need bullseye updats and we can start doing them all19:22
clarkbI'll work on putting together this todo list as well as one for the user stuff this afternoon. Then we can work through it and know when we are done19:25
fungireviewed both of those, and thanks19:27
fungiinteresting run timeout on the mailman log collection change, i wonder if i've added too much to the job: https://zuul.opendev.org/t/openstack/build/d5aab74b18f348f0939f62c6bb116bb619:30
clarkbor maybe the node was really slow creating lists?19:40
fungimaybe19:42
fungithat change is earlier in the stack than the one which alters the newlist command invocation19:43
clarkbhttps://etherpad.opendev.org/p/opendev-container-maintenance starting to put the information together there19:56
clarkbNeed to take a break for lunch, but I'll try to get that etherpad as complete as possible. Then we can start pushing changes in a more organized manner to get through this. Previously it was pretty ad hoc (we've made decent progress though)20:22
slittle1_oops ... I think we missed something in the config of one of our starlingx repos21:08
slittle1_remote: error: branch refs/tags/vr/stx.6.0:21:08
slittle1_remote: You need 'Create Signed Tag' rights to push a signed tag.21:08
slittle1_remote: User: slittle121:08
slittle1_remote: Contact an administrator to fix the permissions21:08
slittle1_remote: Processing changes: refs: 1, done    21:08
slittle1_To ssh://review.opendev.org:29418/starlingx/metrics-server-armada-app.git21:08
slittle1_ ! [remote rejected] vr/stx.6.0 -> vr/stx.6.0 (prohibited by Gerrit: not permitted: create signed tag)21:08
slittle1_error: failed to push some refs to 'ssh://review.opendev.org:29418/starlingx/metrics-server-armada-app.git'21:08
clarkbslittle1_: you'll need to push a change to update your acls allowing you to push the signed tags21:09
clarkbif the acl is already there then you'll need to be added to the appropriate group21:09
clarkbslittle1_: https://opendev.org/openstack/project-config/src/branch/master/gerrit/acls/starlingx/metrics-server-armada-app.config#L1121:10
clarkbhttps://review.opendev.org/admin/groups/3086a3152fc635addcd00cd4823a1be0352fac1f,members21:11
slittle1_yah, should have included 'starlingx-release' 21:16
opendevreviewScott Little proposed openstack/project-config master: give starlingx-release branch and tag powers in metrics-server-armada-app  https://review.opendev.org/c/openstack/project-config/+/82132121:25
slittle1_https://review.opendev.org/c/openstack/project-config/+/821321 21:25
clarkbslittle1_: I'm not sure if you can double up the groups on one line like that21:27
clarkbbut also should you replace the core group with the release group anyway?21:27
slittle1_yes, that would be mor consisten with our norm21:27
opendevreviewScott Little proposed openstack/project-config master: give starlingx-release branch and tag powers in metrics-server-armada-app  https://review.opendev.org/c/openstack/project-config/+/82132121:29
slittle1_gotta get me a new keyboard21:30
clarkbianw: ok left comments on https://review.opendev.org/c/opendev/system-config/+/821155 tl;dr I think it does what it describes and that it is safe and unintrusive but also think we should have a discussion as a group about further plans before we get too far ahead. Happy to dedicate the majority of our next meeting to that if it would be helpful (or use email or do an ad hoc meeting22:28
clarkbetc)22:28
ianwthank you!  yes i agree on discussion22:32
ianwas far as i would want to go is having zuul write things in plain text on the bastion.  i could write a spec to that, if we like, or just an email22:33
clarkbI think eitherway works. I might have a slight preference for a spec as it helps outline everything in the code where we do that sort of thing22:35
ianwi wouldn't mind applying 821155 (not now, when it's quiet and i'm watching) and reverting after a successful run, just to confirm it works as intended22:37
ianwi think it does, but i've thought a lot of things about this changeset that haven't quite been true :)22:37
clarkbheh ya. I think its a good way to test the waters as the scope is qutie small and we can clean up after it easily when done22:38
clarkbok I think that etherpad is fairly compelte and I've sorted the lists by done, not applicable for one reason or another, and needs work22:40
clarkbI'm going to start pushing more changes up to bump to bullseye next22:41
fungiwe seem to have very few builds in progress for the openstack tenant at the moment, most builds seem to be queued22:43
fungithinking this may be all the branch creation events for starlingx repos, we saw something similar when the release team merged a change to add branches to all of the openstackansible repos earlier in the week22:45
clarkbexciting22:45
clarkbthere are a lot of events22:45
clarkbI guess we watch that and see if they move?22:45
fungicorvus suggested that the scheduler should be collapsing all the reconfigure events for those together, i think?22:45
fungiwe ended up getting out of the similar pileup from osa by doing a full scheduler restart and zk clear22:46
clarkbya might be worth double checking zuul isn't doing something wrong here too22:46
fungithe event queues should burn down on their own, but i don't know how rapidly. https://grafana.opendev.org/d/5Imot6EMk/zuul-status says some events are taking 15-30 minutes to process22:49
clarkbya I think the restart made things go faster because zuul would check all branches at the startup time and somehow that makes it go quicker?22:50
clarkbbut I hesitate to proceed with a restart because 1) zuul should be able to handle this and 2) I thought we though zuul would handle this? Probably a good idea to see if corvus has opinions22:51
fungiwell, it would only read them all once, rather than one for every new branch creation in one of the repos, i guess?22:51
opendevreviewClark Boylan proposed opendev/system-config master: Update the accessbot image to bullseye  https://review.opendev.org/c/opendev/system-config/+/82132822:52
fungikevinz: if you're around yet (i'm sure it's still early) we seem to have 19 server instances stuck in a "deleting" state (one i looked at is saying the task_state is deleting but the vm_state is building, with a creation date of 2021-11-19, i expect the others are similar but haven't confirmed)22:52
fungias a result we're not booting any new instances there until they're cleaned up22:52
opendevreviewClark Boylan proposed opendev/system-config master: Update the hound image to bullseye  https://review.opendev.org/c/opendev/system-config/+/82132922:55
clarkbthe queue sizes appear to be getting smaller23:00
opendevreviewClark Boylan proposed opendev/system-config master: Update limboria ircbot to bullseye  https://review.opendev.org/c/opendev/system-config/+/82133023:08
opendevreviewClark Boylan proposed opendev/system-config master: Install Limnoria from upstream  https://review.opendev.org/c/opendev/system-config/+/82133123:08
opendevreviewClark Boylan proposed opendev/system-config master: Update matrix-eavesdrop image to bullseye  https://review.opendev.org/c/opendev/system-config/+/82133223:11
opendevreviewClark Boylan proposed opendev/system-config master: Update refstack image to bullseye  https://review.opendev.org/c/opendev/system-config/+/82133523:26
ianwclarkb: for 821331 did they make it to the master branch yet?23:27
*** rlandy|ruck is now known as rlandy|out23:30
clarkbianw: they appaer to have. I cloned and git log showed them in history23:30
clarkbianw: but you should definitely double check23:30
clarkbI sort of figured we could get the changes up and then testing will tell us where bullseye is different and stuff will berak23:31
clarkbbut better to get this out there as a list of things we can take action on than a secret todo list :)23:31
clarkbuwsgi-base is going to be the complicated on that needs thinking since it is a base image with other consumers. We want to do what we did with python-base and python-buidler so I'll have to look at it a bit more closely once the others are moving along23:34
*** artom__ is now known as artom23:39
opendevreviewClark Boylan proposed opendev/system-config master: Properly build bullseye uwsgi-base docker images  https://review.opendev.org/c/opendev/system-config/+/82133923:47
clarkbok the uwsgi situation is a bit fun. I tried to cover it all in the commit message for ^. Lodgeit isn't actually done and will need an image rebuild once ^ lands23:48
opendevreviewClark Boylan proposed opendev/lodgeit master: Rebuild the lodgeit docker image  https://review.opendev.org/c/opendev/lodgeit/+/82134023:50
clarkbok I think that is a fairly compelte list of changes needed to bump our images up a debian release. Note I don't think we should approve them all at once and instead take a little time to make sure debian userland updates don't cause unexpectedchanges23:50
clarkbbut the vast majority of them should be fine as they don't rely on the userland for much23:50
fungimanagement events list for the openstack tenant is down to 4 now23:51
clarkbI guess tomorrow I'll look for any failures and maybe we can land a subset. Then we can also start looking at the uid updates. Hopefully that etherpad lays out the todos around this pretty clearly. I added a few others as well for mariadb and zookeeper that i noticed23:53
ianwclarkb: thanks, will double check.  i'll review the other bits this afternoon23:56

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!