Monday, 2022-03-28

*** pojadhav- is now known as pojadhav04:55
*** marios is now known as marios|ruck04:57
*** soniya29 is now known as soniya29|ruck05:35
*** soniya29|ruck is now known as soniya29|rover05:37
*** ysandeep|out is now known as ysandeep06:23
fricklerinfra-root: zuul seems to be in a bad state, no check jobs running, trying to find out more details07:00
fricklerhttps://zuul.opendev.org/t/openstack/builds shows only periodic jobs for some hours. I do see events submitted from gerrit, but they seem to disappear somewhere07:09
ysandeepmarios|ruck, soniya29|rover ^^ fyi.. 07:12
soniya29|roverysandeep, ack07:12
fricklerthis looks like it could be related https://paste.opendev.org/show/b2jHWxtJG1ljGC3ePUsh/07:15
marios|ruckthanks ysandeep 07:16
frickler#status notice zuul isn't executing check jobs at the moment, investigation is ongoing, please be patient07:17
opendevstatusfrickler: sending notice07:17
-opendevstatus- NOTICE: zuul isn't executing check jobs at the moment, investigation is ongoing, please be patient07:17
opendevreviewArnaud Morin proposed openstack/project-config master: [OVH/GRA1] Disable nodepool temporarily  https://review.opendev.org/c/openstack/project-config/+/83541507:26
swestfrickler: re. the exception: this shouldn't block when it happens in during tenant trigger processing. the exception is the result of a data race when reading the state while it is currently updated07:38
swest * frickler: re. the exception: this shouldn't block anything when it happens during tenant trigger processing. the exception is the result of a data race when reading the state while it is currently updated07:38
*** jpena|off is now known as jpena07:39
*** soniya29|rover is now known as soniya29|rover|lunch07:41
*** ysandeep is now known as ysandeep|lunch07:49
fricklerswest: right, it just matched the time when I though the issue might have started, but looking further, this seems to happen pretty often, so likely unrelated08:14
fricklertoo bad we have so many errors occurring regulary08:15
fricklergerrit is even worse, the log is a continous stream of error messages08:15
fricklerswest: do you think I should try restarting a scheduler? or better wait for further investigation?08:20
fricklerbtw I tested that gate jobs are not being triggered as well, so only periodic jobs working. not sure if manually enqueueing something might be interesting?08:22
swestfrickler: I think you could try restarting a scheduler. As periodic jobs are starting it seems to be more of a problem related to gerrit events and not a general issue.08:23
swestdo you see any exceptions related to Gerrit event processing?08:25
fricklerrestarting zuul01 scheduler now08:31
fricklerswest: no exceptions, just something like this and then nothing matching that change afterwards 2022-03-28 06:37:04,258 DEBUG zuul.zk.event_queues.ConnectionEventQueue: Submitting connection event to queue /zuul/events/connection/gerrit/events/queue:08:33
fricklerrestart has helped. check 50, gate 2208:34
fricklerseems zuul is now processing all the previous events, like from https://review.opendev.org/83541308:35
swestfrickler: sounds like the gerrit event pre processing was stuck somehow08:35
fricklerah, I probably should've collected stacktraces08:36
fricklertrying that for zuul02, maybe is still has the broken state08:36
swestonly one of the schedulers is elected for processing the Gerrit events. but the stacktrace might still be helpful08:38
*** ysandeep|lunch is now known as ysandeep08:45
IanIDENTIFY08:45
fricklerlikely too late, seems zuul02 started working as soon as zuul01 and was elected then08:45
fricklerIan: nope08:45
Iansorry.08:46
fricklerIan: np, just kidding ;)08:47
fricklerthis has a gap between 01:12:49,653 and 08:25:42,586: grep Forwarding /var/log/zuul/debug.log|grep -v " timer "|grep -v " pull_request "08:47
Ianalways knew IRC existed. never messe with it08:47
fricklerswest: let's hope that if it is a regression in recent zuul code, it will be triggered again soonish08:48
opendevreviewMerged openstack/project-config master: [OVH/GRA1] Disable nodepool temporarily  https://review.opendev.org/c/openstack/project-config/+/83541509:04
*** soniya29|rover|lunch is now known as soniya29|rover09:18
opendevreviewArnaud Morin proposed openstack/project-config master: Revert "[OVH/GRA1] Disable nodepool temporarily"  https://review.opendev.org/c/openstack/project-config/+/83542209:23
opendevreviewChris MacNaughton proposed openstack/project-config master: Add Ganesha based Ceph NFS Charm  https://review.opendev.org/c/openstack/project-config/+/83543010:17
dpawlikIan, fungi, clarkb: after merging that change https://review.opendev.org/c/openstack/ci-log-processing/+/833011 logsender will take more time to parse the log files, so can we resize once again logscraper01.openstack.org to have 4vcpus? Right now is 2.10:32
dpawliklogsender seems to be better that logstash, so I will ask Reed to remove that stack related to logstash.10:32
dpawliknot you reed :P10:32
opendevreviewAndy Ladjadj proposed zuul/zuul-jobs master: [upload-logs-base] add public url attribute  https://review.opendev.org/c/zuul/zuul-jobs/+/83404311:34
opendevreviewAndy Ladjadj proposed zuul/zuul-jobs master: [upload-logs-base] add public url attribute  https://review.opendev.org/c/zuul/zuul-jobs/+/83404311:35
opendevreviewAndy Ladjadj proposed zuul/zuul-jobs master: [upload-logs-base] add public url attribute  https://review.opendev.org/c/zuul/zuul-jobs/+/83404311:36
*** dviroel|pto is now known as dviroel11:52
fungidpawlik: i can try to find time in a little while, seems like there's a bunch to catch up on first this morning11:58
*** pojadhav is now known as pojadhav|afk11:59
dpawlikfungi: ok12:02
ironfootHi there, I'm somehow struggling to configure my environment to send a patch to gerrit.12:17
ironfootI've done this in the past, and I can't figure out what's wrong. The main issue is that my configured ssh keys don't seem to work12:18
fungiironfoot: you're following the setup guide in https://docs.opendev.org/opendev/infra-manual or some other instructions?12:18
fricklerironfoot: are you using fedora35 or some other distro with recent openssh?12:18
ironfootdidn't start reading that from scratch, as I managed to sent some patches in the past12:18
ironfootfrickler: oh no, is that an issue?12:19
ironfoot(yes, fedora 35)12:19
fungiahh, yep12:19
fricklerironfoot: yes, gerrit has some issue with rsa keys and new openssh. workaround is to use other key type like elliptic curve12:19
frickleror some special ssh options that I can remember12:20
fricklercan't12:20
ironfootright, will create a new key :)12:20
fungifor some reason, new versions of openssh have elected to fall back to ssh_rsa host protocol when an rsa key is used, but also blocked use of ssh_rsa at the same time. there's finally support for negotiating rsa key exchanges with stronger hash algorithm in gerrit's development branch, so hopefully this will cease to be a problem soon12:22
fungi(once we upgrade to gerrit 3.6.x i think?)12:22
ironfootcan confirm, that fixed the issue12:23
ironfootor worked around it12:23
ironfootthanks frickler and fungi :D12:24
fungiyw12:24
*** ysandeep is now known as ysandeep|afk12:25
*** dviroel is now known as dviroel|brb12:43
tobias-urdinany known issues cloning from gerrit? i can access gerrit but cloning just hangs with no progress for the last 5-10 minutes now atleast https://paste.opendev.org/show/b4wkca8DepfpnRnMHMJS/12:53
fricklertobias-urdin: I don't know any particular current issue, but the general reply likely is that you shouldn't do this for performance reasons. clone from opendev.org which is our scalable gitea farm instead. you can then add gerrit as second remote if needed after that13:00
fungialso the gitea servers have copies of all the gerrit change refs, git notes, and so on. there's nothing you can pull from gerrit via git which isn't also in the gitea servers13:04
*** ysandeep|afk is now known as ysandeep13:14
tobias-urdinack, it went away for a coffe and 15 minutes later it was cloned, sorry for the noice :)13:21
tobias-urdinI went away :p13:21
*** dviroel|brb is now known as dviroel13:27
*** pojadhav|afk is now known as pojadhav14:02
fungidpawlik: okay, i'm going to try to resize it from v3-starter-2 which has 4gb ram and 2 vcpus to v2-highcpu-4 which has the same 4gb ram but 4 vcpus14:17
fungiin progress now14:18
fungionce it's up and you have a chance to double-check it looks okay, let me know and i'll confirm the resize through the api14:18
fungilooks like it's booted14:21
dpawlikfungi: yep, it works. Thank you14:27
fungicool, i've marked it confirmed through the api now as well14:28
*** artom__ is now known as artom15:03
*** marios|ruck is now known as marios|ruck|call15:23
clarkbfungi: yup gerrit 3.6 should fix it15:24
clarkbfrickler: swest: to clarify a single scheduler handles a single pipeline at a time. But each pass thorugh can be the other csheduler. The elect each othe rbut not necessarily for hte long term if that makes sense15:27
swestclarkb: I was talking about the Gerrit event preprocessing 15:34
clarkbah yup15:35
clarkbsorry I saw check queue wasn't processing so thought that was the focus. Didn't realize that event processing beyond that was also impacted15:35
jrosserdoes anyone know how the version of setuptools is decided here (its an openstack-tox-docs job) https://zuul.opendev.org/t/openstack/build/5027f55aed1444ec96ac10ce7df6ecb8/log/job-output.txt#64515:49
clarkbjrosser: they are bundled by virtualenv15:50
jrosserah ok - so to get a later setuptools would require a later release of virtualenv15:51
clarkbyes, or yo uneed to manually update setuptools afterwards15:51
fungiright. the virtualenv 20.14.0 release on friday updated setuptools from 60.10.0 to 61.1.016:01
fungiaccording to its changelog16:01
*** marios|ruck|call is now known as marios|ruck16:03
jrosserfungi: i've just done some experiments here and it looks like 61.1.0 is good enough to fox what failed in that job16:04
jrosserso hopefully this turns into a no-op16:04
jrosserfix not fox :)16:05
fungiyou'll outfox the bug16:06
jrosseroh wait - that job already installs virtualenv-20.14.0-py2.py3-none-any.whl16:06
fungiwhat's the error you're encountering?16:06
fungiis the the multiple top-level packages error?16:06
jrosseryes it is16:06
jrosserhttps://github.com/pypa/setuptools/issues/319716:07
fungiokay, then that's the same problem everyone else ran into with setuptools 6116:07
fungisome projects (e.g. tripleo) merged workarounds on friday when it first hit16:07
jrosserindeed - i was looking to avoid having to do that in ~50 repos if theres already a fix in stuptools16:08
fungionly repos with multiple top-level packages in them run into this, as the error indicates, so thankfully it's a fairly small number of projects impacted16:08
clarkbwhat version of steuptools has the fix?16:08
fungibut i guess that's a design pattern adopted across a large swath of openstack-ansible?16:08
jrosseralternatively, we have some other mistake in our config that triggers this and we should instead fix that16:08
jrosseryes, it's pretty much copy/paste everywhere16:09
clarkbfungi: the issue is when you have playbooks/ and roles/ and so on they all have python stuff in them and setuptools has a sad. system-config has the same problem16:09
fungiunless i misread the setuptools discussion, this is working as designed16:09
fungianything using pbr is not impacted, right?16:09
clarkbfungi: no16:10
jrosserclarkb: i just tried a venv here with 61.0.0 and it broke like my zuul job, 61.1.0 seemed to work OK16:10
clarkbsystem-config uses pbr and is affected16:10
fungii thought pbr assembles the manifest.in rather than relying on setuptools' discovery16:10
clarkbfungi: it does, but setuptools still runs the discovery and fails stuff16:10
fungiahh, so still have to tell setuptools not to try to discover things16:10
clarkbfungi: this is why the tripleo workaround works16:10
clarkbthey told setuptools there are no package modules don't discover things. Then pbr did its discovery and populated the package content16:11
fungiyeah, looks like system-config does indeed rely on pbr16:11
clarkbif 61.1.0 fixes things then we are good I think16:11
jrosseronce there is a release of virtualenv?16:13
*** dviroel is now known as dviroel|lunch16:13
fungior forcing a setuptools upgrade in the env16:13
fungi"#3202: Changed behaviour of auto-discovery to not explicitly expand package_dir for flat-layouts and to not use relative paths starting with ./."16:13
fungii guess that's the one?16:13
clarkbjrosser: oh yup its still 61.0.0 in virtualenv got it. Need an update there too16:13
jrosseris there a way to upgrade setuptools on a job which i'm a sort of drive-by consumer of, like openstack-tox-docs?16:14
fungiahh, no, it was this:16:14
fungi"#3211: Disabled auto-discovery when distribution class has a configuration attribute (e.g. when the setup.py script contains setup(..., configuration=...)). This is done to ensure extension-only packages created with numpy.distutils.misc_util.Configuration are not broken by the safe guard behaviour to avoid accidental multiple top-level packages in a flat-layout."16:14
clarkbIf tox has a flag to tell virtualenv to update setuptools then we could set that flag in our base tox job for a while maybe16:14
fungiit does, yes16:14
*** marios|ruck is now known as marios|out16:15
fungiwell, maybe not a cli option16:15
fungiit has an option you can set in tox.ini16:15
fungimaybe there's a corresponding envvar, checking16:15
clarkbfungi: does pbr set configuration=?16:15
clarkbI don't understand why 3211 would fix it16:15
fungiclarkb: yeah, i don't understand why 3211 would fix it either, but issue 3197 claims to be "fixed" by it16:16
clarkbah16:16
fungiso maybe it was fixed by a combination of 3211 and 320216:17
fungihttps://tox.wiki/en/latest/config.html#conf-requires doesn't mention any envvar you can do that with16:20
*** ysandeep is now known as ysandeep|out16:20
jrosseroh i see why our other jobs don't fail with this, we don't usually use the bundled setuptools16:22
jrossercurrently the only thing i can find to do in an OSA job is put setuptools==60.9.3 in the tox.ini `deps =`, but thats super-fragile as it'll blow up as soon as the setuptools in u-c moves 16:44
*** jpena is now known as jpena|off16:52
*** dviroel|lunch is now known as dviroel16:55
fungijrosser: setting requires setuptools>=61.1 in tox.ini won't solve it?16:59
fungiis the problem that older 61.0.0 is getting called into and choking before tox gets a chance to update the virtualenv?17:00
jrosseroh maybe you are right there17:00
fungi[tox] requires = setuptools>=61.117:02
fungiis what i'm talking about, to be clear17:02
fungi(the config option i linked above has an example too)17:02
jrosserfungi: in a simple environment i think that would work, however https://paste.opendev.org/show/bF5ZroGXqFcoh1lWJnZn/17:07
jrosserit's a bit catch-22, as `pip install -chttps://releases.openstack.org/constraints/upper/master 'setuptools>=61.1'` is never going to work17:08
clarkbsetuptools shouldn't be in constraints17:09
jrosserunfortunately it is17:09
clarkb(I've said this ove rand over again and I guess I don't get listened to)17:09
jrosserin other OSA repos we download a copy of u-c and sed out all the nonsense like this17:09
fungiyeah, i think the reason people stuffed it into upper-constraints.txt is because that's what devstack uses to install setuptools17:10
jrosserhaving said that, the proposal bot removes setuptools here https://review.opendev.org/c/openstack/requirements/+/83532917:11
fungioh good17:21
jrosseri don't understand why it does that though :) it might come back!17:21
fricklerit does that because pip freeze without --all skips setuptools and others. afaict prometheanfire keeps readding it manually17:25
fungiahh, that's too bad17:25
fungisince the result is that we're now pinning to one which is breaking some projects and preventing them from updating to a newer setuptools17:25
clarkbyes this is why you aren't supposed to pin setuptools that way17:26
clarkbits a chicken and egg problem that needs to be solved outside anyway so you should just avoid introducing errors17:26
clarkbbasically if you have functional problems constraints won't help you. All constraints can do is over constrain you making things worse in when it comes to setuptools17:27
jrosserit would be very nice if it was removed from u-c and didnt come back17:30
jrosserif it's true that it's there to determine the version for devstack, thats not a great reason imho17:30
fungiyes, devstack could still do that in its own repository17:31
*** lajoskatona_ is now known as lajoskatona18:27
prometheanfirefrickler: yep :|19:27
clarkbI've updated https://github.com/go-gitea/gitea/issues/19118 with more verbose client logs as requested though I'm not sure they are helpful20:18
fungii saw, thanks!20:20
clarkbI spent far too much time using a variety of GIT_TRACE_* flags trying to find those with enough verbosity to likely be helpful and remove those that add just noise20:21
clarkbUnfortuately the trace on hirsute looks completely different so hard to compare them directly :/20:21
corvusi'm looking into the potential zuul gerrit event processing bug; i suspect we are stuck right now, but still collecting events.  i'd like to leave it as is while i research.  but if necessary, we can restart a scheduler to unstick it temporarily.20:23
fungithanks for the heads up20:23
*** dviroel is now known as dviroel|out20:30
opendevreviewmelanie witt proposed openstack/project-config master: grafana ceph: add nova stable/(xena|yoga) branches  https://review.opendev.org/c/openstack/project-config/+/83551420:31
clarkbI'm going to start putting the meeting agenda together. I think I've been a bit distracted and not sure if things are missing. Anything I hsould add?21:11
corvusclarkb: fungi i would like to monkey-patch https://review.opendev.org/835518 into production; objections?21:12
corvus(i have run the zuul test suite on that locally with no errors)21:12
clarkbcorvus: are we hitting this now due to the updates for circular dependency stuff?21:14
clarkbbut ya that patch looks small enough that it should be ok to patch in21:14
corvusclarkb: yes.  it is unclear to me at this point whether we are looping, or if we are just still in an exponential tree.  i think the submitted together change can cause an exponential explosion in queries in the best case, and i think looping is theoretically possible in the worst case.  i've only confirmed exponential behavior in the tests since they produce well-behaved responses.21:19
corvusthe assert in that test for 8 queries is something like 2748 without the fix21:20
clarkboh wow21:21
fungisounds good, thanks for looking into it21:21
corvusi think the submitted-together contribution is mostly just adding an extra path; we were likely doing far too many queries alreday21:21
corvusbut exponents being what they are, we notice now :)21:21
clarkbah there i sthe recursion. _getChange calls _updateChange which calls _updateChangeDependencies again21:25
clarkband so the issue is we aren't accumulating the complete set of changes we can short circuit on as quickly as possibly21:25
clarkbThe branch that generates log.debug("Change %s is in history", change) isn't hit when it could be21:25
clarkbI guess the laternative would be to return the history and update it, but this seems like it should work just fine21:26
clarkbfungi: is the openstack release Wednesday or Thursday?21:53
rlandyhello ... there is only one check job in https://zuul.opendev.org/t/openstack/status21:56
rlandyand https://review.opendev.org/c/openstack/tripleo-ci/+/835101 has not shown up21:56
clarkbrlandy: yes see scrollback. corvus thinks the issue has been identified and we're working on patching a fix in21:57
rlandyk - reading back ...21:57
rlandythank you21:57
corvusclarkb: remote:   https://review.opendev.org/c/zuul/zuul/+/835522 Add more submitted-together tests [NEW]        21:58
corvusi wanted to confirm a few more things in unit tests before executing the patch21:58
corvusi'm happy with that, so i will proceed now21:58
clarkbsounds good. I'll work on reviewing that followup testing change now21:58
clarkbcorvus: I think that second change will fail pep8 due to line lengths22:03
clarkblgtm otherwise22:05
corvusclarkb: thx; i'm going to add a reno to  the first one (so we can do a 5.2.1) ill fix it then22:06
corvus#status log monkeypatched https://review.opendev.org/835518 into running zuul schedulers22:06
opendevstatuscorvus: finished logging22:06
fungiclarkb: release activities are starting around 11:30 utc on wednesday according to my notes. at least that's when i set my reminder to be on hand. maybe it's starting at noon utc22:09
fungicorvus: i see backlogged changes starting to be enqueued, thanks22:09
clarkbI've made note of it on the agenda for our meeting tomorrow. I'll get that sent out in a bit once I'm sure there isn't anything else I want toadd to it first22:10
clarkbalso I don't know I'll be up that early, but can do my best to check wednesday morning once I am up22:10
fungithanks22:10
fungiand yeah, the assumption is that release work will be concluded by 15:00z22:11
fungiso around 8am your time22:11
fungii have an appointment i need to get to about an hour after22:11
clarkbI should be around so no prolbem22:12
fungiclarkb: also you got a response to your gitea debug info22:12
fungithat was quick22:12
clarkbyup I think the shanghai day must be starting22:13
*** rlandy is now known as rlandy|out22:55

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!