Thursday, 2022-05-19

*** rlandy|bbl is now known as rlandy|out00:45
*** ykarel_ is now known as ykarel04:51
*** ysandeep|out is now known as ysandeep|rover04:52
*** pojadhav|afk is now known as pojadhav05:10
*** ysandeep|rover is now known as ysandeep|rover|brb05:51
*** ysandeep|rover|brb is now known as ysandeep|rover05:56
*** jpena|off is now known as jpena07:36
*** ysandeep|rover is now known as ysandeep|rover|lunch09:31
*** ysandeep|rover|lunch is now known as ysandeep|rover10:09
*** rlandy|out is now known as rlandy10:27
*** dviroel|out is now known as dviroel11:26
*** rlandy is now known as rlandy|mtg11:26
*** ysandeep|rover is now known as ysandeep|rover|afk11:33
*** rlandy|mtg is now known as rlandy12:02
*** ysandeep|rover|afk is now known as ysandeep|rover12:26
fricklerinfra-root: cf. https://review.opendev.org/c/zuul/zuul/+/837852/5/doc/source/client.rst how would you currently for example enqueue a patch into gate? the zuul-client on zuul02 doesn't know the --trigger option, it does work without it, though. do we need to do something before zuul moves on?12:27
frickleralso for reference I enqueued 842532,1 into gate to speed up unblocking devstack, is that worth a status log?12:30
fungifrickler: i do often #status log manual actions like that, just to serve as a clear record12:36
frickler#status log enqueued 842532,1 into gate to speed up unblocking devstack12:41
fungifrickler: i have this in my command history on zuul02:12:44
fungisudo zuul-client enqueue --tenant=openstack --pipeline=check --project=openstack/placement --change=825849,112:44
fricklerfungi: yes, that is what I used, but it seems from the above patch that that is deprecated and zuul-admin should be used now? or maybe I read that wrong12:47
fungioh, i thought it was `zuul` being deprecated in favor of `zuul-admin`12:49
fricklerI just noticed a regression in gerrit: if I hit rebase and start typing a patch ID or text, it shows patches from all projects, not just from the project the patch is against.12:50
fricklerfungi: there's a note added that says: For operations related to normal workflow like enqueue, dequeue, autohold and promote, the `zuul-client` CLI should be used instead.12:51
fricklerbut lateron there are still examples with zuul-admin for enqueue etc.12:51
fricklerbut maybe we should discuss on that patch rather than here12:52
fungithat seems like it might be a typo12:52
fungiyeah12:52
fungifrickler: oh!12:55
fungii get it now12:55
fungithe enqueue, dequeue, autohold and promote subcommands are being retained for now for backward compatibility, so the zuul (not zuul-client) documentation about them is being updated to indicate that you need to run zuul-admin instead of just zuul12:56
fungiit's not saying "use `zuul-admin enqueue` instead of `zuul-client enqueue`" but rather "...instead of `zuul enqueue`"12:57
fricklerfungi: ah, o.k., then I really read this the wrong way around and we should be fine with what we are doing13:02
fungii added a recommendation in a review comment to hopefully reduce that point of confusion13:03
fungiif you look at the change, it will print a deprecation note "Warning: this command is deprecated with zuul-admin, please use `zuul-client` instead"13:05
frickleryes, maybe then the docs should also be updated to not show deprecated examples, I'll add a comment with that on the patch13:06
fungithat's basically what my comment suggested13:06
fungier, well i suggested each subcommand's documentation entry mention it's deprecated, but you're right we probably should also drop the examples from those entries13:07
fungii could go either way on keeping examples for deprecated options until they're actually removed13:08
fungias long as we make it clear they're deprecated13:08
fricklerI suggested grouping them into a "Deprecated" section to make it more obvious13:10
fungioh, yep that could also work13:10
fungilooks like the ovh-bhs1 mirror has broken package updates, which has in turn broken our base deploy job, which is the reason ssl certs aren't getting updated13:17
fungii'll get it squared up13:17
fungihuh, it wants openafs-build-deps13:19
fungiwhich is going to drag in a slew of other packages13:19
fungispecifically for openafs-build-deps 1.8.8.1-2~ppa0~bionic13:21
fungithe gra1 mirror is also running bionic and has that version installed with no problem13:22
fungioh, no it doesn't have openafs-build-deps installed, just the dkms package for the openafs lkm13:23
fricklerfungi: openafs-build-deps shouldn't be installed on mirrors, should they?13:24
funginope!13:24
funginot sure when/why it was installed there13:25
fungi#status log Purged the unneeded openafs-build-deps package from mirror01.bhs1.ovh.opendev.org in order to unblock our base deploy job13:25
fungicleaning it up seems to have solved the problem13:25
fricklerhmm, doesn't show up in any of the apt logs, so must have been in place for a very long time. or installed manually with dpkg?13:27
fungiprobably not via dpkg -i since that's a virtual package13:27
TheJuliaI just observed a bunch of jobs hit this within the last few minutes: E: Failed to fetch https://mirror.bhs1.ovh.opendev.org/ubuntu/dists/focal/universe/binary-amd64/Packages  403  Forbidden [IP: 158.69.73.218 443]13:45
TheJuliaI don't know if should expect things to be working or not at the moment13:46
abhishekkclarkb,hi, around?13:56
abhishekkcan anyone help me to get this issue resolved, https://review.opendev.org/c/openstack/glance/+/84240013:57
abhishekkWe are stuck and not able to merge anything due to this error13:57
abhishekk2022-05-19 08:31:55.100787 | ubuntu-bionic | The conflict is caused by:13:57
abhishekk2022-05-19 08:31:55.100796 | ubuntu-bionic |     The user requested glance-store>=2.3.013:57
abhishekk2022-05-19 08:31:55.100804 | ubuntu-bionic |     The user requested (constraint) glance-store===4.0.013:57
rosmaitaif we hack upper-constraints to glance-store===3.0.0 , we can build the tox py36 environment locally13:58
rosmaitaand we tried as a short term thing to put glance-store>=2.3.0,<4.0.0 in requirements.txt, but that just gives us13:59
rosmaitaERROR: Cannot install glance-store<4.0.0 and >=2.3.0 because these package versions have conflicting dependencies.13:59
rosmaitaThe conflict is caused by:13:59
rosmaita    The user requested glance-store<4.0.0 and >=2.3.013:59
rosmaita    The user requested (constraint) glance-store===4.0.013:59
Clark[m]fungi: is The Julia's error related to your mirror surgery?14:00
Clark[m]abhishekk: rosmaita: I'm not sure how we would help with that. Seems like constraints and your requirements are in conflict so you need to change one or the other14:01
rosmaitaClark[m]: i misread that as "minor surgery" and was wondering what happened to fungi14:01
Clark[m]It is the server you need to worry about :)14:02
Clark[m]Anyway, I'm not really here yet and need to do a school run but can help in an hour or so14:02
abhishekkClark[m], if we do change requirement in glance to >= 4.0.0 then also it is failing with same error14:03
abhishekkThe conflict is caused by:14:03
abhishekk    The user requested glance-store>=4.0.014:03
abhishekk    The user requested (constraint) glance-store===4.0.014:03
Clark[m]abhishekk: because 4.0.0 requires python3.8 or newer?14:05
abhishekkyes, it does not support py36 and py3714:06
Clark[m]That is still a requirements and constraints conflict. You'll need to use different constraints under python 3.6 likely14:06
abhishekkack14:08
abhishekkany example for the same?14:08
rosmaitai think the setup.cfg in 4.0.0 tag still says 3.614:09
rosmaitanope was looking at the wrong branch14:10
abhishekkhttps://pypi.org/project/glance-store/ different here14:10
fungiClark[m]: TheJulia: oh, yes i didn't think about it but the openafs lkm may have been unloaded on the ovh-bhs1 mirror while it was being upgraded14:16
Clark[m]fungi it looks to still be broken if you load the root http dir14:17
fungilooks like it may still be that way. rebooting the mirror server now14:17
fungilsmod didn't show the module resident at all14:18
TheJuliasweet14:18
fungialso must have decided to run a filesystem check, or is otherwise timing out trying to load openafs now14:21
fungiA start job is running for OpenAFS client (5min 16s / 8min 3s)14:23
fungii want to say we've seen this before with the ovh mirrors14:23
fungii can't remember if forcing the dkms rebuild was necessary, or if it was just a boot-time race and rebooting usually solved it14:24
fungiit did eventually boot and seems afs is working now14:26
fungi#status log Distribution package mirrors on mirror01.bhs1.ovh.opendev.org were unavailable 13:25-14:25 UTC due to a package upgrade not removing and not reloading the openafs kernel module; related job errors can be safely rechecked14:27
fungiTheJulia: https://mirror.bhs1.ovh.opendev.org/ubuntu/dists/focal/universe/binary-amd64/Packages is returning content again. thanks for the heads up!14:27
*** pdeore is now known as pdeore|afk15:03
TheJuliafungi: thanks!15:04
clarkbfungi: I think it is just slow with iops? it is the arm64 mirrors that have similar trouble15:15
fungiahh, maybe related to the afs cache volume then15:16
clarkbabhishekk: rosmaita: ok I'm at a proper keyboard now. The first thing is to determine why you are testing with python3.6 in the first place. Is this on master? if so hasn't openstack dropped master python3.6 support?15:16
clarkbfungi: ya  I think it prunes it or verifies it or something and that takes time15:16
fungioh, right. we've blown away the cache before rebooting in the past when needed to avoid that15:16
rosmaitaclarkb: thanks, i think we have it sorted15:17
clarkbabhishekk: rosmaita: if you are still testing with 3.6 because master glance-store is expected to work with stable releease of openstack then you need to do something like carry constrain override files or convince requirements to carry special 3.6 rules15:17
rosmaitaapparently the zed template change got stuck in the gate15:17
clarkbPBR has examples of the special contraint override files iirc because it installs stuff back to python2.715:17
abhishekkyep :/15:17
rosmaitaclarkb: there is some kind of weird dependency on a job not defined in the glance repo that was breaking things, i don't understand it really, but abhishekk has a handle on it15:19
rosmaita"things" being preventing the zed template merge15:19
abhishekk++15:20
*** dviroel is now known as dviroel|lunch15:26
*** ysandeep|rover is now known as ysandeep|out15:29
clarkbjohnsom: I can see an argument to put it in the zuul job. Devstack probably doesn't need to know when its system is appropriately booted all it cares about is the user has triggered it and the user should be sure that the system si ready15:50
clarkbjohnsom: that might still be a change to the devstack repo, but in the zuul playbooks or roles that trigger stack.sh15:50
fungiyeah, conversely, the fips setup role shouldn't need to care that dns resolution works15:51
fungiand it's unclear what or when something in zuul-jobs can generically assure the system is "fully booted"15:52
johnsomI am flexible. I just liked the fact that devstack would stop and give a direct error when DNS was broken instead of the current behavior where it runs a while and complains of missing packages.15:52
clarkbright I'm beginning to think the best place for the check is in the devstack ansible stuff that runs stack.sh. Before running stack.sh you can do whatever system readiness checks are appropriate for devstack15:52
johnsomYeah, I looked briefly at systemd to get status, all of my systems reported "degraded" even though they are booted ok.15:52
clarkbwell I think you can do the same wait until nslookup returns a result check15:53
clarkbbut move it into the ansible running stack.sh and devstack proper can assume the user is triggering it on a ready system15:53
fungithough if systemd deferred allowing logins until unbound was actually started up, this would likely be a non-issue15:54
fungito ianw's point15:55
clarkbya but then you have to mdoify every third party ci setup's images15:55
clarkband systemd is designed to have these problems somewhat intentionally aiui15:56
clarkbit wants to give you access to the system as early as possible so that you can decide if you are ready to do additional work (to reduce total system startup cost)15:56
fungisomething else i liked about sysvinit15:56
clarkbya I think for a laptop it makes a lot of sense. For servers consistency and stability are desireable and worth a few seconds of startup cost15:57
fungiagreed. it seems like systemd optimized for portable devices, at the expense of adding instability and vague lack of startup assurances for servers15:59
fricklerfungi: ade_lee: taking the reboot issue here, because there is a deeper question hidden I think: should a consumer be able to expect a CI node to be working properly after a reboot. if the answer is "Yes, we as opendev want to support this", then we likely need to set up things like unbound in place and have tests for the images we build that ensure that this works16:00
fricklerif not, maybe the fips job setup needs a different approach16:00
fungigranted, the behavior with afs on the mirror servers is a clear indication that it does block on some things. openssh wouldn't allow me to log into the mirror until afs had started16:00
clarkbmy perspective on that is if you reboot in your job then you assume all responsibility for making the node happy16:00
johnsomThere is a nss-lookup.target that in theory should mean the system can accept queries, but I don't know how reliable that is16:00
clarkbwe already do ensure the node is ready for you when we hand you the node16:00
fungishould we rerun the validate-host role after reboots?16:01
clarkbrebooting throws a huge wrench in things. Its powerful that you are able to reboot at all (jenkins couldn't do it), btu its also something that jobs need to accept can be problematic and deal with16:01
clarkbfungi: that may be an option16:01
fungigranted, validate-host doesn't wait for the server to be booted, it just discards the build if the server isn't ready for the things it checks16:02
clarkba "wait for system to settle after reboot" role may be reasonable to add to zuul-jobs16:03
clarkbthen add that to the fips jobs after the reboot16:03
clarkba wait for network to be up (checked via ssh connectivity?), restart zuul console logger, and validate dns resolution are the three steps I can think of off the top of my head16:04
johnsomThe bonus with the switch to ansible is you would have access to the mirror FQDN16:06
ade_leesounds like a reasonable plan16:08
ade_leewhere would the ssh check connect to?16:09
ade_leeand whats the ansible parameter that specifies the mirror FQDN?16:10
clarkbade_lee: zuul-executor ansible being able to connect to the node that rebooted post reboot16:10
clarkbI'm sure you're already doing that in a wait_for or something. I'm just suggesting we can collect these common post reboot actiosn into a single role16:11
*** marios is now known as marios|out16:14
ade_leeclarkb, ack.  16:18
ade_leejohnsom, clarkb , fungi frickler -- If we agree on this approach, I can start putting such a role together.16:18
ade_leesomething that does a wait_for, restarts the zuul console, and checks dns by resolving opendev.org16:19
johnsomade_lee zuul_site_mirror_fqdn16:20
ade_leejohnsom, sorry yes ^^ that one :)16:20
fungito make the role generic, opendev.org should be at most a default for some rolevar so other sites can supply a record they expect to be resolvable by their nodes16:23
clarkbor make it a required value16:24
fungiyeah, ideally we'd ask it to resolve $zuul_site_mirror_fqdn in our deployment, but other sites may want to supply a different record to test16:25
*** dviroel|lunch is now known as dviroel16:25
ade_leeok - so resolve  $zuul_site_mirror_fqdn if set, else some rolevar which defaults to opendev.org ?16:26
clarkbade_lee: I think I would make it a required var input. Maybe suggest that you can use $zuul_site_mirror_fqdn if using mirrors16:27
clarkbthen people using the role can decide if google.com is more appropriate16:27
ade_leeok16:28
*** jpena is now known as jpena|off16:31
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Re-sync test-mirror-workspace-git-repos  https://review.opendev.org/c/zuul/zuul-jobs/+/84257216:34
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Correct git config item name in mirror-workspace-git-repos  https://review.opendev.org/c/zuul/zuul-jobs/+/84257316:34
clarkbcorvus (and the rest of infra-root) only our openstack tenant uses the deprecated zuul queue syntax according to that script as of the last 10 minutes or so16:34
clarkbI'm working on an email to the openstack-discuss list now. There are a number of projects so I'm going to do my best to catch the attention of those that need it16:35
fungithanks for checking it!16:35
johnsomI just posted patches for octavia and designate repos.16:35
fungiawesome!16:36
fungidon't forget your stable branches, if you also set it there16:36
johnsomYeah, that will be some work/time to get through.16:36
corvusclarkb: re projects that are unmaintained -- maybe consider dropping them from the zuul config if they don't clean up errors after a certain time?17:00
corvusthey can always be added back later easily enough17:00
clarkbya I think that is a reasonable appraoch to take17:00
fungii concur17:07
clarkbok email sent17:10
clarkbfungi: it is just over the size limit if you can moderate it through17:11
fungigladly17:12
clarkb(I attached a file with all the branch and file info for each project which did that)17:12
fungii discarded your message and approved the ones for hydraulics investment opportunities and shipping notices17:12
fungi(just kidding, it was the other way around)17:13
clarkbI'm always open to good investment opportunities17:14
*** timburke__ is now known as timburke17:28
opendevreviewMerged zuul/zuul-jobs master: Re-sync test-mirror-workspace-git-repos  https://review.opendev.org/c/zuul/zuul-jobs/+/84257218:01
opendevreviewMerged zuul/zuul-jobs master: Correct git config item name in mirror-workspace-git-repos  https://review.opendev.org/c/zuul/zuul-jobs/+/84257318:03
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Make test-prepare-workspace-git role  https://review.opendev.org/c/zuul/zuul-jobs/+/84259818:11
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Make test-prepare-workspace-git role  https://review.opendev.org/c/zuul/zuul-jobs/+/84259818:40
opendevreviewMerged zuul/zuul-jobs master: Make test-prepare-workspace-git role  https://review.opendev.org/c/zuul/zuul-jobs/+/84259819:29
opendevreviewJames E. Blair proposed opendev/base-jobs master: Switch base-test to test-prepare-workspace-git  https://review.opendev.org/c/opendev/base-jobs/+/84261520:12
corvusinfra-root: ^ a base-test change to prepare us for ansible 520:12
*** dviroel is now known as dviroel|out20:45
clarkbI'm trying to push a few of these queue update changes to projects that are likely abandoned (but figure make it easy for them to address and then remove from zuul projects list if they don't) and am discovering they don't even have valid .gitreview configs on their branches ugh20:59
clarkbI tried to push to stable/xena and it pushed a second patchset to my master change. Trying to push to ussuri-test made a second stable/ussuri chagne and overriding the branch doesn't seem to do anything due to some .gitreview config they have21:00
clarkbhttps://review.opendev.org/q/topic:fix-queue-config that was fun21:11
fungiyeah, i just make a point of always telling git-review what branch to target when doing that sort of thing, for exactly that reason21:58
clarkblearned my lesson22:22
fungimake no assumptions22:24
corvusfungi: got a sec for https://review.opendev.org/842615 ?22:24
fungilookin'22:24
corvuswould like to keep the base-test cycle moving22:24
fungime too, thanks!22:25
opendevreviewMerged opendev/base-jobs master: Switch base-test to test-prepare-workspace-git  https://review.opendev.org/c/opendev/base-jobs/+/84261522:29
opendevreviewMerged openstack/project-config master: update generate constraints to py38,39  https://review.opendev.org/c/openstack/project-config/+/83781522:35
ianwok, sorry i got distracted yesterday but i've parsed https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2f46993d83ff4abb310ef7b4beced56ba96f0d9d now23:43
clarkbI was disrtacted yesterday too :)23:43
ianwspec_store_bypass_disable and spectre_v2_user can both be set to "seccomp" or "prctl"23:44
ianwif it's seccomp, every seccomp() enabled thing will try to enable these flags for the process.  if it's prctl, it becomes an opt-in thing userland needs to set explicitly23:45
ianw0x4 is ssbd from previous investigation.  so it is presumably spec_store_bypass_disable that is causing the problems23:46
ianwi just need to rejig my test machine back to the standard kernel, but i'll try booting that with spec_store_bypass_disable=prctl and i expect the flood of messages goes away23:48
ianwoh, and that change modifed the default kernel to turn it to prctl, because, as the changelog goes into, they're basically unhelpful23:49
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Test base-test with Ansible 2.8  https://review.opendev.org/c/zuul/zuul-jobs/+/84264723:50
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Test base-test with Ansible 2.9  https://review.opendev.org/c/zuul/zuul-jobs/+/84264823:50
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Test base-test with Ansible 5  https://review.opendev.org/c/zuul/zuul-jobs/+/84264923:50
ianwso hopefully we can get an argument to someone upstream that jammy kernels should do the same and it can be backported.  however that doesn't solve the immediate issue of jammy nodes having hundreds of megabytes of logs on OVH23:51
clarkbis that something that can be set via sysfs on boot?23:51
clarkbor maybe via a kernel flag?23:52
clarkbif so we could make dib element modify that?23:52
ianwhrm, yes i wonder if sysctl works dynamically.  i'm just reinstalling kernels and can test23:54

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!