Friday, 2022-04-01

clarkbI assume then that centos 8 and other distros that work in rax where static networking is required are converting the move event to an add then?00:00
clarkbI think that is related to my remaining concern with triggering on a move. If we do that will we run twice? I guess that is ok because glean will see the config already exists and noop exit00:00
ianwi would say that's right.  systemd/udev sends a netlink message IFLA_IFNAME to trigger the rename.  the kernel handles that, and eventually sends the udev event via device_rename->kobject_rename, where udev gets that event00:00
ianw... where systemd-udevd gets ...00:01
ianw... while interesting, i don't think really any closer to understanding what events should be triggered her00:05
ianwhere00:05
clarkbwell I think that says the move event is expected00:07
clarkbwhat we don't know is if we should get an add after the move00:07
ianwi wonder if we should do something with DEVPATH_OLD in glean-udev.rules00:07
ianwclarkb: yes, agree on that00:07
clarkbconsidering that the comments seem to indicate that moves are largely just for renaming network interfaces I think we can fairly safely match on move too?00:10
clarkbthen as long as matching move and add (say if some distros proxy move to an add) is safe the nwe're ok00:10
ianwclarkb: yeah, i think there's enough due-diligence that we don't think this is an upstream issue but something we're missing, and so adding |move (and maybe |change?) is where we're at00:16
clarkbwell I think upstream may need to add the add event as there is enough implication that that was existing behavior and is behavior in other distros?00:23
clarkbI mean I guess it isn't strictly required that that dod that since we can match on move00:24
clarkb*that they do that00:24
clarkbbut it is annoying when behaviors like this randomly change00:24
clarkbudev should be fairly stable imo00:24
ianwwhat i might try is bring up a plain/from upstream qcow centos 8-stream and 9-stream, run a udev trace and rename the devices via "ip" commands?00:25
ianwthat should give us a trace of the events?  if that differs, we know somethings up00:25
clarkbianw: you might also just compare our cntos 8 stream boot to centos 9? Also `udevadm info /sys/class/net/ens3` is useful00:26
clarkbthere shouldbe a SYSTEMD_WANTS value in there if it is working00:27
clarkbianw: any reason for me to keep my first test node up in bhs1 or should I delete it now?00:27
ianwumm, that is the node with the "bad" environment, right?00:27
clarkbianw: its the one that I modified to have ACTION=="add|move" to test that that fixed things00:28
clarkbso it is happy now I think00:28
clarkbbut you can revert that rules file and reboot to generate current behavior from the image00:28
ianwahh maybe keep it, just until we either decide to file a bug report or do that00:28
ianwi'm just thinking get glean out of the picutre for the rename trace; to avoid any confusion for a potential bug report00:28
clarkbah yup that makes sense00:29
clarkbok I'll keep the instance up. I can delete it tomorrow if this is conlcuded00:29
ianwi'll try and get two traces of a manual rename now.  i feel like that's the best way forward to decide where things might have changed00:29
clarkbbut now I need to figure out what our dinner plan is00:30
ianwok, renaming only issues a MOVE udev event on centos-802:25
mgagneFYI I restarted some services on iweb cloud. that could explain why some instances were stuck in deleting state.03:18
ianwclarkb: i have a theory -> https://etherpad.opendev.org/p/centos-9-glean-renaming03:58
ianwit seems that on RAX centos-8 we are *not* renaming the devices.  they are eth0/eth1 -- and glean is working03:58
ianwmy theory is that there is a driver issue there with /sys/class/net/eth0/name_assign_type returning an invalid value.  this is what systemd/udev uses to decide if the device should be renamed.04:00
ianwit's probing that value, getting -EINVAL or whatever and failing out without renaming04:00
ianwi'm assuming that is fixed on 9-stream either in the kernel or systemd/udev and now it *is* deciding to rename04:01
ianwand then we fall back to what you've already discovered; that we are not matching the "move" event and thus not setting up glean correctly04:02
ianwhowever, on other clouds, we haven't noticed because, as you note, they fall back to dhcp.  the only xen+!dhcp combo we have is RAX, and because the interface wasn't being renamed from eth0 we just happened to work04:03
ianwto add extra confusion to all this, the upstream .qcow2 images set net.ifnames=0 on their default command line.  so they don't rename interfaces, by design.  we do not set that anywhere04:04
*** Guest318 is now known as diablo_rojo_phone04:06
*** diablo_rojo_phone is now known as diablo_rojo04:10
*** ysandeep|out is now known as ysandeep05:09
*** akahat is now known as akahat|rover06:35
*** jpena|off is now known as jpena07:11
*** pojadhav- is now known as pojadhav08:13
*** ysandeep is now known as ysandeep|lunch08:31
*** ysandeep|lunch is now known as ysandeep09:15
*** hjensas is now known as hjensas|out-sick09:52
*** marios is now known as marios|food|biab09:54
*** marios|food|biab is now known as marios10:26
*** prometheanfire is now known as Guest92810:40
*** dviroel|afk is now known as dviroel11:22
*** ysandeep is now known as ysandeep|afk11:51
*** soniya29 is now known as soniya29|afk12:04
*** pojadhav- is now known as pojadhav12:18
opendevreviewJeremy Stanley proposed opendev/base-jobs master: No longer special-case CentOS Stream  https://review.opendev.org/c/opendev/base-jobs/+/83618112:47
fungiinfra-root: ^ semi-urgent fix, looks like the centos-8 wheel cache removal has broken things for centos-8-stream nodes12:48
*** artom_ is now known as artom13:17
*** ysandeep|afk is now known as ysandeep13:28
Clark[m]fungi: I'm not sure that change is correct. That is for the actual distro mirroring and CentOS-8 stream is adjacent to CentOS 8 iirc. Then when CentOS 9 stream they changed roots and our mirrors accommodated that and that is what those conditions reflect13:56
Clark[m]http://mirror.dfw.rax.opendev.org/centos/8-stream/ is one root and http://mirror.dfw.rax.opendev.org/centos-stream/9-stream/ the other13:57
Clark[m]fungi: I think https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/configure-mirrors/defaults/main.yaml#L12 sets the wheel mirror path14:08
Clark[m]Looks like the two CentOS files in https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/configure-mirrors/vars override the default value though14:12
Clark[m]I can take a closer look in about an hour14:13
fungioh, for some reason i thought the wheel mirror urls were under our control14:42
fungii'm back now and can work on an alternative patch14:42
fungithat's going to get weird though since it's not just configuring our system, so need to make sure i keep it backward-compatible for other users14:43
fungithe wheel_mirror override in vars/CentOS.yaml is identical to the one in vars/CentOS-9.yaml14:48
fungii wonder if this means we're actually putting the centos 8 wheels in the wrong place and we need to move it?14:49
*** ysandeep is now known as ysandeep|out14:53
fungioh, we don't even have a centos stream 9 wheel cache14:54
fungiso we can't reason about what is or isn't working for 914:54
fungishould we try to override zj's configure-mirrors vars for centos from where we include the role in our base job?14:56
Clark[m]Have we verified the Ansible vars we use lack -stream in them?14:57
Clark[m]I think we might be able to include that if we can check a var for the info14:57
fungithe jobs running on centos-8-stream nodes are looking for wheels in the url without -stream in it14:57
fungii'll see if centos-9-stream is trying similar14:58
Clark[m]Those bars are recorded by zuul in the host info file14:58
Clark[m]*vars14:58
fungiunfortunately i'll have to find an actual example because the logs provided in #openstack-infra were via paste not zuul build results14:59
fungiClark[m]: which host info file? zuul-info/inventory.yaml or something else?15:05
fungioh, i'm blind. zuul-info/host-info.centos-9-stream.yaml et cetera15:06
Clark[m]Ya those files have all the Ansible facts and that is what the role I linked uses to construct that var15:07
fungiconfirmed, ansible_distribution_major_version is just '8' or '9' with no -stream on the end15:07
fungihttps://zuul.opendev.org/t/openstack/build/3777ce57d6024b668499d433a3ebd93a/log/zuul-info/host-info.centos-8-stream.yaml#16215:07
fungihttps://zuul.opendev.org/t/openstack/build/a3674b7b2fe24f7895ed25851b5f9b8d/log/zuul-info/host-info.centos-9-stream.yaml#13015:07
fungiso we need to override wheel_mirror to "{{ http_or_https }}://{{ mirror_fqdn }}/wheel/{{ ansible_distribution | lower }}-{{ ansible_distribution_major_version }}-stream-{{ ansible_architecture | lower }}" for all centos versions15:08
*** dviroel is now known as dviroel|lunch15:09
Clark[m]Or modify the role to add -stream if that is detectable15:10
fungiin zuul-jobs?15:10
fungithe problem is that what wheel_mirror should be is determined by where you're publishing your wheels15:11
fungiand there's nowhere to "detect" that choice15:11
fungiso we can either change it for everyone who's using configure-mirror and possibly break their systems if they were already putting wheels where that role expects them, or we can override our use of the role to point to where we publish our wheels, or we can move our wheels to the location the role expects to find them15:12
Clark[m]Right but we might be able to detect if the distro is a stream release which is distinct to the old rhel clone releases15:12
Clark[m]8 != 8-stream I think it is fair to update zuul-jobs here if we can tell it is 8-stream and not 815:13
fungithe role expects centos stream 8 wheels to be in centos-8 and centos stream 9 wheels to be in centos-915:13
Clark[m]Yes but that is wrong because wheels for stream don't always work on not stream15:13
Clark[m]They need to be distinct to be correct15:13
fungianybody who's been using it for centos stream 9 (we haven't) may be publishing them to centos-9 as the url15:13
fungichanging it in the role in zuul-jobs could break any existing consumers who are publishing their wheels where the role currently expects to find them15:14
fungidistinguishing between 8 and stream 8 may be a thing, but there is and was no 9 separate from stream 915:15
Clark[m]Yes, that is a risk. And I agree 8 vs 9 is a different situation. I'd personally be inclined to say meh here. And tell the job to install libvirt-dev. It never worked for stream anyway except by random chance for a while15:16
fungialso since centos linux 8 (non-stream) is no longer a thing, it would make just as much sense to fix the wheel cache urls/paths we're publishing15:16
clarkbya we could just put a symlink in place maybe15:21
fungii don't think changing the centos default in zuul-jobs can be safely done without notifying potential users in advance at the very least, which leaves either reorganizing our mirrors (volume/mount rename, symlink, whatever), or overriding the var in our base job15:22
clarkbor accepting it never worked anyway and jobs need to adjust.15:22
clarkbBut since a symlink is simple enough I could see just doing that and maybe make a note of the weirdness somewhere15:23
fungiyeah, i don't see any reason to assume it never worked for other users of that role, it just didn't work for us because we put our mirror at a different place than the role expects to find it15:24
clarkbya I mean tell opendev users there is no centos wheel mirrors anymore15:24
clarkbwhich means installing libvirt-dev etc (and maybe they can pressur etheir colleagues to finally publish to pypi)15:25
fungioh, that15:25
fungiand yes, we have no centos stream 9 wheel cache anyway since it's never been added15:25
clarkbianw: that all makes a ton of sense. I'll try to look into addressing the python27 testing for glean15:30
fungiokay, after a bit of hunting, this is where we decide on what path to publish the centos wheels under: https://opendev.org/openstack/project-config/src/branch/master/playbooks/publish/wheel-mirror.yaml#L14-L2315:38
opendevreviewClark Boylan proposed opendev/glean master: Handle udev move events in addition to add events  https://review.opendev.org/c/opendev/glean/+/83610315:39
opendevreviewClark Boylan proposed opendev/glean master: Install older voluptuous on py27 due to import error  https://review.opendev.org/c/opendev/glean/+/83619415:39
clarkbfungi: I think our two options if we wish to keep the mirror are either to put a symlink in place or in our base jobs set the wheel_mirror var to include -stream on our centos machines15:41
*** marios is now known as marios|out15:42
fungior rename the two volumes to what the configure-mirrors role expects and change the publish playbook to match15:44
fungiwhich would basically just be a simple revert of https://review.opendev.org/80341115:45
clarkbwell no because centos 8 wheels were published to centos-8 and now we need centos-8-stream to be published there15:47
clarkbthat does give a hint at how we can override wheel_mirror though using ansible_lsb.id to check if we are on stream15:47
fungicentos 8 wheels were published to centos-8 but we deleted them15:49
fungibackwards compatibility there is unnecessary15:49
clarkbcorrect but if you just revert that change it will try to publish centos-8 wheels to centos-8?15:49
fungiyes, which is exactly where the configure-mirrors role expects to find them15:50
clarkbI think we have to keep the new jobs but change the slug definition. So not a proper revert15:50
clarkbrevert https://review.opendev.org/c/openstack/project-config/+/803411/4/playbooks/publish/wheel-mirror.yaml but not the rest of the change15:50
fungioh, right i meant revert 803411's change of the playbooks/publish/wheel-mirror.yaml file only15:50
fungii'll push that up for discussion, and if others agree i can work on the corresponding volume moves15:54
*** dviroel|lunch is now known as dviroel16:00
opendevreviewJeremy Stanley proposed openstack/project-config master: Match configure-mirrors for CentOS wheel URLs  https://review.opendev.org/c/openstack/project-config/+/83620016:14
fungiclarkb: ^ something like that16:15
clarkbya I think that will do it if we also update afs16:15
clarkbmight be good to double check with ianw since he had pushed on it but that would wait for ~sunday our time16:16
fungidpawlik: that region has no instance flavors with 8 vcpus and only 4gb ram, so i'm resizing to v2-highcpu-8 which has 8 vcpus with 8 gb ram instead16:43
fungiare you ready for me to initiate the resize?16:43
*** rlandy is now known as rlandy|biab16:55
dpawlikfungi: hey, gimme 2 min17:02
fungisure thing, there's no hurry17:03
fungijust want to make sure you're not in the middle of something when it reboots into the new flavor17:03
dpawlikfungi: exactly, I need to stop logscraper 17:03
dpawlikfew logs left17:04
dpawlikwe can go17:04
*** jpena is now known as jpena|off17:04
dpawlikfungi: ping me after, I will check how it works17:09
fungiresizing now17:20
fungisorry for the delay, stepped away for a few17:20
fungidpawlik: console log says it's booted. once you give me the thumbs-up i'll mark it confirmed17:21
dpawlikfungi: works. Thank you17:26
fungiawesome, the latest resize is marked as confirmed now17:26
dpawlikhope it will be last time17:29
dpawlikif not, need to think for some other language for parsing time and message17:30
*** rlandy|biab is now known as rlandy17:45
fungiwell, we had 20 servers running 4 concurrent preprocessors each for the old solution17:46
dpawlikfungi: there are few data nodes and master nodes  on Opensearch side, but just one host that is getting the logs, parse and send...18:44
dpawliktoday I see that there is a lot of devstack jobs, that require more time to parse one build than usually...18:45
dpawlikHave a good weekend18:47
fungithanks, you too!18:58
fungiand yes, now that the release is done, development activity on openstack is picking back up. after next week (ptg) i expect it to accelerate further18:58
clarkbhttps://review.opendev.org/c/opendev/glean/+/836103/2 the glean stack to try and address the centos 9 udev stuff has +1's from zuul now. I believe we consume that from releaess? Maybe if we can land those changes today then early next week we can tag glean 1.20.1 ?19:31
fungiyeah, i think it needs a release before dib will pick it up, but anyway i approved the change just now19:35
clarkbfungi: the parent will need revie wtoo19:36
fungioh19:36
clarkbI suspect the parent is safe to land with single reviewer approval since it is just a dep cap19:36
fungisingle-core approved it just now, yep19:36
clarkbgreat, hopefully that addresses the node failure problems by giving us more cloud sto boot in. fwiw I think the fix in bhs1 no valid hosts and adding gra1 back again helped a lot too19:37
*** dviroel is now known as dviroel|brb20:05
opendevreviewMerged opendev/glean master: Install older voluptuous on py27 due to import error  https://review.opendev.org/c/opendev/glean/+/83619420:53
opendevreviewMerged opendev/glean master: Handle udev move events in addition to add events  https://review.opendev.org/c/opendev/glean/+/83610320:53
*** dviroel|brb is now known as dviroel21:16
fungi#status log Restarted Zuul executors for kernel and docker updates, now running on Zuul 5.2.2.dev2 (08348143)22:38
opendevstatusfungi: finished logging22:38

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!