Tuesday, 2023-06-06

clarkbfungi: huh apparently https://github.com/python/cpython/blob/main/Modules/_io/_iomodule.c#L197 is what gdb says I'm in which is a C implementation of that python function00:12
clarkbfungi: I wonder are they code gening from python to C?00:12
clarkbhttps://github.com/python/cpython/blob/v3.10.11/Modules/_io/bufferedio.c#L753 is what ultimately calls the seek since tell depends on seek00:16
clarkbsidenote https://debuginfod.opensuse.org/ is super duper cool and making my life easier00:18
clarkbit writes it to your local user cache dir as it needs contents00:18
clarkbwhere I've ended up is that tell call has been there for 14 years (and maybe longer I think the function got renamed at that point but was there prior too)00:20
clarkbso the issue really is in openafs I guess not handling a seek on that file gracefully00:20
clarkband now we know who to file a bug report with00:20
clarkbbut I need to eat dinner so maybe tomorrow I'll figure out how to do that00:20
fungiclarkb: there are parallel c and native python implementations of a lot of the stdlib, where the latter tend to be used for testing and extreme portability situations02:03
fungiduring cpython compilation it will pick which one to include02:03
*** amoralej is now known as amoralej|lunch12:10
*** amoralej|lunch is now known as amoralej13:11
clarkbfungi: I think what confused me is that the python version is a 1:1 copy except for this lseek14:37
clarkbthe seek only exists in the C version14:37
clarkbopenafs' rt server appears to be non responsive. But they have a libera irc channel? Maybe I'll start there14:48
fungican also try reaching out to auristor or rra maybe14:50
fungior i think there's an ml14:50
clarkbfungi: ya the bug reporting mechanism is via email to rt so I can just send it there and hope the instance is processing email and only http is sad14:50
clarkbbut I Figure if they have an irc channel I can ask if I found the right links in the first place14:51
clarkb(I wanted to check if this was a known issue before reporting it)14:51
*** amoralej is now known as amoralej|off15:37
fricklermeh, seems changing topics for merged patches isn't allowed? my plan was to update this stack once I'm done with all of them, but now I'm getting an error from gerrit16:34
fricklerRamereth: https://review.opendev.org/885336 doesn't look much better yet, you updated it when I was about to merge it. do you still plan to fix those CI issues for rocky? I'm only interested in getting rid of the zuul config errors16:39
Ramerethfrickler: sorry for pushing that right at the same time. I was trying to see if I could fix it, but I assumed it would just error out17:24
RamerethI'm not going to worry about it and will just approve both of those17:27
fricklerRamereth: thx, merged17:58
clarkbI apparently failed to send a meeting agenda yesterday after I got nerd sniped with python, openafs, and lseek()18:44
clarkbI'm going to send one nowish. Sorry about that18:44
fungithanks!18:44
clarkband my browser is broken18:45
corvusIt's a sign 18:55
clarkbkill took care of it, but it was weird changing workspaces to it it refused to render18:56
fungitook it out behind the toolshed and gave it a talking-to?18:56
ianwok here's my confusion19:31
ianwthe change was proposed on Mar 2, 2022 and merged in January 202319:32
ianwso it hasn't made it into any release!19:32
ianwbut it looks old19:32
clarkbaha19:32
fungiyeah, i saw a ton of that in the git history for openafs19:32
clarkbif they havne't released it yet I'm guessing it isn't a high priority for making things work generally19:33
clarkband it is more of a corner case19:33
fungiseemed like they had changes which took years to merge in some cases19:33
fungipossibly related to the loose interface between openafs and the mainline kernel codebase too19:33
ianwalthough the corner case is "python opens a file on afs" isn't it?19:38
clarkbianw: no only the ioctl I think19:39
clarkbwhich is a special file/device, not a general file19:39
clarkbbut we can test that19:39
clarkbianw: `python3 -c 'open("/proc/fs/openafs/afs_ioctl", mode="rb", buffering=4096)'` this crashes. `python3 -c 'open("/afs/openstack.org/project/opendev.org/docs/opendev/gear/latest/index.html", mode="rb", buffering=4096)'` does not19:55
clarkbusing the .openstack.org path does not crash either19:55
clarkbso ya its related to that specific device because it isn't a seekable device. I expect that they kept seekable things working like regular files but missed this one19:56
ianw++ i can file a bug for an ubuntu backport.  and if we have an urgent need we know what to do now :)19:57
ianw^ https://bugs.launchpad.net/ubuntu/+source/openafs/+bug/202310720:06
fungi$ host 2604:e100:1:0:f816:3eff:feff:bd1c20:30
fungic.1.d.b.f.f.e.f.f.f.e.3.6.1.8.f.0.0.0.0.1.0.0.0.0.0.1.e.4.0.6.2.ip6.arpa domain name pointer ns04.opendev.org.20:30
fungiyay!20:30
fungithanks guilhermesp_____ !20:30
guilhermesp_____nice! 20:43
clarkbcorvus: on nl02 (and I suspect the other launchers) we appear to create lock nodes and delete lock nodes frequently against the same instances. This creates a lot of cache watcher log spam. I think this may be due to trying to grab locks to see if things are locked? Maybe we should look at reducing the log level of thoese events? (though they are already debug)21:54
corvusclarkb: a few thoughts on that22:01
corvusclarkb: 1) those entries are specifically in a logger dedicated to verbose cache log entries, so if we or anyone else ever wants to silence them, it's super easy.  i would be okay with a change to make that the default too.  but i don't think i'd want to remove them.22:02
corvusclarkb: 2) i'm not 100% confident in that code yet and i anticipate a non-zero probability of needing to use those in opendev in the near future, so i don't think we should do that on opendev now22:03
corvusclarkb: 3) those specific entries may represent opportunities to further optimize -- a lot of the cache work is actually reducing lock attempts22:04
clarkbcorvus: ah ok. Ya it seems like we get an ever incrementing counter in the lock path for a lock on the same set of instances. DOing an instance list shows the instance is locked so I think it must be something trying to get the lock and failing?22:04
corvusclarkb: got a good number and host i should look at?22:06
clarkbcorvus: nl02 node 0034237848 was the one I spot checked to make sure that behavior wasn't a problem22:07
clarkb(I'm fairly certain it is fine just verbose)22:07
corvusunderstood; just figured since this is an ongoing area of work looking at the behavior you're seeing (with specific nodes) would be good22:07
clarkback22:08
corvus(just a casual reference to node number 34 million)22:08
clarkbside note looks like infra-prod-service-zuul has been failing. Is that due to the executor situation?22:17
corvusyes but that's not expected22:19
corvusit looks like it's talking to the new hosts, and also it's not treating them as jammy nodes22:21
clarkbdid we recache the old facts at some point?22:22
corvusoh wait, no it is talking to the correct (old) hosts22:23
corvusand it has focal cached...22:23
corvusbut for some reason it's trying to install the xenial openafs package22:23
clarkbcorvus: fungi noticed that we had two apt sources.lists files for openafs on the new nodes22:24
clarkbmaybe it is related to that?22:24
corvusoooh22:24
corvusyep22:25
corvuson the old hosts too, we now have 2 files; one is focal and one is xenial22:25
corvusopenafs.list is wrong, and i don't find that in a grep in system-config22:26
clarkbI think fungi did debug it a bit but not sure where it ended up22:26
corvusi think the openafs file is from the zuul-executor role22:28
corvuswe must have run the role on the old hosts with cached facts from the new hosts during a time where we didn't have the focal config in place22:29
corvusi'm sort of inclined to delete both ppa files on every ze host, delete the cache from every ze host, and see if it recovers22:30
corvusclarkb: sound good ^?22:30
clarkbcorvus: delete the cache on bridge you mean for the hosts? I think that works22:31
corvusyep22:31
clarkbI think the only risk there is if we somehow install openafs from the distro because we don't have the ppa config in place22:32
clarkbbut that doesn't seem likely?22:32
corvusi hope the ppa is installed before package installation; i suspect we're failing at an earlier apt-get update22:33
clarkbah22:33
corvusthat's done, so let's see what the next run does22:34
corvusthose locks seem to happen every 5 seconds... i'm not sure what's doing that22:35
corvusdelete or stats22:37
corvusclarkb: i suspect you may have found a bug; i'm pretty sure that's the deleted node worker, and it's supposed to know that that node is locked and not try it.  i think i have enough clues to track it down.22:44
clarkbcool22:46
clarkbanyone have a quick moment to check that meetpad is happy via https://meetpad.opendev.org/isitbroken ?22:48
clarkbI just rebooted the two associated servers. It seems up but without audio/video between multiple users hard to say for sure.22:49
clarkbI guess I can test via my phone too22:49
clarkbusing my phone as a second device worked well and all seems happy22:50
tonybI get the expected welcome screen22:52
clarkbyup I think it is happy22:52
fungiyes, sorry, i forgot to put the extra lists file back to its original name on ze01 last night, i got spacey22:56
fungiit's apparently low-impact because we don't need that ppa on jammy anyway, but want to look into why those servers have two roles that apply duplicate entries for that ppa (one for afs-client and one for zuul-executor i think it was?)22:58
fungiseems like maybe one should imply the other or something, and then we can remove the duplication22:59
clarkbfungi: the old focal servers have it too and it has broken the infra-prod-service-zuul job22:59
fungilooking at my comments from yesterday, it was because ansible is applying both the openafs-server-config and zuul-executor roles, and they add the /etc/apt/sources.list.d/openafs.list and /etc/apt/sources.list.d/ppa_openstack_ci_core_openafs_jammy.list files with identical ppa entries23:07
fungimaybe if they both agreed on the filename it would be a non-problem? that could be a simple solution, though maybe somewhat dirty23:08
clarkbfungi: or just drop the special zuul-executor content?23:09
clarkbsince we have a general role for it already23:09
fungisure, i can push that up real quick if nobody else is already writing it23:09
Clark[m]I'm not. Transitioning to figuring out dinner23:12
fungion it23:15
opendevreviewJeremy Stanley proposed opendev/system-config master: Stop adding duplicate OpenAFS PPA on executors  https://review.opendev.org/c/opendev/system-config/+/88541923:20
funginot sure if we actually need to add cleanup code to remove that... opinions?23:21
Clark[m]I didn't approve it since we should make sure corvus is ok with it since he debugged it. But I don't think we need to encode cleanup since its only 12 nodes we can pretty easily rm a file off of manually or with an ansible command23:33
corvusClark: fungi it seems wrong to have openafs-server-config on there... that looks like it has stuff for ... like afs file/pts/db servers?23:35
corvushrm, also /etc/openafs/server is not currently present on the executors... is that the right role?23:36
opendevreviewIan Wienand proposed opendev/system-config master: install-docker: replace deprecated include: calls  https://review.opendev.org/c/opendev/system-config/+/88542023:39
fungithe other possibility is that it's coming from roles/openafs-client/tasks/openafs-client/Debian.yaml23:43
fungimaybe that's something the launch script uses?23:44
Clark[m]https://opendev.org/opendev/system-config/src/branch/master/playbooks/service-zuul.yaml#L27 is where it comes from which is openafs-client23:44
fungiyeah, okay, so it's that it includes the openafs-client role. are we good with the change so long as i adjust the commit message?23:45
opendevreviewJeremy Stanley proposed opendev/system-config master: Stop adding duplicate OpenAFS PPA on executors  https://review.opendev.org/c/opendev/system-config/+/88541923:46
opendevreviewNeil Hanlon proposed openstack/diskimage-builder master: Add support to build 64k-page-table images for Rocky 9  https://review.opendev.org/c/openstack/diskimage-builder/+/88445223:46
fungiedited the commit message23:46
corvusfungi: yes, thanks, that was very confusing; i agree it does look like it's openafs-client and i think this fix makes sense.23:46
fungifor some reason i thought we weren't pulling in things from the top-level roles directory any longer, but i guess we mix that and the playbooks/roles directory23:47
corvusand that role doesn't need the special per-release files (focal jammy) that are in the executor role?23:48
corvusi'm not sure why that role did something different23:50
Clark[m]The only weird thing is the xenial hwe override/difference. But we haven't don't xenial in forever so I think we can leave that behind23:52
fungiif i had the desire to go excavating git history we'd probably find that at one time we included more variation between ubuntu versions at one time23:52
corvusk23:54
fungisince they were identical anyway, i didn't question it23:56

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!