Thursday, 2022-01-06

*** rlandy|ruck is now known as rlandy|out00:58
*** pojadhav- is now known as pojadhav05:02
*** ysandeep|out is now known as ysandeep05:45
*** cloudnull5 is now known as cloudnull06:09
*** ysandeep is now known as ysandeep|afk07:30
*** ysandeep|afk is now known as ysandeep07:44
*** akahat|out is now known as akahat|ruck08:27
*** ysandeep is now known as ysandeep|lunch09:07
*** ysandeep|lunch is now known as ysandeep09:28
opendevreviewMerged openstack/project-config master: Revert "Disable nodepool temporarily"  https://review.opendev.org/c/openstack/project-config/+/82335709:52
*** rlandy_ is now known as rlandy|ruck11:06
*** dviroel|rover|afk is now known as dviroel|rover11:16
*** pojadhav is now known as pojadhav|afk11:36
*** pojadhav|afk is now known as pojadhav13:05
*** ysandeep is now known as ysandeep|dinner13:05
*** ysandeep|dinner is now known as ysandeep14:54
*** arxcruz is now known as arxcruz|ruck15:21
*** akahat|ruck is now known as akahat15:22
*** dviroel|rover is now known as dviroel15:26
*** ysandeep is now known as ysandeep|out16:46
*** rlandy|ruck is now known as rlandy|ruck|lunch16:58
*** rlandy|ruck|lunch is now known as rlandy|ruck17:27
clarkbI note that we've got a few hosts that have LE certs expiring in 29 days or less. Not likely to be able to dig into that today or this week, but calling it out as noticed and should be able to inspect next week17:37
fungii must have missed the notifications about those17:39
fungii saw some for openstackid, but hopefully we can move translate.o.o to the new openinfra idp before that becomes critical17:40
fungiinfra-root: how do folks feel about putting copies of the release sdists/wheels for tools like bindep and git-review on tarballs.opendev.org?17:54
fungii feel like that was our intent all along, but we never got around to implementing it after moving the projects into the new namespace17:55
fungii'm happy to work on solving that in the near future unless there were obvious reasons we avoided that and i've merely forgotten them17:55
clarkbno objections from me17:58
fungiit also came to my attention that when we moved tarballs subdirs for other projects to the new namespace, we missed moving git-review (possibly others, i haven't looked closely, we did move bindep though)18:00
fungithat's probably on me, planning to fix that as well18:00
*** dviroel is now known as dviroel|afk18:29
clarkbjpic: hrw has a similar problem in reverse. I'm under a mountain of paperwork the next few days but can hopefully look at it next week.18:54
clarkbjpic: I think if you can login with your old ubuntu one openid and use the old account that would work without any intervention from us. However, if that isn't possible we can retire the old account and remove the conflicting attributes allowing a new account to be created18:54
fungii am in the same boat, but happy to help once no longer swamped18:54
clarkbjpic: Another option is for you to create a new account with a different email today, then in the future (early next week?) we can retire the old account and you can add the prefered email to the new account18:55
*** artom__ is now known as artom18:55
*** rlandy is now known as rlandy|ruck20:29
*** rlandy|ruck is now known as rlandy|ruck|biab21:10
timburke__i've been seeing a fair number of timeouts lately; seems like it's usually around https://github.com/openstack/swift/blob/master/test/unit/common/test_utils.py#L5565-L5615 -- how easy would it be for me to run something like https://github.com/tipabu/python-stack-xray/blob/master/python-stack-xray while the test is hung, so i can mock out whatever part of the socket library seems to be blocking?21:18
timburke__fwiw, https://zuul.openstack.org/stream/64cb0246c8554f05a8559db731b78563?logfile=console.log seems to be currently in that problematic state21:18
clarkbtimburke__: give me a few minutes and I can take a look at it21:19
timburke__thanks!21:19
*** timburke__ is now known as timburke21:19
fungiit the hang reproducible? i'm guessing it's unpredictable based on your description21:19
fungione tactic we've used is to propose a dnm change which adds a bunch of copies of the problem job (removing all the other jobs), and instrument the data collection tool in that change or a parent change21:21
fungithat way you don't spend all day rechecking in hopes of tripping the right conditions21:21
timburkeunpredictable -- eventually, a recheck will get me going, but it's a pain on both sides to keep doing that21:21
clarkbtimburke: it looks like that tool expects to be run from outside of the process21:21
clarkbwhich makes this more complicated since you really only get the one process via the ansible exec21:22
clarkbwhat we can do is hold the node and you can get on it though21:22
clarkblet me work on doing that21:22
fungidoes it hang ~forever? if so then, yeah, an autohold would probably get you what you need21:23
timburkeclarkb, yes -- it's something i've used before when debugging customer issues. ssh onto a box, copy in the script, find the process you want to trace, and run against it. if anybody has better ideas, i'm all ears :-)21:23
fungijob fails with a timeout, zuul holds the node, you log in and inspect the hung process21:23
timburkefungi, yes, afaict -- the job either completes in <20min, or timeouts out at 90min21:24
fungiif it just runs too long and tanks the job but vanishes before you can log in to take a look then that gets a lot harder21:24
clarkbtimburke: do you have your ssh pubkey? I've put a hold in and if I get that now you can ssh in now and run your tool21:24
clarkbyou may want to try and get in now while it is running in case the job cancellation kills the running process. If that happens you might have to rerun tox and try to reproduce on the host21:25
timburkehttps://paste.opendev.org/raw/811956/21:25
clarkbbut if we move quickly maybe you can just get what you need right now :)21:25
clarkbtimburke: ssh root@198.72.124.6421:25
fungiworst case do the many-jobs dnm change and bump the timeout to the maximum, then find one which exhibits the issue and that may buy longer to troubleshoot21:25
clarkbthe node is held too so when the job ends yuo can keep poking at it. Let us know when you are done and we'll delete it21:26
clarkb(done might mean "this host is no longer useful for debugging but we haven't figured it out yet" too)21:26
timburkegot it! thanks. seems like it may actually be something down in logging :-/ but i've got enough to go digging some more21:31
timburke(https://paste.opendev.org/show/811957/ for the curious)21:31
fungithat's a very neat tool, didn't know about it before now21:34
timburketorgomatic (who doesn't work on swift anymore, unfortunately for us) wrote it a while ago and i've found it *incredibly* useful21:34
fungiwe've implemented https://pypi.org/project/yappi in zuul and nodepool to similar ends, and it's been indispensible21:35
clarkbtimburke: zuul has similar tooling built into it. It accepts sigusr2 to toggle on a profiler and print stack traces if you've got the right stuff installed.21:35
clarkbthen you let it run in slow mode with the profiler on for a few minutes, hit it with sigusr2 again and you get the stack traces and profiling data21:36
fungiyeah, the signal handler does thread dumps, and we've got it set to also toggle yappi's profiler21:36
timburkei'm sure other people have written similar things. his trick to writing both bash and python in the same file is pretty clever -- if it were for anything more substantial than a debugging tool like this, it'd never pass review ;-)21:36
fungithat's definitely an amusing hack for abusing string quoting semantics in the two languages21:38
*** rlandy|ruck|biab is now known as rlandy|ruck22:22

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!