Wednesday, 2022-08-31

*** dasm|off is now known as Guest164504:00
*** jpena|off is now known as jpena07:33
*** dviroel|out is now known as dviroel11:23
*** Guest1645 is now known as dasm13:50
slaweqhi infra team, I have a question about POST_FAILURES in jobs14:41
slaweqwe noticed in some neutron jobs, like functional or fullstack at least few times a week we have POST_FAILURE results on those jobs14:41
slaweqbut when I check into job-output.txt file, I don't really see any errors14:42
slaweqlike e.g. https://c228ca193be60f87086b-704d2e5cde896695f5c12544f01f1d12.ssl.cf5.rackcdn.com/840420/30/check/neutron-functional-with-uwsgi/3638734/job-output.txt14:42
slaweqso I'm not sure what is causing that POST_FAILURE there14:42
slaweqall of such failures which I checked are in the OVH GRA114:42
slaweqdo You know maybe what was the error there?14:43
slaweqand do You know about any issue like that already?14:43
fricklerslaweq: that looks like a failure during log upload, although most of the logs seem to be in place15:02
clarkbslaweq: I find it helps to start from the zuul build page rather than the logs. Much more additional info that way. https://zuul.opendev.org/t/openstack/build/363873433e4d4f1ab66f6b2e97fb9429 for your example15:02
fricklerthis directory seems to be very large, not sure if that's always the case https://c228ca193be60f87086b-704d2e5cde896695f5c12544f01f1d12.ssl.cf5.rackcdn.com/840420/30/check/neutron-functional-with-uwsgi/3638734/controller/logs/dsvm-functional-logs/index.html15:03
clarkbfrickler: slaweq: note that large numbers of files also seem to slow down swift uploads15:03
fricklernot necessarily related to the failure, but maybe worth improving, like just uploading a tgz from it?15:03
clarkbthat dir could be problematic for multiple reasons15:04
clarkbanyway I like the zuul build page because you can check https://zuul.opendev.org/t/openstack/build/363873433e4d4f1ab66f6b2e97fb9429/console but I agree with frickler this must be an issue with the log upload itself which doesn't shwo up there (because it is a chicken and egg with the uploads)15:05
clarkbwe can check ze03's executor log though15:05
clarkbslaweq: [build: 363873433e4d4f1ab66f6b2e97fb9429] Ansible complete, result RESULT_TIMED_OUT code None15:08
clarkbThat was for the playbook that fetches the devstack logs15:08
clarkbthis is the step that copies from the test node to the executor. Not the step that copies from executor to swift15:09
clarkbhttps://zuul.opendev.org/t/openstack/build/363873433e4d4f1ab66f6b2e97fb9429/log/job-output.txt#18905 the logs actually capture that. I'm surprised the console log doesn't show that but I guess since the timeout operates above ansible the ansible console log has a hard time showing it15:12
clarkbhttps://zuul.opendev.org/t/openstack/build/363873433e4d4f1ab66f6b2e97fb9429/log/job-output.txt#15596 that task seems to be consuming a significant amount of time15:13
clarkbhttps://opendev.org/zuul/zuul-jobs/src/branch/master/roles/stage-output/tasks/main.yaml#L103 that is doing a mv of filenames that should be on the same fs based on the path (it is only changing the file suffix not the location dir)15:14
clarkbBut it takes about 3 seconds per file15:15
clarkbI wonder if the slowness is ansible or the host15:15
clarkbif you look at that log the task that check sudo https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/stage-output/tasks/main.yaml#L13 also takes about 3 seconds15:21
clarkbI think either ansible is being slow or the host is being slow. But it doesn't seem specifically related to perform disk writes15:21
clarkbLooking at the dstat log for that job I don't see anything that stands out to me (cpu utilization seems fine, plenty of memory, not swapping a ton, etc)15:23
*** dviroel is now known as dviroel|lunch15:25
clarkbComparing to another job I've been looking at logs for https://zuul.opendev.org/t/openstack/build/c9b77addea87426e995d9a0ba0b1784f/log/job-output.txt#21348 check sudo takes almost a second and a half there. Not fast either, but twice as fast as this example15:26
*** dviroel|lunch is now known as dviroel16:25
slaweqclarkb: frickler thx. I will try to upload those logs in tar.gz archive. I hope it will help16:26
fungior if there are a lot of files being logged unnecessarily, finding ways to not copy them could help16:37
*** jpena is now known as jpena|off16:43
*** dviroel is now known as dviroel|afk19:47
*** dasm is now known as dasm|off21:40
*** rlandy is now known as rlandy|bbl22:25
*** dviroel|afk is now known as dviroel23:12

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!