Thursday, 2022-09-22

corvusfungi: are you around to approve https://review.opendev.org/858812 and https://review.opendev.org/858813 ?00:03
fungisure, just a sec00:06
fungicorvus: both lgtm, i00:08
fungi've approved them00:08
opendevreviewMerged opendev/zone-opendev.org master: Add tracing server to DNS  https://review.opendev.org/c/opendev/zone-opendev.org/+/85881200:09
opendevreviewMerged opendev/system-config master: Add tracing server to inventory  https://review.opendev.org/c/opendev/system-config/+/85881300:22
corvusthanks!00:23
*** ysandeep|out is now known as ysandeep01:31
*** rlandy|bbl is now known as rlandy|out01:59
*** ysandeep is now known as ysandeep|afk04:31
*** dasm is now known as dasm|off04:51
*** jpena|off is now known as jpena07:19
opendevreviewRafal Lewandowski proposed openstack/diskimage-builder master: Added cloud-init growpart element  https://review.opendev.org/c/openstack/diskimage-builder/+/85585607:28
opendevreviewRafal Lewandowski proposed openstack/diskimage-builder master: Added cloud-init growpart element  https://review.opendev.org/c/openstack/diskimage-builder/+/85585607:52
dpawlikis it all fine with centos 9 stream image: https://nb01.opendev.org/images/ ?08:02
*** ysandeep|afk is now known as ysandeep08:15
*** akahat|ruck is now known as akahat|ruck|lunch08:39
*** ysandeep is now known as ysandeep|lunch08:52
*** akahat|ruck|lunch is now known as akahat|ruck09:31
*** ysandeep|lunch is now known as ysandeep10:01
*** rlandy|out is now known as rlandy|rover10:29
hrwmorning10:38
hrwcan someone tell me which repo keeps https://zuul.openstack.org/status code? I would like to move 'check-arm64' to be right after 'check'10:38
hrw(now I do that with tampermonkey script)10:38
fricklerhrw: that should be part of zuul code, opendev.org/zuul/zuul. but IIUC the ordering is dynamic anyway trying to make efficient use of the three columns10:47
*** ysandeep is now known as ysandeep|afk10:48
fricklerhttps://opendev.org/zuul/zuul/src/branch/master/web/src/pages/Status.jsx likely10:50
hrwthanks11:01
*** ysandeep|afk is now known as ysandeep11:01
*** frenzyfriday is now known as frenzyfriday|food11:11
*** ysandeep is now known as ysandeep|afk11:36
*** frenzyfriday|food is now known as frenzyfriday12:00
opendevreviewJeremy Stanley proposed opendev/system-config master: Re-expose our Mailman archives.yaml and robots.txt  https://review.opendev.org/c/opendev/system-config/+/85891312:11
opendevreviewJeremy Stanley proposed opendev/system-config master: Re-expose our Mailman archives.yaml and robots.txt  https://review.opendev.org/c/opendev/system-config/+/85891312:22
*** ysandeep|afk is now known as ysandeep12:35
*** Tengu is now known as Guest114512:51
*** Tengu_ is now known as Tengu12:51
Clark[m]dpawlik as far as I know everything is fine. If you are wondering why no images are listed for stream 9 on nb01 that is because nb02 has built both of the images for that distro. If you change the URL to nb02 you will see them there.13:05
dpawlikClark[m]: hey, thanks. That's what I was looking for13:31
*** ysandeep is now known as ysandeep|out13:37
*** dasm|off is now known as dasm14:17
clarkbnoonedeadpunk: /usr is stil mount read only in the zuul test envs and the zuul ansible venvs live in /usr15:14
clarkbanyway I don't think the openstack collection or sdk or ansible version conflicts are a thing there. Some of those things are installed together but if you need a specific version you're using nested ansible15:15
noonedeadpunkwell yes, I do agree that in jobs we have it's unlikely that some reall tests are run with native ansible15:24
noonedeadpunkIt's indeed more used to launch tox15:24
clarkbright, I just want ot be clear that I don't think any of those concerns apply15:24
clarkbthis is purely for executing the job and it isn't a broad enough use case to worry about those extra details15:25
clarkbI'm trying to eplain this to tripleo right now in the change I pushed them too so that it can land before zuul is upgraded15:25
noonedeadpunkWell, I personally find native ansible quite good to perform some simple jobs/checks without asking for a node and wasting extra resources. But I do agree that in current usecase it's likely not applicable indeed15:27
noonedeadpunkThough I don't have good overview of all jobs zoo we might have15:28
clarkbright, I'm asking that we stop complicating the situation so that we can address the immediate problem15:28
clarkbif you want to optimize jobs that is an entirely separate discussion15:28
noonedeadpunk+115:28
clarkb(and one worth having, I've been optmizing some job tasks recently and have managed to save minutes in each devstack job and more in every multinode job)15:29
*** marios is now known as marios|out15:29
corvusa jaeger has appeared overnight: https://tracing.opendev.org/15:42
clarkbcorvus: we'll need to configure zuul to point at that server once tracing support in zuul lands right?15:44
corvusclarkb: yep... we might be able to configure zuul for that before tracing lands actually...15:45
clarkbnoonedeadpunk: side note, part of the optimization I've done is to stop doing work in ansible native tasks as they are horribly slow, particularly in large loops15:45
corvus(since zuul ignores unknown config sections)15:45
noonedeadpunkclarkb: well, I think the biggest loop is preparing related projects? At least that's true for OSA15:47
clarkbnoonedeadpunk: one of the biggest issues was normalizing devstack log files. I fixed that via an ansible module. Other cases are multinode ssh host key and /etc/hosts setup.15:48
clarkbI'm sure there are more we can find if we go looking.15:48
fungipreparing related projects could certainly fall into that category if a job has tons of them (which i expect is often true for deployment projects like osa)15:51
noonedeadpunkclarkb: for us it takes like 12mins per job https://zuul.opendev.org/t/openstack/build/602ef6a3218f474b846f9a57d97efa8d/log/job-output.txt#207-548215:51
fungii guess the question is whether that's ansible overhead or git overhead. we can probably do more about the former than the latter15:51
noonedeadpunkWe have an ansible module for parallel git clone which is quite fast, but will need to be adopted for this specific task15:52
clarkbif each task is taking between 1-3 seconds that seems to be the minimum for ansible tasks15:52
noonedeadpunkhttps://opendev.org/openstack/openstack-ansible-plugins/src/branch/master/plugins/modules/git_requirements15:52
clarkbif they are taking longer than that then we aren't hitting the ansible lower bound on task runtime15:52
clarkblooks like the initial find loop is probably hitting the limit, and then some of the repo setup tasks are too (that probably depends on the size of the repo)15:54
noonedeadpunkwell, using module isntead should save quite some time?15:54
noonedeadpunkAs you don't loop ansible tasks, but to this in threads in module15:54
clarkbyes, rewriting things into modules saves time because you can iterate your loops within a single ansible task15:54
clarkbeven if you don't parallelize things just removing the ansible task startup overhaed can be a huge win15:55
clarkb(assuming again that your tasks are limited by that and not something else, these tasks seem to be a mix)15:55
noonedeadpunkJust on git clone from remote using this module vs using anible's native git give like 7-10 times boost in time15:55
clarkbunfortunately this is a core bit of zuul-jobs that most jobs in the wild will be using so modifying it is a lot more difficult than say updating devstack log file normalization15:56
clarkbbut still doable if people are interested15:56
clarkb(I'm impressed that our local caches seem to work well enough that we hit the task startup bounds here)15:58
fungiit's not necessarily the modifying it that's difficult, but making sure it has solid test coverage against regressions in behavior and coordinating/announcing the replacement15:58
fungimore that it's a sensitive piece to modify15:58
fungiimplying a lot of potential risk for much of zuul's user base15:59
fungiso needs a lot of care15:59
clarkbyup and we can probably do it bit by bit in that role too15:59
clarkbthere are a number of steps that run loops and we don't have to replace everything all at once to see improvement16:00
*** jpena is now known as jpena|off16:13
clarkboh another side note. Ansible 5 broke pipelining when configured for specific connection types. In theory this means that when we move to ansible 6 we'll get a speedup in jobs16:29
clarkbhttps://review.opendev.org/c/openstack/devstack/+/858436 is my ansible 6 devstack test change let me see if the data there shows any improvement16:30
clarkb4.5 minutes faster between https://zuul.opendev.org/t/openstack/build/e6b8dded3c7f4ebf98f871acba2a3bab/log/job-output.txt and https://zuul.opendev.org/t/openstack/build/5b72dcf1c6ac4064945fe43389940f0a/log/job-output.txt on the same cloud provider in the tempest-full-py3 job. Only a single comparison but at least we haven't disproven the idea yet16:32
clarkbhttps://zuul.opendev.org/t/openstack/build/35e5242ca27d4593b63e9ceebd276d63/log/job-output.txt vs https://zuul.opendev.org/t/openstack/build/d28f3d215537449d913fcde67f6d133c/log/job-output.txt shows us slower under ansible 6 for that job. I think if this helps the improvement is small enough to be subject to the variance within a single cloud provider16:36
johnsomHi everyone, it appears the promote pipeline failed on a releasenotes job in the upload-afs-roots task. Is there a way to re-run this without making a silly patch?  https://review.opendev.org/c/openstack/designate/+/85802216:42
johnsomOur Zed release notes page is 404 and causing the links to not be in the releases patch.16:42
clarkbjsut as a quick sanity check https://grafana.opendev.org/d/9871b26303/afs?orgId=1 shows we've got disk space and quota availale for both the afs disk and volumes16:47
clarkbhttps://zuul.opendev.org/t/openstack/build/a3ce29b5c9294c3e98f368cdd9d3e4ec/log/job-output.txt#152 is where the job failed16:47
clarkbjohnsom: we can reenqueue the buildset which will also repromote the docs. Is that an issue?16:48
johnsomShould not be a problem16:48
opendevreviewJeremy Stanley proposed opendev/system-config master: Re-expose our Mailman archives.yaml and robots.txt  https://review.opendev.org/c/opendev/system-config/+/85891316:49
fungihopefully that's correct now. yay for tests!16:50
clarkbfungi: do we not need a directory with require all granted for those?16:52
clarkbor maybe a file?16:52
clarkbyour tests seem to indicate this is not necessary which surprises me. Maybe alias implies that?16:53
fungiapparently not (i tried it briefly on lists.o.o), but the tests should confirm16:53
fungiand yes, i too was surprised16:54
clarkbnoonedeadpunk: looking at that workspace setup role replacing the first loop should be straightforward with a module. I thought maybe the find module could do it instead but I think it will be too greedy and won't be accurate enough. But a simple module that checks if the path exists and returns true or false for that is an easy improvement to shave off some time16:55
clarkbthen the next bit can be tackled and so on16:55
clarkbjohnsom: have any changes landed since that one?16:56
clarkbjohnsom: that change landed about a week ago and I think redoing docs promotion will overwrite anything more recent16:56
fungiunlikely, or else the promote re-run would be unneeded16:56
clarkbfungi: well unless the docs promote can run and releasenotes job doesn't16:56
clarkb(I'm not sure about that()16:56
noonedeadpunkclarkb: well, that's already somewhere in my to-do list which goes quite slowly to be honest :(16:57
johnsomclarkb Stuff has landed, but nothing with release notes or docs16:57
clarkbok I'll see about reenquing that now16:57
clarkbnoonedeadpunk: ya I might take a look at that later today16:57
clarkbone idea I had for ansible loops was a meta module that would call another module but keep the looping internal. But it seems you can't really do that without creating new tasks and that will negate any benefit16:58
clarkbfungi: does `zuul-client enqueue --tenant openstack --pipeline promote --project opendev.org/openstack/designate --change 858022,1` look correct to you?17:02
clarkbnoonedeadpunk: actually I think a better approach is to do everything in a single shell script as the next incrememntal step. We're trying to encode logic in ansible to avoid ansible errors, but we can do all that in a shell script and then loop once17:10
clarkbThen a better approach to that would be to rewrite what that single shell script does into an ansible module that can do that looping internally to cut out the loop overhead entirely17:11
clarkbbut I think doing it step by step is good as reviewing the condensing to a single shell script should be easier and gives us a fallback point if we get the module wrong and need to revert17:11
clarkbjohnsom: I went ahead and ran that command17:12
johnsomThanks!17:13
clarkbjohnsom: both jobs are running now17:13
corvuskeep in mind that doing things in ansible gets us structured result data that's easier to display in the web ui -- so it's a balance! :)17:13
noonedeadpunkBut you can always create block/rescue if module fails and place old code in rescue?17:13
noonedeadpunkeventually the good thing about module is that you can leverage multiprocess there comparing to bash script. I'm not sure how wise to run that for native ansible though load-wise17:14
johnsomclarkb Fixed. Thanks again17:14
corvus(i know clarkb knows this, i mostly just wanted to caution anyone reading "do everything in a single shell script" that we don't really mean *everything* :)17:14
corvus(more like everything in a certain logical task)17:15
clarkbright in this case "check if git repo cache dir exists, if so clone it to workspace else git init, then remove origin remote" is a logical task of preparing the workspace that was ~4 loops. It can be one17:17
clarkbthen a further improvement would be to move all of that into a module to remove the ansible task loop entirely and put that into the module17:17
clarkbbut I think iterate improvements here are a good thing17:17
corvus++17:17
fungiclarkb: yeah, that invocation looks right to me17:17
fungisorry, was distracted briefly by my weapons of grass destruction17:18
fungiand  now i'm headed back into the fray17:20
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Reduce the number of loops in prepare-workspace-git  https://review.opendev.org/c/zuul/zuul-jobs/+/85896117:25
clarkbcorvus: noonedeadpunk ^ something like that. I think that is a good simple step17:25
noonedeadpunkoh, for some reason I though that ansible git module is being used there, hehe17:27
noonedeadpunkyeah, it makes way more sense this way17:27
noonedeadpunkyeah, it's really way better this way17:32
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Reduce the number of loops in prepare-workspace-git  https://review.opendev.org/c/zuul/zuul-jobs/+/85896117:37
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Reduce the number of loops in prepare-workspace-git  https://review.opendev.org/c/zuul/zuul-jobs/+/85896117:48
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Test speedup change to prepare-workspace-git  https://review.opendev.org/c/zuul/zuul-jobs/+/85896317:48
opendevreviewClark Boylan proposed opendev/base-jobs master: Use test-prepare-workspace-git in base-test  https://review.opendev.org/c/opendev/base-jobs/+/85896417:49
fungihttps://zuul.opendev.org/t/openstack/build/7fdb7f42f96f4ec6b91737d01c721524/log/lists.openstack.org/apache2/lists.opendev.org-ssl-access.log#217:50
fungii think the job actually needs to fire the script the cronjob would normally run, or at least stick a replacement file there17:50
clarkbyou can trigger the script with a creates directive on the file paths it creates17:51
clarkbthen it would fire in CI but not run in prod17:51
fungiprobably easiest if we just create a temporary file there when deploying the server, and check with testinfra that it gets served?17:52
clarkbya that should work too17:52
fungithat way we don't have to block for the script to complete (though i suppose ansible would do that too)17:52
clarkbyou'll want ot make that test specific then (the idea I had was so that we didn't need a separate playbook but we may already have one)17:52
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Test speedup change to prepare-workspace-git  https://review.opendev.org/c/zuul/zuul-jobs/+/85896317:56
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Reduce the number of loops in prepare-workspace-git  https://review.opendev.org/c/zuul/zuul-jobs/+/85896117:56
opendevreviewJeremy Stanley proposed opendev/system-config master: Re-expose our Mailman archives.yaml and robots.txt  https://review.opendev.org/c/opendev/system-config/+/85891318:02
fungiclarkb: how about that ^ approach?18:02
fungithat also ensures the list is created on new servers in advance of the daily cron firing18:02
fungioh, i guess i need become: true18:03
fungior do i? we run those playbooks as root already, right?18:04
clarkbyes they are already root18:04
clarkbI think that looks correct18:04
fungicool, then i guess we'll see what folks (including zuul) think about that18:04
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Test speedup change to prepare-workspace-git  https://review.opendev.org/c/zuul/zuul-jobs/+/85896318:10
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Reduce the number of loops in prepare-workspace-git  https://review.opendev.org/c/zuul/zuul-jobs/+/85896118:10
opendevreviewJeremy Stanley proposed opendev/system-config master: Re-expose our Mailman archives.yaml and robots.txt  https://review.opendev.org/c/opendev/system-config/+/85891320:10
* fungi sighs at his inability to indent correctly20:10
opendevreviewMerged zuul/zuul-jobs master: Test speedup change to prepare-workspace-git  https://review.opendev.org/c/zuul/zuul-jobs/+/85896321:00
clarkbfungi: do you have time to double check I've modified the test playbook in https://review.opendev.org/c/opendev/base-jobs/+/858964 and won't be affecting production?21:12
*** rlandy|rover is now known as rlandy|rover|biab21:33
opendevreviewJames E. Blair proposed opendev/system-config master: Make zk-ca role more generic  https://review.opendev.org/c/opendev/system-config/+/85898821:37
opendevreviewJames E. Blair proposed opendev/system-config master: Export Zuul traces to Jaeger  https://review.opendev.org/c/opendev/system-config/+/85898921:48
corvusclarkb: fungi ^ as i was writing the second change, I realized the flaw in using zk-ca for jaeger, so the first change above makes a new ca, and the second change sets up tracing for zuul.21:49
clarkbcorvus: yup dropped some comments. Seems like a good improvement21:55
fungiclarkb: sorry, got sidetracked by dinner, but looking noe21:56
funginow21:56
clarkbI find dinner distracting too :)21:56
fungiit was bibimbap, which is a lot of work, so extra distracting21:57
clarkbonce the base-test job updates I'm going to see if I can reparent an OSA job to it since they were the original example that kicked this off21:59
corvusclarkb: yeah, i left the filename alone because it's the "zk-ca.sh" script from zuul...  i guess we could change it in that copy?  would you prefer that?22:03
clarkbcorvus: I'm on the fence. I do think it may create confusion later as we use it for other things22:04
clarkbcorvus: maybe leave a note that it originated from there and update the filename?22:04
corvusok22:04
opendevreviewMerged opendev/base-jobs master: Use test-prepare-workspace-git in base-test  https://review.opendev.org/c/opendev/base-jobs/+/85896422:06
opendevreviewJames E. Blair proposed opendev/system-config master: Make zk-ca role more generic  https://review.opendev.org/c/opendev/system-config/+/85898822:06
opendevreviewJames E. Blair proposed opendev/system-config master: Export Zuul traces to Jaeger  https://review.opendev.org/c/opendev/system-config/+/85898922:06
clarkbremote:   https://review.opendev.org/c/openstack/openstack-ansible/+/858992 is testing the prepare-workspace-git changes22:10
clarkbit took just under 3 minutes to run through that loop on the openstack-ansible-deploy-aio_metal-debian-bullseye build for ^ and it took 6.5 ish minutes in yoctozepto's example22:20
clarkbassuming it actually works that is a decent time saving22:20
clarkbya I think it got it down to 8 minutes or so. A definite improvement, but could still be a lot better22:26
*** dasm is now known as dasm|off22:33
clarkbyoctozepto is probably sleeping, but when your Friday starts can you check the error at https://zuul.opendev.org/t/openstack/build/c27b758d63224e1bbe5a8c43bce19802/log/job-output.txt#10583 I want to make sure that isn't fallout from my change to prepare-workspace-git. It doesn't look like openstacksdk is one of the repos managed by the zuul job and we don't set it up. So that must be22:58
clarkbcoming from somewhere else22:58
clarkbLooks like https://review.opendev.org/c/openstack/openstack-ansible/+/858981 maybe suffering a similar fate.23:00
clarkbI'm going to run devstack under base-test too to up my coverage23:00
clarkboh hrm devstack parents to multinode so that is tricky23:01
opendevreviewClark Boylan proposed zuul/zuul-jobs master: DNM reparenting multinode to base-test to test devstack  https://review.opendev.org/c/zuul/zuul-jobs/+/85899523:03
clarkbhttps://review.opendev.org/c/openstack/devstack/+/858996 depends on ^ which should give us that coverage23:05
corvusfungi: if you have a minute for https://review.opendev.org/858988 and https://review.opendev.org/858989 that should let us start using the new tracing server23:38
fungii may even have two minutes23:42
clarkbok the devstack jobs are passing. I'm pretty sure the osa issue is independent of my chnage. The last thing to check is probably that we still set the zuul git state properly, but that is actually done by the other role. I think this is probably safe to merge into production if other people want to review it (and probably better to land tomorrow and not today as I'm about to have23:44
clarkbdinner)23:44
fungicorvus: those lgtm. trade you for 858913?23:46
* fungi brokers deals for code review23:46
clarkbalso it doesn't improve the speed of devstack tremendously because the repos devstack deals with are much larger and the cloning does dominate23:48
clarkbbut in osa it is a definite improvement23:48
corvusfungi: lgtm thanks!23:49
fungithanks!23:51
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: WIP: Fix CORS and endpoint in AWS log upload  https://review.opendev.org/c/zuul/zuul-jobs/+/85872623:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!