Wednesday, 2021-03-24

*** tosky has quit IRC00:09
*** piotrowskim has quit IRC01:02
*** holser has quit IRC01:45
*** evrardjp has quit IRC03:33
*** evrardjp has joined #zuul03:33
*** ykarel has joined #zuul04:05
*** ykarel has quit IRC04:15
*** ykarel has joined #zuul04:15
*** vishalmanchanda has joined #zuul04:42
*** jfoufas1 has joined #zuul05:24
*** ajitha has joined #zuul05:26
*** wuchunyang has joined #zuul05:36
*** saneax has joined #zuul05:56
*** zbr|rover has quit IRC06:08
*** zbr|rover has joined #zuul06:10
*** parallax has quit IRC07:29
*** jcapitao has joined #zuul07:46
openstackgerritTobias Henkel proposed zuul/zuul master: Route streams to different zones via finger gateway  https://review.opendev.org/c/zuul/zuul/+/66496508:11
openstackgerritTobias Henkel proposed zuul/zuul master: Support ssl encrypted fingergw  https://review.opendev.org/c/zuul/zuul/+/66495008:11
*** ykarel is now known as ykarel|lunch08:17
*** rpittau|afk is now known as rpittau08:19
*** hashar has joined #zuul08:28
openstackgerritSimon Westphahl proposed zuul/zuul master: Add UUID for queue items  https://review.opendev.org/c/zuul/zuul/+/77251208:31
openstackgerritSimon Westphahl proposed zuul/zuul master: Store semaphore state in Zookeeper  https://review.opendev.org/c/zuul/zuul/+/77251308:31
*** ricolin has joined #zuul08:32
*** tosky has joined #zuul08:41
openstackgerritSimon Westphahl proposed zuul/zuul master: Unify handling of dequeue and enqueue events  https://review.opendev.org/c/zuul/zuul/+/78109908:46
openstackgerritSimon Westphahl proposed zuul/zuul master: Improve test output by using named queues  https://review.opendev.org/c/zuul/zuul/+/77562008:46
openstackgerritSimon Westphahl proposed zuul/zuul master: Avoid race when task from queue is in progress  https://review.opendev.org/c/zuul/zuul/+/77562108:46
openstackgerritSimon Westphahl proposed zuul/zuul master: Implement Zookeeper backed connection event queues  https://review.opendev.org/c/zuul/zuul/+/77562208:46
openstackgerritSimon Westphahl proposed zuul/zuul master: Dispatch Github webhook events via Zookeeper  https://review.opendev.org/c/zuul/zuul/+/77562408:46
openstackgerritSimon Westphahl proposed zuul/zuul master: Dispatch Pagure webhook events via Zookeeper  https://review.opendev.org/c/zuul/zuul/+/77562308:46
openstackgerritSimon Westphahl proposed zuul/zuul master: Dispatch Gitlab webhook events via Zookeeper  https://review.opendev.org/c/zuul/zuul/+/77562508:46
openstackgerritSorin Sbârnea proposed zuul/zuul master: Document tox environments  https://review.opendev.org/c/zuul/zuul/+/76646009:00
*** vishalmanchanda has quit IRC09:01
*** jpena|off is now known as jpena09:09
*** holser has joined #zuul09:16
*** ykarel|lunch is now known as ykarel09:16
*** nils has joined #zuul09:37
*** vishalmanchanda has joined #zuul09:42
*** parallax has joined #zuul09:54
openstackgerritTobias Henkel proposed zuul/zuul master: Route streams to different zones via finger gateway  https://review.opendev.org/c/zuul/zuul/+/66496510:03
openstackgerritTobias Henkel proposed zuul/zuul master: Support ssl encrypted fingergw  https://review.opendev.org/c/zuul/zuul/+/66495010:04
*** jangutter_ has joined #zuul10:24
*** jangutter has quit IRC10:27
*** wuchunyang has quit IRC10:59
*** jcapitao is now known as jcapitao_lunch11:06
*** hashar has quit IRC11:07
*** tobias-urdin has joined #zuul11:09
tobias-urdinis there any way to make nodepool (or zuul?) wait for cloud-init to complete on a node before using it11:09
tobias-urdinfor example, with an image that does not have python installed, and cloud-init (passed through nodepool) that installs it sometimes causes a race condition between zuul trying to execute ansible and python is not installed on the node11:10
tobias-urdinim thinking, a pre-run task using the "raw" module and pausing until the python binary is found, but that's kind of hacky11:11
tobias-urdinbut maybe it breaks even earlier than that11:11
avasstobias-urdin: you're installing python with cloud init?11:17
avassI think corvus had siilar problems where there was a race with host-keys generated by cloud-init11:18
avasstobias-urdin: if you can somehow install python before enabling sshd that would do it I think.11:19
tobias-urdinavass: yeah, nodepool passes userdata that installs python, but sometimes python is not installed in time when zuul to start using the node11:34
*** rlandy has joined #zuul11:35
tobias-urdinfor now i'm just pausing the execution in pre-run, seems to work just need to make sure nothing using modules is used before i guess11:36
tristanCtobias-urdin: maybe you could use a `raw` task to wait for python installation, e.g. https://docs.ansible.com/ansible/latest/collections/ansible/builtin/raw_module.html#examples11:55
avasstristanC, tobias-urdin: doesn't the initial ansible-setup need python too?12:07
*** sshnaidm|off is now known as sshnaidm12:13
*** jpena is now known as jpena|lunch12:32
*** jcapitao_lunch is now known as jcapitao12:42
*** ykarel has quit IRC13:04
*** ykarel has joined #zuul13:07
*** ykarel_ has joined #zuul13:13
*** ykarel has quit IRC13:14
mordredtobias-urdin: I don't suppose you can tell nodepool to build you an image that has python already?13:50
mordredbut in general - "cloud-init is complete" is, AIUI, a hard condition to generalize. I'm sure some people would like that if there is a good way to express it and it can be determined via the cloud's api13:51
mordredif we're talking about doing it in nodepool that is. for zuul - yeah- a very early raw task in your base pre-run playbook that waits for python to exist might be a good idea13:52
swesttobias-urdin: would executing 'cloud-init status --wait' work for you?13:54
tobias-urdinmordred: unfortunately the reason it does not have python is because we want to use a non-customized image as much as possible, but i'm sticking to the pre-run trick for now13:54
tobias-urdinswest: didn't know about that, perhaps that is better as a raw task than simply pausing, thanks!13:55
mordredswest: ooh that's a potentially neat trick13:55
corvus++ sounds like a good role for zuul-jobs if it works :)13:56
mordred++13:56
fungitobias-urdin: i'm slightly confused... how does cloud-init run without python?13:57
fungior has it been rewritten in something other than python?13:57
tobias-urdinfungi: on centos (and similar) it uses platform-python (which is python3 but is /usr/libexec/platform-python or smth like that) and not /usr/bin/python313:59
tobias-urdinpretty much all system tooling uses that, but python is not "installed"13:59
fungitobias-urdin: aha, got it. so your nodes have python, just not the python you want to use to run your tests13:59
tobias-urdinyeah, so i can use that to run stuff but i can't use that for my applications13:59
avassI thought the "ansible '*' -m setup" needed python, maybe it doesn't14:01
mordredtobias-urdin: you know - for the ansible case, trying using /usr/libexec/platform-python might be an interesting experiment.  it would potential help the ansible not pollute the images you are otherwise trying to keep as more pristine test environments14:01
mordredthat way you could actually have jobs that install python as part of the workload - which would be neat14:01
tobias-urdinmordred: yeah :)14:03
corvusavass: i'm curious about that too14:05
mordredavass, corvus: since that's for fact caching, do we just catch the exception and not cache facts if it doesn't work?14:08
corvusmordred: it's actually mostly for ssh connection testing; so we want it to fail if it doesn't work14:08
mordredoh - right14:09
mordredwe do in fact want that to work14:09
mordredI wonder if the setup module finds /usr/libexec/platform-python14:09
mordred system_interpreters = ['/usr/libexec/platform-python', '/usr/bin/python3', '/usr/bin/python']14:10
mordredalso: https://github.com/ansible/ansible/blob/4c5ce5a1a9e79a845aff4978cfeb72a0d4ecf7d6/lib/ansible/modules/package_facts.py#L24214:10
*** ykarel_ is now known as ykarel14:10
mordredit looks lke ansible is aware of and will attempt to find platform-python14:10
corvusah nice; that would probably be used for all ansible tasks then, right? (once facts are gathered?)14:11
mordredyeah - except I think we set ansible_python_interpreter somewhere, no?14:12
mordredwhich I think might stem from older days when ansible was worse at finding the right interpreter?14:12
mordred(like, I'm wondering if setup works because ansible finds platform-python, but then we configure ansible more explicitly which then breaks things)14:13
corvusmordred: i think we set it to auto which is the default now?14:13
mordredI think you're right?14:13
mordredtobias-urdin: ^^ are you setting ansible_python_interpreter somewhere?14:13
* mordred is learning fascinating things today14:14
corvustobias-urdin: and is your race condition with running ansible or running your code which requires python?  (ie, is ansible breaking, or is a shell task that runs "python something.py" breaking)?14:15
tobias-urdinit's actually pretty messy, i'm setting python-path to /usr/bin/python3 in nodepool, passing userdata in nodepool to install python314:24
tobias-urdinthen the application im running is python3, which is running inside a venv, that in itself runs ansible, that sets ansible_python_interpreter to bootstrap the node, which in turn installs python3 (it's already installed) which then runs another playbook with interpreter set  to python314:25
tobias-urdini'm basicly testing bootstrapping code, that is a python app, that runs ansible, inside zuul that runs it with ansible :p14:26
avassheh :)14:26
corvushrm, then i wonder how ansible -m setup works14:26
avasscorvus: maybe that ignores ansible_python_interpreter or fails and tries to find another interpreter from the defaults14:27
corvusmaybe14:27
*** jpena|lunch is now known as jpena14:28
avassactually I think ansible_python_interpreter might just be missing since that step uses a separate inventory file14:29
corvusavass: oh, that would be interesting14:30
avassI checked and that command fails if ansible_python_interpreter is set to a bad path14:30
tobias-urdinanyway, i added a pre-run playbook to the base job which just does cloud-init status --wait which seems like the best situation, we always want to wait for cloud-init so that fine for us (thanks swest!)14:30
corvustobias-urdin: did you use raw for that?14:31
tobias-urdinyes, straight up "raw: cloud-init status --wait" with gather_facts set to false for the playbook14:32
corvusokay, that's not inconsistent with setup doing something special14:36
corvusi'm doing some tests :)  with absolutely no python on the system, setup module fails14:40
corvusansible will use platform-python.  setup honors ansible_python_interpreter.  and if it's set, it won't fall back on platform-python.14:45
corvusand from what i can tell, zuul includes ansible_python_interpreter in the setup inventory.14:45
corvusavass, tobias-urdin, mordred: so i don't understand how tobias-urdin's system gets past zuul's invocation of ansible -m setup.14:45
corvusthe ansible_python_interpreter that's set via nodepool should cause that to fail before even running the pre-playbook.14:46
corvus(i just confirmed on opendev that our setup-inventory files have ansible_python_interpreter in them)14:48
corvustobias-urdin: are there any messages in your executor log related to zuul's ansible setup phase?14:50
corvustobias-urdin: it'll be an ansible command with "-m setup" in the args14:51
corvustobias-urdin: because i'm stumped as to how this is working for you; i would expect you to need to use the default of 'auto' for python-path for ansible to work reliably, then in a regular pre-run task (no "raw" or anything) either install python, or wait for cloud-init to finish doing it for you.  then proceed with running your app.14:53
corvusmordred, avass, swest: ^14:54
avassyep I agree14:54
avasscorvus: oh actually I think I got it14:57
avasscorvus, mordred: ansible setup exits with 127 and that's not handled: https://opendev.org/zuul/zuul/src/branch/master/zuul/executor/server.py#L231414:59
avassoh nvm, I might have gotten the actual exitcode and the "rc" response it returns mixed up15:00
mordredwow15:00
corvusexit code is 2 if it's not found15:01
corvusso that's line 2361?15:01
avassyeah I read "rc": 127 as what the exitcode would be without actually checking the exitcode15:01
corvusavass: that would make sense :)15:01
corvusthere's a whole bunch of processing there... i think we may not actually hit any of the cases for exit code 2 that cause it to return15:02
corvushere's my test run: http://paste.openstack.org/show/803869/15:03
avasscorvus: but that path only return a non RESULT_NORMAL status if this is in the log: https://opendev.org/zuul/zuul/src/branch/master/zuul/executor/server.py#L237015:03
corvusavass: right that's what i'm saying15:03
avasscorvus: ah I was busy writing :)15:03
corvusavass: so i think you solved it :) apparently we *only* care about network issues at that stage, so we bypass the interpreter error15:04
corvusmordred: ^15:04
corvusit's like, it worked well enough to bomb out, carry on!15:05
corvustobias-urdin: okay, i think we understand why your sequence works.  :)  the other one might work too if you run into problems.15:06
openstackgerritJames E. Blair proposed zuul/nodepool master: Azure: replace driver with state machine driver  https://review.opendev.org/c/zuul/nodepool/+/78192515:12
openstackgerritJames E. Blair proposed zuul/nodepool master: Azure: update documentation  https://review.opendev.org/c/zuul/nodepool/+/78192615:12
avasscorvus: I also found this comment which might be worth taking a look at heh: https://opendev.org/zuul/zuul/src/branch/master/zuul/executor/server.py#L2416 :)15:20
corvusavass: yep, that could probably be handled now15:21
mordrednice15:21
corvusthat would let us increase the persistent ssh session timeout which would improve efficiency on heavily loaded systems, while at the same time closing the persistent ssh connections immediately at the end of jobs15:22
tobias-urdinnice :D15:36
corvusswest, tobiash: i've started a "punch list" for v5: https://etherpad.opendev.org/p/zuulv515:43
corvusi think so far we've got 2 things we should remember to do; ideally before our need for them is critical :)15:44
*** jfoufas1 has quit IRC15:55
avassdoes zookeeper have something like leases in etcd? just wondering if keys could be attached a lease so if a client doesn't bump it's lease (like a crash) the keys would just be dropped15:59
avassor if there's another reason why that's not used15:59
*** ykarel is now known as ykarel|away16:00
corvusavass: there are ephemeral nodes16:00
corvusavass: and we use them where appropriate; but the 2 items on that list aren't a good match for ephemeral nodes (in the first case, the wrong client would own the node)16:01
corvusand in the second, we don't want the node to disappear even if the client does16:01
corvuswe're actually probably going to be using ephemeral nodes a lot less in the future, as we decouple persistent "system state" from clients16:02
corvuswe might be able to use ephemeral nodes for item 1 if we swap things around so the originator of the request creates the result node16:05
avasscorvus: checking the second case the jobs holding the semaphores could be stored as znodes below (sub znodes?) the semaphore itself so they're dropped. but maybe that's much less efficient16:07
* avass will now read up on the sos spec16:09
corvusyeah, i don't think that improves efficiency, and it doesn't address the "leak due to bug" case16:09
corvusi think you might be on to something with #1 though, i'll give that a look later :)16:10
openstackgerritJames E. Blair proposed zuul/zuul master: Don't refresh change when enqueuing an dequeue event  https://review.opendev.org/c/zuul/zuul/+/78281216:21
*** ykarel|away has quit IRC16:22
*** hamalq has joined #zuul16:32
openstackgerritJeremy Stanley proposed zuul/zuul-jobs master: WIP: Set Gentoo profile in configure-mirrors  https://review.opendev.org/c/zuul/zuul-jobs/+/78233916:36
openstackgerritJeremy Stanley proposed zuul/zuul-jobs master: Revert "Temporarily stop running Gentoo base role tests"  https://review.opendev.org/c/zuul/zuul-jobs/+/77110616:36
avasscorvus: no but it should at least be possible to repair the system with a restart that way16:37
corvusavass: which thing are you talking about, 1 or 2?16:47
*** masterpe has quit IRC16:48
*** Eighth_Doctor has quit IRC16:48
*** mordred has quit IRC16:48
corvusif 2, then we don't want the semaphore to disappear when a scheduler restarts16:48
avasscorvus: 2 but in my head the semaphore and the executors holding part of the semaphore are different nodes, so if an executor crashes it drops the node it was holding16:51
*** mordred has joined #zuul16:51
*** masterpe has joined #zuul16:52
avassso instead the semaphore being one znode containing how many are currently being used, it just contains a max number with sub nodes held by executors being references to what jobs are currently using them16:52
avass(but maybe there's a good reason why it shouldn't work like that)16:53
corvusavass: because the scheduler is responsible for acquiring the semaphore before scheduling the job for an executor16:54
avassthen that makes more sense16:55
*** Eighth_Doctor has joined #zuul17:04
*** y2kenny has joined #zuul17:19
*** rpittau is now known as rpittau|afk17:24
*** jcapitao has quit IRC17:47
openstackgerritJames E. Blair proposed zuul/zuul master: Use ephemeral nodes for management result events  https://review.opendev.org/c/zuul/zuul/+/78283417:56
corvusavass: ^ inspired by your comment on #1;  swest, tobiash: ^17:56
corvusi think that also fixes a race18:02
*** jpena is now known as jpena|off18:04
avasswait what's the difference between a gerrit hashtag and a gerit topic?18:08
corvusavass: you can only have one topic, and can have many hashtags18:09
avassoh cool18:09
corvus(also, gerrit uses topics for things like submitting groups of changes together, which we don't enable in opendev)18:10
corvusi'm currently using the "sos" hashtag to identify a working set of changes -- like, let's review this group of changes as a unit, get them merged, then restart.  otherwise the entire topic is too big to deal with.18:11
corvus(so that's why only some "topic:sos" have "hashtag:sos"18:11
corvus(could also do something like sos-1 sos-2 etc, but i haven't found the need for that yet)18:13
*** hashar has joined #zuul18:54
*** sshnaidm is now known as sshnaidm|afk19:12
*** GomathiselviS has joined #zuul19:15
*** hashar is now known as hasharAway19:58
*** hasharAway is now known as hashar20:24
*** GomathiselviS has quit IRC20:28
*** zettabyte has joined #zuul20:41
*** zettabyte has quit IRC20:48
tobiashcorvus: so an ephemeral node stays ephemeral also when updated by a different session?20:58
*** hamalq has quit IRC20:59
*** hamalq has joined #zuul21:00
corvustobiash: yep; i believe we use that with node requests21:01
tobiashCool :)21:01
openstackgerritAlbin Vass proposed zuul/nodepool master: Document ImagePullPolicy for kubernetes driver.  https://review.opendev.org/c/zuul/nodepool/+/76446321:10
*** y2kenny has quit IRC21:14
*** y2kenny has joined #zuul21:18
y2kennycorvus: your suggestion from yesterday worked.21:18
tobiashcorvus: regarding 781099, does it make sense to do the same with the promote event later as well?21:18
y2kennycorvus: thanks21:19
*** jangutter has joined #zuul21:22
*** jangutter_ has quit IRC21:25
*** zettabyte has joined #zuul21:26
*** hashar has quit IRC21:33
*** zettabyte has quit IRC21:33
*** zettabyte has joined #zuul21:34
corvusy2kenny: \o/21:34
openstackgerritMerged zuul/zuul master: Add UUID for queue items  https://review.opendev.org/c/zuul/zuul/+/77251221:36
zettabyteWe're trying to speed up one of our zuul builds by moving some steps into disk image builder. One of our steps is to clone a few docker images, so I want to put these in disk image builder so that they are available before the zuul job starts.21:36
zettabyteDoes anyone know if this can be done and if there are perhaps some examples to look at?21:36
corvustobiash: maybe?  i think that one was mostly aimed at reducing the code in process_global_management_queue  (lines 1232 through 1262 on the old side), so i think it's mission accomplished there; but it seems likely that there some more consolidation we can do on the rpc side21:36
tobiash++21:37
tobiashcorvus, swest: commented on 77562221:37
corvuszettabyte: that's a good question; we haven't done that in opendev so i don't have an example to point at.  the main thing i'd be concerned about is starting/stopping the docker daemon.  it's worth a try.  it might be simpler with podman.  and finally, if worse comes to worst, you may be able to download the images as files and then import them into docker on the node.21:39
fungizettabyte: i don't know about the part where you tell your docker client where to find the images/cache, but we do something similar in opendev to pre-clone all our git repositories and download a number of files a lot of our builds rely on21:39
tobiashzettabyte: if you're using podman look at https://review.opendev.org/c/openstack/diskimage-builder/+/76770621:40
fungiwe run a lot of jobs which start nested virtual machines, and so pre-download iso images those nested vm instances would boot and store them in known paths i our nodepool diskimages21:40
zettabytecorvus: Yeah, that's exactly what I'm struggling with. Starting the docker daemon. I don't think you can do that in chroot21:40
tobiashzettabyte: I meant if you're using docker21:41
tobiashwith podman it's likely easier21:41
corvustobiash: good catch re iter21:41
fungizettabyte: you might be able to run the docker bits outside the chroot and then copy the results into it?21:41
tobiashfungi, zettabyte: we're running docker within bwrap when pulling the images21:42
corvustobiash: oh nice that looks like just what zettabyte needs?21:43
zettabytefungi: Yeah, that was my next thought. I'm trying post-root.d , but does that mean I need docker installed on nodepool-builder host? I'm getting a bit confused there21:43
tobiashyes, I've spent quite some time back then to get this working ;)21:43
zettabytetobiash: Yeah, I'll lookup podman thanks. I don't know it21:43
corvustobiash: is it only not merged because of pep8?21:43
corvuszettabyte: to be clear, there was a mistake in an earlier comment, you should look at https://review.opendev.org/767706 with docker as it may do what you are talking about today21:44
zettabytecorvus: Thanks!21:45
corvustobiash: and, you know, the "POC" in the title? :)21:45
tobiashcorvus: I don't know, I wasn't sure if this is interesting for folks. I've uploaded it some time ago because someone asked a similar question21:45
*** ajitha has quit IRC21:45
tobiashsince the way it works is a bit hacky ;)21:45
corvustobiash: sounds like there are at least 3 people in the world interested :)21:45
tobiashI can easily un-poc it :)21:45
corvustobiash: might be worth doing and see if dib wants it; could always add it with a warning it may muck up your networking or something21:46
avasszettabyte, tobiash: we're eventually going to have the same problem with pre-pulling images but haven't had much time to take a look at it yet. if you happen to find a better solution (or get docker to stop forcing everything to go through it's daemon) we'd be very interested :)21:46
tobiashat least we use that since years in production and even within nodepool-builder running containerized within openshift21:47
tobiashso it even works with docker-in-docker21:47
avassit's a bit sad that even pulling images requires the daemon to be running21:47
corvusskopeo/podman are great for that; i'd be really tempted to use those to write a file then import to docker on boot, just to keep from pulling hair out.21:48
corvus(assuming i wanted to use docker in the actual tests)21:48
avassyeah that's an alternative. I tried getting podman to work by symlinking docker->podman but the tools rely on it being docker a lot21:49
corvuswhile using podman for everything would be great, to be clear, i'm suggesting using skopeo/podman in dib to make the image, then on the actual booted node, importing the image into docker21:50
corvusa little extra time at boot, but shouldn't be much21:51
tobiashthat works file for small images probably but not for our 15gb+ images unfortunately21:51
zettabytetobiash: https://review.opendev.org/c/openstack/diskimage-builder/+/767706/ looks great. Would have taken a week of pain to figure something like that out21:51
corvustobiash: point21:51
avasssame for our 6gb+ images :)21:51
tobiashI've removed the poc, but I guess before being accepted the docs need to be added21:51
corvusit's also probably possible to write directly to docker's image cache.  can't be too hard, right? :)21:52
tobiashhowever I don't have much time right now to do that so if anyone wants to take it over feel free21:52
tobiashcorvus: I've tried that hard back then since that's the preferable solution but skopeo at least at that time also relies on a docker daemon21:52
avasscorvus: that can actually be very hard :)21:53
tobiashat least two years ago docker had no library for that21:53
tobiashno idea if that has changed since then21:53
avasstobiash: I think the structure skopeo stores the images in is different to how docker does it21:54
tobiashyes, that was the problem21:54
avassand I didn't find any tool to do that last time I checked ~3months ago21:54
avassI suppose that would be a good candidate for a side project21:56
fungione which you get to revise constantly each time docker inc decide to restructure their cache21:56
avassfungi: the overlay2 structure doesn't seem to have changed in at least 3 months so it can't be too bad can it? ;)21:58
*** zettabyte has quit IRC22:00
*** zettabyte has joined #zuul22:01
fungiby modern standards that's positively fossilized22:02
fungii don't suppose serving the images from a local registry on the node would perform any better than importing them from a filesystem path22:03
avassprobably not22:04
*** zettabyte has quit IRC22:07
*** zettabyte has joined #zuul22:08
*** zettabyte has quit IRC22:13
*** zettabyte has joined #zuul22:15
*** zettabyte has quit IRC22:24
*** zettabyte has joined #zuul22:25
openstackgerritMerged zuul/zuul master: Unify handling of dequeue and enqueue events  https://review.opendev.org/c/zuul/zuul/+/78109922:30
*** zettabyte has quit IRC22:31
*** zettabyte has joined #zuul22:32
*** zettabyte has quit IRC22:38
*** zettabyte has joined #zuul22:38
*** vishalmanchanda has quit IRC22:41
openstackgerritMerged zuul/nodepool master: Document ImagePullPolicy for kubernetes driver.  https://review.opendev.org/c/zuul/nodepool/+/76446322:44
*** zettabyte has quit IRC22:45
*** zettabyte has joined #zuul22:46
openstackgerritMerged zuul/zuul master: Improve test output by using named queues  https://review.opendev.org/c/zuul/zuul/+/77562022:46
openstackgerritMerged zuul/zuul master: Avoid race when task from queue is in progress  https://review.opendev.org/c/zuul/zuul/+/77562122:46
*** zettabyte has quit IRC22:52
*** zettabyte has joined #zuul22:53
*** nils has quit IRC22:57
*** zettabyte has quit IRC23:00
*** zettabyte has joined #zuul23:01
y2kennyIf I have ProjectA (config project) and ProjectB (untrusted project) and I pushed a job in ProjectB pre-submit that inherit a job in ProjectA that in turn calls a role in ProjectB that is not yet submitted, is that supposed to work?23:01
y2kenny(I am currently getting role not found.)23:01
*** rlandy has quit IRC23:05
*** zettabyte has quit IRC23:09
*** zettabyte has joined #zuul23:09
openstackgerritMerged zuul/nodepool master: Mention node id when unlock failed  https://review.opendev.org/c/zuul/nodepool/+/77767823:12
*** zettabyte has quit IRC23:15
*** zettabyte has joined #zuul23:16
fungiy2kenny: https://zuul-ci.org/docs/zuul/reference/job_def.html#attr-job.roles states " Zuul roles are able to benefit from speculative merging and cross-project dependencies when used by playbooks in untrusted projects."23:17
fungiso it has to do with where the playbook resides23:17
mordredyah - otherwise you could add a role to an untrusted project that overrides a role in a trusted project and then execute code speculatively in a trusted context - which would be bad23:18
fungiif the playbook is in a config project and references a role from an untrusted project, it needs that role to be present on the appropriate branch of the untrusted project23:21
*** zettabyte has quit IRC23:26
*** zettabyte has joined #zuul23:27
y2kennyfungi, mordred: yea, I was reading that and I want to make sure I understood it correctly.  I am doing something funky to get around some of my logging setup.  It's not so much the role in the untrusted project overriding the trusted one... I am actually passing the role name from the untrusted project into the trusted project to execute.  But I23:30
y2kennyget why that is a security issue (so the security model worked :)).23:30
mordred\o/23:31
y2kennyI know this looks like I am replicating the whole pre/run/post structure but I kind of have to becuase of various permission/security issue.23:32
y2kennybasically I needed to start something (a logging process) that span the entire duration of the job on the executor.23:32
y2kennywhich has to be in a trusted project but can't be just in the pre because the pre playbook will exit.23:33
y2kennythe work around is fine if I just pass the commands as a variable but I was hoping to do something more advanced by passing a role to be executed23:34
fungicould it technically be a separate build/job which runs on the executor, in parallel to the main job, just paused until the main job ends?23:34
y2kennyum... separate as in starts by the same trigger but not in a parent-child relationship?23:35
fungiwe already have a model for concurrent interdependent builds23:36
fungifor example, our container testing workflow starts a job which runs an image registry on a node, starts another job when that first job "pauses" and the second job can add images to or retrieve images from the registry being served by the node for the paused job, then once the second job completes the first is unpaused and cleans up23:37
y2kennyI think I saw that example but may be I misunderstood the implementation23:38
y2kennyI thought the registry job is parent to the second job23:38
fungiso i don't know the details of your logger, but you could in theory run the "logger job" on the executor, "pause" it (the logger started by the job keeps running), then you start your second job you want logged on an ephemeral node or whatever, once the logged job is done the logger job wakes back up, shuts down the logger process, and archives the logs or whatever23:39
corvusfungi, y2kenny: i think a paused job still ends the main run playbook, so any running processes will be terminated -- it just waits for children to finish before starting the post-run playbook.23:39
fungiahh, okay, so you'd have to have some way of leaving it running outside the playbook regardless23:40
corvusya23:40
fungifor our image registry example, i suppose the registry is a background process disassociated from the ansible which started it23:40
corvusfungi: it's on a worker node23:40
fungiright, of course23:41
corvusthe trick here is y2kenny doesn't have a convenient worker node to run the ipmitool on and would like to use the executor23:41
fungiwas just going to say that gets harder if you try to do it on the executor through the bubblewrap layer23:41
corvus(partly due to nodepool's inability to handle cross-provider requests -- can't request a static node and a vm at the same time)23:42
y2kennyI am actually using the registry example to start the baremetal node (since the baremetal will stay power on after the playbook quit)23:42
fungii suppose it would be possible, but would require a separate supervisor to handle the logger process23:42
y2kennyI can potentially have two separate job off the same trigger23:42
corvusy2kenny, fungi: but tie these 2 things together and we may have anohter option:23:42
fungii.e. something else running independently on the executor, which the playbooks talk to23:42
y2kennyso no parent/child relationship23:42
corvusouter job runs on vm, starts impmitool, pauses; inner job starts on baremetal, completes; outer job on vm resumes23:43
corvusit's 2 separate jobs, so it gets around the nodepoool cross-provider issue23:43
corvusoh, but the outer job would need to know the baremetal node...23:44
corvusnevermind23:44
fungii thought zuul wanted dependent jobs to be in the same provider too23:44
corvusi'm unsure if that's a preference or a hard requirement23:45
fungior it could be i imagined it23:45
corvusi'd look it up but i guess it doesn't matter23:45
corvusfungi: no you're right, i'm just not sure if it will entertain nodes from another provider if the current provider can't actually supply them23:45
corvusanyway, moot point23:45
y2kennywait... so is this the dependency between separate job or the dependencies between parent and child?23:46
y2kennyfor separate jobs but with a specified dependencies, I can certainly use different nodeset23:46
fungijob dependencies, not inheritence of job definitions23:46
fungibut yeah, as corvus points out, the ipmitool job won't know where to find the corresponding baremetal ipmi interface23:47
y2kennyoh right... because the inner job gets the node allocated later23:48
corvuswe can pass info from outer to inner job, but not the other way around23:48
y2kennyright23:48
fungiif there were some separate ipmitool-as-a-service with an api the executor could talk to, then you could theoretically communicate that to start/stop logging of a specified node and retrieve the log data23:49
fungibut that's a lot of additional bespoke engineering23:49
y2kennyyea... the alternative I was going to do is feed the dmesg to a server some where else via netconsole23:50
*** zettabyte has quit IRC23:50
y2kennybut that's a few more things to setup23:50
y2kennyand netconsole is still not as complete as a BMC serial-over-LAN capture.23:51
fungii suppose you're not running ironic, you could probably have the executor talk to it to collect console logs otherwise23:51
corvusacutally....23:51
y2kennybut I think what I have currently should be sufficient for now.  I am just passing the test command on the baremetal via variable.23:52
corvusfungi: i think the provider preference is a preference -- other providers will handle the request if the requested provider has declined it (which it would do if it can't satisfy it because it doesn't have that type)23:52
*** zettabyte has joined #zuul23:52
corvusfungi, y2kenny: so if you wanted to write a little bit of code, you could probably start a daemon on the outer job, return the network address of the daemon to the inner job via zuul, pause, then have the inner job connect to the daemon and tell it which ipmi host to connect to; then the daemon can start logging and the inner job can proceed23:53
corvuswhether *that* rube-goldberg machine is preferable to any of the others, i can't say :)23:54
corvusbut at least everything is ephemeral23:54
y2kennyhaha... yea... I will need to think about that.23:54
corvusand to be clear, since i'm making up the outer/inner job terminology here, the inner job is just a job that has "job.dependencies: outer-job"23:55
y2kennyright.23:55
y2kennyanyway, thank fungi and corvus for brain storming.23:58
y2kennythank you*23:59
fungiin a single-job model, could each playbook on the executor start up an ipmitool background process streaming to a file, and then in post just concatenate them?23:59
fungithere could be gaps, of course23:59
fungibut in theory the gaps at least wouldn't be while playbooks were running23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!