Friday, 2022-02-25

fungithere's some odd errors in the log00:00
fungiahh, looks like it may be due to repos with no branch named "master"00:01
clarkboh right because hound only looks at master00:01
ianwoh, they merged something about that00:01
clarkbthat makes sense00:01
fungicouldn't find remote ref master00:01
fungiambiguous argument 'origin/master': unknown revision or path not in the working tree.00:01
ianwit looks like it has a head branch detector00:03
ianwit calls00:07
ianwgit remote show origin00:07
ianwand00:07
ianwvar headBranchRegexp = regexp.MustCompile(`HEAD branch: (?P<branch>.+)`)00:07
ianw /var/lib/hound/data/vcs-1904f5a1b65975a88e16b96f9ef2c83aa8cbb0c0# git remote show origin00:07
ianw  HEAD branch: main00:07
ianw... i.e. i should be finding it00:07
clarkbhuh00:07
ianwthat repo is cirros00:08
clarkbya frickler added it as a main branch test case. Everything worked well until now :)00:08
ianwhang on, we may need to set it00:08
ianwoh, it's jeepyb that outputs the config iirc00:10
ianwyeah we need to set "detect-ref" in the config00:11
clarkbah00:12
opendevreviewIan Wienand proposed opendev/jeepyb master: hound: add detect-ref config option  https://review.opendev.org/c/opendev/jeepyb/+/83091900:14
ianw // Open an index at the given path. If the idxDir is already present, it will00:18
ianw/ simply open and use that index. If, however, the idxDir does not exist a new00:18
ianw/ one will be built.00:18
ianwthis suggests to me we could remove idx-* files, which might at least stop it having to reclone everything00:18
fungiand just keep the vcs-* files, yeah00:20
ianwit looks like we install from jeepyb master (git+https://opendev.org/opendev/jeepyb#egg=jeepyb)00:20
ianwso if https://review.opendev.org/c/opendev/jeepyb/+/830919 is ok we just need to trigger a image rebuild00:21
Clark[m]Looks fine to me. But I'm off the computer to start some early dinner prep00:23
opendevreviewSteve Baker proposed openstack/diskimage-builder master: Revert "Revert "Detect boot and EFI partitions in extract-image""  https://review.opendev.org/c/openstack/diskimage-builder/+/83090001:52
opendevreviewIan Wienand proposed opendev/jeepyb master: hound: add detect-ref config option  https://review.opendev.org/c/opendev/jeepyb/+/83091901:54
opendevreviewIan Wienand proposed opendev/system-config master: hound: enable detect-ref  https://review.opendev.org/c/opendev/system-config/+/83092602:57
opendevreviewIan Wienand proposed opendev/system-config master: hound: enable detect-ref  https://review.opendev.org/c/opendev/system-config/+/83092604:06
*** bhagyashris is now known as bhagyashris|ruck04:38
*** pojadhav|out is now known as pojadhav05:26
opendevreviewIan Wienand proposed opendev/system-config master: hound: enable detect-ref  https://review.opendev.org/c/opendev/system-config/+/83092606:27
*** ysandeep|out is now known as ysandeep06:28
*** amoralej|off is now known as amoralej07:11
*** luigi is now known as luigi-training07:24
*** luigi-training is now known as luigi07:24
*** jpena|off is now known as jpena08:05
*** bhagyashris_ is now known as bhagyashris|ruck09:55
*** ysandeep is now known as ysandeep|afk10:02
*** bhagyashris_ is now known as bhagyashris|ruck10:44
*** rlandy|out is now known as rlandy|ruck10:48
*** ysandeep|afk is now known as ysandeep12:00
*** frenzyfriday|rover is now known as frenzyfriday12:49
*** ysandeep is now known as ysandeep|away12:58
*** amoralej is now known as amoralej|lunch13:21
*** ysandeep|away is now known as ysandeep13:42
*** Guest229 is now known as diablo_rojo_phone14:01
*** amoralej|lunch is now known as amoralej14:03
opendevreviewFlorian Haas proposed zuul/zuul-jobs master: tox: Include ansible_env in tox_environment  https://review.opendev.org/c/zuul/zuul-jobs/+/83099214:15
corvusi'd like to do a rolling restart of zuul14:28
corvusstarting that now14:30
fungithanks corvus, sounds fine to me14:37
opendevreviewFlorian Haas proposed zuul/zuul-jobs master: tox: Include ansible_env in tox_environment  https://review.opendev.org/c/zuul/zuul-jobs/+/83099214:46
*** ysandeep is now known as ysandeep|out14:52
*** dviroel is now known as dviroel|lunch14:57
*** amoralej is now known as amoralej|off15:00
*** ykarel is now known as ykarel|away15:22
opendevreviewFlorian Haas proposed zuul/zuul-jobs master: tox: Include ansible_env in tox_environment  https://review.opendev.org/c/zuul/zuul-jobs/+/83099215:41
opendevreviewFlorian Haas proposed zuul/zuul-jobs master: tox: Include ansible_env in tox_environment  https://review.opendev.org/c/zuul/zuul-jobs/+/83099215:43
*** dviroel|lunch is now known as dviroel16:04
BlaisePabon[m]I'm new here and will lurk before making a fool of myself.16:18
BlaisePabon[m](I'm also using my "work" id, so I may return with my personal account...)16:19
clarkbBlaisePabon[m]: welcome. And feel free to ask questions if there is anything we can help with16:19
clarkbinfra-root any reason to not approve https://review.opendev.org/c/opendev/system-config/+/830874 now? I'm not sure if we'd prefer to wait on the zuul rolling restart to compelte? though that may take hours?16:20
corvusclarkb: i wouldn't wait for the restart; it will take ... all day? :)16:20
BlaisePabon[m]Thank you Clark. TL;DR: I'm rebuilding my home lab and I want to look into collaborating with open Dev (I have a lot of compute in my garage).16:21
clarkbcorvus: cool I'll approve it then and if something comes up before it merges we can remove the +A16:21
corvusis https://grafana.opendev.org/d/6c807ed8fd/nodepool?orgId=1 the best view of the nodepool status these days?  i think that shows all our node use and the clouds we're using right now?16:25
corvusBlaise Pabon: ^ might be interesting to see what's in use16:25
corvusand i think https://docs.opendev.org/opendev/system-config/latest/contribute-cloud.html has an overview of contributing resources16:26
clarkbcorvus: I also like the nodepool views on the zuul dashboard in grafana16:27
corvusthis one: https://grafana.opendev.org/d/21a6e53ea4/zuul-status?orgId=116:28
corvuswhich i'm watching right now since ze01 is the straggler in the first batch of executor restarts16:28
corvusthere's always that one job...16:29
*** marios is now known as marios|out16:42
*** jpena is now known as jpena|off16:56
clarkbonce etherpad is done I think I may approve the zuul-registry fix and then recheck fungi's change16:59
fungithanks, i'm still in my morning meeting tunnel for the next hour (even though it's no longer morning here), but i can help monitor it once i'm done17:00
corvuswe're onto the second batch of executor restarts now17:08
corvusthis'll be the longer one (since all new jobs have started here)17:09
opendevreviewMerged opendev/system-config master: Update Etherpad to 1.8.17  https://review.opendev.org/c/opendev/system-config/+/83087417:09
corvusseems to be running jobs :)17:11
clarkbetherpad service is restarting now17:28
clarkbhttps://etherpad.opendev.org/p/isitbroken loads for me17:29
clarkblgtm17:30
corvusi'm going to afk until later this afternoon.  i think since the first batch of executors looks good, the ongoing rolling restart will be fine.  it will just restart the second batch of executors and the mergers.  it will not do scheduler or web; i will do that after i get back.17:36
clarkbcorvus: ok, any concerns with me approving that zuul-regstriy change while you are gone?17:37
clarkboh looks liek you said my plan sounds good in zuul room. I'll proceed soon then17:37
corvusclarkb: ++17:37
corvusthe rolling restart is just an ansible playbook i'm running on bridge... so if some emergency comes up, that's the thing to kill17:38
clarkbnoted17:38
corvusroot     20070 11.6  0.7 326572 59016 pts/4    Rl+  14:30  21:54 /usr/bin/python3 /usr/local/bin/ansible-playbook -f 20 playbooks/zuul_rolling_restart.yaml17:39
corvusfor the record17:39
clarkbzuul-registry update has been approved. I'll monitor it after it lands. Likely by rechecking funig's change18:05
fungilooks like that merged, thanks!18:09
corvusah, evil count funig18:09
corvusand his dastardly change18:10
fungiclarkb: i don't know if deploy jobs are succeeding yet. did ianw's default branch fixes for jeepyb/hound merge yet?18:10
clarkbfungi: the deploys should be fine without ianw's fix I think18:11
fungilooks like no, they haven't been approved yet18:11
clarkbbut no the jeepyb update is failing testing18:11
fungioh! right, the deploy jobs will work because i cleaned up the rootfs18:11
fungisorry :/18:11
fungii'm still trying to context-shift out of my day-o-meetins18:12
fungiclarkb: the jeepyb change is passing18:12
clarkbhrm not the one I had in my browser18:13
clarkbmaybe there is a different one18:13
fungithere was an egregious abomination before the whitespace gods in patchset 118:13
fungi83091918:13
clarkbhttps://review.opendev.org/c/opendev/system-config/+/830916 that one is the one I was looking at18:13
fungioh, aha18:13
clarkbwhich if anything was going to affect deploy it would be ^ but I'm pretty sure deploy is fine18:14
fungithanks, i missed that one. i was looking at the two default branch fixes18:14
clarkbsince this was just silently not doing anything before and would only fail in the codesearch playbook which nothing should depend on18:14
fungiright18:16
clarkbfungi: have you checked if your apache config update has applied yet?18:16
clarkbI approved the jeepyb change but I don't think it shares a queue with system-config so approval on the next change isn't helpful just yet18:18
fungiclarkb: if you meant 829975 no it hasn't merged yet18:19
clarkbI'm confused. What looked like it merged then?18:19
clarkbI agree the change I was thinking of has not merged so can be rechecked soon once the zuul-registry change merges18:20
fungii was talking about the buildset registry fix for the thing 829975 was failing on, but misread some notifications from gerrit i think18:20
fungias that doesn't seem to have merged either yet18:20
clarkbah got it.18:20
clarkbya its in the gate and should land shortly. Then once the image for it publishes we can recheck 82997518:20
fungiyeah, sorry, i'm still a little scattered and trying to catch back up18:20
clarkbI also want to get an update to the gitea 1.16.1 change to point it at 1.16.2 now18:21
fungioh, that got tagged? excellent18:21
clarkbyup I dont think it has the fix we wanted but many other fixes seem to be included :)18:22
clarkbhttps://github.com/go-gitea/gitea/blob/v1.16.2/CHANGELOG.md18:23
clarkbI like to take my time with those updates and cross check file diffs and so on. So I'18:23
clarkb*I'll want until I'm happy with zuul-registry and your change before looking at that18:23
clarkbok https://hub.docker.com/layers/zuul/zuul-registry/latest/images/sha256-f2a9be9b41c0dd2713ccbdc095d9215eaf32a0a4236684a49c5e19285cb9a34d?context=explore seems updated18:27
clarkband change has been rechecked18:27
opendevreviewClark Boylan proposed opendev/system-config master: Update Gitea to 1.16.2  https://review.opendev.org/c/opendev/system-config/+/82818418:46
opendevreviewClark Boylan proposed opendev/system-config master: DNM change to test and hold gitea 1.16.2  https://review.opendev.org/c/opendev/system-config/+/82858618:46
clarkbzuul's lack of executors is leading to slow job enqueues so I went ahead and did ^18:47
clarkbthere is a new hold in place for that and I deleted the old hold as well as the hold for etherpad and gitea link verification18:47
opendevreviewMerged opendev/jeepyb master: hound: add detect-ref config option  https://review.opendev.org/c/opendev/jeepyb/+/83091919:03
clarkbheh ^ rebuilds the gerrit image to inlcude teh jeepyb update, which isn't necessary in this case. But we should think about a gerrit restart on up to date images soon (maybe do the mergability checks and remove jvm gc logs too at the same time early next week?)19:06
fungiyeah, all those together would be good for a restart19:17
fungior if we want to prioritize reviewing those today i can plan to do a quick restart over the weekend19:18
fungiworst case, if something doesn't work, i'll roll back to a specific image and restart again with deployment to review disabled19:19
clarkbhttps://review.opendev.org/c/opendev/system-config/+/830912 is the jvm gc log removal change. Note you have to remove the files (or move them aside) from the log dir too otherwise the issue seems to persist19:20
fungihuh, interesting, so remove them while ther container is down i guess19:20
clarkbhttps://review.opendev.org/c/opendev/system-config/+/829882 is the mergable check and has reviews. I would say prioritize this one anyway and the jvm gc log thing is more of a minor annoyance19:20
clarkbfungi: ya the commit message indicates the process. You stop gerrit, move the files aside, remove the entry from docker-compose.yaml (though ansible probably already did this which is fine) then start gerrit again19:21
fungiread it, very clear now, thanks19:26
fungilgtm, happy to approve it or wait for a second reviewer19:26
clarkbI'm happy to do it on monday but if others want to do it sooner than that no objections from me either19:27
clarkbzuul is pretty backed up right now so no rush due to that either :)19:27
clarkbIt should catch up once it starts the second half of executors that were stopped19:27
fungias i said, i'm available to do a restart over the weekend, i'm not too concerned about any of these additions making trouble, and if we get the gc logging change in then i'll make sure to do the dance recommended in its commit message as part of the restart19:28
clarkbsounds good, thanks19:28
clarkbthe gitea image build for 1.16.2 jumped ahead of the gerrit change and is likely to exercise the buildset registry first19:30
clarkb(it has a node and the gerrit jobs don't yet, but niether has an executor)19:30
fungiexecutor roulette19:37
clarkbhttps://zuul.opendev.org/t/openstack/stream/b3783abcb11749a98a772aaf18c69fa8?logfile=console.log it is running now19:41
clarkbI think it is the end of that job where it writes to the buildset registry that we care about19:41
clarkbthen the system-config-run-gitea job will fetch that image and test it19:41
clarkbI think it pushed to the buildset registry happily. This is a smaller image though so really just cheks that we didn't introduce a fatal regression and not necessarily that the issue is resolved19:48
clarkbwe are waiting on one more executor to finish up then we'll restart all 6 and be back to a full set19:51
clarkbze0819:56
clarkblunch now but I think it is at least as functional as before19:58
*** dviroel is now known as dviroel|afk20:21
opendevreviewMerged opendev/system-config master: Restore is:mergeable predicate in Gerrit  https://review.opendev.org/c/opendev/system-config/+/82988220:42
clarkbI wonder if zuul could "reparent" paused jobs to speed up these restarts. There is no active ssh connection at that point. You'd basically stick the job into a "please deal with this when completed" queue?20:53
clarkbtracking state for that might be complicated though since we attach builds firmly to a specific executor right now iirc20:54
clarkbthe gitea 1.16.2 builds were all successful. I expect zuul-registry is fine but still waiting on the gerrit image builds to hit it with the large layers21:08
spencerharmonHello! Just wanted to introduce myself. I'm new to opendev/gerrit. I just submitted a review for the first time. Hopefully I did everything correctly. I see there's another channel for octavia. In case I have questions about feedbak or the review process, would those be more appropriate in this channel or in #octavia? Also, review is https://review.opendev.org/c/openstack/octavia/+/831051 21:16
johnsomspencerharmon Hi and welcome!21:19
fungispencerharmon: yep, in here is where we run the systems which make code review and other sorts of collaboration possible, but it'll be the octavia contributors reviewing your change so that channel is probably more appropriate if you have questions for the people who are going to be looking at it21:19
fungialso, welcome! and have a great time21:19
spencerharmonAwesome, thanks! :) 21:20
johnsomspencerharmon The Octavia team hangs out in #openstack-lbaas. We are more than happy to help you out with gerrit/patches/etc.21:20
spencerharmonOh, good to know. I'll join that one.21:20
spencerharmonOne thing I haven't come across in the documentation so far is the backport process. I'm running this commit against victoria in the lab for compatibility with a third party driver. If this is accepted, I'd like to apply it to victoria as well. Do any of yall have know where that process is documented or have any related advice?21:26
clarkbspencerharmon: https://docs.openstack.org/project-team-guide/stable-branches.html#proposing-fixes is openstack backport procedure (with ltos of other backport related info in that doc)21:28
spencerharmonThank you so much! 21:28
clarkbspencerharmon: its basically cherry-pick -x from the master branch in order going backwards until things are not supported or no longer needed anymore21:29
clarkbfungi: arg https://zuul.opendev.org/t/openstack/build/5f67ee2a2fcb4eec8529a80e726319e3/log/job-output.txt#7434 My hunch is that we converted the short size read to a 404 because the blob isn't fully on disk yet and docker doesn't retry21:31
clarkbneither situation is correct and I think the failures would happen in either case. Basically if we failed before we'd fail now and vice versa so I don't think we need a quick revert. But we need to get the buildset registry logs and for that have to wait for the job to complete21:32
spencerharmonAh, gotcha. Looks like `git review <branch>` is the command I'll need, after cherry picking. Looks pretty easy. Thanks!21:32
clarkbcorvus: ^ fyi. Basically I think the error changed and is maybe easier to track down now, but we didn't fix the problem21:32
clarkbhrm however those failures happened in iweb not ovh like before. So maybe this is worse than before (less sensitive to disk speed and more just fail like?)21:34
ianwinfra-root: if you get a chance to look at https://review.opendev.org/c/opendev/system-config/+/830784/1 and https://review.opendev.org/c/opendev/system-config/+/830785 that enables the log export for all jobs, and updates the documentation respectively 21:35
ianwthe doc update is bigger, once i started it seemed to fall out better into a separate section.  thoughts welcome of course :)21:35
ianwon my monday i'll push through the glean/dib stuff21:36
clarkbianw: thanks21:37
clarkbI'll take a look once I've gotten zuul-regsitry mostly sorted21:37
ianwthanks for the reviews on the hound stuff,  i think we can check back on that in a week and see if we can pinpoint what's growing21:38
clarkbzuul's executors are all back to full strength now too21:40
ianwclarkb: do you think pushing a big image in the test job might help?21:41
ianwi started to look at putting mitmproxy in that job too, from my rough notes on how i set it up when i was debugging21:42
clarkbianw: I don't think there is a test job currently, but yes the problem seems specifically related to large layers like the one we get with gerrit21:42
clarkboh there is a system-config-run-registry job21:42
ianwif you think that would help i can try to get that going21:42
clarkbthat doesn't run against zuul-registry changes, but I see what you mean. Ya that might help, maybe via depends on between zuul-registry and system-config to try and trip it21:42
ianwyeah the build-image does do a functional test; https://opendev.org/zuul/zuul-registry/src/branch/master/.zuul.yaml#L2421:43
clarkboh interesting. But ya I think the key to tripping this is a large layer. cna probably do that via RUN dd if=/dev/urandom of=/foo count=SOMENUMBER21:45
clarkband then try and push the result of that into the registry21:45
ianwyeah not 100% sure where the test image comes from that it pushes there21:45
ianwhttps://opendev.org/zuul/zuul-registry/src/commit/8f1f0705f79f9ac13c768aad5a604a51152b5b41/playbooks/functional-test/setup.yaml#L6321:47
ianwseems to call "buildah from scratch", so that seems like it would be a very empty base container21:48
clarkbI do wonder if some other action we're doing is indicating to the client that it can do the HEAD to read the size back21:48
clarkbbut I read through the api spec a bit yesterday and what that may be isn't clear to me21:49
clarkbbut it is possible the real underlying bug here is that we're taking some action too early which causes the client to get ahead of itself (since it should know the put request isn't complete yet why is it doing a head?)21:49
ianwyeah honestly the only way i figured things out was the mitmdump and correlating that against the api spec; even then i didn't figure it out, but it gave upstream enough info to help21:49
ianwlooks like you could maybe update that to "buildah from <something big>" just as a first step in pushing more data in the test?21:51
clarkbianw: ++21:51
clarkbcould use our gerrit image even21:51
clarkbI've just realized that the big ~200MB blob giving us trouble is pushed by both jobs at around the same time. I'm curating a log paste, but I think the next thing to check is a run that only does gerrit 3.523:32
opendevreviewClark Boylan proposed opendev/system-config master: DNM testing zuul-registry via gerrit builds  https://review.opendev.org/c/opendev/system-config/+/83106423:54
clarkbhttps://paste.opendev.org/show/bS1Udj5RpRm8dkcFaVac/ are the annotated logs23:56
clarkbI've written down what I think are some potential rpoblems but I'm not sure the logs show those specific issues happening in these failures. It is enough to make me susicious though23:56
opendevreviewClark Boylan proposed opendev/system-config master: DNM testing zuul-registry via gerrit builds  https://review.opendev.org/c/opendev/system-config/+/83106423:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!