Wednesday, 2022-09-28

corvusclarkb: sorry was deep into other stuff; glad you worked it out. :)00:52
corvusfungiclarkb i accidentally dropped a vote when rechecking, so the zuul tracing changes haven't landed yet.  hopefully tomorrow.  :)00:53
*** ysandeep|out is now known as ysandeep01:45
*** ysandeep is now known as ysandeep|afk03:42
*** ysandeep|afk is now known as ysandeep05:14
*** jpena|off is now known as jpena07:36
*** ysandeep is now known as ysandeep|sick08:27
*** dviroel|afk is now known as dviroel11:22
*** dasm|off is now known as dasm13:18
opendevreviewRafal Lewandowski proposed openstack/diskimage-builder master: Add cloud-init growpart element  https://review.opendev.org/c/openstack/diskimage-builder/+/85585613:20
dmendiza[m]Hi friends!13:29
dmendiza[m]I'm not seeing zuul runs on openstack/barbican for the new stable/zed branch: https://review.opendev.org/q/project:openstack%252Fbarbican+branch:stable%252Fzed13:29
fungidmendiza[m]: have you checked to see if you have a queue parameter in one of your project pipelines on that branch?13:51
fungithe last several times someone has expressed the same symptom recently, that's been the cause13:52
fungidmendiza[m]: https://zuul.opendev.org/t/openstack/config-errors13:52
fungiopenstack/barbican - .zuul.yaml (stable/zed)13:53
fungiextra keys not allowed @ data['gate']['queue']13:53
fungithat would be why zuul is ignoring the configuration for that branch13:53
fungiprobably something you've already fixed in master and just haven't backported yet?13:54
dmendiza[m]fungi: ah yes, that sounds familiiar, thanks!13:55
fungiany time13:55
fungidmendiza[m]: yeah, you'll want to backport the queue change from https://review.opendev.org/85820513:58
opendevreviewMerged opendev/statusbot master: Handle exception for unprivileged commands  https://review.opendev.org/c/opendev/statusbot/+/80794814:15
*** marios is now known as marios|out14:32
clarkbfungi: are you aware of any reason to no land that zuul jobs update for the git mirroring loop condensing?15:14
clarkbI'll go ahead and do that first thing today if there aren't any issues that need attention first15:14
funginope, i'll be around in case anything goes haywire15:15
clarkbgreat15:15
fungifire away!15:15
clarkbhttps://review.opendev.org/c/zuul/zuul-jobs/+/858961 has been approved. I'll keep an eye on it too. Once I'm happy that change hasn't broken anything I'll approve the arm64 rocky 9 image update. I realized that we should manually restart launchers once that is in to be extra sure we got it correct15:15
fungialso we're still hoping to get a rolling zuul scheduler restart in once 858372 finally merges15:19
*** dviroel is now known as dviroel|lunch15:29
opendevreviewMerged zuul/zuul-jobs master: Reduce the number of loops in prepare-workspace-git  https://review.opendev.org/c/zuul/zuul-jobs/+/85896115:34
clarkbNow to look for jobs that have started since that merged15:36
clarkbhttps://zuul.opendev.org/t/openstack/stream/94a597154bc842e79975e27e9c26f77a?logfile=console.log maybe15:37
clarkbnope that job used the old version of the role15:38
clarkbhttps://zuul.opendev.org/t/openstack/stream/d8b8e6cf6c0d41b7ae21643ee22dcbfb?logfile=console.log that one used the new version15:38
clarkbI see two successful jobs so far that I believe ran with the new version of the role15:40
clarkbI'll give it a bit logner before I move on to the arm64 image change15:41
clarkbthe job failures I'm seeing don't appear related to the git repos.15:58
clarkbhttps://review.opendev.org/c/openstack/project-config/+/858554 has been approved15:59
opendevreviewMerged openstack/project-config master: Add rockylinux-9-arm64  https://review.opendev.org/c/openstack/project-config/+/85855416:06
corvusfungi: clarkb apparently the first change that emits spans did get deployed this weekend, so we should theoretically already be getting them.  i'll look into why they aren't showing up.16:23
corvus"http2Server.HandleStreams received bogus greeting from client:" in the jaeger log looks suspicious16:23
fungiindeed, maybe creds aren't quite right?16:24
corvusyeah, or wrong protocol or something?16:25
clarkbused EHLO instead of Hello16:25
corvushah16:26
corvustcpdump sees traffic from the schedulers -> jaeger16:31
*** jpena is now known as jpena|off16:34
*** dviroel|lunch is now known as dviroel16:35
corvusoh i think i see the problem.  will prepare a change16:37
opendevreviewJames E. Blair proposed opendev/system-config master: Correct OTLP TLS configuration in jaeger  https://review.opendev.org/c/opendev/system-config/+/85965016:41
corvusclarkb: fungi ^ what do you think about me manually applying that fix real quick and restarting?16:41
corvus(also gee willikers this thing has a lot of servers and ports)16:42
clarkbcorvus: I think tahts fine. I guess jaeger has multiple ways of collecting traces?16:43
corvusyes so many ways16:43
corvusit's a merger of several projects16:43
corvuszuul exports using opentelemetry (otlp) which is the new hot universal standard16:43
corvusokay, i manually applied/restarted with that.  now just waiting for a new buildset to start.16:44
corvus"cool".  that stopped the errors in the log.  but i still don't see any data. (and i do still see tcp traffic)16:48
clarkbthe arm64 rocky 9 image update is deploying now16:48
corvusthere seem to be some broken pipe tcp close errors in the log.  not sure what that signifies.16:50
corvus(ie, not sure if it's important)16:50
corvusi'm going to write a simple python script to emit a dummy trace and run it from a zuul server16:52
corvusah, the next problem is that the internal zk-ca cert that i made was for tracing01.opendev.org but i told zuul to connect to tracing.opendev.org17:02
opendevreviewJames E. Blair proposed opendev/system-config master: Correct internal tracing server cert name  https://review.opendev.org/c/opendev/system-config/+/85965417:05
corvusclarkb: fungi ^ can you review that and parent?  i think we'll want to let the deployment job take care of that before we try again.17:06
clarkbdone17:10
clarkbI've been tailing the launcher log on nl04 since the config update and there are no tracebacks. I think that side of things is happy.17:11
clarkbthe new image has been building for about 20 minutes, still to early to say if it is happy17:12
fungisorry, had to pop out for an errand, back and reviewing17:38
opendevreviewJames E. Blair proposed opendev/system-config master: Correct internal tracing server cert name  https://review.opendev.org/c/opendev/system-config/+/85965417:39
corvustesting caught an issue in the role ^17:39
fungilgtm17:39
fungiboth lgtm17:40
opendevreviewMerged opendev/system-config master: Correct OTLP TLS configuration in jaeger  https://review.opendev.org/c/opendev/system-config/+/85965018:13
opendevreviewMerged opendev/system-config master: Correct internal tracing server cert name  https://review.opendev.org/c/opendev/system-config/+/85965418:29
corvusi restarted jaeger19:37
*** rlandy is now known as rlandy|biab19:47
*** dviroel is now known as dviroel|walk19:54
fungi858372 merged for zuul and the promote succeeded, so if we want to pull new images on the schedulers and restart them, we could do that now19:58
corvusnah, not worth it yet; we should be getting at least one trace now, but it's still not working19:59
fungiah, the comment about span emitting merging before the last restart was you saying that we didn't need to bother19:59
fungigot it. thanks!19:59
corvusi turned on debug logging in the collector, and it says it's writing spans to storage...20:25
corvusso apparently the error at this point is internal to jaeger?20:25
corvus{"level":"debug","ts":1664396701.7566938,"caller":"app/span_processor.go:164","msg":"Span written to the storage by the collector","trace-id":"214600fcab74aecbda1d019c4f0a4c69","span-id":"0a82c64ff47f95c8"}20:25
*** rlandy|biab is now known as rlandy20:40
corvusthere's a metrics endpoint on the admin server: jaeger_spans_received_total{debug="false",format="proto",svc="zuul",transport="grpc"} 10820:40
*** dviroel|walk is now known as dviroel20:44
clarkbfungi: re mm3 https://etherpad.opendev.org/p/O6e6Quoe_jKivcj-vEXN I could send something like that make sense to send to the mailman users list or perhaps directly to maxking?20:45
clarkbcorvus: could it be a permissions thing to view spans? basically they are all there but we can't see them because we don't have sufficeint access?20:46
corvusi don't think there's any internal access controls; it's sort of like it's not reading them from the storage backend20:49
corvusi do see traces in strings on the on-disk storage20:51
corvusokay i just restarted it using the in-memory db and it's working, so there's definitely something about the badger storage that's wrong21:18
clarkbinteresting21:19
corvusaha, it's the ttl i set21:19
corvusapparently 30d breaks it21:19
corvusi'm trying it with 720h now21:20
corvusthat looks like it might be working21:20
opendevreviewJames E. Blair proposed opendev/system-config master: Fix jaeger badger config and uid  https://review.opendev.org/c/opendev/system-config/+/85972921:23
corvusclarkb: fungi ^ that syncs the repo with what i have done on disk21:24
corvushttps://tracing.opendev.org/trace/73f394c92d9fbe1943178eabfa6da6b1 exists21:26
corvuslol a project name would be a good tag to add :)21:26
clarkbcorvus: and maybe the event id so that we can cross correlate with logs if necessary21:29
clarkboh wait it is there nevermind21:29
clarkb:)21:29
clarkbI have to expand the listing of tags to see it21:30
corvusyeah; and you can search for it like: https://tracing.opendev.org/search?end=1664400672875000&limit=20&lookback=1h&maxDuration&minDuration&service=zuul&start=1664397072875000&tags=%7B%22zuul_event_id%22%3A%2226e28d1c1d234ffd887d95c86d47457d%22%7D21:31
*** dasm is now known as dasm|off21:31
corvus(by using the "tags" field in the search box)21:32
*** dviroel is now known as dviroel|out21:37
fungiclarkb: very minor typo corrected on your draft e-mail to the mailman list, but lgtm. thanks!21:41
clarkbfungi: do you think it is better to use the public mailing list or try to find email for the maintainer (I'm assuming its in git repos)21:42
fungii don't see why the public ml would be a problem21:45
fungiafter all, it's a mailing list for a project making mailing list software. you'd think if anyone would be comfortable discussing things on a public ml, it's them21:45
clarkbI signed up and got moderated anyway21:58
clarkbBut email is sent, hopefully it will get forwarded to subscribers soon enough21:58
fungithere's probably a feature to moderate first-time posters or a waiting period after account creation22:00
clarkbhttps://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/FVZW5DQJ7C3TW4LPIIU7ARI7XMVJYYWX/22:16
opendevreviewMerged opendev/system-config master: Fix jaeger badger config and uid  https://review.opendev.org/c/opendev/system-config/+/85972922:38
clarkbthe rocky 9 arm64 image did end up going ready eventually22:46
clarkbso I think that is done until mnasiadka gives it a go22:46
fungiexcellent22:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!