Friday, 2023-04-28

opendevreviewMerged opendev/system-config master: launch: fix RAX rdns command-line tool  https://review.opendev.org/c/opendev/system-config/+/88078500:00
opendevreviewMerged opendev/system-config master: reprepro doc: mention contents.cache.db  https://review.opendev.org/c/opendev/system-config/+/85753000:32
opendevreviewMerged opendev/system-config master: doc/nodepool: update vhd-util docs  https://review.opendev.org/c/opendev/system-config/+/86962300:33
opendevreviewMerged opendev/system-config master: grafana: pull the grafyaml image before running  https://review.opendev.org/c/opendev/system-config/+/85206700:57
opendevreviewMerged opendev/system-config master: logrotate: don't use filename to generate config file  https://review.opendev.org/c/opendev/system-config/+/87348100:57
ianwhttps://static.opendev.org/project/tarballs.opendev.org/openstack/openstack-zuul-jobs/openafs/centos9-stream/RPMS/x86_64/ still doesn't have the 1.8.9 rpm files i expected it to have01:48
ianwls -c1 /afs/openstack.org/project/tarballs.opendev.org/openstack/openstack-zuul-jobs/openafs/centos9-stream/RPMS/x86_64  | wc -l01:52
ianw4301:52
ianwls -c1 /afs/.openstack.org/project/tarballs.opendev.org/openstack/openstack-zuul-jobs/openafs/centos9-stream/RPMS/x86_64  | wc -l01:52
ianw6301:52
opendevreviewMerged opendev/zone-opendev.org master: Remove old nameservers  https://review.opendev.org/c/opendev/zone-opendev.org/+/88070901:53
opendevreviewMerged opendev/zone-zuul-ci.org master: Remove old nameservers  https://review.opendev.org/c/opendev/zone-zuul-ci.org/+/88091001:53
ianwUpdating existing ro volume 536871090 on afs02.dfw.openstack.org ...01:56
ianwStarting ForwardMulti from 536871090 to 536871090 on afs02.dfw.openstack.org (as of Sun Apr 23 02:22:36 2023).01:56
ianwi'm running it again.  maybe it only synced to when the volume was locked previously?01:56
opendevreviewIan Wienand proposed opendev/system-config master: Remove old DNS servers  https://review.opendev.org/c/opendev/system-config/+/88071002:10
fungiianw: yes, normally the first new run redoes the previous failed serial, then the next catches it up02:26
ianwi don't think i realised that tidbit.  anyway, it's released (again) and looks in sync now05:06
ianwi'll drop the root shell and emergency etc.05:06
opendevreviewIan Wienand proposed opendev/system-config master: haproxy-statsd: add something at startup  https://review.opendev.org/c/opendev/system-config/+/88179406:08
opendevreviewMerged opendev/system-config master: openafs-client: get logs better  https://review.opendev.org/c/opendev/system-config/+/88152806:20
*** gthiemon1e is now known as gthiemonge07:12
opendevreviewLuis Tomas Bolivar proposed openstack/project-config master: Enable pypi and github clone jobs for ovn-bgp-agent  https://review.opendev.org/c/openstack/project-config/+/88180007:35
opendevreviewyatin proposed openstack/project-config master: Temporary disable vexxhost-ca-ymq-1 provider  https://review.opendev.org/c/openstack/project-config/+/88181010:48
opendevreviewIan Wienand proposed opendev/system-config master: haproxy-statsd: add something at startup  https://review.opendev.org/c/opendev/system-config/+/88179410:54
ianw^^ some discussion in #openstack-tc about this, but it looks quite weird11:03
ianwthe mirror host is up and nothing seems out of order in terms of processes, etc.11:03
ianwhttps://4c92e32258abec426e9c-a1b3a735e9c0af824100c02f6885a5ce.ssl.cf1.rackcdn.com/881798/1/check/neutron-tempest-plugin-ovn/266a77a/job-output.txt11:03
ianwis a log.  it seems like the mirror goes between being completely accessible to not, then back again11:04
ianwipv4 & ipv6 addresses appear, so it doesn't seem to be one or the other11:05
ianwnot seeing anything in vexxhost status11:06
ianwhttps://04313aea8028a6223239-76770e8376bdcb5a12e0ef605f8b8d22.ssl.cf5.rackcdn.com/881742/3/check/neutron-tempest-plugin-designate-scenario/6f701ec/job-output.txt is another one11:14
ianwalmost the same11:15
ianw2023-04-28 10:58:19.131915 | controller | Hit:8 https://mirror.ca-ymq-1.vexxhost.opendev.org/ubuntu focal-security Release11:16
ianw2023-04-28 10:58:51.169261 | controller |   Could not connect to mirror.ca-ymq-1.vexxhost.opendev.org:443 (2604:e100:1:0:f816:3eff:fe0c:e2c0). - connect (113: No route to host) Could not connect to mirror.ca-ymq-1.vexxhost.opendev.org:443 (199.204.45.149), connection timed out11:17
ianwit gets a package list, then seconds later can't connect11:17
ralonsohianw, hi, thanks for checking that. guilhermesp_____ (not in this channel) was checking that11:19
ralonsohbut we didn't receive any feedback yet11:19
ianwi would suspect the job, but nothing really happens between the "apt-get update" and the install of the packages; and the node doesn't drop off the network11:20
ianwmnaser: ^ any thoughts?!11:21
gthiemongeianw: ralonsoh: the Octavia gates are also blocked by this issue11:21
ralonsohright11:22
ianwhttps://zuul.openstack.org/build/db2df31a57594568a23aab551586023f is an octavia job doing the same11:23
ianwyeah, was just looking through the failure list for something ! neutron11:24
ralonsohyes, octavia is using the same virt nested nodes11:24
ykarelianw, from quite sometime compute nodes are being upgraded in vexxhost-ca-ymq-1 to fix a nested-virt issue https://bugs.launchpad.net/neutron/+bug/1999249/comments/311:28
ykareland the nodes that are impacted by mirror issue also matches those to-be-upgraded(Bad Nodes ^) host list11:29
ykarelso seems some fix(which is available in other nodes) is missed during the upgrade11:29
ykareljust assuming based on the available data, infra guys may have more details about the history here11:31
ianw... that bug seems to refer to things randomly hanging, which sounds more plausible for nested virt issues than networking somehow stopping at a first glance11:34
ianwbut i don't know.  i tried logging into a running ca-ymq-1 nested virt node and it was pinging the mirror fine11:35
ianwand as noted the mirror node itself doesn't seem unhappy with anything11:36
ianwi think we probably need vexxhost to look behind what we can see11:36
ykarelianw, the node you logged in was booted on one of the compute node listed above?11:38
ianwyes, just a random one11:40
ykareldo you have the host_id for that node?11:42
ianwi've just checked a few running ones.  some of them are clearly doing devstack and are past the point of installing things11:42
ykarelit's possible they are good ones then, as bad ones fails during devstack setup11:43
ianwone is installing stuff now11:45
ianwthough it has no ipv611:45
opendevreviewMerged openstack/project-config master: Enable pypi and github clone jobs for ovn-bgp-agent  https://review.opendev.org/c/openstack/project-config/+/88180012:02
opendevreviewMerged openstack/project-config master: Indent Gerrit ACL options  https://review.opendev.org/c/openstack/project-config/+/87990612:02
opendevreviewMerged openstack/project-config master: tools/normalize_acl.py: Add some human readable output  https://review.opendev.org/c/openstack/project-config/+/88089812:02
opendevreviewyatin proposed openstack/project-config master: Temporary disable nested-virt labels in vexxhost-ca-ymq-1  https://review.opendev.org/c/openstack/project-config/+/88181012:12
dpawlikhello folks o/  dansmith, fungi: I did not catch that you reply me on 17.04.2023 for the question related to the performance.json file12:22
dpawlikI can skip that file, when the value is wrong but...12:22
dpawlik as I mentioned, this problem is just in project: x/networking-opencontrail12:23
dpawlikthere is some periodic pipeline related to that project12:23
dpawlikis it still used or it can be disabled?12:23
dpawlikhttps://review.opendev.org/c/x/networking-opencontrail/+/88182012:27
dpawlikah, this project got some cores. Ignore review for that PS.12:29
fungiianw: when looking into this previously, i also confirmed logs on both the mirror and collected from the test node don't indicate stray routes being temporarily added or removed either12:31
fungiand someone suggested this coincided with the switch from focal to jammy? so could be kernel-related i guess12:32
*** amoralej is now known as amoralej|lunch12:47
opendevreviewMerged openstack/project-config master: Temporary disable nested-virt labels in vexxhost-ca-ymq-1  https://review.opendev.org/c/openstack/project-config/+/88181013:05
dpawlikfungi: do you know if the core reviewers can push the code directly to the repo?  Disabling the periodic jobs https://review.opendev.org/c/x/networking-opencontrail/+/881820 requires to fix many things...13:05
dpawlikor set voting: false to all jobs..13:06
fungidpawlik: disabling broken jobs seems reasonable to me, but that's probably up to the maintainers of that project if they're still active. if they're not, the opendev sysadmins might consider removing their job configuration entirely in order to stop them wasting resources13:12
fungiit's not something we've done in the past, so we'd need to talk through what sort of precedent that sets and what we want our policy for that sort of thing to be going forward13:13
fungiinfra-root: ^ opinions on that are welcome13:13
funginote that merging a change which replaces all their project-pipelines with just check and gate runs of the built-in noop job should be mergeable normally by zuul, if it's not then it's because we've also got jobs added via the project-config repo which will need to be removed first13:16
Clark[m]I think there are two separate concerns here. The first is dpawlik's where the project generates files that can't be indexes properly. The indexer should either learn to ignore them or figure out how to index the files in some way. Then on the OpenDev side it is what do we do about dead projects running jobs/having broken jobs. In the case of x/windmill repos we simply removed them from the zuul projects list. I don't think we need to do13:20
Clark[m]anything more in depth than that.13:20
dpawlik"The indexer should either learn to ignore them or figure out how to index the files in some way" - yeah, will do a patch for that to avoid such issues, but from the other side, if project is dead or no activity for one year, such periodic jobs that for sure would fail does not make sense.13:32
dpawlikthank you folks for reply, will do a patch13:32
Clark[m]dpawlik in the old system we had a way to exclude specific jobs. Usually this was necessary because the jobs would create massive log files that we couldn't index in a reasonable amount of time13:41
Clark[m]Sometimes we did it because the log format was broken13:41
*** amoralej|lunch is now known as amoralej13:44
fungiyeah, removal from tenants seems like the best approach if there's an abandoned project still running jobs and wasting resources13:54
fungii agree that's even simpler than the noop job change approach13:54
fungiand avoids tampering with the content of the repository itself13:55
*** dviroel_ is now known as dviroel14:30
clarkband it is easy to add it back in should people show up looking to make things work15:06
clarkbslaweq: ykarel fungi ianw I feel like the vexxhost nested virt stuff is almost certainly going to be a cloud issue because no other regions experience this and the mirror node reports all is well15:07
clarkband there is a high probability that it is a neutron issue in the cloud :) now we just need to make dogfooding work15:08
fungiyes15:09
clarkbfungi: have time for https://review.opendev.org/c/opendev/system-config/+/881682 ? to fix gerrit theme on 3.8?15:11
fungiguilhermesp: mnaser: the neutron folks have been observing random internal routing issues between some test nodes in ca-ymq-1 and our mirror server in the same network there, like packets intermittently not making it between hosts there (and often very early in jobs when there's not much besides package downloads going on). they put together a list of the host_id hashes seen for nodes that15:11
fungiexhibited the problem: https://paste.opendev.org/show/bCbxIrXR1P01q4JYUh3i/15:11
clarkbfungi: did we end up deciding whether or not we want ipv6 glue records for opendev nameservers?15:15
fungii don't think they're critical, do you know if we had them before we switched?15:16
clarkbI don't know15:16
fungibut also we didn't ever get reverse dns working for the ns04 v6 addy15:17
clarkbah15:17
clarkbfungi: re gite links in gerrit 3.8 I think the thing that changed was an internal interface that updated affecting plugins used to set web links. We don't use a plugin for that and they would have updated the internal interfaces for gitweb stuff we do use15:18
clarkbWe should still test it but I think this one is a noop for us15:18
fungioh, that helps15:18
fungiit wasn't clear to me that what we're doing doesn't use a polygerrit plugin15:19
clarkbbasically the gerrit official stuff appears to have been updated even if it was in a plugin. But then there may be third party plugins that do similar which they warn about15:19
fungigot it15:20
fungiso shouldn't impact our configuration15:20
clarkbya I don't think so. But we should definitely check it since something may have been missed15:21
*** amoralej is now known as amoralej|off15:48
opendevreviewMerged opendev/system-config master: gerrit: update OpenDev theme CSS installation  https://review.opendev.org/c/opendev/system-config/+/88168216:42
opendevreviewMerged opendev/base-jobs master: Run ensure-quay-repo in our base container jobs  https://review.opendev.org/c/opendev/base-jobs/+/88152216:45
dansmithis something up with the gate or is it just busy? >100 things in the queue is a lot for a friday16:57
clarkbI see 10 in the gate17:02
dansmithsorry I don't mean the gate queue specifically17:02
clarkbpicking random jobs that are in progress their console logs seem to show they are progressing with recent timestamps17:03
dansmiththere are some things that have been in there for >4hr and have "paused" jobs.. I dunno what that means17:03
clarkblooks like starglingx also just pushed a bunch of changes17:03
dansmithI just noticed because I submitted a few things and even after 45m no jobs have even started17:03
dansmithI just noticed because I submitted a few things and even after 45m no jobs have even started17:03
clarkbI think this is mostly demand and a large number of changes from starlingx showing up all at once17:03
clarkbZuul allows you to pause execution of a job so that it can provide resources to other jobs. This is commonly used to host container images (which is what the tripleo exampls you see are doing)17:04
clarkbthe content-provider jobs build and host container images then a bunch of the other jobs fetch and run those containers17:04
dansmithokay I just hadn't seen that before17:05
fungiyou can see the node requests are climbing too https://grafana.opendev.org/d/21a6e53ea4/zuul-status?orgId=117:06
clarkbrax-iad has a backlog of deleting nodes, but that is something we've been fighting for a while now so not new17:06
fungiwe've been maxxed out on available capacity since the past ~4 hours17:06
dansmithyeah, just very different suddenly from yesterday and somewhat unusual for a friday so I thought maybe we had another thing that was making everything fail17:07
dansmithokay the top nova job just started running tests, 88 minutes after enqueue.. whew.17:29
*** JayF is now known as Guest1244418:27
*** JasonF is now known as JayF18:27
opendevreviewMarcos Paulo Oliveira Silva proposed openstack/project-config master: Add Node Feature Discovery FluxCD app to StarlingX  https://review.opendev.org/c/openstack/project-config/+/88188318:41
opendevreviewMarcos Paulo Oliveira Silva proposed openstack/project-config master: Add Node Feature Discovery FluxCD app to StarlingX  https://review.opendev.org/c/openstack/project-config/+/88188318:43
opendevreviewMarcos Paulo Oliveira Silva proposed openstack/project-config master: Add Node Feature Discovery FluxCD app to StarlingX  https://review.opendev.org/c/openstack/project-config/+/88188319:40
opendevreviewMarcos Paulo Oliveira Silva proposed openstack/project-config master: Add Node Feature Discovery FluxCD app to StarlingX  https://review.opendev.org/c/openstack/project-config/+/88188319:47
*** dmellado4 is now known as dmellado20:22
fungiwe caught back up on node requests a few hours ago, btw21:21
fungiaround 18z21:22
opendevreviewMarcos Paulo Oliveira Silva proposed openstack/project-config master: Add Node Feature Discovery FluxCD app to StarlingX  https://review.opendev.org/c/openstack/project-config/+/88188321:48
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Skip quay repo creation if necessary info is missing  https://review.opendev.org/c/zuul/zuul-jobs/+/88189322:09
opendevreviewMerged zuul/zuul-jobs master: Skip quay repo creation if necessary info is missing  https://review.opendev.org/c/zuul/zuul-jobs/+/88189322:22

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!