Wednesday, 2023-09-06

mnasiadkamorning07:47
mnasiadkahttps://meetings.opendev.org/#Containers_Team_Meeting - where can I change the url it's pointing to? Magnum team has moved to using "magnum" as the meeting name, so that link is pointing to very old archives07:47
mnasiadka(I mean url for meeting logs)07:47
fricklermnasiadka: https://opendev.org/opendev/irc-meetings/src/branch/master/meetings/containers-team-meeting.yaml07:50
mnasiadkafrickler: thanks :)07:53
jrossercould i get a hold on job openstack-ansible-deploy-aio_capi-ubuntu-jammy for change 89324009:49
fricklerjrosser: I can do that in a moment10:09
fricklercorvus: can you have a look at https://zuul.opendev.org/t/openstack/build/52f01074b2eb487993ede049d858a660 please? nova certainly does have a master branch, doesn't it?10:10
fricklernote that this is for stable/pike and I'm deleting the whole .zuul.yaml there now anyway, I'm just not sure how to interpret that error10:11
fricklerjrosser: done, recheck triggered10:14
fricklerelodilles: infra-root: I've done https://review.opendev.org/c/openstack/blazar/+/893846 now, but (of course) now with no job running at all it also cannot be merged. would you prefer adding a noop job like we do for newly created projects or just force merge?10:37
elodillesfrickler: if noop does the trick, then it's OK to me10:51
jrosserfrickler: thankyou10:54
opendevreviewAlex Kavanagh proposed openstack/project-config master: Add charms-purestorage group for purestorage charms  https://review.opendev.org/c/openstack/project-config/+/89337711:36
fungiwe'll probably be able to take advantage of this by the time we get mailman auth wired up to our keycloak server: https://github.com/pennersr/django-allauth/commit/ab70468 (that's just merged to the auth lib we're using)11:51
*** dviroel_ is now known as dviroel12:27
*** dhill is now known as Guest202512:32
*** d34dh0r5- is now known as d34dh0r5312:33
opendevreviewAlex Kavanagh proposed openstack/project-config master: Add the Pure Storage Flashblade charm-manila subordinate  https://review.opendev.org/c/openstack/project-config/+/89391013:08
opendevreviewAlex Kavanagh proposed openstack/project-config master: Add the Pure Storage Flashblade charm-manila subordinate  https://review.opendev.org/c/openstack/project-config/+/89391013:08
opendevreviewAlex Kavanagh proposed openstack/project-config master: Add the Pure Storage Flashblade charm-manila subordinate  https://review.opendev.org/c/openstack/project-config/+/89391013:13
opendevreviewAlex Kavanagh proposed openstack/project-config master: Switch charm-cinder-purestorage acl  https://review.opendev.org/c/openstack/project-config/+/89391213:16
opendevreviewAlex Kavanagh proposed openstack/project-config master: Switch charm-cinder-purestorage acl  https://review.opendev.org/c/openstack/project-config/+/89391213:16
opendevreviewAlex Kavanagh proposed openstack/project-config master: Add the Pure Storage Flashblade charm-manila subordinate  https://review.opendev.org/c/openstack/project-config/+/89391013:22
opendevreviewHarry Kominos proposed openstack/diskimage-builder master: feat: Add new fail2ban elemenent  https://review.opendev.org/c/openstack/diskimage-builder/+/89254113:25
fungicorvus: whenever you have a moment, no urgency at all, but this is an example of a leaked image in iad from yesterday. oddly, i can't seem to find a mention of the image name in either of our builders: https://paste.opendev.org/show/bIzGJD8QKpW13Dqg4LpH/13:27
fungithere's a list of 24 leaked image uuids in ~fungi/iad.leakedimages on bridge13:29
fungiin case you need more examples13:29
fungioh, right, because the image names the builders log are in the new build id suffix format rather than the serial suffix that gets uploaded13:35
fungii'm not sure how to reverse the serial suffix name to a build id in order to find the exact upload attempts13:35
fungisince the upload gives up before it gets a uuid for the image13:37
fungiand the builder no longer mentions the serial at all13:37
corvusfrickler: bug in zuul.  fix in  https://review.opendev.org/c/zuul/zuul/+/89392514:02
fricklercorvus: ah, cool, thx14:05
fricklermeh, I fixed blazar, now zuul complains about blazar-dashboard https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/891628 . corvus is there a chance zuul could report all possibly affected projects at once?14:06
opendevreviewMerged openstack/project-config master: Reduce frequency of image rebuilds  https://review.opendev.org/c/openstack/project-config/+/89358814:09
opendevreviewBernhard Berg proposed zuul/zuul-jobs master: prepare-workspace-git: Add ability to define synced pojects  https://review.opendev.org/c/zuul/zuul-jobs/+/88791714:10
corvusfrickler: that could get very large; it only reports the first to avoid leaving many gb messages.  normally i'd suggest looking at codesearch, but i guess these are on unindexed branches?14:11
opendevreviewMerged openstack/project-config master: Add charms-purestorage group for purestorage charms  https://review.opendev.org/c/openstack/project-config/+/89337714:17
corvusfungi: well, that sure doesn't look like it has nodepool metadata attached to it.  also, neither does any other image in rackspace.14:31
corvusoh wait i take that back14:31
corvusi see  nodepool_build_id='5fefa50bae64483b9e6f8b6a21664758', nodepool_provider_name='rax-dfw', nodepool_upload_id='0000000001' in one of the other images14:32
corvusbut i don't see those in the paste14:32
corvusso it does look like they were not added to the leaked image14:32
fungii expect that metadata gets attached to images after they're imported, so if the sdk gives up waiting for the image to show up then it never gets around to attaching the metadata to it? but i'm really not familiar enough with the business logic in the sdk to know for sure. it could also just be some sort of internal timeout between services inside rackspace responsible for handing off the14:36
fungimetadata, i suppose14:36
fungithanks for looking, corvus! i wonder if we can more conveniently script something that just checks our tenant for private images lacking nodepool metadata and clean those up directly, rather than comparing uuids between lists from zk and glance14:38
fungilooks like `openstack image list` allows filtering by property (key=value), so might be able to fetch a list of them that way14:40
corvusfungi: https://opendev.org/openstack/openstacksdk/src/branch/master/openstack/image/v2/_proxy.py#L75514:42
corvusyes, it looks like when using the v2 task api, the sdk attaches image metadata after the image is sucessfully imported.14:42
corvusi think that's the api that is relevant here?14:42
fungii agree14:43
fungii guess a longer timeout in the image create call would counter that to some extent14:43
corvusdo you know where the task api is documented?14:43
fungihah14:43
fungisorry, it's a running joke14:43
fungithe task api hails from the bad old days of "vendor extensible apis" so it can be/do just about anything the provider wants14:44
corvusso i guess there would be "rackspace task api documentation" then?14:44
fungias i understand it, yes14:45
corvusi'm wondering if it would be valid to include the metadata in https://opendev.org/openstack/openstacksdk/src/branch/master/openstack/image/v2/_proxy.py#L73214:45
corvusalternatively, i wonder if glance_task.result['image_id'] is available and valid in the case of an exception, and could be used to attach metadata in the exception handler14:46
fungii think this might be the current api doc for tasks, but of course rackspace is also running something ancient which may not be entirely glance either: https://docs.openstack.org/api-ref/image/v2/index.html#tasks14:47
fungii've observed test uploads to their glance, and the basic process is that the image create call returns an "import" task id (uuid) then the task is inspected and, once it completes, its data includes the uuid of the image that was created14:48
corvushttps://docs-ospc.rackspace.com/cloud-images/v2/api-reference/image-task-operations#task-to-import-image14:49
corvuswarning:: Name is the only property that can be included in image-properties. Including any other property will cause the operation to fail.14:49
corvusso that's a no on idea #1 14:49
corvusfungi: yeah, but since it's leaving an image object around, perhaps it includes the image id in the failure case too14:50
corvusso idea #2 might work if that's the case14:50
fungiwell, we're not getting task failures. the sdk is just giving up waiting for the image to be ready14:50
fungionce the image is ready, the task object does include the uuid of the image14:50
fungiregardless of whether the sdk is still looking for it at that point14:51
fungiit's just that it might be 6 hours after the image was uploaded14:51
corvusoh14:51
corvusit looks like the timeout comes from the create_image call, so i think we can pass that in from nodepool14:52
fungiyes, we can. frickler made a temporary patch that hard-coded a nondefault timeout value in the builder's invocation of the sdk method14:52
fungiand afterward you indicated that adding a config option in nodepool for that would be acceptable if we wanted it14:53
fungiwe were just pursuing other avenues first before determining whether it was necessary14:53
corvuswe used to have a 6 hour timeout, but apparently that disappeared14:54
corvusprobably lost in one of the sdk refactors14:55
corvushttps://opendev.org/zuul/nodepool/src/branch/master/nodepool/builder.py#L4314:55
corvusbut there's the constant; unused.14:55
fungioh wow14:56
fungilooks like it was last used in shapshot image updates, removed by https://review.openstack.org/396719 in 2016?14:58
fungiand removed from upload waiting by https://review.openstack.org/226751 (presumably similarly set in shade back then)15:00
fungithough i'm not immediately finding it in shade's history if so15:01
fungiso anyway, i take this to mean we haven't had that 6-hour timeout in production for about 7 years15:02
fungiwasn't a recent change15:02
corvusremote:   https://review.opendev.org/c/zuul/nodepool/+/893933 Add an image upload timeout to the openstack driver [NEW]        15:05
fungithanks!15:05
fungiclarkb: does https://zuul.opendev.org/t/openstack/build/cfe503710fa543618da6bf513c7576ba mean that opensuse dropped the libvirt-python package? i see a python3-libvirt-python in /afs/openstack.org/mirror/opensuse/distribution/leap/15.2/repo/oss/INDEX.gz but i have no idea if i'm looking in the right place15:07
clarkbfungi: https://software.opensuse.org/package/libvirt-python "there is no official package for ALL distributions"15:14
clarkblooking at https://software.opensuse.org/search?baseproject=ALL&q=libvirt-python I think you may need different package names for different suse releaes15:15
clarkbI would drop it from the bindep test15:15
opendevreviewJeremy Stanley proposed openstack/project-config master: Drop libvirt-python from suse in bindep fallback  https://review.opendev.org/c/openstack/project-config/+/89393515:19
fungiclarkb: ^ thanks!15:19
fungiapparently, git-review takes roughly 5 minutes to give up trying to reach gerrit over ipv615:20
clarkbI think that will be based on your system tcp settings?15:21
clarkbzanata didn't rotate mysql sessions overnight and the error still occurs. I see there is a wildfly systemd unit running on the server whihc I will restart now to see if that changes things15:22
clarkbianychoi[m]: the DB sql_mode update to the 5.6 default plus a restart of the zanata service appears to have your api request working15:24
corvusi'm going to restart the schedulers now to get the default branch bugfix15:28
fungithanks corvus!15:28
fungiclarkb: yeah, i know i can tweak the protocol fallback in the kernel, just hoping whatever the v6 routing issue is between here and there clears up soon15:29
corvusi'm restarting web too; not necessary, but just to keep schedulers ~= web15:30
fungimy outbound traceroute to the server transits eqix-ash.bdr01.12100sunrisevalleydr01.iad.beanfield.com and gets as far as 2607:f0c8:2:1::e (no reverse dns but whois puts that somewhere in beanfield as well), but the return route to me from the gerrit server seems to never escape the vexxhost border (bouncing back and forth between 2604:e100:1::1 and 2604:e100:1::2)15:32
fungiguilhermesp_____: mnaser: ^ any idea if there's a routing table issue in ca-ymq-1 which could explain that? can you see where packets for 2600:6c60:5300:375:96de:80ff:feec:f9e7 are trying to go?15:34
fungiit's been like this for at least a week, so doesn't seem to be clearing up on its own15:34
fungimaybe my isp's announcements are being filtered at the edge or by a peer15:35
fungii haven't had trouble reaching other things over ipv6 though, i can get to servers in vexxhost's sjc1 region over ipv6 with no problem, for example15:36
corvus#status log restarted zuul schedulers/web to pick up default branch bugfix15:37
opendevstatuscorvus: finished logging15:37
corvusfungi: who needs to review https://review.opendev.org/893792 (openstack-zuul-jobs) ?15:38
fungicorvus: i can, meant to look earlier, thanks!15:38
corvusoh cool, thanks :)15:38
clarkbhttps://23.253.22.116/ <- is a held gerrit running in a bookworm container with java 1715:40
corvuswe might still see some default branch errors since those values are cached.  they should clear as config changes are merged and the cache is updated; but we may still want to do a zuul-admin delete-state this weekend just to make sure all traces are gone.  or if it becomes a real problem, we can do that earlier, but that's an outage, so i'd like to avoid it.15:40
clarkbGerrit + bookworm + java 17 seems to generally work. I think we can plan for a short gerrit outage to restart on that container if others agree15:42
fungiseems fine, i suppose we could time that to coincide with the zuul restart?15:42
opendevreviewMerged openstack/project-config master: Drop libvirt-python from suse in bindep fallback  https://review.opendev.org/c/openstack/project-config/+/89393515:44
clarkbthe zuul restart is done?15:45
fungiclarkb: the full down delete-state restart15:48
funginot the rolling restart15:48
clarkbah, I think that restart depends on gerrit being up, but yes we could do gerrit first and then zuul15:49
fricklercorvus: those issue are all on stable/pike it seems, and I can understand the concern about error size, ack. so I'll work my way through them one by one and hope that'll be a finite task15:52
clarkbfrickler: might be able to do a git grep for the negative lookahead regex and catch a lot of them upfront?15:54
fricklerclarkb: that's not about regexes, but about removing some old project templates https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/89162815:56
fricklerand I don't have most of those repos locally15:57
clarkbah15:57
fricklerunless now, when I need to create a patch on them15:57
corvusanother option to consider would be to force-merge the removal, then cleanup based on the new static config errors.  you'd have to be pretty sure that isn't going to break branches you care about, because it will definitely break project configs on affected branches.  not advocating it, just brainstorming.  :)15:59
fricklerafaict all of those branches are only waiting to be eoled, because they predate release automation. so that's a good idea. let me try to do two more projects manually and if more show up then, we can take that path16:03
fricklerlikely noone would care if they stay broken until they're eoled even16:03
fricklerI have the silent fear that blazar might not be the single odd project, but just rather early in the alphabetic list of more to come16:04
opendevreviewClark Boylan proposed openstack/project-config master: Set fedora labels min-ready to 0  https://review.opendev.org/c/openstack/project-config/+/89396116:08
clarkbinfra-root ^ that one should be safe to merge now16:08
fricklerthat sounds like a good first step, ack16:09
opendevreviewClark Boylan proposed openstack/project-config master: Remove fedora-35 and fedora-36 from nodepool providers  https://review.opendev.org/c/openstack/project-config/+/89396316:17
opendevreviewClark Boylan proposed openstack/project-config master: Remove fedora image builds  https://review.opendev.org/c/openstack/project-config/+/89396416:17
clarkbThese two shoud land on monday with some oversight to ensure things clean up sufficiently16:17
opendevreviewMerged openstack/project-config master: Set fedora labels min-ready to 0  https://review.opendev.org/c/openstack/project-config/+/89396116:24
corvusthe next nodepool restarts should happen on bookworm-based images16:28
corvusso heads up for any unexpected behavior changes there16:29
fricklercorvus: do you want to restart now/soon or wait for the usual cycle?16:33
corvusi was assuming we just wanted the usual cycle; but i don't have a strong opinion16:42
fricklerif we see the probability of issues as low I'm fine with waiting, otherwise a restart where we can more closely watch might be better17:03
*** ykarel is now known as ykarel|away17:29
clarkbthe nodepool services have updated fwiw17:29
clarkblive disk images are huge these days.17:37
fungigoing to pop out for a late lunch, bbiab17:50
fricklerah, I was confused earlier, thinking that nodepool would be updated together with zuul on the weekly schedule. sorry if what I was saying didn't make much sense17:58
clarkbfrickler: ah ya nodepool updates ~hourly18:00
clarkbmy laptop exihits the same behavior under ubuntu jammy's desktop livecd which is linux 6.2 which definitely worked back when I ran that linux version. Unfortunately that makes me pretty confident I have a hardware issue18:00
guilhermesp_____fungi: huuuum i think to be able to filter then i would need to check which hv the vm lives.. .we did have a minor issue with core switches this week but it was on amsterdam region... nothing suspicious in mtl 18:19
*** ralonsoh is now known as ralonsoh_away18:55
fricklercorvus: zuul seems to be reporting the errors twice on https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/891628 each time, I didn't notice that earlier, but seems to have been happening since the first result19:09
frickleralso now some charm repo is broken.19:09
fricklerwill continue looking at that tomorrow19:09
corvusfrickler: likely due to being in two tenants19:13
corvus(or could just be 2 pipelines)19:14
fungiguilhermesp_____: i see the same problem to both 16acb0cb-ead1-43b2-8be7-ab4a310b4e0a (review02.opendev.org) and a064b2e3-f47c-4e70-911d-363c81ccff5e (mirror01.ca-ymq-1.vexxhost.opendev.org)19:15
fungiguilhermesp_____: from the cloud side it looks like a routing loop, but it's hard to tell with the limited amount of visibility i see19:16
corvusfrickler: it's because it's in check and check-arm6419:17
corvusclarkb: fungi maybe gerrit + zuul reboot saturday morning pst?20:04
fungii should be around, sgtm20:05
fungihappy to do some/all of it20:05
clarkbI can be around for that too20:14
corvusi wonder how long it's been since we actually had zuul offline.  it's been a pretty good run.  :)20:16
clarkbwe should probably merge the gerrit change on friday then since we won't auto restart on the newi mage20:17
opendevreviewMerged zuul/zuul-jobs master: Remove the nox-py27 job  https://review.opendev.org/c/zuul/zuul-jobs/+/89240820:46
fungithis discussion seems hauntingly familiar: https://discuss.python.org/t/3312221:50
guilhermesp_____fungi: hum ok... im going out for holidays tomorrow. Maybe if you could just open up a ticket for us to keep track and we can investigate that ( or someone else from my team tomorrow when im back ) -- just send an email to support@vexxhost.com21:58
fungiguilhermesp_____: sure, will do. thanks!22:32

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!