Friday, 2024-03-08

tkajinamo/ I wonder if we can move this forward ? https://review.opendev.org/c/openstack/project-config/+/90795406:21
tkajinama few of the remaining patches to drop puppet-qdr06:21
amoralejclarkb, yep, 18:43 run fixed centos8, thanks four your help!08:47
mnasiadkaStarted seeing timeouts on the caching proxy mainly in rax - 500 Server Error for http+docker://localhost/v1.44/images/create?tag=master-ubuntu-jammy&fromImage=mirror-int.iad.rax.opendev.org%3A4447%2Fopenstack.kolla%2Ffluentd: Internal Server Error (\"Get \"https://mirror-int.iad.rax.opendev.org:4447/v2/\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\")\\n'"08:57
mnasiadkainfra-root: can you have a look?08:57
*** tobias-urdin4 is now known as tobias-urdin10:23
*** ralonsoh__ is now known as ralonsoh12:25
*** carloss_ is now known as carloss13:15
fungi[2024-03-08 01:00:24.337] "GET /v2/openstack.kolla/rabbitmq/blobs/sha256:26862c3518e8de33fa5ada8cdc1795d1a60326464c12c899537b07d064e6bc40 HTTP/1.1" 500 141 - "-" "docker/25.0.4 go/go1.21.8 git-commit/061aa95 kernel/5.14.0-362.18.1.el9_3.0.1.x86_64 os/linux arch/amd64 UpstreamClient(docker-sdk-python/6.1.3)"13:44
fungithat's the most recent 500 response from mirror(-int).iad.rax port 444713:44
fungimnasiadka: what time was your example?13:44
opendevreviewMerged openstack/project-config master: Retire puppet-qdr: Remove Project from Infrastructure System  https://review.opendev.org/c/openstack/project-config/+/90795413:47
mnasiadkafungi: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f08/912030/5/check/kolla-ansible-ubuntu-upgrade/f08c3bd/primary/logs/ansible/genconfig14:03
mnasiadkafungi: we’ve seen that in kayobe as well I think, now on a mobile - but will find occurrences in an hour if needed14:04
fungimnasiadka: it looks like there is some sort of proxy webserver on localhost based on that log. could the 500 be coming from there rather than from the regional cache (mirror) server?14:07
fungido you have access or proxy logs for whatever that local webserver is?14:09
mnasiadkafungi: we use docker and set registry mirrors - no other webservers in place14:21
mnasiadkaSee https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f08/912030/5/check/kolla-ansible-ubuntu-upgrade/f08c3bd/primary/logs/system_configs/docker/daemon.json14:21
mnasiadkaWonder if that’s some quay.io timeout issue or something in rax14:22
fungimnasiadka: what's the "localhost" in the url then?14:23
mnasiadkaDocker API :-)14:24
fungican the "docker api" return a 500 error?14:25
mnasiadkaAnsible module using docker-py to talk to Docker API14:25
mnasiadkaOn pull timeout it seems, but maybe it’s some new Docker bug14:25
mnasiadkahttps://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f08/912030/5/check/kolla-ansible-ubuntu-upgrade/f08c3bd/primary/logs/system_logs/docker.txt - this is the docker log14:28
fungilooks like https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f08/912030/5/check/kolla-ansible-ubuntu-upgrade/f08c3bd/primary/logs/system_logs/docker.txt might be a clearer source of info14:28
fungihah, you beat me to it by moments ;)14:28
mnasiadkaWell, I know the log locations in Kolla woken in the middle of the night ;)14:28
fungii'm sort of slow looking at logs because this is also openstack weekly release meeting time14:29
opendevreviewTakashi Kajinami proposed openstack/project-config master: Retire puppet-sahara: Remove Project from Infrastructure System  https://review.opendev.org/c/openstack/project-config/+/91045514:54
fungimnasiadka: i guess this is about the point in the docker log that corresponds to the genconfig log error, though i don't immediately see that it complains about anything:14:57
fungiMar 08 08:49:47 primary dockerd[9535]: time="2024-03-08T08:49:47.389984156Z" level=debug msg="Trying to pull mirror-int.iad.rax.opendev.org:4447/openstack.kolla/fluentd from https://mirror-int.iad.rax.opendev.org:4447"14:57
fungiam i reading that right14:57
fungi?14:58
fungiaha, i had to scroll down more:15:00
fungiMar 08 08:50:02 primary dockerd[9535]: time="2024-03-08T08:50:02.391974305Z" level=warning msg="Error getting v2 registry: Get \"https://mirror-int.iad.rax.opendev.org:4447/v2/\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"15:00
fungiso looks like a 15-second timeout probably15:01
fungimnasiadka: correlating to logs on the mirror server, the last time i see that node's ip address (10.209.0.58) request anything with "fluentd" in the name was half an hour earlier:15:05
fungi10.209.0.58 - - [2024-03-08 08:17:29.054] "GET /v2/openstack.kolla/fluentd/blobs/sha256:9514c7d8dbfd8d9cd5508701931a94b2e3196186868f1a445a4a6d2fa232592d HTTP/1.1" 302 1915 - "-" "docker/25.0.4 go/go1.21.8 git-commit/061aa95 kernel/5.15.0-97-generic os/linux arch/amd64 UpstreamClient(docker-sdk-python/6.1.3)"15:05
Clark[m]Note the timeout you posted was just for the API root. No specific image in that path15:06
fungigood point15:06
fungi10.209.0.58 - - [2024-03-08 08:49:47.404] "GET /v2/ HTTP/1.1" 401 4 Response status 401 "-" "docker/25.0.4 go/go1.21.8 git-commit/061aa95 kernel/5.15.0-97-generic os/linux arch/amd64 UpstreamClient(docker-sdk-python/6.1.3)"15:07
fungithat seems to correspond15:07
fungiand was the last request the server saw from that client ip address15:07
Clark[m]I wonder if the token is expiring and docker client tooling isn't noticing so it sends the old token15:13
mnasiadkatoken? that repo is public15:41
fungimnasiadka: yes, but the docker protocol requires "authentication" for everything and then makes up tokens on the fly15:41
fungiwhich is why it's so painful to proxy15:42
fungievery request is treated as authenticated even if you're not actually using login credentials because the repo is public15:43
mnasiadkaoh boy15:43
fungiit is one of the worst designs for a file transfer protocol i've ever seen15:44
mnasiadkamight be that a proper pull through registry would be better, if that keeps reappearing - we (Kolla) would need to think about something15:45
Clark[m]I'm not sure we can say either way yet. If the problem is/was on the quay side a different tool is unlikely to help15:55
Clark[m]Quay's status page indicates no problems at least15:55
fungibut even just intermittent connectivity problems between rackspace and red hat could account for the observed behaviors15:56
Clark[m]Have any zuul jobs hit issues? Zuul and nodepool are hosted in quay16:00
funginot that i've noticed, but i haven't been watching any very closely today16:02
opendevreviewClark Boylan proposed zuul/zuul-jobs master: Drop CentOS 7 test jobs  https://review.opendev.org/c/zuul/zuul-jobs/+/91228016:23
opendevreviewClark Boylan proposed openstack/project-config master: Drop CentOS 7 wheel builds and bindep fallback check job  https://review.opendev.org/c/openstack/project-config/+/91228316:31
clarkbI've just pushed three new topic:drop-centos-7 changes which do supporting infrastructure cleanup. I believe all three should be mergable at this point in time16:37
clarkbthough maybe I got the order between project-config and ozj wrong /me tries to parse the error16:38
fungii think the project-config change just has to merge before the ozj change will pass16:38
fungidue to being a trusted config repo16:38
clarkboh yup, the order I've got is correct but the depends-on won't speculatively test it16:39
clarkbI got confused because I initially read the error as saying openstack/requirements was the problem but that was me reading the message incorrectly16:39
fungii still need to put together the keystone et al changes to stop using the devstack centos7 nodeset16:40
*** blarnath is now known as d34dh0r5318:09
clarkbfungi: any objectiosn to proceeding with https://review.opendev.org/c/openstack/project-config/+/912283 ?19:09
clarkbI'd like to get that in to make sure the ozj chagne is mergeable and merge that too if pssible19:09
fungiclarkb: none, i went ahead and approved it19:16
opendevreviewMerged openstack/project-config master: Drop CentOS 7 wheel builds and bindep fallback check job  https://review.opendev.org/c/openstack/project-config/+/91228319:23
fungihopefully the other change will pass now19:23
fungii've rechecked it19:24
Clark[m]Thanks having some early lunch19:38
fungiwe may want to remove x/monitorstack from the tenant config, it's got references to ansible-role-functional-centos-719:40
Clark[m]++19:41
fungilast commit was a homepage update 5 years ago19:41
fungii'll push up that change19:41
Clark[m]Thanks19:42
opendevreviewJeremy Stanley proposed openstack/project-config master: Drop gating for x/monitorstack  https://review.opendev.org/c/openstack/project-config/+/91230419:47
fungilooks like it's just the tenant config entry that needs to go19:47
opendevreviewMerged openstack/project-config master: Drop gating for x/monitorstack  https://review.opendev.org/c/openstack/project-config/+/91230420:05
clarkbheh I rechecked the ozj change before ^ deployed. I should know better /me tries to practice patience20:10
fungiyeah, saw that20:10
clarkbonce these jobs are cleaned up I think we're ready to remove the nodeset from base-jobs and the images from nodepool on the announced date. And just hope that as much cleanup as possible happens before then20:14
clarkband then xenial, that will be a fun one20:19
clarkbone upside that I didn't anticipate is it is giving us good reason to pare down the zuul project list20:20
clarkbwhich should in theory reduce the number of errors and other unwanted interactions from cleanups20:20
clarkbfor xenial one of the fun cleanups is going to be our puppet stuff. But we did retire a lot of them so maybe the web won't be too large20:29
clarkbok now the ozj job is failing because the aarch centos 8 openafs build fails due to the kernel being too old again. I'll restore the content of the playbook that builds that stuff to include centos 7 even though it isn't required any longer to make this mergeable20:34
clarkbwe will have to clean that up later though as it looks for artifacts and those will no longer be built20:35
funginoted, thanks20:35
clarkbwe've got plenty of disk on the arm nodepool builder rules that disk being the problem20:38
clarkbthe last build log indicates success. I suspect that the big centos 8 stream updates taht tristanC[m] and amorelj noticed have resulted in new kernels and we just need to rebuild that image which should happen nowish20:39
clarkbwe appaer to do weekyl builds and that image is just about a week old. So I think we can probably try to clean up that job on Monday and see if the openafs pacakges build then20:39
fungiwfm20:48
opendevreviewClark Boylan proposed openstack/project-config master: Remove old infra team puppet testing  https://review.opendev.org/c/openstack/project-config/+/91230921:07
opendevreviewClark Boylan proposed opendev/system-config master: Remove old infra team puppet testing  https://review.opendev.org/c/opendev/system-config/+/91231121:08
clarkbI'm mostly pushing these changes up for early comment. I don't think we are in a rush yet since centos 7 is still in progress and I expect a lot of other stuff to have random ties to xenial21:08
clarkbbut I think that is roughly what it will look like for us to stop testing puppet on xenial (and largely at all) in our world)21:09
clarkbI think we can use this as a stepping stone to retiring more puppet-* repos too21:10
clarkblike puppet-redis? I have no idea what would be using that at this point. Maybe it was for openstackid which we excised21:11
clarkboh and ethercalc etc21:12
opendevreviewAmy Marrich proposed opendev/irc-meetings master: Moving the meeting up  https://review.opendev.org/c/opendev/irc-meetings/+/91231421:53
opendevreviewMerged opendev/irc-meetings master: Moving the meeting up  https://review.opendev.org/c/opendev/irc-meetings/+/91231422:13
clarkbthat reminds me we lose an hour of sleep tomorrow night22:26
fungiyes we do22:46

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!