Thursday, 2022-05-26

opendevreviewIan Wienand proposed opendev/glean master: Revert "Add option to ignore config drive interfaces info"  https://review.opendev.org/c/opendev/glean/+/84322501:15
opendevreviewIan Wienand proposed opendev/glean master: write_redhat_interfaces: refactor to walk interfaces first  https://review.opendev.org/c/opendev/glean/+/84324101:15
opendevreviewIan Wienand proposed opendev/glean master: write_redhat_interfaces: pass multiple networks to output functions  https://review.opendev.org/c/opendev/glean/+/84324201:15
opendevreviewIan Wienand proposed opendev/glean master: [wip] write out ipv6  https://review.opendev.org/c/opendev/glean/+/84324301:15
opendevreviewIan Wienand proposed opendev/glean master: Clean up TYPE=Ethernet handling  https://review.opendev.org/c/opendev/glean/+/84335301:15
ianwoh i think i might have uploaded over fixups made :/01:15
opendevreviewIan Wienand proposed opendev/glean master: write_redhat_interfaces: pass multiple networks to output functions  https://review.opendev.org/c/opendev/glean/+/84324201:19
opendevreviewIan Wienand proposed opendev/glean master: Clean up TYPE=Ethernet handling  https://review.opendev.org/c/opendev/glean/+/84335301:19
opendevreviewIan Wienand proposed opendev/glean master: [wip] write out ipv6  https://review.opendev.org/c/opendev/glean/+/84324301:19
ianwi think that restored fungi's fix.  i'm not sure why it decided to upload everything :/01:21
*** rlandy|bbl is now known as rlandy01:23
*** ysandeep|out is now known as ysandeep01:27
*** rlandy is now known as rlandy|out01:31
opendevreviewIan Wienand proposed opendev/glean master: _network_info: Clean up TYPE=Ethernet handling  https://review.opendev.org/c/opendev/glean/+/84335302:45
opendevreviewIan Wienand proposed opendev/glean master: [wip] write out ipv6  https://review.opendev.org/c/opendev/glean/+/84324302:45
opendevreviewIan Wienand proposed opendev/glean master: _network_info: refactor to add ipv4 info at the end  https://review.opendev.org/c/opendev/glean/+/84336702:45
*** ysandeep is now known as ysandeep|afk02:50
opendevreviewMerged openstack/diskimage-builder master: Check and mount boot volume for data extraction with nouuid  https://review.opendev.org/c/openstack/diskimage-builder/+/84329703:55
*** diablo_rojo_phone is now known as Guest36005:08
*** mnasiadka_ is now known as mnasiadka05:08
*** tkajinam_ is now known as tkajinam05:11
*** TheMaster is now known as Unit19305:58
opendevreviewIan Wienand proposed opendev/glean master: [wip] write out ipv6  https://review.opendev.org/c/opendev/glean/+/84324307:22
opendevreviewIan Wienand proposed opendev/glean master: _network_info: simplify to single string  https://review.opendev.org/c/opendev/glean/+/84341107:22
ianwclarkb/fungi: ^ 843243 is writing out something like what i think the config files need to look like.  probably the next step is to migrate that manually onto some nodes and see what happens on a real system07:23
*** ykarel_ is now known as ykarel07:33
*** ysandeep|rover is now known as ysandeep|rover|lunch07:35
*** ysandeep|rover|lunch is now known as ysandeep|rover09:23
*** rlandy|out is now known as rlandy10:27
*** dviroel_ is now known as dviroel11:17
fungiinfra-root: i'm initiating the zuul_reboot playbook in a root screen session on bridge.o.o now. expect it to take 6+ hours to complete11:33
fungii'll keep an eye on it and make sure it doesn't go completely crazy11:33
fungiimage pulls are in progress now11:34
fungiand now it's starting in on the executor stops11:34
fungioh, right, this isn't the 6+6 batching, it's one-by-one so will be waaaay longer than 6 hours11:35
fungipossibly several days11:36
*** ysandeep|rover is now known as ysandeep|rover|break11:44
opendevreviewsean mooney proposed opendev/bindep master: Add support for popos  https://review.opendev.org/c/opendev/bindep/+/84344412:20
opendevreviewsean mooney proposed opendev/bindep master: Add support for popos  https://review.opendev.org/c/opendev/bindep/+/84344412:25
*** ysandeep|rover|break is now known as ysandeep|rover12:36
*** mnaser_ is now known as mnaser12:41
mgariepyhello, is it possible to hold `openstack-ansible-upgrade-infra_lxc-ubuntu-focal` from `837588,28` so i can check it>?13:14
mgariepyHostname: ubuntu-focal-ovh-gra1-0029790883 13:15
fungimgariepy: i can set an autohold for a combination of project+job+change yes, if that's still running then it will be held once it fails13:38
fungiwhat problem are you investigating with that job?13:39
fungii'll add it to the hold comment13:39
fungifailed: [aio1_keystone_container-c516e6a5 -> aio1_utility_container-4f947b4e(172.29.236.230)] (item=None) => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}13:43
fungii guess it's that you want to know why that task is failing, but with no_log set on it you don't have any stdout/stderr to provide further clues?13:43
fungiin theory you could make a separate do-not-merge change which removes the no_log from that task and make it depends-on 83758813:44
fungi(for openstack.osa.db_setup "Create database for service")13:45
fungianyway, i've set an autohold in case that helps:13:46
mgariepysomething is holding the socket in a container.13:46
fungizuul-client autohold --tenant=openstack --project=opendev.org/openstack/openstack-ansible-repo_server --job=openstack-ansible-upgrade-infra_lxc-ubuntu-focal --ref=refs/changes/88/837588/28 --reason="mgariepy investigating investigating database setup task failure masked by no_log" --count=113:46
mgariepythanks 13:46
jrosser_fungi: it's suspected to be a race condition in swapping the xinetd galera loadbalancer check to one using a systemd socket activated service13:46
mgariepycan you grab my key from gerrit ?13:46
funginot easily, but you can stick a copy on paste.opendev.org13:47
jrosser_the db task failure is just a side effect of the db being down from the perspective of the loadbalancer13:47
fungi(a copy of th epublic key obviously, not the private key!)13:47
mgariepyhttps://paste.openstack.org/show/bmEEcIcyQre3D8rn76hz/13:49
mgariepyfungi, yep i know how ssh works :) haha13:49
fungithanks!13:50
fungize01 reboot appears to have happened successfully ~1.5 hours ago, and https://zuul.opendev.org/components reports it's running 6.0.1.dev34 b1311a590 while ze02 (5.2.6.dev33 a89ce345) is now paused for graceful stop14:00
fungiclarkb: ^14:00
fungiso that was only an hour to restart ze01, not as bad as i expected14:02
fungi11:34z is when i saw the stopping begin for ze01, and last reports system boot at 12:34z14:03
fungize02 seems to be taking longer to stop though14:03
*** artom_ is now known as artom14:04
*** bhagyashris is now known as bhagyashris|ruck14:22
*** ysandeep|rover is now known as ysandeep|dinner14:27
*** ysandeep|dinner is now known as ysandeep14:50
fungii've initiated a borg prune on backup02.ca-ymq-1.vexxhost.o.o since it's warning us about being at 90% capacity14:53
*** Guest360 is now known as diablo_rojo_phone15:00
fungize02 has been down to 1 build running for the past 40 minutes, so hopefully it will pop at any time15:06
fungiand there it goes!15:08
fungiso that was roughly 2.5 hours for ze0215:09
mgariepythe job has failed. what user should i use to log in ?15:12
fungii need to add your key to the node, just a moment15:15
mgariepycool 15:15
mgariepy:)15:15
fungimgariepy: ssh root@149.202.178.24915:18
fungiand let us know when you're done with the node so we can release the hold for it15:18
mgariepyok thanks15:18
clarkbfungi: thank you for getting that started. And ya the idea is we could potentially run this continuously in the background which is why it is serial: 1. Thinking about that thought we might be able to increase that to serial: 2 for the executors and serial: 4 for the mergers? Starting conservative seemed safer though15:19
fungiagreed, slow for now allows us to spot problems before they get as out of hand15:19
fungithe "running builds" graph in the zuul-status dashboard on grafana provides a great view of the per-executor burndown progress15:28
fungias well as fairly accurate timing of each one at a glance15:29
fungigiven the starting point for build count on ze03 was slightly higher than for ze02, i'm guessing it will take roughly the same time or a little longer. hopefully no more than 3 hours15:30
clarkbthe wildcard in that is long paused jobs15:31
clarkbconfusingly they don't even show up as running ansible processes on the host if you do direct inspection as they are merely zuul stte. Its possible that paused job state could be moved into the db instead and we could have executors shutdown more quickly?15:31
opendevreviewMerged openstack/project-config master: elastic-recheck: allow releasers to merge/delete  https://review.opendev.org/c/openstack/project-config/+/84045515:33
opendevreviewShnaidman Sagi (Sergey) proposed openstack/project-config master: Add ops to openstack-ansible-sig channel  https://review.opendev.org/c/openstack/project-config/+/84349215:33
fungioh, that's an interesting idea. i guess the down side is that if you have a paused job which is ready to be unpaused just after the graceful signal is set, it will have to wait for all the other active builds for other buildsets to complete15:34
fungi#status log Pruned backups on backup02.ca-ymq-1.vexxhost.opendev.org bringing filesystem utilization down from 90% to 54%15:40
opendevstatusfungi: finished logging15:40
clarkbre the gerrit 3.6 upgrade oddity I mentioend yesterday: you can apparently run the command unline after upgrading to 3.5.2 or newer 3.5. Then only upgrade to 3.6 once that is complete. 3.5 will write new content in the format 3.6 wants. It is only the old content that needs to be forward ported15:43
clarkbthat shouldn't be too bad15:44
*** marios is now known as marios|out15:44
fungiyeah, i wondered if the migration path was something like that. very cool15:45
mgariepyfungi, thanks you can remove the hold :)15:45
mgariepyit did fail at some other tasks this time.15:45
mgariepyi'll try to reproduce in a vm here instead.15:45
fungimgariepy: appreciated. do you need the autohold reset for another recheck of that patchset?15:45
mgariepyhmm it would be easier.15:46
mgariepyi'll recheck and it will auto-hold ?15:46
fungiyes, i've set a new autohold now for that same change15:46
mgariepyi just did order the recheck.15:47
fungiif you keep rechecking it, then the next time it fails zuul will hold the node for that same job15:47
mgariepythat's handy :D15:47
fungiso if it succeeds, recheck again until it fails. we don't have to redo the autohold after successful runs15:47
fungionly if it catches a build failure15:48
mgariepywhen it holds if i do recheck will it release the old instance?15:48
fungino, i manually released the old one by deleting the original autohold for it15:48
mgariepyok15:49
mgariepyperfect.15:49
funginodes are held indefinitely once caught by an autohold, until an admin deletes the autohold for it15:49
mgariepyi'll check the status after lunch if it has failed i'll poke you to add my key.15:49
fungior until something happens to the server instance (they're in clouds, after all)15:49
fungiperfect. i'll be around15:49
mgariepythanks15:49
fungiyw15:49
clarkbre setting serial to 2 for executors we're already running low on available executor job slots with serial 1 so that may not be a good idea15:52
fungiyeah, agreed. this way at least it's not heavy impact even under load15:53
*** frenzyfriday|ruck is now known as frenzy_friday15:54
fungilooks like we had a bump in activity ~15 minutes ago and are now effectively maxxed out on node quota too15:54
fungithe executors will presumably recover a bit once that spike settles out15:54
fungithe executors can only start new builds when there are nodes available for them anyway, so if it's the builds-starting governor kicking in to take them out of accepting status, then that shouldn't last too long15:56
*** ysandeep is now known as ysandeep|out16:06
fungiclarkb: ianw (if you're not out friday): just a heads up that odds are the executor restarts will accelerate during our lower activity period around utc midnight, so there's a good chance the merger and scheduler/web restarts will happen late in my day or once i'm asleep16:09
clarkbya I'll try to continue to keep an eye on it16:10
fungithanks!16:10
clarkbonce the executors are done mergers should happen very quickly. Then we'll slow down with the scheduelrs again16:10
clarkbI removed frickler's acl package install from the ansible 5 devstack test change and it seems to be happy so far implying our image updates did the trick.16:21
fungiperfect16:22
clarkbAre there other big classes of test job that we want to try and check before we start talking more broadly about updating our default? The tripleo jobs come to mind but I can't keep their setup straight enough to know what job to modify to get broad testing done and they are pretty in touch with ansible upstream so shouldn't be too much effort for them to figure out once we tell16:22
clarkbeveryone it is coming16:22
clarkbBut I'm thinking maybe email on ~Monday saying expect default ansible version in opendev zuul to be 5 across the board at the end of June16:23
fungithat sounds like good timing. gives folks a chance to knock it out after post-summit downtime16:23
clarkbyup and there is an easy way to override to 2.9 if necessary16:25
fricklerclarkb: thanks, for the devstack test, that's good news. I was planning to test kolla jobs, too16:30
clarkb++16:31
fungiclarkb: it came up a little while back as an open question, whether setuptools's experimental pyproject.toml metadata support works with pbr-using packages. i managed to get it going on a personal project: https://mudpy.org/gitweb?p=mudpy.git;a=commitdiff;h=e6f6c65d17:06
fungionly problem was that i couldn't completely delete setup.cfg because pbr still consults it for the package name, apparently17:07
fungiand setup.py of course, since we still have to set pbr=true there17:07
clarkbneat17:07
clarkbso I guess we're most of the way there for when/if that transition is forced on us17:08
fungiyeah17:08
clarkbthough some repos may consider just dropping pbr since the tools do a lot of what it did17:08
fungiwell, they'd need to replace it with other similar setuptools plugins17:08
fungiand for example if they use the semver handling in pbr that's not an available feature in the setuptools-scm plugin17:09
clarkbright, though does anyone use that?17:11
clarkbI do think a transition to upstream tools if they support the bulk of what is needed is a good sunset for pbr17:11
clarkbpbr exists because the usptream tools didn't do a bunch of straightforward stuff tehy should do17:12
clarkbze03 has a very stubborn last job :)17:13
clarkbkolla-ansible-centos8s-source-upgrade-ceph-ansible retries a lot and is non voting in check and does more than three attempts17:23
* clarkb is going to push a change to make that an experimental job for them instead17:24
fungiclarkb: the openstack release automation uses semver headers in all the service projects in order to force the master branch to start reporting development versions for the next semver major release immediately after each coordinated release17:25
clarkboh its on stable victoria too17:25
clarkbfungi: oh neat17:25
clarkbfungi: setuptools_scm does do versioning from git (and hg) I wonder if it can be convinced to report the new dev versions in a similar way17:26
clarkbmaybe by doing a second tag on a new commit that sets the ver as a dev ver or something17:27
opendevreviewDr. Jens Harbott proposed openstack/project-config master: Add a repository for the Large Scale SIG  https://review.opendev.org/c/openstack/project-config/+/84353417:29
opendevreviewDr. Jens Harbott proposed openstack/project-config master: Add a repository for the Large Scale SIG  https://review.opendev.org/c/openstack/project-config/+/84353417:31
fungiclarkb: https://review.opendev.org/c/openstack/nova/+/833243 is an example release bot change17:34
fungii was wrong, it sets Sem-Ver: feature so that dev versions will appear to belong to the next minor (feature instead of api-break)17:35
clarkbremote:   https://review.opendev.org/c/openstack/kolla-ansible/+/843536 Cleanup zuul jobs a bit17:36
fungibut still, same end result. it allows the project to differentiate dev commits on master from dev commits on the most recent stable branch (where dev versions will appear to belong to th enext patch level instead017:36
clarkbfwiw I think we should encourage overrides for things like timeouts and attempts to happen more to the leaf end of the job trees than the trunk17:36
fungiand it's not just openstack service projects, looks like all the libs do it too17:37
clarkbits far to easy to forget things have been overridden then run a job 5 times before running success or failure17:37
clarkbfungi: ya I'm wondering if setuptools scm's version management supports something similar17:37
clarkbhttps://github.com/pypa/setuptools_scm/blob/main/src/setuptools_scm/version.py#L201-L203 it has stuff to do something along those lines17:38
clarkbze04 is paused now17:39
fungiyep, looks like ze03 just restarted in the last few minutes17:39
clarkbwow my change to kolla-ansible updates the attempts value in the base job whcih causes all the jobs that inherit from it to be run17:41
clarkbthere are a lot of those jobs and the vast majority are non voting.17:42
fungimgoddard: mnasiadka: ^ if you're around, any history on that? are those just recently broken?17:44
fricklerI was following the Project Creators Guide, adding the needed-by comment to the project-config patch and that trigger zuul to vote -1 on https://review.opendev.org/c/openstack/governance/+/843535 . can we come up with a better procedure or shall we just add a warning to the guide?17:45
clarkbfrickler: I dont' think the needed by comment is at fault. Zuul doesn't do anything with that information17:46
fricklerclarkb: it seems it does, because it is a new ps which causes the gov change testing to be canceled17:47
clarkbthe issue is that the governance change was in the zuul check queue when you pushed a new patchset to its depends on17:47
fricklerclarkb: yes, but that is exactly what our docs say one should do17:47
clarkbya its the generic behavior that pushing a new patchset to a depends on causes the child to be kicked out since it was running against code that can no longer be merged17:48
clarkbfrickler: sure I'm just trying to clarify that the text needed-by doesn't do that. its any new patchset17:48
clarkb(depends-on affects how zuul runs, needed-by does not)17:48
frickleryes, I know that, sorry if I worded things wrong. it is submitting the new ps with the needed-by comment that triggers this, not the content of the comment17:49
clarkbI think we can update the docs to suggest adding a retry comment after the -117:49
clarkbsince it should reenqueue just fine and report normally as long as no new patchsets are pushed in that time17:50
fungior wait for the governance change to complete testing before revising the commit message on the project-config change17:50
fungior add the needed-by first (and use the change-id in it rather than a gerrit url)17:51
fricklerisn't using change-ids deprecated?17:51
fungifor depends-on, yes. because depends-on is something zuul uses17:52
fungineeded-by is just a hint to reviewers17:52
clarkbanother appraoch would be to use a consistent topic17:52
clarkbrather than updating the commit message. Or leaving a comment on the chagnes to tie them togehter17:52
fricklerthe guide also says to use "new-project" as topic17:52
clarkbI think I like ^ best but woudl require the openstack tc to operate differently17:52
clarkbya so maybe if we set the same topic on both cahnges that is good enough for reviewers to find them17:53
fungithe new-project topic was something we came up with ages ago to help us streamline reviewing project additions, but is probably not that relevant any longer17:54
clarkbunless we wanted to use it now to tie those two sets of changes togther17:54
clarkbrather than having the order of operations problem17:54
fungiooh, or a new use for gerrit's "hashtags" ;)17:54
* fungi is still looking for a nail to beat with that hammer17:55
clarkbbecause as you say it is just a hint to reviewrs so lets use gerrit's ability for that rather than artificially making new commits and making zuul unhappy17:55
fricklero.k, so https://review.opendev.org/q/hashtag:new-project+(status:open%20OR%20status:merged) then17:56
fricklerand https://review.opendev.org/q/topic:large-scale-sig+status:open for the other conjunction17:57
fungii was thinking more like adding a hashtag to the project-config change which says where to find the corresponding governance change17:58
fungibut also, could just use a basic review comment17:58
clarkbya may be asking people to leave  comment with a link to the governance change is simplest17:58
clarkbI'd be happy with that17:58
fungisame. i always look at existing comments when i'm reviewing changes anyway17:58
clarkb"Leave a comment with a link to the openstack governance change once that is pushed"18:02
clarkbsomething simple like that18:02
opendevreviewMerged openstack/diskimage-builder master: Ensure passwd is installed on RH and derivatives  https://review.opendev.org/c/openstack/diskimage-builder/+/84035218:03
frickler"Leave a review comment ..." otherwise I fear it still might be read as adding a comment to the commit message18:03
clarkb++18:03
fricklerI can do a patch for that tomorrow unless one of you wants to do it right away18:04
clarkbI think it is fine to do tomorrow. New projects don't happen every day18:04
fungiyeah, that sounds like a great improvement, thanks frickler!18:06
clarkboh hey we have jammy wheels now18:27
mgariepyfungi, is the hold good also for a gate check ? 18:57
fungimgariepy: yes, it's independent of pipeline18:58
mgariepyok18:58
mgariepyi'll see if it merge now if not, i'll debug it there.18:59
fungigood luck!18:59
mgariepyfungi, can you give me access to the vm  :D19:22
fungion it19:24
fungimgariepy: ssh root@23.253.56.19819:26
mgariepyi'm in it :D19:27
fungize04 rebooted so we're on to ze05 now19:46
clarkbit is interesting to see the job counts fall off on the grafana graphs as the reboots happen20:05
corvusit's the zuul wave20:06
BlaisePabon[m]Late breaking personal news... I talked my management into transferring me fro Product Management to become the first DevOps SRE.20:31
BlaisePabon[m]My first project is to configure zuul-ci and Gerrit for our developers.20:31
fungicongrats!20:34
BlaisePabon[m]I'm both elated and terrified. I have set up CI toolchains with Jenkins and maven, but zero `zuul-ci` experience.20:35
fungiif you're curious how we do it, you can find our deployment playbooks and docker-compose files in https://opendev.org/system-config20:35
fungier, https://opendev.org/opendev/system-config20:35
BlaisePabon[m]I'm both elated and terrified. I have set up CI toolchains with Jenkins and maven, but zero zuul-ci experience.20:36
BlaisePabon[m]Thank you fungi , that means more to me than you can imagine!!!20:36
fungialso https://zuul.opendev.org/ is a real production deployment you can browse around and check out20:36
fungiand we have a zuul-status dashboard at https://grafana.opendev.org/ with lots of stats trended for the deployment20:37
BlaisePabon[m]I'll look later ... my opendev account is my personal account.20:37
fungiwell, all of that is anonymously accessible too, no need to log into anything just to look around20:38
BlaisePabon[m]Oh! Never mind, I can read this from here.20:38
fungihttps://zuul.opendev.org/components gives a nice overview of all the services in our zuul deployment. you can see there that we're in the middle of a rolling update now20:40
mgariepyfungi i think you can release the hold now thanks for your hlep20:44
mgariepyhelp**20:44
fungimgariepy: done, and you're welcome!20:49
fungidid you manage to work out what was conflicting on the socket?20:51
mgariepywe do restart the container and the socket needs an After=network.target.20:55
mgariepyhttps://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/84354720:56
*** dviroel is now known as dviroel|out21:00
opendevreviewClark Boylan proposed opendev/system-config master: Perform package upgrades prior to zuul cluster node reboots  https://review.opendev.org/c/opendev/system-config/+/84354921:06
opendevreviewDr. Jens Harbott proposed openstack/project-config master: Fix propose-updates job for requirements (3rd attempt)  https://review.opendev.org/c/openstack/project-config/+/84355021:06
fricklerclarkb: fungi: ^^ another round, if you could approve (feel free to modify if needed) before the next periodic trigger kicks off, so we get further testing, that'd be great21:08
fricklerI didn't make progress yet on the manual trigger task21:08
clarkblooking21:09
*** rlandy is now known as rlandy|biab21:23
fungimgariepy: ah, so it wasn't a socket conflict, it was just starting too early?21:40
opendevreviewMerged openstack/project-config master: Fix propose-updates job for requirements (3rd attempt)  https://review.opendev.org/c/openstack/project-config/+/84355021:48
mgariepyyep indeed, 21:59
mgariepysometimes it was ok tho ..21:59
fungize05 has rebooted and ze06 is paused now22:21
clarkbianw: not sure if you caught it but gerrit 3.4.5 and 3.5.2 exist now https://review.opendev.org/c/opendev/system-config/+/843298 will bump us up to those versions in our images. The 3.5.2 or newer release is important before we upgraded to 3.6 in order to be able to run a new notedb curation command prior to 3.6 upgrade happening.22:32
funginow ze07 is in progress23:46
Clark[m]I suspect it may accelerate now as there are fewer jobs which means fewer opportunities for a 3 hour long job to hold us up23:49
corvusi think there may be a bug in the graceful shutdown of mergers; so watch out for that and if we get stuck on mergers, just manually 'docker-compose down' them to get it moving again23:52
corvusi'll try to look at the merger graceful bug soon23:52
Clark[m]Specifically the issue should cause this playbook to stall but not exit/fail. It's the merger process not exiting cleanly right? So ya a docker-compose down out of band would force that to happen allowing the playbook to continue23:56

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!