Friday, 2024-03-15

tkajinamI think we can merge https://review.opendev.org/c/openstack/project-config/+/905820 now, because the governance change has already been merged08:45
tkajinamand it'd be nice if https://review.opendev.org/c/openstack/project-config/+/912710 can be merged, too, to move the retirement process forward08:45
opendevreviewTakashi Kajinami proposed openstack/project-config master: Retire puppet-murano: End Project Gating  https://review.opendev.org/c/openstack/project-config/+/91329209:11
opendevreviewMerged openstack/project-config master: Retire puppet-ec2api: End Project Gating  https://review.opendev.org/c/openstack/project-config/+/91271009:12
opendevreviewTakashi Kajinami proposed openstack/project-config master: Retire puppet-murano: Remove Project from Infrastructure System  https://review.opendev.org/c/openstack/project-config/+/91329609:18
fungitkajinam: see comments on 905820, but i'm happy to merge once that's cleaned up12:47
opendevreviewTakashi Kajinami proposed openstack/project-config master: Retire heat-cfnclient: Remove Project from Infrastructure System  https://review.opendev.org/c/openstack/project-config/+/90582012:49
tkajinamfungi, thanks ! (and thank you, frickler, too)12:49
tkajinamfungi, I guess I can remove the grops key assuming it's for storyboard but lmk in case I have to restore it12:50
fungitkajinam: yeah, remove both. it was there for adding the repo to the corresponding project group in sb12:52
tkajinamok !12:54
opendevreviewMerged openstack/project-config master: Retire puppet-murano: End Project Gating  https://review.opendev.org/c/openstack/project-config/+/91329213:02
opendevreviewMerged openstack/project-config master: Retire heat-cfnclient: Remove Project from Infrastructure System  https://review.opendev.org/c/openstack/project-config/+/90582013:34
Clark[m]fungi: fricker: thoughts on removing centos 7 from base jobs and nodepool as planned today? I think fungi's efforts put us in a good spot to proceed with minimal impact14:52
fungii think we should do it today as we announced, yes14:52
Clark[m]https://review.opendev.org/c/opendev/base-jobs/+/912786 is the next step then. If you want to review that I can remove my wip and approve once reviews are done14:53
fungiand i liked the idea of capturing the config error json from tenants (or maybe just the openstack tenant? the others are small enough to work out by skimming) before we do14:53
Clark[m]That way we can do a diff? I like that idea too14:54
fungiyeah. it seems like zuul builds that json in a deterministic order, so diffing a yaml conversion of it is quite trivial14:54
fungiwhen i merged the final devstack cleanup change, a diff of the error data cleanly pointed out three backports i'd accidentally not pushed to some old keystone branches, and once i did and merged those the diff was empty again14:55
fungijust a sec and i'll review, then i'll grab a config error snapshot when we're approving14:56
Clark[m]Thanks!15:01
clarkbWIP is removed I think you can go ahead and approve it when you review it (assuming you have no complaints with the change)15:11
clarkbhttps://review.opendev.org/c/openstack/project-config/+/912787 and then that can go in once we're satisfied we don't have any crazy new config errosr that needs fixing first15:14
fungiclarkb: okay, i've approved 912786 now15:37
fungisnapshot of https://zuul.opendev.org/api/tenant/openstack/config-errors?limit=1000 grabbed (514 entries)15:39
clarkbfungi: where is that snapshot?15:39
fungion my desktop15:42
fungiit's nearly half a megabyte uncompressed so no way i can put it in a paste, but i can copy it to a server or something if needed15:44
clarkbimpressive :)15:44
fungii was just planning to put a yaml diff on paste (if it's compact enough)15:44
fungiall errors mentioning centos-7 are either broken tripleo references, or various branches of openstack-ansible-functional-centos-7 which is already broken for other reasons15:46
clarkbI wonder if this phishes from "ovh" that hit the mailing list are a common attack sent to all mailing lists they can find or if we're targetted because we have some dealings with ovh already15:50
opendevreviewMerged opendev/base-jobs master: Remove centos-7 nodeset  https://review.opendev.org/c/opendev/base-jobs/+/91278615:59
clarkbthere are 637 errors now16:02
fungievery single mailing list is getting those, yes16:03
clarkbfungi: looks like openstack/freezer is angry16:04
fungii'll work up the diff16:04
clarkbbut it was already sad about opensuse removal so not really a regression from that16:04
fungithe diff is still ~95k16:05
clarkbswift, freezer, and openstack-ansible-ops16:05
clarkbfungi: I think thee amy be duplicates too16:06
clarkbSo maybe do the diff then sort and uniq the entries?16:06
fungibasically looking through first to see if there's any impact to master or stable/2024.1 branches of active openstack deliverable repositories16:08
clarkbfungi: I think freezer but the other two were only on older branches16:08
funginot found any yet but will take a bit16:08
clarkbbut not sure if freezer is part of active openstack release16:09
fungifreezer is not active16:09
clarkback16:09
fungiit was officially declared inactive by the tc16:09
fungiin large part because its zuul configuration was in a broken state for a prolonged period of time16:10
fungistarlingx/zuul-jobs master branch is impacted, btw16:10
clarkbyes I pushed a change to them when I did opensuse and it has been completely ignored16:11
clarkbhttps://review.opendev.org/c/starlingx/zuul-jobs/+/90976616:11
clarkbI feel like I was helpful there and it is up to them to decide to accept the help16:12
fungiand yeah, no references to 'branch: master' outside those two, no references at all to 'branch: stable\/2024\.1'16:12
clarkbfungi: probably the biggest risk to the release then is swift, but we're reasonably confident that not having errors on the master and stable/2024.1 branches isolates us from problems? cc timburke 16:13
clarkbsince openstack ansible ops isn't part of coordinated releasing as its a deployment trailing release repo I think16:13
fungii believe so, yes16:14
clarkbin that case I think we can offer our help to timburke to bypass ci to merge cleanups if that is necessary and otherwise we can proceed with nodepool clenaup? or do we want to revert and cleanup swift first?16:15
* clarkb will go do morning things while awaiting feedback on that16:16
fungithis should only at worst prevent testing changes on the affected older stable branches, yes16:18
fungii think we can proceed with the nodepool removal16:18
fungii've rechecked it16:20
fungithe varied diff context and interspersed escaped embedded newlines make analyzing the diff tough. if we really want a uniq'd breakdown of the 123 new errors we'll likely need to produce it by hand16:22
fungithe actual errors are encoded as visually-formatted multi-line message strings rather than as structured data16:24
fungii could try to stream edit out the error messages with a smart enough pattern match, but that will take work too16:26
clarkbfwiw I skimmed only the centos-7 nodeset not foudn errors. I guess there could be newer transitive job not found errors16:36
clarkbmaybe if you can extract any errors that are not nodeset not found that list will be smaller and potentially actionable?16:36
clarkbbut you said the master and stable/2024.1 branches don't show up so we're probably fine either way16:37
*** dmellado74522 is now known as dmellado16:41
fungiyeah, i need to step away for a few minutes, but can try to put together some sort of machine-assisted analysis of the diff shortly16:41
clarkbthere is one centos-7 node locked for deleting in nodepool. I suspect the problem is in the cloud though and deletion is failing16:42
clarkbcorvus: ^ do you think that will present a problem if we remove the centos-7 configuration from nodepool (no more labels, images, etc)?16:42
clarkbI think we can manually clear out the record from zk and followup with the cloud later16:43
clarkbI seem to recall node processing (even deletions) failing if the node and pool info is removed16:43
clarkbthe server was booted in january16:44
clarkbfault | {'message': 'MessagingTimeout', 'code': 500, 'created': '2024-01-21T03:26:57Z'} and is in an error state16:44
clarkbI'm going to try and manually issue a delete request (not that I expect a different result)16:44
clarkbI think if we manually delete the record from zk then nodepool should create a leaked node record for it with less info that may avoid issues. However, I think I'm also happy to do the nodepool cleanup config and see if this creates any problems in the first place. But will defer to corvus on whether or not that is sane16:46
clarkbMy manual delete attempt has not created any change in the situation (as expected)16:55
corvusclarkb: i think removing the image config should be fine as long as the provider still exists.  i think as long as the zk record still exists, it will proceed like a normal node deletion, and if it doesn't exist it should proceed like a leaked instance delete16:55
clarkbcorvus: the provider will remain but not the pool config for that image/label in the provider16:55
corvusclarkb: i think we don't actually create stub zk nodes anymore either btw (that doesn't change analysis of the situation -- just indicating that it means we fall into the leaked path in that case).16:56
clarkbbut sounds like we can proceed and see if nodepool has any problems  and deal with them at that point if they occur16:56
corvusclarkb: so i'm like 85% sure you should be able to proceed as you describe without problem, and if i'm wrong, i agree that a manual zk deletion is probably a good way out of that (or maybe if we see an error, something else will suggest itself)16:57
corvustldr: +1 delete label/image/etc16:57
clarkbsounds like a plan16:57
corvusfungi: remote:   https://review.opendev.org/c/zuul/zuul/+/913434 Use NodesetNotFoundError class [NEW]        17:10
corvusi think that should make those errors a bit more filterable (not by nodeset name, but at least you should be able to filter by the class of error)17:11
corvusalso remote:   https://review.opendev.org/c/zuul/zuul/+/913435 Use ProjectNotFoundError [NEW]        17:25
fungiokay, back and catching up17:41
fungicorvus: ah, neat! yes those would help tremendously. as for filtering out the error strings, i'll probably resort to a python script to drop keys from the json rather than trying to do it all with simple command-line utilities17:43
clarkbfungi: for https://review.opendev.org/c/openstack/project-config/+/912787 I'll let you approve when you're satisfied with your error deep dive since corvus' seems to think the stuck deleting node shouldn't eb a problem17:46
clarkbI do have lunch plans with some friends today, but should be back at a reasonable time to debug any nodepool issues should those occur17:46
fungiyeah, will do17:46
clarkbcorvus: that config error refactor change has import errors17:47
corvusguess i should have run more than just flake8 on it :)17:48
corvusit usually catches those17:48
fungiokay, filtering out the error and short_error keys, the before/after diff is reduced to ~18k in size, probably still too big for paste.o.o17:55
fungiah, no, it fits! https://paste.opendev.org/show/bfRpuFOiI2OM6O77uPuV/17:56
fungii suppose i could squeeze it down a bit more by dropping the source_context.path keys17:58
fungihttps://paste.opendev.org/show/b52SraP5g8re4jEB8IaC/ is without the source_context.path18:00
fricklerI guess I should run the eom branch cleanup, lest someone tries to fix stable/vwx18:00
fungii already merged a bunch of backports to those earlier this week18:01
fungiand yes, it would have been easier if they hadn't existed18:01
frickleryes, but these should go into unmaintained/vwx now where those exist18:01
clarkbfungi: looks like placement and tenks were things I missed when doing a manual scan. But neither are on master or stable/2024.1 so that still isn't an issue18:01
opendevreviewMerged openstack/project-config master: Remove centos-7 image uploads from Nodepool  https://review.opendev.org/c/openstack/project-config/+/91278718:17
clarkbthat config update is deploying nowish18:24
clarkbthe stuck node is in rax-dfw so if there are issues with deleting it we will see that on nl0118:24
clarkbdeploy appears done18:32
clarkb`grep 0036475040 /var/log/nodepool/nodepool-launcher.log` shows a new behavior basically we hit https://opendev.org/zuul/nodepool/src/branch/master/nodepool/driver/utils.py#L360-L366 which has to do quota calculation then it says deletion should clean it up. Otherwise the exceptions deleting the node have continued the same before and after this update18:35
clarkball that to say I don't think this is going to be a problem18:35
clarkbat least not any more than it would be if we kept centos-7 in the provider config18:35
clarkbI'm going to pop out for that lunch momentarily. But will check in after to make sure all is still well. If so I think we can probably proceed with removing the disk image builds today as well?18:37
frickler+118:46
fungiwow, working on final contributor stats for openstack 2024.1 cycle and all those centos-7 removal patches shot me up to #13 by change count18:46
fungididn't realize there had been quite that many18:47
fungino, i was looking at the wrong file. #35 does seem more likely ;)18:51
frickler236 branches deleted, seems like ~100 less config errors https://paste.opendev.org/show/b8EXZ6co1dP8WrMl3J5I/19:40
fricklerand with that I'm out for today and mostly for the weekend19:45
fungihave a good weekend, thanks for the help!19:45
Clark[m]fungi if you are still around can you recheck https://review.opendev.org/c/openstack/project-config/+/912788/20:21
Clark[m]Lunch should be winding down soon but realized that still has a config error -120:21
fungiyep!20:25
fungiit's passing now20:55
fungiwe can approve it when you get back20:55
clarkbI'm back sorry that took entirely too long but the car said itw as was 70F something we haven't had in month21:21
clarkbI have approved it21:22
clarkbI even turned off the hvac system. its too bad it will only last a few days21:25
opendevreviewMerged openstack/project-config master: Remove centos-7 nodepool image builds  https://review.opendev.org/c/openstack/project-config/+/91278821:34
fungilooks like it deployed21:46
fungino need to apologize! there are days when i've accidentally gone out for a 3-hour lunch21:46
clarkbcool at this point we shouldn't have any new issues. We would've seen problems with the earlier cleanups21:46
clarkbnb02 did leak an intermediate vhd conversion file for centos-7 from august. I've deleted it. That was the only file in /opt/nodepool_dib for centos-7 on either nb01 or nb0222:04
opendevreviewClark Boylan proposed opendev/system-config master: Stop mirroring CentOS 7 packages  https://review.opendev.org/c/opendev/system-config/+/91345322:47
opendevreviewClark Boylan proposed opendev/system-config master: Cleanup opensuse mirroring configs entirely  https://review.opendev.org/c/opendev/system-config/+/91345422:47
clarkbgive the way buster went I think we wait on approving those until next week. But I wanted to get testing done for that (particularly the second change since it is a bit more involved)22:47
fungia reasonable precaution22:56
fungialso i want to remember to tag a git-review release on monday22:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!