Friday, 2021-11-05

opendevreviewClark Boylan proposed opendev/system-config master: Cleanup users launch-node.py might have used  https://review.opendev.org/c/opendev/system-config/+/81677100:02
opendevreviewClark Boylan proposed opendev/system-config master: Remove python3-nova-agent package from our servers  https://review.opendev.org/c/opendev/system-config/+/81677200:02
opendevreviewMerged opendev/system-config master: Don't set lodgeit db dir perms  https://review.opendev.org/c/opendev/system-config/+/81675400:04
ianwexcellent, the 9-stream build fails with some sort of yaml library exception : TypeError: load() missing 1 required positional argument: 'Loader'00:19
ianwthis must be pyyaml 600:21
opendevreviewIan Wienand proposed openstack/project-config master: nodepool elements: use yaml.safe_load  https://review.opendev.org/c/openstack/project-config/+/81677400:26
ianw^ this explains why it passed dib/nodepool gate00:26
Clark[m]Ya you need to explicitly opt into unsafe now00:26
*** odyssey4me is now known as Guest495100:56
opendevreviewMerged openstack/project-config master: nodepool elements: use yaml.safe_load  https://review.opendev.org/c/openstack/project-config/+/81677401:08
ianwsigh, now another problem02:13
ianw2021-11-05 01:51:07.189 | Updating cache of https://opendev.org/openstack/openstack.git in /opt/dib_cache/source-repositories/openstack_179b61797588a5983c2f97c6533dca570c8f887d with ref *02:13
ianw2021-11-05 01:51:13.993 | Could not access submodule 'adjutant'02:13
ianw2021-11-05 01:51:13.993 | Could not access submodule 'ansible-hardening'02:13
ianw2021-11-05 01:51:13.993 | Could not access submodule 'ansible-role-collect-logs'02:13
ianw... and so on ...02:13
ianw... is dib broken, something about a new git, gitea, or this repo ...02:13
*** sshnaidm is now known as sshnaidm|off03:06
Clark[m]Is new git trying to auto update submodules?03:19
ianwhttps://nb01.opendev.org/ubuntu-bionic-0000221037.log failed with this error03:22
ianwhttps://nb01.opendev.org/ubuntu-focal-0000119628.log was the next build and passed03:23
Clark[m]The urls are relative so those updates should work however I wouldn't expect the caching step to actually need to fetch the submodule content03:25
Clark[m]Re new git I wonder if that is a bullseye hit behavior update where it actually fetches those?03:26
ianwhttps://nb02.opendev.org/debian-stretch-0000051795.log also failed03:26
Clark[m]Since source-repositories runs outside the chroot it would be the bullseye hit?03:26
Clark[m]Honestly that repo isn't used for anything that I know of and is incomplt these days. We might get away with removing it from caching temporarily if necessary03:27
ianwhttps://nb02.opendev.org/debian-buster-0000061362.log was the build that followed that, and that passed03:27
Clark[m]*it is incomplete03:28
ianwit seems like possibly running the new git failed once -- may have done something?! -- and further runs are working03:28
Clark[m]ianw: maybe we need to see if it affects those older systems specifically? Then my hunch about the chroot is likely wrong03:28
Clark[m]Oh I see. Ya maybe something about git repo state on the first pass then git doesn't try again?03:29
ianwthis would be running with the build-system git, not the guest git03:29
ianwi can't seem to replicate it -- but also the git command-lines run by the caching are ... let's say not a priori obvious03:30
ianwin short, it happened, twice, on different servers, i don't know why and it doesn't seem to be happening now03:31
opendevreviewIan Wienand proposed opendev/system-config master: gerrit: mark file reviewed during testing  https://review.opendev.org/c/opendev/system-config/+/81676603:44
*** frenzyfriday|sick is now known as frenzy_friday04:25
ianwit just happened again with https://nb01.opendev.org/centos-7-0000237899.log04:55
ianwthe list of submodules is different now04:55
ianw2021-11-05 04:50:44.107 | Updating cache of https://opendev.org/openstack/openstack.git in /opt/dib_cache/source-repositories/openstack_179b61797588a5983c2f97c6533dca570c8f887d with ref *04:55
ianw2021-11-05 04:50:48.309 | Could not access submodule 'freezer-tempest-plugin'04:55
ianw2021-11-05 04:50:48.309 | Could not access submodule 'python-ironicclient'04:55
ianwi'm just about out of time for today, i'm not going to get to dig on this much further04:56
opendevreviewIan Wienand proposed opendev/system-config master: gerrit: mark file reviewed during testing  https://review.opendev.org/c/opendev/system-config/+/81676605:08
opendevreviewIan Wienand proposed openstack/project-config master: infra-package-needs: skip haveged start on 9-stream  https://review.opendev.org/c/openstack/project-config/+/81678206:41
opendevreviewMerged openstack/project-config master: infra-package-needs: skip haveged start on 9-stream  https://review.opendev.org/c/openstack/project-config/+/81678207:00
opendevreviewMerged opendev/system-config master: gerrit: don't chown mariadb container directory  https://review.opendev.org/c/opendev/system-config/+/81675009:25
opendevreviewAlfredo Moralejo proposed openstack/project-config master: Fix haveged installation in CentOS7  https://review.opendev.org/c/openstack/project-config/+/81681310:07
*** jpena|off is now known as jpena10:36
*** dviroel|out is now known as dviroel|rover10:38
*** jssfr is now known as foorl11:33
*** mazzy50981 is now known as mazzy509812:00
fungiClark[m]: ianw: i agree, the openstack/openstack repo is unnecessary to cache, i'd be in favor of filtering it out of the repos list explicitly13:45
opendevreviewAndre Aranha proposed zuul/zuul-jobs master: Add fips version of jobs needed for OpenStack  https://review.opendev.org/c/zuul/zuul-jobs/+/81638514:18
clarkbfungi: re the user changes in https://review.opendev.org/c/opendev/system-config/+/816769/1/playbooks/roles/gerritbot/tasks/main.yaml we are defaulting to uid 11000 with the idea that other bot users could be 11001 etc or 12000 and so on. What I'm not sure about and questioning now is if we install a distro package that creates a new system user will it create that as uid 11001 after14:23
clarkbwe create that user?14:23
fungiit will if 11000 is in the range adduser is willing to use14:24
fungion review.o.o, adduser.conf lists LAST_UID=59999 and LAST_GID=59999 so, yes it will assume no new users should be created lower than 11000 if there's a 11000 in passwd/groups14:25
clarkbfungi: a better example might be zk04.opendev.org. Since that has zk running as 10001 (I used it as an example here)14:26
fungiat one point we were setting FIRST_UID and FIRST_GID higher than 1000 i thought, but doesn't look like it now14:26
clarkbso  Igess the next question I have is this a problem and do we need to fix the existing zuul/nodepool/zk uid assignments14:27
clarkbfungi: I suppose as an alternative I can create it as a non system user?14:28
fungiit can become a problem if we rsync files or detach/attach a cinder volume during server replacements14:28
clarkbthen system packages will continue to use their normal range and we already manage our actual users with specific uids so that won't conflict?14:28
funginot sure what you mean by non system user14:29
clarkbfungi: a regular user eg system: false/no in that ansible task14:30
fungiif you're talking about adduser --system that normally causes it to pick a uid/gid from the "system" range rather than the normal user range14:30
clarkbfungi: yes I know. We have created the zuul/nodepool/zookeeper users with a high (10001) uid and set it as a system user in ansible14:31
clarkbI'm wondering if it would be better to not create these users as system users so that system packages can continue to pick from the normal range14:31
clarkbnormal range for system users I mean14:31
clarkbthen our non system users can have hardcoded uids and since we manage those directly we can keep them from getting too out of sync?14:32
fungii still don't understand. if we explicitly picked a uid/gid outside the "system" range then it's not really a system user anyway and "system" user uids/gids picked by the package maintscripts will be unaffected anyway14:33
clarkbfungi: ok I don't know what the system: yes in the ansible that I cargo culted from the zookeeper ansible will dothen14:35
clarkbfungi: is there no other flag for system vs not?14:35
funginot sure what you mean by flag14:35
clarkbhttps://review.opendev.org/c/opendev/system-config/+/816769/1/playbooks/roles/gerritbot/tasks/main.yaml line 1414:36
fungithe adduser manpage explains what adduser --system does, there are behaviors it switches besides just which uid/gid range is used14:36
clarkbmaybe that is a noop if we set our own uids and gids14:36
clarkbfungi: basically what I'm saying is we already do this for zuul/nodepool/zk. If this is a problem we not only need to rethink my gerritbot change but those systems as well potentially14:38
clarkbbut I'm not yet sure if there is a problem with what I have proposed14:39
clarkbit just occurred to me overnight that there may be and it was worth considering14:39
fungiwe went through this some years ago when we were still using puppet, and concluded that the sanest option was to create a gap between LAST_SYSTEM_UID and FIRST_UID where we could create our static uids14:40
fungianother alternative is to pick uids/gids strictly greater than LAST_UID and LAST_GID (59999 looks like)14:41
opendevreviewAlfredo Moralejo proposed opendev/system-config master: Add CentOS Stream 9 to AFS mirrors sync  https://review.opendev.org/c/opendev/system-config/+/81685214:42
clarkbfungi: we do set UID_MIN etc in opendev/system-config/playbooks/roles/base/users/files/Debian/login.defs15:15
clarkbI think that was the conclusion of the puppet stuff and it got ported to ansible15:15
fungioh, it's just not in adduser.conf15:16
clarkblooking at that I guess I should set this new user to 60001?15:16
clarkb(and then maybe one day we do the same with zuul/nodepool/zk?)15:16
clarkbfungi: maybe we should hold a node for my proposed change then install a package that adds a user and confirm the behafior?15:19
fungiso maybe adduser is obeying the values in login.defs rather than adduser.conf?15:20
clarkbya or not. That is why I'm wondering if we should test it. Seems like having a good answer to this is important to making whatever approach we take erliable as we apply it to other setups15:20
clarkbbasically invest in getting it right the first time then we can reapply that over and over again15:20
fungilooks like useradd goes by what's in login.defs, while adduser relies on adduser.conf15:24
mordredI love adduser vs useradd15:26
clarkbI guess now we need to figure out which ansible uses?15:27
clarkbfungi: I'm going to review some zuul changes, but then I'll come back to this and probably set up a held node and we can experiment with it. I do think it is worthwhile to sort out properly before we do this to too many things15:52
fungiyeah, i agree15:52
fungialso we should probably make login.defs and adduser.conf consistent15:53
fungione option would be to drop LAST_UID and LAST_GID to 9999 (that ought to be plenty for normal users without static uids/gids anyway)15:54
clarkboh ya I like that. Then we're no longer in conflict with the 10001 stuff15:58
*** marios is now known as marios|out16:16
clarkbtristanC: where is the Dockerfile for matrix-gerritbot?16:22
clarkbI didn't see it in the matrix-gerritbot software repo16:22
opendevreviewClark Boylan proposed opendev/system-config master: Run gerritbot with a user that will be shared with matrix-gerritbot  https://review.opendev.org/c/opendev/system-config/+/81676916:30
opendevreviewClark Boylan proposed opendev/system-config master: Run matrix-gerritbot with gerritbot user  https://review.opendev.org/c/opendev/system-config/+/81677016:30
clarkbfungi: ^ I put a forced failure in testinfra tests and I'll make a node hold.16:31
fungiexcellent16:31
clarkbheh I was going to clean up my old holds but they aren't there due to the zuul restarts. I'll check nodepool directly after this16:31
clarkbfrickler: corvus: ianw: you've each got at least one "leaked" hold node. I'm not sure if they are still in use or not so I'll leave them as is but if you get a chance to look on the nodepool side can you do that and delete the node(s) if not longer needed?16:36
clarkbor I can clean them up if you don't need them anymore (frickler has an oom debug node, corvus a registry debug node and ianw buildkit and gerrit 3.4 nodes. of these I suspect at least the gerrit 3.4 node is still used)16:37
opendevreviewClark Boylan proposed opendev/system-config master: Run haproxy-statsd as uid 1000  https://review.opendev.org/c/opendev/system-config/+/81676416:38
clarkbapparently you can override user in a straightforward manner but group info has to be in the groups file on the image? I'll get another patchset up for zk's related change but I think that means we should prefer setting only the user in docker-compose.yaml16:39
clarkbactually no zookeeper itself sets it uid:group too so I'll leave that one alone.16:40
clarkbbut the more I read docs the less sure I am this is correct :/16:41
tristanCclarkb: there is no Dockerfile, the image is built with nix16:45
opendevreviewClark Boylan proposed opendev/system-config master: Run haproxy-statsd as uid 1000  https://review.opendev.org/c/opendev/system-config/+/81676416:45
fricklerclarkb: the oom debug node can be removed (that helped in identifying the bullseye qemu issue), but I have two questions: a) how did you link it to "oom debug" when that info is no longer in zuul? b) how to properly clean it up, just a "nodepool delete"?16:46
clarkbfrickler: if you do a nodepool list --detail | grep hold you get all of the held nodes including their detailed message from the zuul hold side. I used that to identify what it was held for. Then ya you nodepool delete $nodeid on a nodepool node16:47
fungifrickler: nodepool list --detail16:47
fungithe comment from the autohold gets copied into the node info16:47
tristanCclarkb: is there something missing from the image?16:48
clarkbtristanC: I was mostly curious as I'm looking at changing how the bot runs a little bit and thought it might be nice to look at.16:48
tristanCclarkb: the image is defined as https://github.com/softwarefactory-project/gerritbot-matrix/blob/master/flake.nix#L56-L64  , here is the documentation https://nixos.org/manual/nixpkgs/stable/#sec-pkgs-dockerTools16:51
opendevreviewClark Boylan proposed opendev/system-config master: Run gerritbot with a user that will be shared with matrix-gerritbot  https://review.opendev.org/c/opendev/system-config/+/81676916:51
opendevreviewClark Boylan proposed opendev/system-config master: Run matrix-gerritbot with gerritbot user  https://review.opendev.org/c/opendev/system-config/+/81677016:51
clarkbtristanC: ^ related to that effort16:51
fricklerah, I missed the --detail thing, thx. node deleted16:52
clarkbfrickler: thanks!16:52
clarkbtristanC: I guess the container is defined to run as root in there? Will have to see how testing goes for overriding that16:53
clarkbIt was complaining about a lack of content in the groups file. I thought that was maybe related to a yaml parsing issue, but now wonder if no groups file exists at all and that is part of the issue?16:54
corvusclarkb: deleted16:56
clarkbmy rough plan for today is to get the haproxy statsd update changes in and if that looks happy maybe go for the zookeeper statsd changes too16:58
clarkbWhile also figuring out user stuff in 81676916:58
tristanCclarkb: the user is not defined, but yes it currently expect uid 0 for the home directory needed by openssh17:01
clarkbI've approved the bullseye updated for haproxy-statsd17:13
clarkbfungi: if you can rereview https://review.opendev.org/c/opendev/system-config/+/816764 that would be great. I'm not sure if the : before was causing problems or not but this makes it more consistent with other setups.17:14
*** jpena is now known as jpena|off17:22
clarkbfungi: ok 158.69.73.60 is up and running. Do you think I should run a useradd and an adduser for both system and non system users and see what the results are? then maybe install a package that comes with a system user/group (libvirt?)17:32
clarkbI'm going to start with the package install since that is easy to uninstall nd purge17:35
fungiyeah, those sound like good enough tests17:36
clarkbhuh libvirt doesn't add a user or group I swear that it did /me checks devstack17:37
fungiwell, also packages which add users are going to do so in the system (100-999 by default) range17:37
fungiclarkb: try installing snmpd?17:38
fungiit creates a Debian-snmp user and group17:38
clarkbI needed libvirt-daemon-system apparently17:40
clarkblooks like libvirt-dnsmasq got uid 112 which is the next system uid in the low range17:40
clarkb(thats a good indication we aren't breaking this too much)17:40
clarkbI'll try snmpd and then adduser/useradd stuff17:40
clarkbfungi: snmp is already there17:40
clarkbfrom before we add our stuff on top17:40
fungioh, right, this is less a test node and more a deployed server in that regard17:42
fungiclarkb: looking at my workstation, try installing usbmuxd17:44
fungishould create a usbmux user17:44
fungior try tcpdump, or postgresql17:44
fungitcpdump might be the best test since it has few deps and creates both a user and a group, whereas usbmux only creates a user17:46
clarkbfungi: https://paste.opendev.org/show/bRbYPCBvUhstKOGyCnNs/17:52
clarkbit seems that adduser/addgroup do what we want but useradd/groupadd do not17:53
clarkbIt seems that login.defs are ignored by adduser hence adding the 1000 uid17:53
clarkbthen useradd respects login.defs and adds the user as biggest uid +117:54
clarkbTo make these somewhat consistent with each other I guess we reduce our max values down to say 9999 in both login.defs and adduser.conf?17:54
clarkbalso anecdotally it seems that pacakge installs use adduser and not useradd, but they do system users and they all end up below in the range anyway17:55
clarkbheh we already include tcpdump in our test images so it is early in the list17:55
clarkbI'm going to manually edit login.defs uid max and see if it shifts down as we want17:56
fungiyeah, i say we adjust the max values in both login.defs and adduser.conf, and also adjust the regular minimums in adduser.conf to be consistent with our current login.defs17:59
fungithat gives us the option of either putting static users/groups for containerized services between 1001 and 1999, or above 999918:00
fungiwe seem to start our admin users at 200018:00
clarkbok confirmed lowering the maxes makes useradd and groupadd respect them even if thereare higher values existing18:00
clarkbfungi: yup I'm thinking we do higher than 9999 for containerized services will work well since we already do that for a number of them like zuuk/nodepool/zk18:00
clarkbgerrit is special for raisins not worth changing but if everything else can match up that way I think that would be good18:00
fungithough we should probably avoid values over 6000018:01
clarkbfungi: ya I doubt we'd need to go that high18:01
clarkblibvirt uses a high value like that fwiw18:01
clarkbso it seems some system pacakges may also explicitly go high18:01
fungiright18:01
clarkbfungi: did you want to put that change together since you spec'd it out earlier (all I did was some basic testing to confirm behavior)18:02
fungiyeah, i can, just a sec18:02
clarkbAnd then ya we can do 2000-9999 for system level normal users (sorry if that statement didn't make sense). The distro continues to have 0-999 for system level system users then we can use >=10000 for our container users18:03
clarkbah actually the ranges are this: 0-999 for distro system users/groups, 1000-1999 unallocated, 2000-2999 infra-root users, 3000-9999 non system users created by config mgmt, >=10000 free for use in containers18:06
clarkbspot checking things on nb0X we have letsencrypt as gid 10002 because nodepool group is 10001. This implies ansible is using useradd/groupadd and not addgroup/adduser18:09
clarkbI think this is ok and the next redeployment of those services will end up correcting that sort of thing18:10
clarkbany objection to me approving 816764 now?18:11
opendevreviewJeremy Stanley proposed opendev/system-config master: Lower UID/GID range max to make way for containers  https://review.opendev.org/c/opendev/system-config/+/81686918:11
opendevreviewMerged opendev/system-config master: Update haproxy-statsd to bullseye and python3.9  https://review.opendev.org/c/opendev/system-config/+/81676518:12
fungiclarkb: no objection18:12
fungialso 816869 is the account range adjustments as discussed18:12
clarkbyup +2'd as noted from my spot checking letsencrypt on nodepool and zuul nodes will be weird, but that will sort of self correct over time18:13
clarkbif we really wanted to we could probably chown all the 10002 group stuff over to say 3000 or whatever the actual next value is18:14
clarkbI'm thinking things like 816869 and 816771 might be best on not a friday. I won't object if others want to push them in but I'd like to get a bike ride in today as there is no rain and generally not worry about fixing that up over the weekend if it has a sad :)18:16
clarkbtristanC: so is there no way to run this as a different group? docker says docker: Error response from daemon: unable to find group 11000-i: no matching entries in group file. I am able to override uid:group for other containers that don't set the group either. This makes me wonder if it is something about the /etc/group file not existing at all?18:20
clarkboh wait I think I see it, that is a bug in my script edit18:22
clarkbugh18:22
clarkbthe -i is important :(18:22
opendevreviewClark Boylan proposed opendev/system-config master: Run gerritbot with a user that will be shared with matrix-gerritbot  https://review.opendev.org/c/opendev/system-config/+/81676918:24
opendevreviewClark Boylan proposed opendev/system-config master: Run matrix-gerritbot with gerritbot user  https://review.opendev.org/c/opendev/system-config/+/81677018:24
clarkbfungi: ^ thats rebased on top of your change with the test force failure removed18:24
tristanCclarkb: i think you would need to keep the image default user, but you should be able to map it to an arbritary host uid. Otherwise I can bake the uid you need in the image18:24
tristanCi mean if that is easier for you18:25
clarkbtristanC: isn't it just running a process though? so as long as we mind mount the files with the correct perms we are good?18:25
clarkbI think it would only be a problem if the executable isn't +x or readable for the different uid/gid or the bot tries to write to somewhere that needs different perms18:25
tristanCclarkb: i'm not entirely sure how docker handle the --user arg, i guess what needs to be checked is that the ~/.ssh/id_rsa key is readable18:29
tristanCclarkb: fwiw here are my notes about rootless podman with regards to sharing host uid with container: https://github.com/podenv/podenv/blob/main/docs/references/userns.md18:29
clarkbtristanC: yup the rest of that change chmods the contents of the bind mount to match18:30
clarkbtristanC: for mapping my concern with that is it seems docker uses a single mapping? which means that if you want to map uid 0 in one container to user foo and uid 0 in another container to user bar you can't? But maybe I'm missing soething important there18:31
clarkbhrm https://review.opendev.org/c/opendev/system-config/+/816765 only ran the promote job for the image in deploy18:54
clarkbI guess I'll need to manually pull and restart the container when the second change lands18:54
clarkbI'll go ahead and do that now for the first change update18:54
clarkbthat also updated the haproxy image so it will restart as well. Should be quick. Any objection to me doing that now?18:56
fungino objection18:57
* clarkb goes for it18:57
clarkbthats done I can reach opendev.org18:57
clarkbhttps://grafana.opendev.org/d/ZQmopePMz/opendev-load-balancer?orgId=1&from=now-5m&to=now shows new data appears to be arriving18:58
clarkbI'll repeat this once the second change lands and taht one likely won't want to restart haproxy, just the statsd process18:58
fungirackspace opened a ticket saying there's an outage or impending outage for the cacti trove instance, i haven't had a chance to look into it yet18:59
clarkbI'm going to eat lunch and when that is done plan to do the second haproxy-statsd restart19:05
opendevreviewMerged opendev/system-config master: Run haproxy-statsd as uid 1000  https://review.opendev.org/c/opendev/system-config/+/81676419:28
clarkbdeploy jobs udpated ^ automatically and https://grafana.opendev.org/d/ZQmopePMz/opendev-load-balancer?orgId=1&from=now-5m&to=now still shows new data arriving and the uids lgtm19:39
clarkbhttps://review.opendev.org/c/opendev/system-config/+/816762/ is probably reasonably safe to land now as a result?19:40
clarkbfungi: ^ what do you think? I am probably going to be in and out today as I try to enjoy the lack of rain19:40
fungiyep, i've approved it now19:43
fungithanks!19:43
fungiwe're gearing up for a prolonged wind event here, so need to switch gears shortly to rearrange things on the deck before i'm stuck doing it in the middle of a tempest19:44
mordredfungi: you should rig up some lines and pulleys so that you can treat the house like a sailboat and set the deck for the prevailing wind direction20:06
clarkbFungi's Moving Castle20:07
fungiyeah, for now i'm just battening down the hatches20:19
fungialso an aggressive last-minute vegetable harvest, because whatever we don't take off the plants, the wind will20:21
clarkbcorvus: service-zuul is failing because of zuul01 not having LE configuration20:30
corvusclarkb: ah, thx.  i'll see if i can fix that20:34
clarkbthe apache startup fails as a result is the actual error if I'm reading logs correctly20:35
corvusthat makes sense; maybe we should just go ahead and add the LE stuff to zuul01 even tho we're not using it20:42
corvuseither that, or remove the apache stuff20:42
clarkbya I think adding LE to it would be fine. Theoretically we'll want that in the near future anyway?20:43
corvusunless we want to separate web from schedulers20:44
clarkbah20:44
corvusi haven't thought that much ahead, and it may  not be worth thinking ahead until we see resource usage of both of them20:44
corvusunder the new regime20:44
corvus(does a zuul-web take up a lot of ram because it's a sort-of-scheduler?)20:45
opendevreviewMerged opendev/system-config master: Run zookeeper-statsd as the zookeeper user  https://review.opendev.org/c/opendev/system-config/+/81676220:46
opendevreviewMerged opendev/system-config master: Update zookeeper-statsd to python3.9 on bullseye  https://review.opendev.org/c/opendev/system-config/+/81676320:46
clarkbI'm not sure if ^ will have the same issue as the haproxy statsd update where the bullseye update doesn't actually trigger deploy jobs. I'll manually pull and up -d if necessary20:46
opendevreviewJames E. Blair proposed opendev/system-config master: Add LE config for zuul01  https://review.opendev.org/c/opendev/system-config/+/81690320:48
clarkbhrm the promote and the service-zookeeper jobs seem to run concurrently as well20:51
clarkbthe promote is probably quick enough that this isn't a real problem but I guess something else to look at20:52
clarkbthe zookeeper image has updated as well so its going through that similar to updating the haproxy. The playbook does one zk at a time to avoid outages20:54
clarkboh wait no the zookeeper images were already up to date20:56
clarkbonly the stats restarted. The order of the docker ps output changed20:56
clarkblooking at grafana we are still getting zk data so I think the first update is happy.20:57
clarkbI'm likely to be doing a school run during the second change's pass due to the hourly jobs queueing up here in a minute or two20:59
clarkbbut I don't expect any problems now that the first one is in and happy20:59
clarkbcorvus: check the note I left on 81690321:01
clarkbcorvus: I think you may want to add in the borg backup excludes too21:01
clarkbmaybe those should go in a group var though21:01
clarkbinfra-root I've approved the mailman3 spec as minor feedback updates were made only since we discussed at our last meeting21:03
clarkbconfirmed that the deploy job is the only one queued for the bullseye update on that container image. I'll manually pull and up -d when i get back from the school run21:05
corvusoh i missed i git add sorry21:07
opendevreviewJames E. Blair proposed opendev/system-config master: Add LE config for zuul01  https://review.opendev.org/c/opendev/system-config/+/81690321:07
corvusthat's what i thought i pushed :)21:08
opendevreviewMerged opendev/infra-specs master: Add a specification for Mailman 3  https://review.opendev.org/c/opendev/infra-specs/+/81099021:08
*** jonher_ is now known as jonher21:28
*** yoctozepto8 is now known as yoctozepto21:28
ianwclarkb: if you still have that window open, you can remove the buildkit nodes, the gerrit 3.4 one i still have up for poking at21:56
ianw(otherwise i'll do it later)21:56
fungiianw: i'll clean them up21:58
fungiand thanks!21:58
fungihope your saturday isn't intolerable21:58
ianwso far so good :)21:59
clarkbfungi: thanks.21:59
clarkbI'm about to do the statsd update on the zks21:59
fungii'm trying to figure out why setuptools seems to have suddenly become uninstallable in tripleo and devstack jobs as of about an hour ago22:00
ianwlooks like we've got a 9-stream image ready in rax-ord, so that's good22:00
ianwfriday night, let's release!22:01
fungiianw: the 9-stream work broke centos-7 image builds, there's a simple fix up22:01
* fungi checks22:01
clarkbfungi: if you have a link to one of those failures I can take a look as well once zk stats are done22:01
fungihttps://review.opendev.org/81681322:01
fungiit's project-config, so shouldn't block dib releases22:02
fungiclarkb: there are several linked in #openstack-infra in the last few minutes22:02
ianwyeah, that looks fine.  we probably need to think about that matching for 10-stream, but one thing at a time22:02
fungii was worried it could be related to the pbr bump, but seems to have started up hours after22:02
clarkbah cool I haven't cauhgt up on all irc channels yet22:03
clarkbfungi: ya the pbr bump was from wednesday22:03
clarkbor early yseterday. It doesn't pyproject.toml properly from what I've seen but existing users should be fine22:03
fungithe reqs update for the pbr bump merged around 1700 today22:03
fungibut still far earlier than these errors began, it seems22:03
clarkbpbr is setup_requires so requirements shouldn't really affect it much22:04
fungiright22:04
fungiianw: the nodes held for buildkit debugging are deleted now22:05
clarkbok I'm happy with zk stats. Updated the container on all three nodes and still getting data on grafana dashboards22:05
fungiawesome22:05
fungioh, that reminds me, i was going to check on that rax ticket about the cacti db22:05
fungii'll try to take a look at that shortly22:06
opendevreviewMerged openstack/project-config master: Fix haveged installation in CentOS7  https://review.opendev.org/c/openstack/project-config/+/81681322:12
fungiokay, so there were two tickets, one for the db behind wiki.openstack.org and one for the db behind cacti.openstack.org22:58
fungiboth services seem to be working fine now though23:01
corvusmy current plan is to restart zuul on master tomorrow morning and try a 2nd scheduler again.  i'll let it run as long as it runs without errors.  if there's an error, i'l triage it and may try to roll forward if it seems tractable.23:17
corvus(otherwise clear state and roll back to .4)23:17
fungii expect to be around, so happy to help or just keep an eye on it23:17
corvusi think it'll be interesting either way :)23:18
fungiyep!23:18
fungiseems like this is getting really close23:18
corvusyep i think so23:19
Clark[m]Sounds good. Not sure how much I'll be around though23:30

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!