Friday, 2023-09-01

opendevreviewMerged opendev/system-config master: Explicitly disable offline reindexing during project renames  https://review.opendev.org/c/opendev/system-config/+/88069200:14
fricklerclarkb: fungi: I like the idea of bumping rebuild-age, start doing it for non-latest images? what value would you suggest? 2d? 7d? something in between?10:01
fungii'd be fine with 7 for older images (centos 7/8, ubuntu bionic/focal, debian buster/bullseye...)11:36
Clark[m]++13:15
fungiand maybe even scaling back to 48 or 72 hours for current versions of our non-default image as those are also not used quite as much13:16
slittleplease set me up as first core for starlingx-app-intel-device-plugins16:09
fungislittle: starlingx-app-intel-device-plugins doesn't seem to exist. let me see if that's a typo or if something went sideways configuring the project16:16
fungistarlingx-app-intel-device-plugins-core i guess you meant16:16
fungislittle: done!16:17
clarkbmy laptop has started producing display artifacts on the built in and external displays :/ sorry been pokign at that a bit today16:17
clarkbBut I've given up for now and am swapped over to the desktop. I'll probably need to rma new shell16:17
fungimy isp has decided ipv6 should be a giant black hole, which has been frustrating me this morning16:17
clarkbit seems to do a fairly consistent transformation that shifts and tilts things over. I should probably try to do memcheck before starting any rma stuff. But my hunch is gpu problems16:18
fungiemergency kernel or firmware update overnight?16:19
clarkbno I rebooted after the last updates and things were working. Though  Isuppose it could be some latent bug too16:20
fungibut yeah, that does sound like something is degraded in video memory16:20
slittlethanks16:20
fungior maybe triggered by suspend/resume or dpms blanking and waking back up16:20
fungii've definitely seen my share of weird display driver bugs which only appeared after the display went to sleep and then turned back on16:21
clarkbya its fully powered off now to cool off. I'll see if a cold boot produces different behavior later. Where later might even be "problem for tuesday"16:21
fungioh, yep heat can do it too in some cases16:22
clarkbnone of the reported temps were above 55C at least16:22
fungiclarkb: on 893073 what do you think about putting gitea10-14 in emergency disable, taking gitea09 temporarily out of the haproxy pool, then merging the change and checking up on replication once it deploys?16:31
clarkbfungi: I think that could work16:31
fungior we could just approve it, cross our fingers and try to be on top of whatever unexpected problems might arise. it is a friday16:31
clarkbthe main thing would be rerunning the job or playbook to ensure the others get updated if all is well16:31
fungiright, which is why i'm not opposed to just merging it. the risks seem minimal and there's not a ton of activity right now anyway (it's already the weekend for half the globe, and in the usa lots of people are checking out early for the long holiday weekend)16:36
clarkbya and if we do send it in and something goes wrong with replication we're probably most likely to replace the gerrit key?16:36
fungiyes, i think so16:37
clarkbActually we can probably safely revert gitea too since the gitea version is fixed16:37
clarkbits the udnerlying OS that is changing so a revert should be fine16:37
fungitrue16:37
clarkbI'm coming around to just going for it given ^16:37
fungifairly straightforward to just fip back to the prior image16:37
fungier, flip back16:37
clarkbfungi: well the prior image will be gone I think16:37
clarkbwe'd need to do an actual revert and rebuild the old state16:37
clarkbbecause we use :latest16:37
clarkbI'm coming around to just sending it16:38
fungiand we prune the old images on the servers? or you mean avoiding locally changing the compose files16:38
clarkbfungi: we prune the images `docker image prune -f --filter "until=72h"` <- I think that 72h is from when the image is built not when it was deployed16:39
fungibut yeah, i don't anticipate any serious disruption, and if there is some impact to replication then slightly stale repos while we work through that are likely to go entirely unnoticed by !us16:39
clarkbso one thing we could do is a noop rebuild of gitea on bullseye then immediately do the bookworm update so the bullseye images are less than 72 hours old16:39
clarkbI think if we want to be careful that seems like the easiest safe option16:39
fungiwfm16:40
clarkbok give me a few and I can stack a new change under the bookworm chagne that does a noop rebuild16:40
clarkbone thing we need to be careful of is doing them in sequence such that the deploy for the first image doesn't run after the promote for the second image16:42
clarkbbceause in that case we'll just pull the latest bookworm image and not step through16:42
clarkbbasically that means approve the first one. Let it deploy then approve the second one16:43
fungiyep16:44
fungimakes sense16:44
opendevreviewClark Boylan proposed opendev/system-config master: Update Gitea images to bookworm  https://review.opendev.org/c/opendev/system-config/+/89307316:45
opendevreviewClark Boylan proposed opendev/system-config master: Rebuild gitea on bullseye  https://review.opendev.org/c/opendev/system-config/+/89353916:45
clarkbI think that will do it16:45
TheJuliaWho wants my pubkey for autohold request  0000000249 :)17:44
fungiTheJulia: i can take it17:45
fungiTheJulia: ssh root@104.239.174.9417:48
TheJuliaperfect, thanks!17:48
fungifun fact, my isp's ipv6 routing actually seems fine. it's just that i can't reach review.opendev.org over ipv6. i can reach our other servers in that same network over ipv6 just fine though17:49
fungianyone else having trouble getting to 2604:e100:1:0:f816:3eff:fe52:22de by any chance?17:49
TheJuliaworks for me17:54
fungioh, or maybe i can't reach other stuff there. that's vexxhost ca-ymq-117:54
fungiyeah, looks like my traceroute from home is dying somewhere after equinix ashburn/iad17:55
fungiprobably asymmetric return path dying elsewhere17:56
fungiyeah, looks like something's up with the gateway in vexxhost not being able to reach my isp17:58
clarkbfungi: does the mirror in ca-ymq-1 have similar problems for you?19:23
fungiyeah, it's that whole network from what i can tell19:25
fungilooks like the routers at least don't want to forward packets back to my 2600:6c60:5300:: isp19:26
fungior 2600:6c60:7003::19:27
fungiso probably anything in 2600:6c60::19:27
opendevreviewMerged opendev/system-config master: Rebuild gitea on bullseye  https://review.opendev.org/c/opendev/system-config/+/89353919:35
clarkbfungi: all of the giteas have restarted on the new imaegs (b7cda0718262 and 474f230c47d5)19:50
clarkbI think we can approve the bookworm update now19:50
clarkband since those two images are only an hour or so old they shouldn't get pruned when bookworm deploys19:51
fungidone19:52
fungii concur19:52
clarkbexcellent19:52
opendevreviewMerged opendev/system-config master: Update Gitea images to bookworm  https://review.opendev.org/c/opendev/system-config/+/89307321:29
clarkbstill waiting for gitea09 to upgrade21:31
clarkbI expect that will happen at any moment though21:31
clarkbgitea09 is running the bookworm image now21:32
clarkbI think the gerrit replication log shows replciation completing to gitea09 after gitea09 was updated21:35
clarkbbut its close so would be good to get more data21:35
clarkbok just now tempest had a replication event trigger (I think due to a review being posted not a chagne being pushed). I could see all but 13 succeed. 13 refused the connection, I believe its sshd was down as part of the ugprade. Then after the retry backoff it tried again and succeeded to gitea13.21:39
clarkbthat is looking good, but harder to confirm it worked since it wasn't a regular chagne ref I can fetch but a review intsead21:39
opendevreviewClark Boylan proposed opendev/system-config master: DNM Forced fail on Gerrit to check bookworm/java 17 update  https://review.opendev.org/c/opendev/system-config/+/89357121:45
clarkbI'll put a hold in place for ^ and then we can see if that ref is fetchable from the giteas21:45
clarkbthe deployment job did end up succeeding21:46
clarkbI was able to fetch ^ using `git fetch orgin refs/changes/71/893571/1` where originhttps://opendev.org/opendev/system-config (fetch) so I think this is all happy21:48
fungiyep21:56
fungisorry, stepped away at the wrong moment to eat21:57
fungibut it's all testing clean for me21:57
fungireplication looks good21:58
fungideploy job completed successfully so all the servers are updated21:59
clarkbyup. This is what we expected but given the trouble MINA had previously it was worth being cautious22:11
clarkbit was also nice to see that the logs show replication working across a gitea restart (when done properly at least)22:14
clarkbbuilds more confidence in the system22:14
fungiagreed22:15
clarkbheads up there is a jitsi release that just got made22:16
clarkbwe'll auto upgrade during the daily runs. They've all been happy so far though since you modernized the config22:16
fungiyeah, we should probably still test next week just to be sure22:29

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!