Friday, 2022-08-19

*** dviroel|afk is now known as dviroel00:25
opendevreviewTony Breeds proposed openstack/project-config master: Move the requirements-constraints job to the periodic-weekly pipeline  https://review.opendev.org/c/openstack/project-config/+/85373500:51
*** dviroel is now known as dviroel|out01:08
ianwi've just put a hold on a rocky job and hopefully can poke and rebuild01:44
ianwi rebuilt it and it gets a different machine-id in /boot/loader/entries/ each time04:24
ianwwhich suggest to me it's being written out04:25
ianw><fs> cat /etc/default/grub04:29
ianwGRUB_DEVICE=LABEL=guest-rootfs04:29
ianwthe image has this correct04:29
ianw><rescue> chroot /sysroot04:39
ianwFatal glibc error: CPU does not support x86-64-v204:39
ianwexcellent, i can't inspect it04:39
ianw022-08-19 05:02:54.974 | + cat /etc/machine-id05:03
ianw2022-08-19 05:02:54.976 | + grub2-mkconfig -o /boot/grub2/grub.cfg05:03
ianwthere is no machine-id before grub2-mkconfig05:04
ianw2022-08-19 05:08:34.477 | + cat /etc/machine-id05:09
ianw2022-08-19 05:08:34.478 | 766780bd5a4943ef8725c4ab6d7cc6db05:09
ianw2022-08-19 05:08:34.844 | + [[ -e /boot/loader/entries ]]05:10
ianw2022-08-19 05:08:34.845 | + grubby --info=ALL05:10
ianw2022-08-19 05:08:34.878 | grep: /boot/grub2/grubenv: No such file or directory05:10
ianw2022-08-19 05:08:34.895 | index=005:10
ianw2022-08-19 05:08:34.895 | kernel="/boot/vmlinuz-5.14.0-70.22.1.el9_0.x86_64"05:10
ianw2022-08-19 05:08:34.895 | args="ro console=tty0 console=ttyS0,115200 no_timer_check nofb nomodeset gfxpayload=text"05:10
ianw2022-08-19 05:08:34.895 | root="LABEL=guest-rootfs"05:10
ianwi.e. if the machine-id is there, it rewrites the bl entry05:10
*** soniya29 is now known as soniya29|ruck05:11
opendevreviewIan Wienand proposed openstack/diskimage-builder master: [dnm] rocky : create machine-id in 9  https://review.opendev.org/c/openstack/diskimage-builder/+/85357505:14
ianwthat seemed to work.  i can rework that to a mergable change06:35
*** soniya29|ruck is now known as soniya29|ruck|afk07:03
opendevreviewMerged openstack/project-config master: Move the requirements-constraints job to the periodic-weekly pipeline  https://review.opendev.org/c/openstack/project-config/+/85373507:34
*** soniya29|ruck|afk is now known as soniya29|ruck07:38
*** soniya29|ruck is now known as soniya29|ruck|lunch08:00
*** jpena|off is now known as jpena08:33
*** soniya29|ruck|lunch is now known as soniya29|ruck08:47
opendevreviewIan Wienand proposed openstack/diskimage-builder master: rocky : create machine-id in 9  https://review.opendev.org/c/openstack/diskimage-builder/+/85357508:57
*** rlandy_ is now known as rlandy10:18
*** dviroel|out is now known as dviroel11:26
*** dviroel is now known as dviroel|rover11:27
*** frenzy_friday is now known as frenzyfriday|afk11:35
*** tosky is now known as Guest53212:51
*** tosky_ is now known as tosky12:51
NeilHanlonianw: thank you for investigating and finding that for me. I'm going to do some investigation on the back here and see why this is all happening.12:58
*** frenzyfriday|afk is now known as frenzyfriday13:09
*** tbachman_ is now known as tbachman13:10
opendevreviewRafal Lewandowski proposed openstack/diskimage-builder master: - added elrepo element  https://review.opendev.org/c/openstack/diskimage-builder/+/85381713:25
*** jpena is now known as jpena|off13:45
*** dasm|off is now known as dasm13:59
*** rcastillo|rover is now known as rcastillo14:00
*** dviroel|rover is now known as dviroel|rover|lunch15:02
clarkbfungi: you reviewed 853575 but the rest of https://review.opendev.org/q/topic:testing-rootfs needs to land before that one can15:33
clarkbno rush though just wanted to call that out15:34
*** marios is now known as marios|out15:36
clarkbjohnsom: as a quick status check have you noticed any recent ssh issues?15:42
clarkbI expect that the expected fix has largely rolled out by this point (but it can take some time to incorporate updates like that in our disk images)15:43
johnsomWell, there was an IP swap issue: https://zuul.openstack.org/build/ddc9ebdff70748cdb6cf09fc87e05aaf/log/job-output.txt#198515:45
johnsomI wonder if that was related too15:45
clarkbthats a different issue aiui. TL;DR is openstack will sometimes reuse an IP while a host thinks it still has the IP15:46
clarkbthen you get a fight over the IP via ARP15:46
johnsomThis one still had it: https://zuul.openstack.org/build/e4bf1989542c4ca09d73b6d66181124c/log/job-output.txt15:46
johnsomhttps://zuul.openstack.org/build/e4bf1989542c4ca09d73b6d66181124c/log/job-output.txt#2547215:46
johnsomIt doesn't seem to be the 50+ occurrences in a day like it was.15:47
clarkbjohnsom: the rax-ord focal image (which that example ran on) updated about 16 hours ago. That job started about 17 hours ago. I think that was before the fix rolled out in that region15:48
clarkbok that is good news. I'll try to keep an eye out for new occurrences to see if we should tweak the ssh settings further15:49
johnsomAh, ok.15:49
johnsomYeah, I will let you know if I see a bunch again.15:49
*** dviroel|rover|lunch is now known as dviroel|rover16:09
*** rlandy is now known as rlandy|lunch16:17
*** rlandy|lunch is now known as rlandy16:50
*** rlandy is now known as rlandy|mtg17:24
clarkbfungi: thanks! I'll reapprove the child change sthat zuul -1'd due to not sharing queues17:36
opendevreviewMerged openstack/diskimage-builder master: Allow setting ROOT_LABEL from environment  https://review.opendev.org/c/openstack/diskimage-builder/+/85357317:53
*** rlandy|mtg is now known as rlandy18:00
*** rlandy is now known as rlandy|biab18:05
TheJuliaSo what controls the contents of mirrors? I ask because https://mirror.mtl01.iweb.opendev.org/centos-stream/9-stream/BaseOS/x86_64/os/ has a .treeinfo file, but lacks the entire images folder which is present at http://mirror.stream.centos.org/9-stream/BaseOS/x86_64/os/18:16
clarkbTheJulia: a cronjob runs https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror-update/files/centos-stream-mirror-update every few (4?) hours18:40
clarkbwe explicitly exclude images because they are huge18:40
TheJuliabut they are static and not updated like packages18:41
TheJuliaat least... they are not from what I've been told18:41
* TheJulia looks18:42
clarkbI think that isn't true for a lot of the images I don't know about centos 918:42
clarkbfedora for example added images all the time18:42
clarkbbut also raw disk is a concern18:42
clarkbwe also don't mirror source packages for similar reasons18:42
TheJulialast updated august 8th18:42
TheJuliaso I *believe* it is monthly18:42
clarkbwe're trying to meet the 80% need which is distro packages and things like images and source packages can be fetched from upstream when needed18:42
TheJuliathe conundrum ultimately is that I can't trust .treeinfo is valid because I can't grab the image assets it refers to because the folder is gone18:43
TheJuliaso then I have to split requests across public upstream mirror and local mirror18:43
TheJuliawhich is... not great :(18:44
clarkbI'm not an expert on rpm repos. I guess .treeinfo reflects index info that the normal http index doesn't? We could probably update that file. But no one has done it for any of the rpm mirroring and we've done this for years18:44
TheJuliawell, .treeinfo provides extra insight and some checksums for the images18:45
clarkbdoes dnf/yum/rpm even pull images natively? I always assumed those were separate request sanyway18:45
TheJuliathis is more so I can install a host directly from the repository18:45
TheJulias/repository/mirror18:45
TheJuliaso dnf/yum/rpm is after the host boots with the assets in /images/18:46
TheJuliaclarkb: https://mirror.mtl01.iweb.opendev.org/centos-stream/9-stream/BaseOS/x86_64/os/.treeinfo18:47
clarkbLooking at that we should also exclude the .treeinfo if we exclude images/ ? seems like that is images specific and doesn't apply to the packages/ dir18:48
clarkboh the very end does refer to packages18:48
TheJuliaif you guys nuke .treeinfo, I'm going to have to toss quite a bit of work18:49
TheJuliaor18:49
TheJuliajust use public mirrors18:49
clarkbwell I'm just trying to understand I don't know how any of this works other than "mirroring all the extra large content that most of our jobs don't use isn't currently viable"18:49
TheJuliaWell, I'm trying to make a job that needs images/pxeboot/* and images/install.img18:50
clarkbright I think my assumption was that anyone needing images/ or ppc packages for a cross compile or src packages for gdb would fetch them from an upstream mirror18:50
clarkband our mirrors would be used for the vast majority of package installs that jobs perform18:50
TheJuliaokay18:51
TheJuliaI'll just... bifriucate the mirror settings for my job and hope that it... worsk18:51
TheJuliaworks18:51
clarkbhttps://grafana.opendev.org/d/9871b26303/afs?orgId=1 shows disk usage18:52
clarkbThere are two main concerns (but one is currently a subset of the other). First is that an AFS volume can't hav emore than 2TB in it. We currently organize each repo into its own volume so that gives us a hard upper limit. The second is we have 4TB total disk space and are consuming 82% ish of that (and thats above our red warn threshold)18:55
TheJuliaokay18:55
clarkband as a side note, we're currently not mirroring new distros we add (like rocky linux) to see how well that performs18:56
TheJuliaThat seems like... it is going to be a major constraint as time moves forward18:57
clarkbI suspect if we suddenly converted all the cnetos jobs to rocky that we'd mirror it though. Its a question of popularity which has an impact on reliability and load when talking to upstream mirrors18:57
clarkbTheJulia: it is a big reason why I keep arguing for everyone to stop using centos as a stable distro :)18:57
clarkbI mean other than the fact that every is surprised at how often it breaks. It would allow us to pivot the current tooling to a stable alternative that would be easier to manage mirrors for example18:58
TheJuliaAnd take a higher development maintenance cost on the backend when something has been broken for months if not longer18:58
clarkbwe definitely can't mirror the world and we've always said that. This is why a lot of stuff we added caching proxies for18:58
TheJuliabottom line is, there is a huge tradeoff there18:58
clarkbIntead we try to focus on the thing swith the most impact and currently ubuntu and centos are that so they get mirrored18:59
TheJuliaI did see the caching proxy stuff poking around, that seems... interestring :)18:59
*** rlandy|biab is now known as rlandy19:00
clarkbre development maintenace I've also said I think there is value in testing master against centos in a limited fashion to find the broken. But that doens't need half a terabyte of mirror disk to accomplish19:00
TheJuliatruth be told, I'm *really* surprised that we're still mirroring centos719:01
clarkbA lot of jobs still use it iirc19:01
TheJuliathat is... worrying19:01
clarkbfor stable branches19:02
clarkbslowly those are getting pruned (some pruning happened recently too) and we may need to look and see if that is still needed19:02
clarkbone upside to testing broadly with stream is that it is harder to ignore when it doe sbreak. We have definitely seen a preference for punting or working around external breakages like that. Finding thta balanve is not easy19:05
clarkbwe ripped out tumbleweed because it was always broken and no one was caring for it.19:06
TheJuliaYeah, I remember that. :(19:09
TheJuliaThey also had some weird intentional breaks which made no sense19:10
NeilHanlonworking with stream has been, in a word, frustrating19:51
clarkbI think it is getting better. The rtt on getting bug fixes in has gone down. But it is still longer than you'd like for a CI system19:52
NeilHanloni agree. there are some rough edges but it's not unsurprising. it's one of those things that needs work and love to get better, which took me a while to realize. sorta the old 'complaining for the sake of complaining doesn't help anyone' adage19:53
NeilHanlonthere's just a duality of testing ahead and behind RHEL. it's a huge effort to constantly maintain what boils down to rolling updates, versus spending a bit of time every once and a while to resolve whatever changes might cause breakages in a minor release.. I imagine most will choose testing post-RHEL rather than pre, but my crystal ball doesn't19:55
NeilHanlonshow everything :) 19:55
TheJuliaAt least in my interactions, we see most issues that impact us hit centos far enough ahead of rhel that we can at least make progress on getting things fixed before it detonates downstream.... or that we can fix/route around well in advance of it breaking us20:18
clarkbA lot of the problems I've seen people run into really should've been an immediate revert in the stream distro. But I guess that isn't really possible with stream 8 due to the way package updates flow. and it is still difficult with 9?20:22
clarkbping was broken for over a month on stream 8 due to a packge update that should've been trivially reverted immediately when it was noticed. Instead there was a fix made that didn't fix it then another fix and I think it was the third fix that actually fixed it20:22
TheJuliaugh :(20:28
TheJuliaAt least on my end, 9 so far has been okay, 8 is where things were funky.20:29
*** rlandy is now known as rlandy|biab20:51
*** rlandy|biab is now known as rlandy21:12
*** dviroel|rover is now known as dviroel|rover|biab21:14
*** dasm is now known as dasm|off21:33
opendevreviewMerged openstack/diskimage-builder master: rocky : create machine-id in 9  https://review.opendev.org/c/openstack/diskimage-builder/+/85357522:21
*** dviroel|rover|biab is now known as dviroel|rover22:28
*** dviroel|rover is now known as dviroel|out22:50

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!