Monday, 2023-10-09

SvenKieskemhm, so I get this weird build error, do we have infra restrictions in place which external servers I can reach?08:53
SvenKieskeFailed to fetch https://td-agent-package-browser.herokuapp.com/lts/5/debian/bookworm/dists/bookworm/InRelease  Clearsigned file isn't valid, got 'NOSPLIT' (does the network require authentication?)08:53
SvenKieskebuild log: https://zuul.opendev.org/t/openstack/build/3fefdffe6d7641a581f862350945e796/log/kolla/build/000_FAILED_fluentd.log08:53
SvenKieskelooks like apt can't reach this network from the opendev infra to me. works reasonably well locally.08:54
SvenKieskeanother question: zigo: would it be possible to enable https for http://osbpo.debian.net ? I noticed it seems to only support http and we are installing gpg keys from it, without further verification which is kind of a problem08:57
SvenKieskezigo, see this review for further context :) https://review.opendev.org/c/openstack/ansible-collection-kolla/+/852240/comment/db0db1e9_9608917b/08:59
SvenKieskeI know it might not be possible to trivially enable https support, I just hope it is, thanks in advance :)09:00
zigoSvenKieske: Well, the question is rather: why do you wget the GPG key, instead of using extrepo which does the job with authentication ?09:00
SvenKieskezigo: good question! "I" don't do it. I was not familiar with this server to begin with. Could you kindly point me maybe to some code example where this is already done? I at least don't recall anything called "extrepo" currently, but that might be just my faulty brain.09:02
zigoapt-get install extrepo09:02
zigoextrepo enable openstack_bobcat09:02
zigoapt-get update09:02
zigoThat's all there is to it ... :)09:02
SvenKieskeah I see, it's on the debian wiki09:02
SvenKieskeinteresting09:02
SvenKieskethank you, will look into it09:03
zigoFYI, I was the one contributing the extrepo-offline-data package and the --offlinedata option ... :)09:03
zigoThough probably, at this point, we'd need an upload to bookworm-backports ...09:04
SvenKieskeI already noticed you are behind a lot of stuff when it comes to debian and openstack, so thanks for that :)09:04
zigoYeah ... :)09:04
zigoWell, packaging OpenStack in Debian since 2011 ... :)09:04
SvenKieskeI'll just need to dig in what extrepo really does and if it's suitable for our usecase on the kolla-ansible side09:04
zigoAbout extrepo: you can consider it to be the Debian counter-part of Ubuntu PPAs ...09:04
SvenKieskea link to the source of extrepo on the debian wiki would be convenient, I don't have currently an account over there to edit it myself09:05
SvenKieskeah okay09:05
zigoThat's where you may contribute to extrepo:09:05
zigohttps://salsa.debian.org/extrepo-team/extrepo-data/-/tree/master/repos/debian09:05
zigoA lot of stuff is already in there.09:05
zigoI always push the unofficial debian.net stable backports of OpenStack in there, from day one of the release.09:06
zigoIMO, that's a *way* safer than just using https.09:06
SvenKieskehttps would still be a nice addition, but if the gpg keys are retrieved in a secure way that's also sufficient. but it seems there are people who don't know how to install a debian third party repo securely (me included) because the tools where not known, so thanks!09:13
zigoIt's  been only 3 or 4 years extrepo exists ...09:21
zigo:)09:21
SvenKieskeI'm still left with my problem to reach the td-agent aka fluent-package repo over at herokuapp.com, it seems there might be some redirect shenanigans going on..11:53
*** elodilles is now known as elodilles_afk12:17
fricklerto me that looks like a broken repo setup, https://td-agent-package-browser.herokuapp.com/lts/5/debian/bookworm/dists/bookworm/InRelease has "Unable to load bucket: packages.treasuredata.com" as content. not much we can do about that12:17
fungiSvenKieske: catching up, seems like a lot of this ground has been trodden in our discussion in #openstack-kolla already (apologies, i read and responded there first because there was a nick highlight)12:29
SvenKieskefungi: no problem12:31
SvenKieskeI would still be glad if someone could check out https://zuul.opendev.org/t/openstack/build/3fefdffe6d7641a581f862350945e796/log/kolla/build/000_FAILED_fluentd.log because I have never seen this apt-get error and all the internet tells me is that it's a proxy issue, but I somehow doubt that.12:32
fungiso for starters, no "we" don't restrict what networks our test nodes are allowed to reach. we purposefully install empty/allow-all security groups in the providers who donate resources for job nodes and then restrict traffic locally on each node with an iptables/nftables ruleset which allows all egress connections (statefully), but jobs can and do adjust the firewall rules on the nodes so may12:32
fungiend up restricting their own ability to reach things. also the internet is an unstable place, and sometimes "you can't get there from here" as folks in maine like to say12:32
SvenKieskeyeah sure, I wasn't certain about the sec groups, so thanks for confirming this is not a problem. it's still weird. I still need to check the dns, might be related (because the original mirror url does not point to heroku)12:33
fungialso yes we could probably mirror the osbpo repository similarly to how we mirror uca (ubuntu cloud archive), the amount of data is in the same order of magnitude and we've got some free space on our fileservers at the moment: https://grafana.opendev.org/d/9871b26303/afs (i'll put this topic on the agenda for tomorrow's sysadmins meeting)12:33
fungialso, both wget and extrepo are more fragile than just serializing the key and embedding it in the job. why have every single job waste time and resources re-verifying a key if you can verify it once yourself and stick it in an ansible role? then it doesn't need to be retrieved at all, eliminating one more network interaction that could randomly break and cause a job to fail or get rerun12:36
fungi(wasting even more resources). for opendev's own jobs we embed public keys as a matter of course12:36
SvenKieskeyeah, that would also be my favorable solution, and I _think_ kevko will go this route, I did just review the code.12:36
SvenKieskeI'll make sure to leave a comment regarding this on the changeset.12:37
SvenKieskeso I'm not sure if these requests to heroku are directly related from the package installation of fluentd-package. the mirror fqdn is packages.treasuredata.com. which resolves to many ips on cloudfront, which forward the requests who-know-where..12:42
fungialso, if that's a cdn, it likely resolves to different ip addresses depending on who's asking and where they are on the internet12:44
SvenKieskeyeah, the joy of anycast :)12:44
fungiwell, also just using techniques like dns312:45
funginameservers returning different results based on things like geo-ip lookup for the client address12:45
SvenKieskeyeah, that's also possible12:45
SvenKieskesoo..should I just recheck the job? I hate to do that.12:46
SvenKieskethat build _did_ work, so maybe it was really just a spurious error: https://zuul.opendev.org/t/openstack/build/f18392d9f660473ea3b25d0be070b054/log/kolla/build/fluentd.log12:47
fungiwhen it looks like a network failure, i try to see if the site that the job failed to reach has anything like a service status page where they might list outages/incidents that could explain it and maybe even indicate whether it's still happening. beyond that, i recheck and include a message explaining that the previous build log indicated a network connection issue12:47
SvenKieskeseems reasonable: is there some list or something somewhere where I can correlate build machines with locations? I guess you just know them? :D12:48
fungibasically, my goal is to not waste resources rechecking something that's likely to hit the same exact network issue again, but our reality is that the internet is not reliable12:48
SvenKieskeyeah sure :) I was in a Ops centric role for a data center/webhoster for almost a decade, tell me more about internet reliability ;)12:49
fungiSvenKieske: if you mean where the job nodes are located in the world, the zuul inventory.yaml archived with the build results includes the donor provider's region name, and those are almost always based on icao or iata airport codes12:49
SvenKieskeah that's a good pointer!12:50
SvenKieskethank you12:50
fungithere's a telecommunications semi-standard about location-based device naming that some providers follow too, but i've been out of that industry for long enough i don't remember off the top of my head what it's called (also not everyone follows it, more popular for backbone providers than for isps)12:51
SvenKieskerax-dfw might be something rackspace like?12:52
SvenKieskeI just read about that some months ago, I guess also in the context of the openstack region names.. but I also don't remember the standards name12:53
fungiyes, rackspace is the provider there, and dfw is the dallas/fort worth texas airport code so it's somewhere in that area12:53
fungiit's not really anything openstack-oriented. all sorts of utility and service providers name their facilities based on similar patterns12:54
fungiaha, clli was the semi-standard i was trying to recall (but again, not everyone uses that, it's just an example): https://en.wikipedia.org/wiki/CLLI_code12:55
SvenKieskeyeah, I never saw that in europe; reminds me of airport names in aviation language..12:56
fungireusing nearby airport codes is more common among our donors, clli has its own set of location abbreviations12:57
SvenKieskesilly question, but are all our buildlogs in UTC? I think they are?12:58
fungido for example, the ovh regions we boot job nodes in are bhs1 (beauharnois, canada) and gra1 (gravelines, france)12:59
fungiSvenKieske: yes, build logs are in utc12:59
SvenKieskerackspace seems to be fine, the status page is a maze though..thanks again13:00
fungiSvenKieske: well, i was talking more about status pages for the services the job was trying (and failing) to reach13:01
fungiyou mentioned packages.treasuredata.com and cloudfront, so i would generally start by trying to see if they have status pages13:01
fungifor example, when i see issues getting packages from pypi (even through our proxies), i take a quick look at https://status.python.org/13:02
fungidockerhub and quay have status pages too where they post incidents, same for github, gitlab, et cetera13:03
SvenKieskeyeah, I'm a frequent visitor of githubstatus.com ;)13:04
fungianyway, i need to knock out some other morning tasks, i'll be back in a bit13:04
SvenKieskethanks for all the hints provided so far :)13:07
fungiyw14:03
fungiconfig-core: infra-root: anyone else available to review https://review.opendev.org/896943 (the dependency has merged now)?15:06
fungionce it's in, i'll let the openstack release managers to run a release test so we can confirm the signatures look right15:06
fungier, i'll let the openstack release managers know to run release test15:06
fricklerhttps://review.opendev.org/c/openstack/project-config/+/896944 is the one to review ;) but I'm also fine to self-approve given that it will be tested later anyway15:07
fungigah, right thanks. i keep copying/pasting the depends-on field instead of the correct url15:08
clarkbfungi: I'm just sitting down can take a look15:11
clarkbah looks like it is done15:12
fungiclarkb: as frickler correctly noted, i pasted the wrong change. i meant https://review.opendev.org/89694415:13
clarkboh ack15:13
clarkbdone15:13
clarkbinfra-root https://review.opendev.org/c/opendev/system-config/+/892699 has been on the todo list as an item for after the openstack release. That has happened now and I can make time today to help restart gerrit and ensure it is happy with its new runtime if we want to land it15:14
fungisounds great, i'll be around15:16
clarkbok I can approve it in a few15:17
fungiawesome15:17
fungior i can approve it if you're ready15:18
clarkbyup that works. I'm just settling in first (loading ssh keys, opening office window, finding something to drink)15:18
fungicool, i approved it just now15:20
clarkbthanks. I now have a cup of chai and ssh keys are loaded15:21
opendevreviewMerged openstack/project-config master: Replace 2023.2/Bobcat key with 2024.1/Caracal  https://review.opendev.org/c/openstack/project-config/+/89694415:25
clarkbfungi: I see the well behaved bots doing their thing against gitea, but no flood currently at least on gitea0915:28
clarkbwhich is good means we didn't accidentally block the well behaved bots and maybe we made an impact on the bad ones15:29
*** elodilles_afk is now known as elodilles15:37
clarkbI'm going ahead and starting on a gitea 1.21 change. There are no release notes/changelog updates yet but the template changes are annoying enough to get out of the way on their own15:40
opendevreviewClark Boylan proposed opendev/system-config master: WIP Update gitea to 1.21  https://review.opendev.org/c/opendev/system-config/+/89767916:02
clarkbI'm trying to keep myself distracted so I don't look at java today :)16:07
fungii must say, that's a compulsion i don't believe i've ever experienced16:08
fungii got mosh set up on my linux phone and confirmed i can get to the tmux session on my shell server where my irc client, mua, calendar, et cetera run, so i'll be able to get by without my netbook while i'm travelling next week16:09
clarkbthat sounds "fun"16:09
clarkbfungi: It is mostly a compulsion to fix the problem more than a desire to java. The wall I'm up against now is I really need access to the logs but they seem to be bitbucketed :/16:10
fungiit's working really well, just thankful i've got good enough eyesight to see a 80x25 terminal on there16:10
clarkbI've been debating a cheap chromebook running the debian overlay or whatever it is called these days as a backup device16:11
fungiwell, i now have two identical dead netbooks i need to send back to hk for repairs, because i procrastinated getting my backup one fixed16:12
clarkbthe gerrit update should be landing soon16:13
clarkbthere is a config update that I think is a noop to set java home (noop because we don't use the init script for gerrit in our containers and that value is used by the init script to know which java to run)16:14
clarkbonce that is in place we should be good to stop gerrit, move the replication waiting queue aside, then start gerrit16:14
fungii also just +1'd a change in gertty over mosh from my phone to my home workstation16:16
clarkbI was a big fan of my OG Droid phone with the slide out keyboard16:18
clarkbthey don't make them like that anymore16:18
clarkbmost devices do usb on the go now though so you can hook up other keyboard devices at least16:19
opendevreviewMerged opendev/system-config master: Update gerrit image to bookworm  https://review.opendev.org/c/opendev/system-config/+/89269916:19
fungii bought a lapdock for this, which is basically a usb type-c connected streaming kvm in the form of a slim laptop with a holder for the phone next to the screen, making it essentially a dual-screen system16:22
clarkboh neat16:22
fungican drag apps from the phone screen onto the laptop-sized screen with the pointer (the screen is also a touchscreen though)16:23
clarkbgerrit's config should be getting udpated as we speak16:23
clarkbjava home is updated on disk16:24
clarkband the job completed successfully. I think everything is ready if we are16:25
clarkbI think the process we want to use is stop gerrit (docker-compose down), mv the replication waiting/ tasks dir aside, docker-compose pull, then docker-compose up -d16:26
clarkbmaybe make note of the current iamge in use so that we can return to it if necessary16:26
clarkbshould we notify the openstack release team then go for it?16:27
clarkbfungi: ^16:27
clarkbopendevorg/gerrit   3.7       3a2e576abc9416:28
clarkbthat is the image we are currently running16:28
clarkb/home/gerrit2/review_site/data/replication/ref-updates/waiting is the dir to move aside16:30
fungistatus notice The Gerrit service on review.opendev.org will be offline momentarily while we restart it for a patch upgrade16:30
fungithat look right?16:30
clarkbfungi: maybe s/patch/runtime and platform/16:30
clarkbthough we'll also do some gerrit patch updates too due to the nature of our image rebuild process16:31
fungistatus notice The Gerrit service on review.opendev.org will be offline momentarily while we restart it for a combined runtime and platform upgrade16:31
clarkbyup lgtm16:31
fungishould i send that now? or are we not ready to start yet?16:31
clarkbI'm ready if you are. I can do the steps I outlined above16:32
fungi#status notice The Gerrit service on review.opendev.org will be offline momentarily while we restart it for a combined runtime and platform upgrade16:32
opendevstatusfungi: sending notice16:32
fungiyep, steps lgtm. thanks!16:32
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline momentarily while we restart it for a combined runtime and platform upgrade16:32
clarkbok starting momentarily16:32
clarkbI'm actually going to docker0-cmopose pull first as that should minimize the outage time16:33
fungigood idea16:33
clarkbok image pulled. Stopping, moving tasks ,and starting now16:34
opendevstatusfungi: finished sending notice16:35
clarkbINFO  com.google.gerrit.pgm.Daemon : Gerrit Code Review 3.7.5-45-g90bfc86419-dirty ready16:36
clarkbweb ui loads for me16:36
clarkbthe reported jvm looks correct too16:36
fungilgtm as well16:38
fungigertty has come back to online mode as well16:38
clarkbanother minor prep step towards 3.816:39
fungifor those not following #openstack-release, the updated key checks out: https://paste.opendev.org/show/bNky4Hv8qcBKcmexjB7p/17:29
opendevreviewClark Boylan proposed opendev/system-config master: WIP Update gitea to 1.21  https://review.opendev.org/c/opendev/system-config/+/89767917:31
clarkbif ^ gets gitea 1.21 further along then that will be an annoying thing to change, but probably worthwhile in the long run17:33
clarkbfungi: anything we should be doing to help prep for the mm3 migration THursday?17:43
funginot yet, i'll put together the system-config changes in a bit for reviewing17:45
opendevreviewClark Boylan proposed openstack/project-config master: Update the jeepyb gerrit build jobs to match current base image  https://review.opendev.org/c/openstack/project-config/+/89771017:47
clarkbThis change is a followup on the gerrit chagne we just put into production. Keeps everything aligned17:47
clarkband https://review.opendev.org/c/opendev/gear/+/895968 is the last update for container images in opendev I think. The remainder are in zuul17:48
clarkbI think I'm goign to pivot the zuul-registry update to just be a python update and stay on bullseye for now. Then we can clean up the other bullseye images at least17:48
clarkband once that is done we can add python3.12 images :)17:49
clarkboh storyboard is also lagging behind but I don't think the containerization went anywhere so would need to be updated when picked up? fungi does that make sense or should we update those images already?18:18
opendevreviewClark Boylan proposed opendev/lodgeit master: Update lodgeit container image to Python3.11 on Bookworm  https://review.opendev.org/c/opendev/lodgeit/+/89771118:21
clarkband I missed lodgeit18:21
fungiclarkb: yeah, i don't think there was any further work on the sb container configs18:34
clarkbarg changing the key type in the gitea change got the job to pass. I think I saw that gitea is requiring a minimum number of rsa bits and that key mustn't meet the criteria18:46
clarkbwe'll probably want to replace the key. In my test change I went to ed25519 to avoid needing to bump the bit count in the future again, but open to feedback on how we want ot approach that18:46
clarkbthis is the key that gerrit uses to replicate to gerrit18:46
clarkb*gerrit uses to replicate to gitea18:47
fungihow long is the current key?18:47
opendevreviewClark Boylan proposed opendev/lodgeit master: Update lodgeit container image to Python3.11 on Bookworm  https://review.opendev.org/c/opendev/lodgeit/+/89771118:51
opendevreviewClark Boylan proposed opendev/lodgeit master: Put regex multiline specifier at beginning of regex string  https://review.opendev.org/c/opendev/lodgeit/+/89771318:51
clarkbfungi: I'm not sure. Cna you determine that easily from the pubkey? gitea must be somehow18:53
fricklerssh-keygen with the proper option can tell you that. but I'm already too much offline to check the details19:24
clarkbthe minimum keylength is 3072 according to the git commit I found19:40
clarkblooks like we might be able to disable the min key legnth checks19:43
clarkbI'll update the change to do that instead of replcaging the key and we can discuss if we are happy with that19:43
opendevreviewClark Boylan proposed opendev/system-config master: WIP Update gitea to 1.21  https://review.opendev.org/c/opendev/system-config/+/89767919:45
opendevreviewClark Boylan proposed opendev/lodgeit master: Update lodgeit container image to Python3.11 on Bookworm  https://review.opendev.org/c/opendev/lodgeit/+/89771119:52
opendevreviewClark Boylan proposed opendev/lodgeit master: Replace inspect.getargspec with inspect.getfullargspec  https://review.opendev.org/c/opendev/lodgeit/+/89771419:52
clarkbI'm impressed that lodgeit manages to hit so many deprecations19:52
fungithe current default rsa keylength generated by ssh-keygen is 3072 bits. a recent pubkey i have is 1.5x as long as the one in inventory/service/group_vars/gitea.yaml20:05
fungiso probably 204820:05
opendevreviewClark Boylan proposed opendev/lodgeit master: Replace inspect.getargspec with inspect.signature  https://review.opendev.org/c/opendev/lodgeit/+/89771420:40
opendevreviewClark Boylan proposed opendev/lodgeit master: Update lodgeit container image to Python3.11 on Bookworm  https://review.opendev.org/c/opendev/lodgeit/+/89771120:40
clarkbok that config option does fix things for gitea 1.21. We should be able to use that until/when we decide to replace the key with something newer20:48
clarkbok I think that makes lodgeit py311 compatible20:52
clarkbI've updated the meeting agenda. Please add any other items or let me know if there are things that need to go in there21:24
clarkbarg I missed that https://review.opendev.org/c/opendev/system-config/+/895522 was still outstanding. The only thing it will do is update tags for plugins which probably in all cases didn't actually change between 3.7.4 and 3.7521:38
clarkblet me double check that. But I suspect we can land that and just not restart gerrit since it is equivalent to what we are running21:39
clarkbplugin-manager and webhooks do have differences. We don't really use either right now and the single commit updates to both of them seem minimal21:42
clarkbbut I'm happy to get that in and restart again in order to be properly in sync. I don't think it is urgent21:43
clarkbinfra-root ^ fyi a small annoying but not super urgent thing21:44
clarkbI need to pop out soon for some family errand stuff. Last call for meeting agenda updates21:59
clarkbagenda sent22:24
fungiah, i meant to add debian osbpo mirroring that got brought up in channel earlier today, but can cover it during open discussion, no worries22:37
opendevreviewSamuel Walladge proposed zuul/zuul-jobs master: Ensure the log dir exists before writing in it  https://review.opendev.org/c/zuul/zuul-jobs/+/89774323:23

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!