Friday, 2020-05-08

openstackgerritAdam Coldrick proposed opendev/storyboard master: Add an author_id parameter to the events endpoint  https://review.opendev.org/72626400:13
openstackgerritMerged opendev/system-config master: Organize zuul jobs in zuul.d/ dir  https://review.opendev.org/72239400:18
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: cabal-test: add build target job variable  https://review.opendev.org/72626600:22
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: haskell-stack-test: add build target job variable  https://review.opendev.org/72626700:22
ianwthe arm64v8/ubuntu image works, and has binaries in the littleaarch format00:25
ianwi guess this must actually come from the alpine arm64 images?00:31
ianwhrm, the python 3.7-slim container seems to work00:33
clarkbianw: but not 3.8?00:34
ianwahh, yeah 3.8 seems to work too ... just installing binary tools00:34
ianw looks right : /bin/ls:     file format elf64-littleaarch6400:36
ianwahhhhh ... objdump is leading me astray00:39
ianw  Machine:                           AMD x86-6400:39
ianwelfutils sees it00:39
ianwof course, the host objdump doesn't understand the e_machine type set in the elf header00:39
openstackgerritIan Wienand proposed opendev/system-config master: Build multi-arch python-base/python-builder  https://review.opendev.org/72626301:14
ianwclarkb: hahaha " ianw in theory zk on arm should be easy because jvm at least" ... https://github.com/31z4/zookeeper-docker/pull/9001:18
clarkbhrm that kinda makes me think doing debian + zk tarballs might be easiest :/01:42
ianw#11 [linux/amd64 builder 3/3] RUN assemble uWSGI02:06
ianw#11 592.5   Created wheel for uWSGI: filename=uWSGI-2.0.18-cp37-cp37m-linux_aarch64.whl size=529535 sha256=5ae5fc0c691bd90c6dda8730f5a746c6ae698db0a2d21dd3da42fdb2d701ae1802:07
ianwummmm why would the amd64 build create a aarch64.whl ...02:07
ianwhere it swapped, the arm64 builder started using the amd64 image i think02:31
openstackgerritMerged opendev/system-config master: nodepool-builder: fix servername  https://review.opendev.org/72603503:13
*** ykarel|away is now known as ykare04:18
ianwmordred / corvus : so ... i've learnt a lot but not enough :)  i've left comments in https://review.opendev.org/#/c/72626305:28
ianwin short, i think the intermediate registry is somewhat randomly returning either the amd64 or arm64 container05:29
ianwhttps://storyboard.openstack.org/#!/story/2007642 has links to logs showing this05:29
*** dpawlik has joined #opendev05:55
*** ysandeep|afk is now known as ysandeep06:12
*** DSpider has joined #opendev07:01
*** tosky has joined #opendev07:17
*** dtantsur|afk is now known as dtantsur07:39
*** ralonsoh has joined #opendev07:44
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role  https://review.opendev.org/72567808:02
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role  https://review.opendev.org/72567808:08
*** bhagyashris|ruck has joined #opendev08:09
openstackgerritMerged openstack/project-config master: Retire syntribos - Step 1  https://review.opendev.org/72623708:10
*** tkajinam has quit IRC08:13
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role  https://review.opendev.org/72567808:14
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role  https://review.opendev.org/72567808:18
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role  https://review.opendev.org/72567808:28
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role  https://review.opendev.org/72567808:29
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role  https://review.opendev.org/72567808:36
*** ysandeep is now known as ysandeep|lunch08:47
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role  https://review.opendev.org/72567809:01
*** ykare is now known as ykarel09:03
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: DNM: return linenumber in matchplay  https://review.opendev.org/72631209:09
*** kevinz has joined #opendev09:11
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add upload-artifactory role  https://review.opendev.org/72567809:24
*** ysandeep|lunch is now known as ysandeep09:51
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add ansible-lint rule to check owner and group is not preserved  https://review.opendev.org/72485510:09
*** owalsh has quit IRC10:15
*** owalsh has joined #opendev10:22
*** ykarel is now known as ykarel|lunch10:23
*** sshnaidm|afk is now known as sshnaidm|off10:31
*** ykarel|lunch is now known as ykarel11:08
*** Toshimichi-F82 has joined #opendev11:50
*** Toshimichi-F82 has quit IRC11:50
openstackgerritMonty Taylor proposed opendev/system-config master: Build multi-arch python-base/python-builder  https://review.opendev.org/72626312:07
*** tkajinam has joined #opendev12:27
*** lpetrut has joined #opendev12:33
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: cabal-test: add build target job variable  https://review.opendev.org/72626612:38
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: haskell-stack-test: add build target job variable  https://review.opendev.org/72626712:38
*** bolg has quit IRC12:49
openstackgerritMonty Taylor proposed opendev/system-config master: Explicitly target arm64 image on nb04  https://review.opendev.org/72637613:00
*** ysandeep is now known as ysandeep|brb13:02
*** ysandeep|brb is now known as ysandeep13:10
openstackgerritMonty Taylor proposed opendev/system-config master: Run accessbot less frequently  https://review.opendev.org/72637913:15
*** ykarel is now known as ykarel|afk13:23
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: cabal-test: add build target job variable  https://review.opendev.org/72626613:38
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: haskell-stack-test: add build target job variable  https://review.opendev.org/72626713:38
fungidisappearing for the weekly grocery pickup, but should hopefully be back before too long13:45
*** ysandeep is now known as ysandeep|away13:49
*** mlavalle has joined #opendev13:57
openstackgerritMerged opendev/system-config master: Run accessbot less frequently  https://review.opendev.org/72637914:02
*** lpetrut has quit IRC14:08
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: haskell-stack-test: add build target job variable  https://review.opendev.org/72626714:11
*** hashar has joined #opendev14:13
*** hashar is now known as hasharAway14:14
dmsimardWhere would I remove myself from infra-root emails like the crons ?14:24
mordreddmsimard: it's in a private ansible var on bridge ... do you also want to not be infra-root? or just want to avoid emails?14:26
dmsimardmordred: I sent an email recently: http://lists.openstack.org/pipermail/openstack-infra/2020-May/006627.html14:28
dmsimardI'll be around but I won't be able to contribute meaningfully14:31
dmsimardneed to afk, be back in a bit14:31
*** ykarel|afk is now known as ykarel14:33
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Revert "ensure-tox: use venv to install"  https://review.opendev.org/72640414:36
mordreddmsimard: ah - nod, yes. I think I was too sad you were going to register that we'd need to remove you from things :)14:39
mordreddmsimard: I have removed you from root emails - will you have time to send in a patch to remove yourself from root in the various places in system-config?14:40
*** lpetrut has joined #opendev14:41
mordredcorvus: when you have a second, feel like reviewing https://review.opendev.org/#/c/726263/ ?14:49
*** lpetrut_ has joined #opendev14:49
*** lpetrut has quit IRC14:52
openstackgerritMerged zuul/zuul-jobs master: Revert "ensure-tox: use venv to install"  https://review.opendev.org/72640414:58
dmsimardmordred: yeah, I'll send patches and hope I don't forget anything14:58
dmsimardHaven't got around to it yet14:59
mordreddmsimard: kk.14:59
avasstobiash: could you take a look at the callback config change whenever you have time? https://review.opendev.org/#/c/717260/15:04
tobiashavass: sure, I'll have a look at it later today15:04
avasstobiash: I'll just continue nagging you about if you forget it again :)15:05
avassoh, oops missed I was in opendev and not in zuul, sorry for the noise15:07
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Revert "Revert "ensure-tox: use venv to install""  https://review.opendev.org/72641315:12
*** lpetrut_ has quit IRC15:14
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Revert "Revert "ensure-tox: use venv to install""  https://review.opendev.org/72641315:14
mordredavass: make as much noise as you want ;)15:15
mordredclarkb: when you're up and moving - I have 3 patches up with an amended idea of what we can/should do with arm nodepool buiklders: https://review.opendev.org/#/q/topic:arm64-specific-tags15:17
mordredthe first system-config patch I think we should do regardless - but I think the second two will actually be a better experience end to end15:17
*** hasharAway has quit IRC15:20
*** hasharAway has joined #opendev15:21
*** tkajinam has quit IRC15:22
clarkblooks like the zuul.d change landed?15:23
clarkbI have a few changes to rebase now then :)15:23
clarkbmordred: looking at https://review.opendev.org/#/c/726372/1/.zuul.yaml is that going to break the whole manifest thing?15:24
clarkbbecause we'll push zuul/nodepool as arm64 only?15:24
clarkbrelated should we fix the manifest mixups ianw was noticing before landing https://review.opendev.org/#/c/726263/3/zuul.d/docker-images/python.yaml ?15:25
clarkb(a lot of how docker handles the metadata around this is still foreign to me which is why I'm asking questions)15:25
clarkbok I think I've answered my first question. We tag the arm64 specific builds as arm64. So I think that my concern is the case if you can get that image via pulling of latest15:30
clarkband I don't think you can so we should be opk15:30
avassmordred: why do we no_log looking for venv anyway?15:31
avassI guess I did it again :)15:31
clarkbmordred: given ^ should we maybe revert the multiarch build so that we don't mixup things there and have working x86 too? then separately build the arm64 tagged images as in https://review.opendev.org/#/c/726372/1/.zuul.yaml. Then once the mixups are fixed we can go back to multiarch for all of it?15:31
*** dpawlik has quit IRC15:32
*** hasharAway has quit IRC15:32
*** roman_g has joined #opendev15:33
*** diablo_rojo has joined #opendev15:35
diablo_rojocorvus, I went to go make  a meetpad and it seems there is an issue? Its throwing a 404: https://meetpad.opendev.org/virtual-ussuri-celebration. Apologies if I missed something about that in the channel logs.15:36
clarkbhttps://etherpad.opendev.org/p/virtual-ussuri-celebration is working at least15:38
clarkb[error] 225#225: *2719 open() "/usr/share/jitsi-meet/virtual-ussuri-celebration" failed (2: No such file or directory)15:40
clarkbdiablo_rojo: corvus: that looks like webserver misconfiguration?15:41
clarkbour nginx config has 'root /usr/share/jitsi-meet;' then for / we just turn on ssi15:44
diablo_rojoclarkb, interesting, so relatively easy fix?15:44
clarkbdiablo_rojo: maybe? I don't fully understand this yet15:44
diablo_rojoclarkb, that makes two of us lol15:45
clarkbwe also ahve a location match for /[:alnum:]+ rewriting to /15:45
clarkbwhich I would've expected this to hit15:45
clarkbwhich should force everything through index.html and the js15:45
clarkbits weird to me that nginx is looking on disk for that path given ^15:46
clarkbjitsi's images were updated 18 hours ago15:47
clarkbbut our nginx container is 2 days old15:47
clarkbmaybe they got out of sync? corvus do you understand what nginx should be doing there?15:47
fungiit was working after the http->https redirect config change merged, or so i thought15:48
fungii don't recall any further changes we made after that15:48
clarkbfungi: did we test that a meetpad worked or just / ? it looks like / is working fine15:50
fungiclarkb: diablo_rojo: it's something to do with the hyphens, i think15:50
clarkbits the next step that it tries to load padnames off of disk15:50
fungiroom/pad names with no hyphens work fine for me, but with hyphens i get the 40415:50
fungiso maybe we're missing - in the redirect character set15:51
fungilike /[:alnum:]+ probably doesn't cover -15:51
clarkbfungi: ah ya the exact regex is location ~ ^/([a-zA-Z0-9=\?]+)$ {15:51
clarkbwhich is slightly more than alnum but no -15:51
clarkbI bet that is it15:51
fungiso quick workaround is don't use hyphens, but we'll probably have a patch merged to add them in moments15:52
clarkbfungi: we'll need to crosscheck that jitsi can actually handle hyphens on its side (we know etherpad can)15:52
fungigood point15:52
fungieasiest way is probably just to try. i mean, it's currently broken with hyphens anyway15:53
fungiwe can presumably hand-edit the config and restart the nginx container?15:53
clarkbhttps://meet.jit.si/foo-bar-opendev seems valid15:53
clarkbso ya I think we just edit that regex and it will eb good15:53
fungiyour test there is good enough to convince me. do you have a change in progress or shall i start one?15:54
clarkbI don't15:54
clarkbyou should go for it since you figured it out15:54
fungiworking on it now, in that case15:54
fungiheh, that sounds like incentive for me to stop figuring things out ;)15:54
clarkbfungi: its the meet.conf file somewhere in system-config (I was looking on the prod server)15:54
diablo_rojoThis was an excellent exchange to follow along with :)15:55
fungiwe have docker/jitsi-meet/web/rootfs/defaults/meet.conf and playbooks/roles/jitsi-meet/files/meet.conf15:55
clarkbfungi: the second one. (the first is image defaults which we inherited from upstream and second is our site config)15:55
clarkbthough if our site config update works we should update the docker image too15:56
fungiaha, yep, that is the conclusion i came to as well after diffing15:56
*** ykarel is now known as ykarel|away15:57
fungiany reason not to just edit them both in the same change?15:57
fungiany other characters we should add? maybe _15:58
fungietherpad.opendev.org and meet.jot.si both support _ based on my testing16:00
fungis/jot/jit/16:00
clarkbfungi: only that if it doesn't work for some reason its more untangling, but chances seem high it will work and _ seems like good addition too16:01
*** roman_g has quit IRC16:02
clarkbfungi: I've actually got a related change I need to rebase after the reorg16:03
openstackgerritDavid Moreau Simard proposed opendev/system-config master: Remove dmsimard from infra-root  https://review.opendev.org/72642916:05
openstackgerritClark Boylan proposed opendev/system-config master: Run jobs prod test jobs when docker images update  https://review.opendev.org/72003016:05
clarkbfungi: ^ that change will add a bit more testing. I don't think its critical to land your chnage but we should try to get that in for future stuff16:05
openstackgerritDavid Moreau Simard proposed openstack/project-config master: Remove dmsimard from accessbot  https://review.opendev.org/72643116:07
fungiclarkb: should i stack mine on that?16:08
fungihappy to review and merge that asap if it's working16:08
clarkbfungi: sure16:09
clarkbdmsimard: one minor thing on https://review.opendev.org/#/c/726429/1 due to how the automation works16:09
dmsimardyup, looking16:10
*** cmurphy is now known as cmorpheus16:10
dmsimardmakes sense16:10
clarkbdmsimard: I'll put that higher on my list to do16:11
clarkb(you don't need to worry about doing that rotation)16:11
dmsimardI had the feeling there was something like that because of the previous key named after a date16:11
dmsimardI won't do anything mean even if it takes a bit, I promise <316:11
openstackgerritJeremy Stanley proposed opendev/system-config master: Support hyphens and underscores for meetpad rooms  https://review.opendev.org/72643316:12
openstackgerritDavid Moreau Simard proposed opendev/system-config master: Remove dmsimard from infra-root  https://review.opendev.org/72642916:13
clarkbdmsimard: +2 on both changes now. I'm sad to do that but understand at the same time. Hope your new endeavors go well16:22
fungiclarkb: do you think it's safe to stop the dstat process on lists.o.o now? most recent oom was 2020-04-28 10:18:1016:23
clarkbfungi: yes I think we can consider that fixed now16:24
dmsimardclarkb: thanks!16:24
* fungi updates the sign on the wall to "10 days since our last mailman oom"16:24
fungi#status log terminated dstat process on lists.o.o after 10 days with no oom16:26
openstackstatusfungi: finished logging16:26
fungicorvus: just a heads up you might want to take a look at 726433 for the meetpad service... i assume those weren't omitted intentionally, but would rather be sure16:30
fungi(and alternatively, if there are other characters you think we shuold also add, i'm happy to amend the change to do that)16:31
clarkbinfra-root today is the container restart day16:32
clarkbI think we should start with gerrit since that should be a quick one? I'm not fully up to speed on all the chagnes that were made to gerrit so would be good if those that are can at least be around?16:32
fungii have some containers in the fridge with leftovers in them, happy to add them to the restarts16:32
clarkbI think we want to do `docker-compose down && docker-compose up -d` as ansible should alrady have a pretty up to date image there?16:33
fungibut yes, i think the gerrit restart will also mean we finally stop replicating to github16:33
clarkbyup I believe that is the major change16:33
clarkband for zuul I think we can get away with just restarting the scheduler, but can likely do a full restart of everything if that is helpful. I guess mergers would like the jemalloc update too?16:34
fungiclarkb: was patchset 3 for 720030 just a rebase?16:35
fungi(the interdiff is... large)16:35
clarkbfungi: yes to deal with the zuul.d reorg16:35
fungiclarkb: i expect that restarting all zuul services would be appreciated by the zuul maintainers, in preparation for finally tagging?16:36
clarkbI expect I'll be in a good spot to help with all that at about 1800UTC. Kids have class soon then I'll be without that distraction16:36
clarkbfungi: I think tagging was dependent of the zk tls stuff? testing for that is still in progress16:36
fungioh, okay. i thought that was still waiting on us16:36
clarkbfungi: well its testing of it in opendev16:37
fungimaybe it's still waiting on us to implement our side of it16:37
clarkbfungi: I think all of the config management stuff is now done except for nb03 and now its simply a matter of making tls work in opendev16:37
clarkb"simply"16:37
fungiin that case restarting all of zuul seems less urgent, but maybe the mergers at least16:37
fungiin addition to scheduler16:37
fungiwe did the executors recently-ish16:38
* fungi checks16:38
AJaegerdmsimard: I suggest you remove yourself from the channels, one more change needed on https://review.opendev.org/#/c/726431/1.16:38
AJaegerdmsimard: all the best and thanks!16:39
fungiclarkb: oh, actually the last mass restart of executors was 2020-04-2516:39
fungiso nearly two weeks ago16:40
dmsimardAJaeger: good catch, will fix, thanks for your ever vigilant reviews :D16:40
openstackgerritDavid Moreau Simard proposed openstack/project-config master: Remove dmsimard from accessbot  https://review.opendev.org/72643116:41
*** dtantsur is now known as dtantsur|afk16:42
fungiclarkb: though the last change to merge for anything under zuul/executor/ was on 2020-04-1616:42
mordredclarkb: no - we shouldn't revert the multi-arch - and the manifest mixups are a thing we should investigate but aren't immediately germane. the nodepool patch won't upload zuul/nodepool as arm64 only - it'll upload zuul/nodepool:latest as amd64+arm64 and then upload zuul/nodepool:arm64 as arm64 only. doing that will let us attempt actually getting rid of our arm64 control plane node and just using x86 for16:47
mordredall of them. just chatted with corvus about an idea for a followup that I'll write up in just a sec16:47
*** slittle1 has quit IRC16:48
AJaegerclarkb: want to review https://review.opendev.org/726431 again?16:53
*** slittle1 has joined #opendev16:58
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Revert "Revert "ensure-tox: use venv to install""  https://review.opendev.org/72641317:04
openstackgerritMerged openstack/project-config master: Remove dmsimard from accessbot  https://review.opendev.org/72643117:08
*** slittle1 has quit IRC17:08
*** slittle1 has joined #opendev17:14
clarkbmordred: is there any concern that we won't be able to properly update our x86 nodepool launcers while we sort out the mixup ? I'm trying to understand the breadth of this17:22
mordredclarkb: nope. they should all be fine17:22
clarkbmordred: eg restarting nb01.opendev.org right now might make it unhappy? which isn't the ened of the world because we'll just use existing images for a while?17:22
clarkbk, I need to pop out for a bit but will return to rereview thos ewith new infos and then hpefully do the planned service restarts?17:23
mordredclarkb: it shouldn't - the only mixup inthe mix may have something to do with copying images around between intermediate and buildset registries. or it may not - but the published images should be fine17:23
mordredclarkb: yay service restarts17:23
clarkbmordred: gotcha so pulling from dockerhub in production should be fine, its the test infrastructure that gets confused17:23
clarkb(so that inhibits our ability to know if things will work for prod but we can yolo if we want)17:24
mordredclarkb: yeah - and there we don't actually know what the issue is - we only saw it that one time17:24
clarkbok I was worried docker hub had the same issue, understanding it does not or shouldn't helps clarify things for me17:24
mordred++17:24
clarkbalright back in a bout 20 minutes17:25
mordredclarkb: there's also basically 2 different things in that stack17:25
mordredthe first is just making multi-arch python-base - befcause we DO install a platform-dependent thing in python-base, so we need to anyway.17:25
mordredthe second two are for attempting to try something new to ultimately allow us to get out of the business of running an arm64 control plane host - but we need to trial-run that still17:26
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Remove install-* roles  https://review.opendev.org/71932217:29
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Fail and direct user to use ensure-* version of roles  https://review.opendev.org/72644817:29
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Revert "Revert "ensure-tox: use venv to install""  https://review.opendev.org/72641317:31
*** slittle1 has quit IRC17:32
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Remove install-* roles  https://review.opendev.org/71932217:33
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: DNM: return linenumber in matchplay  https://review.opendev.org/72631217:34
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Fix bad path in ansible-lint test job files  https://review.opendev.org/72644917:35
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: DNM: return linenumber in matchplay  https://review.opendev.org/72631217:37
*** slittle1 has joined #opendev17:38
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: ansible-lint-rules: Fix bad path and filename  https://review.opendev.org/72644917:39
clarkbok I was a bit slower than I expected but here now18:01
clarkbinfra-root does nowish work for gerrit restart?18:01
clarkbI'm going to check that the configs updated on review for replication changes as expected18:01
clarkbI see no github stuff in review.o.o:/home/gerrit2/review_site/etc/replication.conf18:02
clarkbmordred: fungi corvus ^ if you think we are good to proceed I'll go notify the release team and can do the docker-compose down && docker-compose up -d as well. Mostly just hoping I'm not the only set of eyeballs if something goes sideways18:02
*** ralonsoh has quit IRC18:08
mordredclarkb: yes - I think we're good to proceed - and I'm here18:09
fungiclarkb: yep, i'm around18:09
clarkbmordred: cool, is my `docker-compose down && docker-compose up -d` the process you expect to be used as well? I don't think I need to pull as the latest image should already be there?18:09
fungisorry, was just rewiring the networking at my workbench for a minute18:09
clarkbI've notifieid the release team, haven't heard any complaints, and I didn't see any release jobs when I checked 10 minutes ago either18:10
clarkbI think I'm good to type the commands if those comamnds look good to you all18:10
mordredclarkb: I also don't expect to need a pull18:12
mordredclarkb: so - yes18:12
clarkbok I'm starting with those commands in review.o.o:/etc/gerrit-compose now (I'll sudo them too)18:12
*** roman_g has joined #opendev18:12
clarkbcommands have been run gerrit should be coming back up again now18:13
* fungi reloads browser impatiently18:14
clarkblogs report gerrit is ready18:14
fungithere it is18:14
clarkbapache seems to agree now as well18:14
fungilgtm18:14
clarkbI guess the next thing to do is confirm no github replication? I don't know how to do that directly though18:15
clarkbI'm mostly ok assuming the config file being up to date is sufficient :)18:15
fungievent stream18:15
fungibut yeah, i suspect it's fine18:15
clarkbthat takes us to zuul18:15
clarkbmordred: earlier fungi and I were discussing if we want to do a full zuul restart or just mergers and scheduler to pick up the jemalloc removal in the containers18:16
clarkbcorvus: ^ you may also have thoughts on this18:16
mordredclarkb: I think I defer to corvus on that one18:16
fungito reiterate, we did a mass executor restart on april 2518:17
fungithere haven't been any new changes merged under zuul/executor/ since april 1618:17
openstackgerritMerged opendev/system-config master: Run jobs prod test jobs when docker images update  https://review.opendev.org/72003018:18
fungibut maybe there are other reasons we might want a restart there18:18
clarkbthis is me thinking out loud here: maybe we should update our restart zuul globally playbook to handle the new container situation if not done already. Then just restart everything since we have a playbook to do it?18:18
clarkbrestarting the scheduler means we'll lose all running jobs anyway so we can't optimize for keeping jobs around18:19
clarkband usually thats the thing we optimize for with restarts on subsets of services iirc18:19
corvusi can't think of a reason for executor restart, but a scheduler restart would be great18:20
corvus(sorry, i'm not fully around atm)18:20
clarkbzuul_restart.yaml has been updated for the new conatiner situation18:21
mordredclarkb: I believe it ... yes18:21
mordredclarkb: I'm not 100% sure we've added queue saving and restoring there18:22
clarkbmordred: ya that bit is still manual18:22
*** roman_g has quit IRC18:22
clarkbalso I think there is a bug in the start.yaml playbook18:22
clarkbwe start scheduler before mergers18:22
clarkbI believe mergers have to be first to enable the scheduler to load configs18:22
fungii don't see a ton of reason to spend time optimizing our restarts for queue preservation anyway, that effort's better spent getting the high-availability work finished so we never have to think about it again18:24
fungiafter all, we can use the periodic queue snapshots to manually reenqueue stuff after the scheduler restart occurs anyway. it's a fairly idempotent process18:25
* clarkb is trying to find the old playbook and derping with git. but double checking mergers first thought18:25
clarkbthe old playbook used the current order so I expect it is actually fine as is18:26
clarkbconsidering that this seems like a good time to exercise the new playbook and just restart everything?18:26
clarkband now I've got to learn what the proper way to run ansible on bridge is (so ya this is a good exercise)18:30
clarkbfungi: mordred: I've started a root screen on bridge (sorry about the terminal size)18:31
clarkbI'm in /home/zuul/src/opendev.org/openstack/system-config/playbooks and I think I want to run ansible-playbook -f 20 ./zuul_restart.yaml. And before running that we need to grab queues so they can be restored?18:32
clarkbmaybe someone else can do the queues and I'll run the playbook if that plan sounds good?18:35
fungii've joined18:36
clarkbhrm one more thing to check, if the playbook will wait for zuul executors to properly stop ebfore starting them again18:37
*** avass has quit IRC18:37
fungijust a sec i'll take a look at the current queue state and work out what likely needs to be grabbed18:37
fungivexxhost has a couple changes in flight which will probably clear in the next few minutes18:38
clarkbfungi: k I think I found a bug so we aren't in a rush :)18:38
fungiother than that, only the openstack tenant has any real activity at the moment18:38
clarkbon the start side we include_role zuul-executor then limit to start.yaml but there is no start.yaml unlike the other services18:39
clarkbI think we can either add a start.yaml that just ensures service is running, or replace the include role with a task that does that18:39
clarkbI'll write a change for start.yaml so it is symmetric with the other roles18:40
clarkbthen maybe we land that and do this after PDT lunch?18:41
openstackgerritMerged opendev/system-config master: Support hyphens and underscores for meetpad rooms  https://review.opendev.org/72643318:43
openstackgerritClark Boylan proposed opendev/system-config master: Add a start.yaml to zuul-executor role  https://review.opendev.org/72645318:44
clarkbinfra-root ^ I'm thinking getting to a point where we can use that playbook will be a good idea now :)18:45
clarkbmordred: I'm thinking about docker builds again now. Re https://review.opendev.org/#/c/726372/1/.zuul.yaml thats purely to make testing work?18:46
clarkbmordred: any concern that if we land https://review.opendev.org/#/c/726263/3/zuul.d/docker-images/python.yaml before fixing the mixups we'll break a lot more testing?18:46
clarkbI think ^ is my big concern with that stack now since a lot of pythony things consume those base images now and we'd be potentially breaking the testing they do?18:47
fungiyeah, after pdt lunch wfm, i'm probably going to go run another quick errand around 19:15z or so18:48
fungibut should be back before 20z18:49
mordredclarkb: no - actually ...18:49
mordredclarkb: that is for reals - as in, it occurred to me that with the docker binfmt support, we do not actually need to run arm64 nodepool-builder on arm64 - so that's what that is aiming to allow us to do18:50
mordredclarkb: for the other thing - let's circle up with corvus when he's back online18:51
mordredclarkb: I was thinking it's not an issue - but you make a good point that if we don't understand what the issue is that we had there, we could be breaking the gate18:51
clarkbmordred: because wecant override arch docker pull with binfmt but could with a specific tag?18:52
clarkbmordred: ya that us my concern with puthon-basr18:52
clarkbbah cant type, started figuring out lunch18:52
mordredclarkb: yes - that's right - youc an tell docker to run a tag that's specifically a different arch18:52
mordredclarkb: and if you have the binfmt stuff installed, it actually just works18:52
mordredclarkb: once you're done with lunch, I'll tell you the next even crazier part :)18:53
corvusi'm back from exercise for a few mins before lunch18:53
mordredcorvus: so - tl;dr ...18:53
mordredcorvus: the thing we were disucssing this morning that we shelved for later ...18:53
mordredcorvus: clarkb has brought up a concern that with us not understanding the issue, and having many things consuming python-base images, that we could be introducing gate breakage18:54
mordredcorvus: but this might be an after-lunch issue18:54
corvusthe "sometimes something gets the wrong image arch issue"?18:54
mordredyeah18:54
mordredcorvus: I'm digging in to try to understand that a bit18:55
corvusso the suggestion is that we need to figure that out before we publish any multi-arch python-base images?18:55
mordredcorvus: yeah - out of fear that, since they are base for other things, we might break ourselves18:55
clarkband it could be adding those images is the fix to what we were seeing18:56
clarkbbut understanding that first would be good18:56
corvusokay.  well, i was going to spend today trying to figure out why our zuul and nodepool gate tests don't do anything.18:57
corvusbut i could switch to this instead if folks think that's more important18:57
mordredcorvus: I'll dig in to this one for a bit first18:57
mordredand see what I can learn while you work on the other thing18:57
clarkb(I think the testibg if existing stuff might be more important since that helps us with the existing things)18:58
corvusokay, i'll plan on working on the gate tests after lunch then.18:58
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Fail and direct user to use ensure-* version of roles  https://review.opendev.org/72644819:03
*** dpawlik has joined #opendev19:03
openstackgerritMonty Taylor proposed opendev/system-config master: Build multi-arch uwsgi images  https://review.opendev.org/72645819:06
fungii'm popping out for a quick errand, should be back in ~45 minutes19:11
openstackgerritAndreas Jaeger proposed openstack/project-config master: Retire x/pbrx - part 1  https://review.opendev.org/72646119:16
openstackgerritAndreas Jaeger proposed openstack/project-config master: Retire x/pbrx - part 1  https://review.opendev.org/72646119:21
openstackgerritAndreas Jaeger proposed openstack/project-config master: Finish retiring x/pbrx  https://review.opendev.org/72646319:21
openstackgerritAndreas Jaeger proposed openstack/project-config master: Retire x/pbrx - part 1  https://review.opendev.org/72646119:32
openstackgerritAndreas Jaeger proposed openstack/project-config master: Finish retiring x/pbrx  https://review.opendev.org/72646319:32
AJaegermordred: and myself just discussed to retire this ^19:33
clarkbinfra-root https://review.opendev.org/#/c/726453/ is the change that would be good to review andhopefully land for the zuul restart19:33
clarkbas an alternative we can just do a quick scheduelr restart and land ^ later19:47
fungiback from errand and looking at that change now while scarfing 'za19:53
clarkbfungi: I think I'm coming around to simply restarting the scheduler19:53
clarkbeverything else we can restart with minimal impact19:53
fungiis the system-config-run-base-arm64 failure expected?19:53
clarkband its looking like a quiet friday afternoon19:54
clarkbfungi: yes19:54
fungiit's voting though, so we can't merge it normally19:54
clarkbfungi: that pipeline doesn't vote19:55
clarkbso its voting +0 basically19:55
clarkb(or -0 depending on how you look at it :) )19:55
clarkbfungi: I think I've come around to just doing the scheduler its the important bit and wouold be good to get that behind us19:56
mordredclarkb: typo19:56
fungioh... gertty's combining the pipeline reports19:56
fungiignore me19:56
openstackgerritClark Boylan proposed opendev/system-config master: Add a start.yaml to zuul-executor role  https://review.opendev.org/72645319:57
clarkbmordred: ^ thanks19:57
*** avass has joined #opendev19:59
mordredcorvus: ok - I think we've repeated the issue, and confirmed that it seems to be non-deterministic20:01
mordredcorvus: https://zuul.opendev.org/t/openstack/build/ce8d743c1fa04841a34de543366a3bf1/log/job-output.txt#91820:01
corvusmordred: i'm back20:01
mordredcorvus: if you look at that, you'll see if fetching the same shas for each arch20:01
corvusmordred: what is that job doing specifically?20:01
mordredcorvus: and ... the other job for that build, https://zuul.opendev.org/t/openstack/build/65c6686dae3544d0b55f18f72199c1e3 - correctly fails due to lack of arm wheels20:02
mordredcorvus: using the uwsgi builds to test building something using the multi-arch python-base as a parent20:02
*** diablo_rojo has quit IRC20:02
clarkbfungi: `/opt/zuul/tools/zuul-changes.py https://zuul.opendev.org` seems to grab all the queues which we can then filter out the results for reenqueing if necessary20:02
mordredcorvus: we should expect the arm builds to fail - because there are no wheels and the images don't install build tools20:02
clarkbfungi: so I'm thinking we run that, the docker-compose down && docker-compose up -d in /etc/zuul-scheduler20:03
mordredbut they "succeed" sometimes - which looks like them saying they're running arm but are actually running x8620:03
mordredcorvus: I have not yet sorted _why_ this is happening - that's next20:03
corvusmordred: so that link you linked, that's buildkit fetching what should be a multi-arch python-base image from dockerhub, and it's getting the wrong arch?20:03
mordredcorvus: well - this is a child job of the job that built python-base multi-arch - so it actually should be fetching those images from the buildset registry20:04
corvusmordred: :( that's tons of variables20:04
mordredyeah. I'm not thrilled about it20:04
mordredcorvus: mostly just wanted to let you know I did reproduce the issue - albeit not yet in a consistent manner20:05
corvusmordred: so to revise: that link is buildkit fetching what should be a multi-arch python-base image from the buildset registry, and it's getting the wrong arch.20:06
clarkbinfra-root: I think that is my plan now. I'm going to run `/opt/zuul/tools/zuul-changes.py https://zuul.opendev.org > queues.sh ; cd /etc/zuul-scheduler; sudo docker-compose down ; sudo docker-compose up -d; wait for scheduelr to be up then execute bash queues.sh this is straightfowrard and gets us onto a non jemalloc scheduler image20:06
*** avass has quit IRC20:07
mordredclarkb: yes20:07
mordredgah20:07
mordredcorvus: yes20:07
mordredclarkb: seems reasonable20:07
clarkbI've notified release team of my intent to do that real soon now and I've checked they don't have any jobs running (they don't)20:08
clarkbI'll give it a couple minutes for anyone to object otherwise I'm proceeding :)20:08
clarkbfungi: ^ you were helping with things earlier so want to make sure you have a chance to see that too20:09
fungiclarkb: catching back up... we've been running `python /opt/zuul/tools/zuul-changes.py http://zuul.opendev.org >queue.sh`20:09
fungithat seems to still work as expected20:10
clarkbfungi: ya thats basically the command i've got except for the python prefix (seems to work either way)20:10
fungiokay, cool20:10
*** avass has joined #opendev20:11
fungiyeah, that plan looks sane20:11
clarkbok I'm proceeding with it now then20:11
fungii'm ready and have non-pizza hands now20:11
corvusmordred: at this point, we're pretty sure that we can have buildkit push multi-arch to dockerhub.  we also think we can have buildkit push multi-arch to the buildset registry, but i'm not sure we've fully tested that.  we also haven't examined skopeo copying from the buildset registry to the intermediate registry.  nor have we examined skopeo copying from the intermediate registry to the buildset registry.20:11
clarkbI've saved the queues and am downing and upping zuul next20:12
clarkb*zuul-scheduler20:12
*** dpawlik has quit IRC20:12
corvusmordred: so i guess we need to follow that sequence -- first make sure that the push from buildkit to the BR is okay, then see what skopeo does from BR to IR; then same from IR to BR20:12
*** hashar has joined #opendev20:13
clarkbits doing the things I expect20:13
corvusmordred: my guess though is that since we've basically not looked at skopeo at all, maybe it's not doing anything with multi-arch, and so there's an extra thing we need to do in the push-to-intermediate-registry and pull-from-intermediate registry roles.  and maybe it's the same thing we need to do to both roles.20:13
fungiand nowcat jobs are underway20:13
corvusmordred: i'm going to get started on the zuul/nodepool gate issue now20:15
clarkbit is up now20:16
clarkbrunning the queues.sh script20:16
fungiyep, i see executor interaction20:16
clarkbqueues.sh is done and I think thats it ?20:18
clarkbjobs are starting20:18
clarkb#status log Restarted gerrit container on review.opendev.org to pick up new replication config (no github, replication for github runs through zuul jobs now)20:20
openstackstatusclarkb: finished logging20:20
corvusthe reason that the nodepool job is failing is because wi didn't open iptables in the test20:21
clarkb#status log Restarted zuul-scheduler container on zuul01 to pick up the jemalloc removal in the containers which seems to address python memory leaks.20:21
corvuswe have firewall rules for the production nodepool servers, but not in the gate20:21
openstackstatusclarkb: finished logging20:21
corvuscan anyone think of a similar thing we have a gate test for i can model after?20:21
corvus(we need the gate-test-fake-nl01 to be able to talk to the gate-test-fake-zk01)20:22
clarkbcorvus: easy but maybe not super correct mode would be the multinode roles20:22
clarkbcorvus: they open all traffic between hosts20:22
clarkb(but then I guess our base iptables role may overwrite?)20:22
corvusyeah, would prefer to just set the iptables rule based on the ansible inventory20:23
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add --all to skopeo copy from insecure registry  https://review.opendev.org/72646920:23
mordredcorvus: ^^ it might just be that simple20:23
clarkbcorvus: I am not aware of any examples of that. I think if we set up /etc/hosts (using multinode roles?) then when the base iptables role runs it will configure things properly because /etc/hosts wins overdns?20:24
mordredcorvus: dealing with manifest lists is definitely a new thing for skopeo and there is a decent chunk of discussion about it - also, some of the early discussion was of the form "if there's a list just grab the best one"20:24
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add --all to skopeo copy from insecure registry  https://review.opendev.org/72646920:25
fungi"best"20:26
corvusmordred: neat, i know nalin from way back :)20:27
mordredclarkb, corvus : we could also have the iptables rules derived from the ansible inventory group ips20:27
mordredso more like "open port X from the hosts in group zkclient" vs "open port X on this list of IPs"20:27
mordredthat's -- a big bit of ansible magic thoug20:27
clarkbmordred: oh interesting20:27
mordredbut if we can figure it out and get it right (I still don't 100% know when ansible decides to not load up the group membership of something) - it's likely more maintainable long term?20:28
corvus{% for addr in host.hostname | dns_a -%}20:29
corvusthat's what we currently do20:29
corvusdns_a is a filter module we wrote20:29
corvus            # Note we use 'host' rather than something like20:29
corvus            # getaddrinfo so we actually query DNS and don't get any20:29
corvus            # local-only results from /etc/hosts20:29
clarkbcorvus: ha20:29
mordredha indeed20:29
clarkbthere goes my idea :)20:29
mordredand we want dns values instead of the ips we've put in the ansible inventory?20:30
corvusmordred: no we are just doing a dns lookup20:30
corvusi kinda think replacing all that with your idea may be the way to go20:30
mordredyeah, I'm wondering if we've changed anything about how we're organized now that might make rethinking that workwhile20:30
mordredyah20:30
corvusmy guess is it far, far, far predates how we are doing things now20:31
mordredI think before we still had dynamic openstack inventory20:31
clarkbmordred: re skopeo --all. Is the --all still scoped the the urls we are passing? eg it won't try to download literally everything on the intermediate registry will it?20:31
mordredclarkb: it's about the specific image20:31
corvus"Add a --all/-a flag to instruct us to attempt to copy all of the instances in the source image"20:31
clarkbcorvus: hrm that makes me wonder if we'll get all versions of zuul-scheduler and so on20:32
fungii have a feeling that dates back to limitations we had with matching in the openstack dynamic inventory20:32
corvusclarkb: if zuul-scheduler is a multi-arch image.... yes, but that's what we want20:32
clarkbcorvus: right but all chagnes in the intermediate registry for zuul-scheduler?20:32
clarkbor maybe it does the right thing and I'm reading that wrong20:32
corvusclarkb: we're telling it to copy an image20:33
mordredyeah - and a specific tag of an image at that20:33
corvusit's not going to copy other images we're not asking it to copy20:33
mordredyeah20:33
clarkbk20:33
mordredimage in this case means repository:tag20:33
corvus(but if the image is a list of images[ie, multiarch], it will copy every image in the list of images)20:33
corvussee https://github.com/containers/skopeo/pull/74120:33
clarkbmordred: right I think I'm getting confused because "image" can mean repository or repository:tag/sha20:33
mordredclarkb: yah - in this case it means the specific repository:tag20:34
clarkbin this case we tell it repository:tag so should get all instances of that (eg each arch) tag20:34
*** avass has quit IRC20:34
mordredclarkb: our more general conversational usage of call repository image is wrong20:34
mordredclarkb: yes20:34
fungischeduler restart is looking good... we've got jobs succeeding and publishing logs20:35
clarkbfungi: over the long term will want to monitor memory use too but ya things looking good so far20:36
clarkbhttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64792&rra_id=all memory use graphs20:36
clarkbmemory use looks good so far but we tend to need at least several hours to several days of data there to say anything definitely20:38
fungiagreed20:39
tobiashI've also updated our zuul-web to py38 w/o jemalloc today and memory looks better so far20:40
mordredtobiash: cool!20:40
corvusmordred: i'm working on the iptables thing20:43
clarkbtobiash: fwiw my completely uninvestigated and untested theory is that the bug is in jemalloc20:44
clarkbtobiash: since glibc and jemalloc should be called the same by python in this case20:44
clarkbtobiash: its also a major version change between ubuntu xenial and buster for jemalloc20:44
tobiashclarkb: probably, memory allocators are hard20:44
mordredyah - especially memory allocators underneath dynamic language memory allocators :)20:46
clarkbya I think if we really wanted to dig in more we'd want to run zuul-web under valgrind for a bit and then send off that data to jemalloc20:46
clarkbbut that sounds like a lot of effort for minimal to no gain :)20:47
mordredyah - also - python has been putting a lot of effort into the new dict impl in later pythons too20:47
fungiespecially if it turns out to be something they've already fixed20:47
mordredso at this point just being on 3.8 is probably a big win over our initial 3.5 deployments20:47
mordredI mean - remember in opendev we were running a patched python until this container rollout20:48
tobiashyou patched python?20:48
tobiashawesome20:48
clarkbI thought upstream eventually pulled the fixes in?20:48
clarkbwe were for a while though20:48
mordredclarkb: we never switched20:48
clarkbmordred: oh hah ok20:48
mordredtobiash: we pulled a backport patch20:48
fungii'd rather drink than try to remember that20:48
mordredtobiash: https://launchpad.net/~openstack-ci-core/+archive/ubuntu/python-bpo-27945-backport20:49
mordredclarkb: to be fair - it's possible we did get newer pythons because of versioning20:49
tobiashah I think I remember segfault discussions during zuul v3 development20:49
mordredclarkb: but we never stopped adding that ppa :)20:49
clarkbgotcha20:49
mordredtobiash: yah. they were "fun"20:49
clarkbwe had to debug similar python issues with 3.3 iirc20:50
clarkbI feel like I did that in the HP seattle offices, been a while20:50
fungiseems so very long ago now20:50
clarkbsegfaults in the python garbage collector20:50
clarkbwhcih were fixed in python upstream but had to convince distro to pull it in20:50
clarkbthankfully "segfaults due to no user input or interaction" tends to be a bad enough bug they'll patch :)20:50
* mordred loves our new python base images overlords20:51
clarkbfungi: I'm all out of alcohol20:51
clarkbunless I want some more iron butt20:51
clarkbok I'm going to context switch to PTG planning stuff as zuul continues to look happy. Send up a signal flare if I can help debug or look at anything else20:52
fungiclarkb: my "errand" this afternoon was to restock my home bar with aged rum before the tourists clear out the shelves in a week20:54
fungiso i suppose i'm now well equipped to remember running with python patches after all20:55
openstackgerritJames E. Blair proposed opendev/system-config master: Use inventory host lookup for iptables  https://review.opendev.org/72647220:55
clarkbfungi: nice20:55
corvusmordred: ^ that's step one that just does a 1:1 for our host based system.  i'll build on top of that and switch to using groups20:55
clarkbfungi: I've been debating picking up a scotch since I tend to drink that at a reasnoble pace20:55
corvusso we don't have to think about it when we add a new zk client20:55
fungiclarkb: to be fair, it was a stop off at the liquor store on the way to pick up decent pizza, so booze was not the entire reason for leaving the house at least20:56
mordredcorvus: cool20:58
mordredcorvus: so - biggest question - do all of the hosts in the inventory show up in hosts[] even if they aren't used?20:59
mordredcorvus: (I like the look of that a lot)20:59
corvusmordred: this worked for me: ansible-playbook -i ~/git/opendev/system-config/inventory/openstack.yaml -i ~/git/opendev/system-config/inventory/groups.yaml /tmp/test.yaml20:59
corvusmordred: where the playbook was a simple debug with what you see in the change21:00
corvusmordred: that seem like an effective test?21:00
corvusmordred: (and the play was on hosts:localhost)21:00
mordredcorvus: yeah. cool!21:00
mordredcorvus: that's exactly the type of test I woudl think would show it21:01
corvusmordred: i think the other weirdness maybe you are remembering is facts?21:01
corvusbut since this is inventory data....21:01
mordredcorvus: similar with groups - I've got that play in there somewhere to do a debug statement on hosts: zookeeper so that zookeeper shows up in groups[] - or maybe it's facts that are the issue there21:01
mordredcorvus: https://opendev.org/opendev/system-config/src/branch/master/playbooks/service-zuul.yaml#L1-L9 <-- but that does explicitly say "hostvars"21:02
mordredcorvus: in any case - it's an angle to check that I'm sure you;ll check anyway - and maybe find a way to make that chunk go away21:03
corvusmordred: well, i'm planning on letting the gate tests check for me21:04
mordredinfra-root: I'm going to be out next tuesday afternoon so will miss the meeting21:04
mordredcorvus: \o/21:04
mordredclarkb: we had another IP conflict / host key issue in a job - are we collecting those? or just shrug?21:05
clarkbmordred: usually they seem to happen in waves and cloud cleans up after itself and we move on. If it is very persistent in a single provider we usually escalate to that provider21:06
mordrednod. I'll just got with shrug for now21:07
corvuswhat's zuul-executor vs zuul-executor-opendev?21:08
corvusi don't see the zuul-executor-opendev group used anywhere21:09
clarkbI think zuul-executor-opendev may have been a fork in the road for container'd executors21:10
clarkbmordred: ^21:10
mordredcorvus: it's not - I think it can be killed21:10
corvusk will do21:10
mordredcorvus: there's a similar one in nodepool which is used but which can go away once we sort out the final puppet host21:10
mordredmaybe we shoudl rename those to nodepool-builder and nodepool-builder-legacy at this point21:11
openstackgerritJames E. Blair proposed opendev/system-config master: Add iptables_extra_allowed_groups  https://review.opendev.org/72647521:17
openstackgerritJames E. Blair proposed opendev/system-config master: Add iptables_extra_allowed_groups  https://review.opendev.org/72647521:19
openstackgerritMerged zuul/zuul-jobs master: Add --all to skopeo copy from insecure registry  https://review.opendev.org/72646921:23
corvusmordred: ^ recheckify?21:23
*** avass has joined #opendev21:24
*** hashar has quit IRC21:24
clarkbhttps://etherpad.opendev.org/p/opendev-virtual-ptg-june-2020 is up and I sent email about it to the mailing list as well21:25
clarkbfeel free to add your thoughts. I'm trying to add mine as well21:25
mordredcorvus: I rechecked both the python-base change and the uwsgi-base change - because the copy to the intermediate can also be an issue21:27
corvusi'm working on the zuul job failures now (they're different than the nodepool failures)21:31
corvusmordred: we should standardize our docker-compose directories21:32
corvusit's a little hard to find them on the various different hosts21:32
corvus(and in our situation, it's a bit like not knowing the name of the init script to start a service)21:32
*** DSpider has quit IRC21:34
mordredcorvus: I completely agree21:36
clarkbthat will change container names so we'll have to do a bit of a dance but I agree that will be a nice thing21:39
clarkbmaybe even /etc/docker-compose/$service21:39
clarkbthen we always know to look in /etc/docker-compose like /etc/init.d/21:39
*** hashar has joined #opendev21:46
fungicorvus: mordred: agreed, even if they just all started with docker- instead of ending in -docker i could at least tab-complete something21:47
*** avass has quit IRC21:47
*** hashar has quit IRC21:48
fungithough i suppose i could eventually get in the habit of doing ls -d /etc/*-docker21:49
clarkbfungi: note that won't match the zuul services :)21:50
clarkbI think that may be part of corvus' observation there21:50
fungioh, even more fun21:50
fungiahh, yep, /etc/zuul-scheduler and /etc/zuul-web there21:51
clarkbfungi: I approved a service-discuss email you were cc'd on and responded to the list and directly. Just in case you were off in separate conversation land thought I would mention it21:51
fungii think i accidentally discarded it along with some spam when going over the moderation queue first thing this morning21:52
fungiso i found the sender in the mailman logs and reached out to them directly asking them to re-send21:53
fungibecause i remembered seeing the queue notification for it late last night but then didn't remember seeing it in the moderation queue this morning when i was processing all the lists i moderate21:54
clarkbah they did resend and I got it through and responded to their question21:54
fungiexcellent, thank you!21:55
corvusmordred: the problem with the zuul job is that the test node has a zuul user with id=100021:56
corvusso all our careful work to make sure zuul=10001 on the real server doesn't apply in test21:56
mordredcorvus: yeah - the zuul user on the test nodes is really non-idea for us isn't it?21:57
mordredcorvus: I almost wonder if we should make a different production user name other than zuul21:58
mordredso that we don't overlap with the zuul test user :(21:58
mordredcorvus: we could also see if docker will work with uidmap21:59
corvusmordred: yeah; though that would make a slight discontinuity between inside (zuul) and outside (newuser) container21:59
corvusmordred: docker uidmap?22:00
mordredcorvus: https://docs.docker.com/engine/security/userns-remap/22:00
mordredcorvus: looks like docker does suppor tit22:00
corvusmordred: i think i need you to spell out your idea for me22:00
mordredcorvus: basically - we configure a mapping so we can tell docker "please make uid 1234 on the host map to 10001 in the container"22:01
corvusmordred: so would we then go back to our production hosts and set zuul to 1000 everywhere?22:01
mordredcorvus: and then whatever the uid of the zuul user is on the host translates through that mapping so that they don;t have to _match_ we just. have to know what the host uid is22:01
mordredcorvus: I think we could maybe even just have our code to set the mapping read the uid of the zuul user so that if it preexisted the mapping would say 1000:10001 and if it didn;t it would say 10001:1000122:02
corvusmordred: oh.. so maybe let prod and test be different?  and then uidmap $hostuid:10000 ?22:02
mordredOR - we could also make the zuul user on productio n100022:02
mordredyeah- we could do either thing22:02
mordredthe ansible would be the same22:02
mordredwe could even stop setting a uid in the ansible so that it will just make one and we don't have to care becasue it;ll configure a mapping22:03
mordredthe only place we'd care about the uid is in inside the container22:03
corvusthat has a certain attractiveness to it... clarkb, fungi: thoughts ^?22:03
mordredI recommend reading that docker.com link if you haven't poked at subuid and this sort of mapping yet22:04
corvusmordred: specifically what would that look like?  what files/settings would we have to put where to make that happen?22:04
fungiso, like, 1:1 nat for unix uids/gids? that's not a terrible way to keep test users from colliding with production22:04
mordredcorvus: we have to put an /etc/subuid and /etc/subgid file in place22:05
corvusmordred: i have, but the only thing i understand about it is remapping entire ranges; i don't quite grok how you say "map this uid out here to that one in there"22:05
mordredwith an entry like zuul:1000:1000122:05
mordredthen "userns-remap": "zuul" in daemon.json22:06
mordredcorvus: I might be reading this poorly22:07
corvusmordred: i read that as "docker containers started by the zuul user have the inside uid of 0 mapped to the outside uid of 1000"22:07
mordredcorvus: yeah - nevermind. my idea sucks22:08
corvusmordred: well, it was a good idea, just apparently not implemented22:08
clarkbmordred: I think uids matter for logs too?22:08
mordredclarkb: oh - they matter - i was just reading the construct differently than it was thinking it would let us do a mapping like fungi described22:08
clarkbgotcha22:09
fungiso userns-remap only remaps the inside uid=0 to some outside uid?22:09
corvusfungi: it remaps a range, but that range starts at 0 on the inside22:09
mordredcorvus: oh ... yeah22:09
mordredthe range starts there22:10
fungioic... so, could we map to an unused uid range where inside 1000 happens to line up with outside 10001?22:10
mordredyeah- except we want the inverse - we want outside 1000 to map to inside 10001 :)22:10
mordredwhich I don't think we can do with this22:11
fungiohhh22:11
mordredwe could do the other thing22:11
corvusreverse the polarity?22:11
fungiwhere's a deflector grid when you need one?22:11
corvusmordred: what's the other thing?22:11
fungiwhat i was describing (lower outside uid to higher inside uid)22:12
mordredcorvus: I meant we could do what fungi suggested if we wanted the mapping to go the opposite direction22:12
mordredyeah22:12
corvusoh22:12
mordredbut alas that doesn't help :(22:12
corvusso we're back to: (a) start the gate test job by renaming the zuul user; (b) start the gate test job by renumbering the zuul user; (c) change the username on our images; .... any other options?22:13
clarkbuse different user in gate jobs22:13
mordredcorvus: d) renumber zuul in prod to 100022:13
corvus(e) change the zuul number on our images22:13
mordredyeah22:13
corvusclarkb: how do we use a different user in gate jobs?22:14
mordred(f) run our images as pid 0 but suggest people use userns in prod22:14
corvusmordred: how does renumbering zuul in prod to 1000 help?  it's baked into the images as 100001...22:14
clarkbI guess that would involve changing prod user too but run zuul as the zuul-prod user22:14
mordredcorvus: d and e would have to be merged22:14
clarkbthen the zuul user on test nodes can configure zuul-prod and they can be distinct22:14
fungiyeah, i interpreted d as actually being e22:15
mordredyeah - I thnk that's an optino - make a user not named "zuul" in prod22:15
corvusmordred: my suggestion for (e) is just change our diskimages so the zuul user we create is id=10001; nothing would use 1000 then.22:15
mordredmaybe we need to re-state these22:15
mordredcorvus: ah!22:15
mordredtoo many images22:15
corvusoh yeah, sorry, not suggesting we change zuul's docker image22:15
mordredcorvus: (e) is the easiest - but does mean we won't test creating the user because it'll pre-exist22:16
mordredthat's probably ok22:16
corvuswell, it'll take like a week to implement (e) :)22:16
mordredyeah22:16
corvusand we'll probably open a can of worms22:16
mordred(g) rename zuul user in prod to zuul-prod22:16
mordred(just capturing the thing clarkb was saying)22:16
mordredand kill (d)22:17
mordredcorvus: b won't work - you can't renumber the zuul user because ansible will be running as it - trying to fails pretty directly22:17
mordredunless we edit /etc/passwd and reboot then nodes22:18
corvussure you can, it's just harder22:18
mordredfair22:18
corvusyou don't need to reboot22:18
corvusyou just can't use usermod22:18
mordredah - nod22:18
corvusso edit the files, find/chown, then HUP the ssh connection22:18
clarkbI like splitting zuul the test user and zuul the service users as it helps make reasoning about this stuff eaiser22:19
mordredcorvus: I might actually be coming around to liking that one - it's fiddly at the top of the test job, but we have to fiddle with a few things to set things up properly anyway22:19
clarkbbut its also likely somewhat involved to make that change22:19
mordredbut yeah - clarkb's point22:19
mordredand followup22:19
mordredI agree with both22:19
corvusdidn't we just *undo* the zuulcd user on bridge? :)22:20
mordredif we do (b) - we could do (g) as a followup without too much trouble most likely and then remove the (b) remediation22:20
clarkbcorvus: we did22:20
mordredyup22:20
clarkbcorvus: but that is a case of zuul the test user22:20
fungidoes having a different uid on the remote node vs on the executor create problems for log/artifact synchronization?22:20
corvusfungi: we have one now22:21
clarkbfungi: there were a bunch of recent changes around that in zuul jobs the answer is it did but now shouldn't22:21
clarkbwe stopped syncing ownership iirc22:21
clarkbbecause someone other than us hit that22:21
corvusthe executors are running as 10001 today22:21
fungiahh, okay22:21
fungiyeah, sounded familiar22:21
corvusclarkb: i think that was for missing user names, so technically i don't think we hit that.22:21
clarkbah22:22
corvusclarkb: but if we renamed to zuul-prod, then we would :)22:22
corvusbecause the executors are not running in a container22:22
mordredhaha22:22
corvusi think it would be great if the names and uids inside and outside the container matched22:22
clarkbcorvus: hrm thats a good point too and I agree22:23
corvus(so at least our executors will match the rest of the system)22:23
clarkbhaving zuul-prod on the outside and zuul on the inside would be weird22:23
mordredyeah22:25
mordredso I think that gets me back around to liking b the most22:25
corvusshould we do b on all hosts in the system-config-run jobs?  or try to narrow it down to just the 'zuul' group hosts?22:26
mordredor - actually - why not b - then also do e - because shrug22:26
mordredcorvus: I'd vote for all the hosts - so that it's a consistent "this is part of setting up the world" task22:26
clarkbya b + do it in system-config-run-bsae makes sense to me22:27
corvusok.  i'll work on b for all hosts; then we can think about e and deprecating b22:27
clarkbsounds good22:27
mordredcorvus: ooh carnage! https://zuul.opendev.org/t/openstack/build/590dcae1db64414a938eb0682f2a623c22:28
mordredcorvus: that --all flag did _not_ work22:28
corvusmordred: maybe next time don't use an EN-DASH.22:29
corvusno... weird22:29
corvusit looks like the dash is correct in the code, it's just printing it out weird in the error message22:29
corvusmordred: sorry, i guess that was a red herring22:30
corvusmordred: my skopeo copy has a --all option22:31
corvusno idea how to find out what version it is22:31
corvusdoes it even have versions?22:31
mordredcorvus: I thnk our skopeo is too old22:31
corvusskopeo version 0.1.4022:31
corvusthat's what i have22:32
mordredskopeo is skopeo version 0.1.37-dev22:32
clarkbours == what we have on the executors?22:32
mordredyeah22:32
mordredI think we're still installing them from teh ppa and not from kubic?22:32
mordredwe'll install skopeo from kubic for focal22:33
*** mlavalle has quit IRC22:33
corvusmordred: i just apt-get installed skopeo on ze01 and it upgraded to 0.1.4022:34
corvushow about we just do that on all the executors real quick-like?22:34
mordredoh - hrm. yeah22:34
corvusmordred: mind doing that while i go back to the other thing?22:35
mordredon it22:35
mordredcorvus: done22:36
*** mlavalle has joined #opendev22:37
mordredcorvus: I have also rechecked the patches22:52
*** tosky has quit IRC23:11
mordredcorvus: first patch worked with the skopeo --all ... waiting on uwsgi now23:17
openstackgerritMerged opendev/system-config master: Remove dmsimard from infra-root  https://review.opendev.org/72642923:25
openstackgerritJames E. Blair proposed opendev/system-config master: Change the zuul user id when running the base playbook  https://review.opendev.org/72649023:26
corvusthere's option b ^23:26
mordredcorvus: yay!23:28
mordredcorvus: we don't need to do a business with zuul_console after reset_connection do we?23:29
corvusmordred: unclear... it's still going to be running on the old uid.... i'm curious if we can just leave it alone23:29
mordredcorvus: we'll see23:32
clarkbI expect the old uid for that is fine unless we have to restart it for some reason (and none of our jobs od a restart of it currently)23:43
*** mlavalle has quit IRC23:50

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!