Monday, 2020-05-11

openstackgerritIan Wienand proposed zuul/zuul-jobs master: [wip] fix venv  https://review.opendev.org/72670400:24
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Revert "Revert "ensure-tox: use venv to install""  https://review.opendev.org/72670500:24
openstackgerritIan Wienand proposed zuul/zuul-jobs master: [wip] fix venv  https://review.opendev.org/72670400:29
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Revert "Revert "ensure-tox: use venv to install""  https://review.opendev.org/72670500:29
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-pip: always check for python3-venv on Debuntu  https://review.opendev.org/72670400:44
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Revert "Revert "ensure-tox: use venv to install""  https://review.opendev.org/72670500:44
*** DSpider has quit IRC00:46
openstackgerritIan Wienand proposed zuul/zuul-jobs master: [wip] venv probe full path  https://review.opendev.org/72671501:00
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-pip: always check for python3-venv on Debuntu  https://review.opendev.org/72670401:10
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Revert "Revert "ensure-tox: use venv to install""  https://review.opendev.org/72670501:10
openstackgerritIan Wienand proposed zuul/zuul-jobs master: [wip] venv probe full path  https://review.opendev.org/72671501:10
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-pip: use full python3 path  https://review.opendev.org/72671503:17
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-pip: use full python3 path  https://review.opendev.org/72671503:40
*** ykarel|away is now known as ykarel05:07
openstackgerritIan Wienand proposed openstack/project-config master: Set python-path for SUSE builds  https://review.opendev.org/72672805:35
*** ysandeep|away is now known as ysandeep05:42
*** dpawlik has joined #opendev06:10
*** dpawlik has quit IRC06:13
*** dpawlik has joined #opendev06:18
*** lpetrut has joined #opendev06:41
*** DSpider has joined #opendev07:08
*** tosky has joined #opendev07:34
*** rpittau|afk is now known as rpittau07:36
*** hashar has joined #opendev07:38
*** ralonsoh has joined #opendev07:40
*** avass has joined #opendev07:46
*** dtantsur|afk is now known as dtantsur08:04
*** panda|pto is now known as panda08:04
*** roman_g has joined #opendev08:11
*** shubjero has quit IRC08:15
*** DSpider has quit IRC08:17
*** DSpider has joined #opendev08:18
openstackgerritXinliang Liu proposed openstack/diskimage-builder master: Fix DIB_UBUNTU_KERNEL issue on arm64  https://review.opendev.org/72674508:19
*** ysandeep is now known as ysandeep|lunch08:27
*** ykarel is now known as ykarel|lunch08:27
*** ykarel|lunch is now known as ykarel09:00
*** kevinz has quit IRC09:07
*** kevinz has joined #opendev09:07
*** ysandeep|lunch is now known as ysandeep09:11
*** sshnaidm|off is now known as sshnaidm09:15
*** ykarel is now known as ykarel|mtg09:18
*** priteau has joined #opendev09:44
*** ykarel|mtg is now known as ykarel10:05
*** rpittau is now known as rpittau|bbl10:15
*** ysandeep is now known as ysandeep|brb11:21
*** iurygregory has quit IRC11:37
*** ysandeep|brb is now known as ysandeep11:40
*** rpittau|bbl is now known as rpittau11:58
*** iurygregory has joined #opendev11:58
*** priteau has quit IRC12:16
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Remove requiring tox_envlist  https://review.opendev.org/72682912:19
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Deprecate default tox_envlist: venv  https://review.opendev.org/72683012:20
*** tkajinam has quit IRC12:31
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Remove requiring tox_envlist  https://review.opendev.org/72682912:35
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Deprecate default tox_envlist: venv  https://review.opendev.org/72683012:36
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Do not fail if stestr or testr is not found  https://review.opendev.org/72683612:46
*** iurygregory has quit IRC13:01
*** iurygregory has joined #opendev13:02
*** ykarel is now known as ykarel|afk13:05
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Deprecate default tox_envlist: venv  https://review.opendev.org/72683013:10
smcginnisI'm seeing job launch failures and problems accessing apt updates.13:17
openstackgerritMerged zuul/zuul-jobs master: ensure-pip: always check for python3-venv on Debuntu  https://review.opendev.org/72670413:20
fricklersmcginnis: do you have an example?13:20
smcginnisfrickler: If you filter by "release" in the status view, there are two examples in check right now.13:21
smcginnisLooks like jobs may be launching OK now though.13:21
fricklerinfra-root: mirror.gra1.ovh.openstack.org seems unresponsive, if s/o has time to take a closer look13:24
corvusa server list says i need to authenticate; i wonder if the account is disabled?13:30
*** owalsh has quit IRC13:30
*** owalsh has joined #opendev13:31
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add explanatory comment to tox siblings  https://review.opendev.org/72684513:32
fungicorvus: same for bhs1 region13:36
fungilooks like mirror01.bhs1.ovh.openstack.org may also be down13:36
corvusi've prepared a nodepool change; but since it looks like the entire account may be down, that may be unecessary right now13:37
*** dpawlik has quit IRC13:38
fungicorvus: well, our openstackci account seems disabled but not our openstackjenkins account13:39
fungiso nodepool is booting nodes in the working one but the mirror is offline in the disabled one13:40
fungiso i think your prepared nodepool change is needed13:40
corvusfungi: oh, then i'll push up the change13:40
openstackgerritJames E. Blair proposed openstack/project-config master: Temporarily disable OVH  https://review.opendev.org/72684813:40
*** dpawlik has joined #opendev13:42
fungiwe seem to have lost contact with our mirror servers between 12:55 and 13:00 utc13:43
fungii've reached out to amorin in #openstack-infra13:43
fungiif we don't hear back, we can try e-mail13:43
fungialso pinged rledisez13:44
fungisince they were previously looking into the intermittent keystone errors13:44
*** diablo_rojo has joined #opendev13:56
*** ysandeep is now known as ysandeep|afk14:02
fungiper #openstack-infra, rledisez is passing it along to colleagues and i've privately supplied them with the affected project id14:06
*** dtantsur is now known as dtantsur|brb14:17
*** ykarel|afk is now known as ykarel14:19
*** lpetrut has quit IRC14:26
*** jhesketh has quit IRC14:35
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: WIP: Remove requiring tox_envlist  https://review.opendev.org/72682914:43
openstackgerritMerged openstack/project-config master: Temporarily disable OVH  https://review.opendev.org/72684814:44
fungiinfra-root: per amorin in #openstack-infra we've been using the wrong flavor for our mirror servers in ovh (hg-7), if we use ssd-osFoundation-3 there instead we won't consume credits14:45
fungialso i deleted a lingering calebb-mirror-update-test instance there, which was almost certainly not in use14:46
mordredfungi: aha14:46
fungi#status log deleted calebb-mirror-update-test server instance in ovh bhs1 region14:46
openstackstatusfungi: finished logging14:46
mordredfungi: so mirror rebuilds are a good idea then14:46
openstackgerritMerged zuul/zuul-jobs master: Add explanatory comment to tox siblings  https://review.opendev.org/72684514:46
fungiyeah, i'm also trying to work out which of the two mirror instances in gra1 is in use, if it's the newer opendev.org one then i'll go ahead and clean up the openstack.org one14:47
clarkbfungi: the eay to check is via our zuul site vars14:47
clarkbas that has the opendev.org overrides14:48
fungiyeah, or cacti graphs14:49
fungiwhich indicate it's the old openstack.org instance in use there, the new opendev.org one is not being utilized14:50
fungiclarkb: where do we set those site vars?14:55
clarkbfungi: I always have a hard time finding them. Let me see14:56
fungii can't seem to find them in base-jobs, either project-config repo, or system-config14:56
*** sshnaidm is now known as sshnaidm|afk14:57
clarkbfungi: https://opendev.org/openstack/project-config/src/branch/master/zuul/site-variables.yaml#L614:57
fungiahh, this is why my git grep was failing14:57
fungithe fact that it's *not* listed there is what makes it use the openstack.org version14:58
clarkbyes14:58
fungiclearly i should have just grepped for any files not including gra1 ;)14:58
fungithanks!14:58
clarkbhowever if we arerebuilding anyway maybe we should switch to opendev.org in that cloud now to get ssl14:58
clarkbits a bit more work to set of the acme forwarding but should be done at dome point anyway14:59
fungiso, "conveniently" we have ovh offline at the moment anyway. should we take this opportunity to just blow away the mirrors and rebuild them?14:59
*** dtantsur|brb is now known as dtantsur14:59
*** mlavalle has joined #opendev14:59
clarkbyou indicated we needed to use different flavors right?14:59
fungiyeah, even the not-yet-used opendev.org mirror in gra1 is using the "wrong" (billable) flavor15:00
clarkbnew flavors requires new instances so ya I think we should try that if possible15:00
clarkbnote we'll need volumes which the old servers dont use15:00
clarkbbut Im pretty sure ovh allows for volumes?15:00
fungiso shall i just delete the current instances, boot replacement instances and then submit changes to plumb them in the config?15:00
fungiand yes, we're already using cinder volumes15:01
clarkbah cool and ya I think that is the proper way forward15:01
fungier, at least the new opendev.org mirror server in gra1 has two cinder volumes anyway15:02
fungipresumably bhs1 supports cinder too though15:02
fungido i need to submit changes to remove the old servers from our inventory first, or is it fine having them disappear out from under ansible until the change to update inventory entries gets merged?15:03
clarkbAnsible should timeout ssh but sometimes that works less well than expected15:05
clarkbits probably ok, but small chance it js unhappy15:06
fungi#status notice Our CI mirrors in OVH BHS1 and GRA1 regions were offline between 12:55 and 14:35 UTC, any failures there due to unreachable mirrors can safely be rechecked15:06
openstackstatusfungi: sending notice15:06
mordredyou can disable them in emergency while waiting on removing them from inventory15:06
-openstackstatus- NOTICE: Our CI mirrors in OVH BHS1 and GRA1 regions were offline between 12:55 and 14:35 UTC, any failures there due to unreachable mirrors can safely be rechecked15:07
fungimordred: oh, there's a good idea15:07
fungiwill do that now15:07
clarkb++15:07
openstackstatusfungi: finished sending notice15:10
fungi#status log all ovh mirror servers placed in emergency disable list in preparation for replacement15:10
openstackstatusfungi: finished logging15:10
mordreddo we want to boot the new mirrors on focal? if so - I can work on getting the focal image uploaded everywhere15:11
mordred(although maybe I should do that anyway)15:11
*** ykarel is now known as ykarel|away15:12
clarkbmordred: I dont think so as usually we test afs when doing that15:14
*** hashar has quit IRC15:14
clarkband right now we just need replacement working servers15:15
mordred++15:16
mordredclarkb: well - I'm going to get focal uploaded to all the places anyway just so it's there when we want it15:16
mordred(since I did it manually for rax-dfw - might as well )15:16
clarkbmordred: I think what weve done in the past is bring up a new server next to old. Done basic sanity checking, then if that looks ok we can switch between old and new afs mirrors with dns cname15:18
clarkbthen if we notice new is flaky we can quickly back back to old15:18
fungii'm waiting for traffic levels on the current mirrors to die off before i delete them as we still have some nodes in those regions15:19
fungibut i'll delete the unused mirror01.gra1.ovh.opendev.org and its cinder volumes now15:19
fungi#status log deleted unused mirror01.gra1.ovh.opendev.org server instance and associated main01 and tmpbuild cinder volumes15:22
openstackstatusfungi: finished logging15:22
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Add new non-npm specific javascript jobs  https://review.opendev.org/72654715:22
openstackgerritMerged zuul/zuul-jobs master: Revert "Revert "ensure-tox: use venv to install""  https://review.opendev.org/72670515:29
openstackgerritMerged zuul/zuul-jobs master: ensure-pip: use full python3 path  https://review.opendev.org/72671515:29
openstackgerritMonty Taylor proposed opendev/system-config master: Upload focal images to all control plane clouds  https://review.opendev.org/72688615:34
mordredclarkb: ^^15:34
*** ysandeep|afk is now known as ysandeep|away15:36
mordredclarkb: although - come to think of it - why don't I build a focal arm image on nb03 instead of downloading from canonical15:36
clarkbzuul memory use looks really good (granted it was the weekend): http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64792&rra_id=all15:47
clarkbmordred: I think I saw out of the corner of my eye friday that the cloud launcher job was failing15:47
clarkbmordred: might want to sort that out before adding more content to that job15:48
clarkbmordred: couple of things on that change though15:50
mordredclarkb: cool and cool15:51
tobiashclarkb: does that host the scheduler and we or just the scheduler?15:52
clarkbtobiash: web + scheduler15:53
tobiashk,thx15:53
clarkbtobiash: with both running under python3.8 with no jemalloc since friday15:53
tobiashcool15:53
tobiashI also tried out zuul-web with py38 with no jemalloc and that looks much better15:54
tobiashwe'll switch to py38 with the scheduler soon15:54
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Add new non-npm specific javascript jobs  https://review.opendev.org/72654716:01
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add new non-npm specific javascript jobs  https://review.opendev.org/72654716:02
fungiwe've left jemalloc on our executors, but i think we're also running under 3.5 there still16:02
clarkbfungi: correct and its an older version of jemalloc which doesn't seem to have this problem16:03
clarkbmy hunch is that its the jemalloc version more than the version of python that hurts us on the newer systems16:03
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add new non-npm specific javascript jobs  https://review.opendev.org/72654716:03
fungiclarkb: i wonder if maybe it's trying to avoid a performance hit from collecting too often16:04
clarkbmaybe? you'd think it would balance that against available system memory? but that could explain why sigusr2 seems to force it to go do work16:06
*** rpittau is now known as rpittau|afk16:09
mordredfungi, clarkb: yeah - I think as we update the executors to be on focal we'll strip out jemalloc from the equation16:13
clarkb#status log Restarted ptgbot on eavesdrop.openstack.org as it had netsplit into some alternate reality16:14
openstackstatusclarkb: finished logging16:14
openstackgerritJames E. Blair proposed opendev/system-config master: Change the zuul user id when running the base playbook  https://review.opendev.org/72649016:17
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add new non-npm specific javascript jobs  https://review.opendev.org/72654716:18
*** dtantsur is now known as dtantsur|afk16:19
openstackgerritMerged zuul/zuul-jobs master: cabal-test: add build target job variable  https://review.opendev.org/72626616:20
openstackgerritMerged zuul/zuul-jobs master: haskell-stack-test: add build target job variable  https://review.opendev.org/72626716:20
*** slittle1 has quit IRC16:20
openstackgerritJames E. Blair proposed opendev/system-config master: Use inventory host lookup for iptables  https://review.opendev.org/72647216:24
openstackgerritJames E. Blair proposed opendev/system-config master: Add iptables_extra_allowed_groups  https://review.opendev.org/72647516:24
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: tox: allow tox to be upgraded  https://review.opendev.org/69005716:25
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Add new non-npm specific javascript jobs  https://review.opendev.org/72654716:30
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Extract ensure-javascript-build-tool role  https://review.opendev.org/72690016:30
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: tox: allow tox to be upgraded  https://review.opendev.org/69005716:32
*** mlavalle has quit IRC16:44
*** mlavalle has joined #opendev16:46
openstackgerritJames E. Blair proposed opendev/system-config master: DNM: fail zuul tests  https://review.opendev.org/72624816:57
openstackgerritSorin Sbarnea (zbr) proposed zuul/zuul-jobs master: tox: allow tox to be upgraded  https://review.opendev.org/69005716:58
fungiokay, so nl04.openstack.org still doesn't seem to have 726848 applied16:58
fungiand it's not in the emergency disable list16:59
corvusthe deployment failed: https://zuul.opendev.org/t/openstack/build/71eedea05c2b41c9a9b1bdb803de4c2517:00
fungiyep, that's where i looked first17:00
fungiwill check /var/log/ansible/service-nodepool.yaml.log on bridge.o.o17:00
fungiTASK [nodepool-base : Get zk config] fatal: [nb04.opendev.org]: FAILED! ... No network information facts found17:01
fungithat's nice and cryptic... do we need to gather-facts in that playbook?17:02
corvusthat sounds like the thing mordred added already17:02
clarkbya I think that required us to run against the zk hosts before the nl hosts?17:03
mordredyeah - so - in the library task there we're looking at ansible_default_ipv417:03
mordredI'm thinking - perhaps we should update to what corvus is doing in the iptables stuff above17:03
mordredan just use the inventory ip addresses instead of the ansible detected ones17:03
clarkbmordred: that avoids NAT problems too17:04
mordredso - you know, ansible_host instead of ansible_default_ipv417:04
corvusyeah that sounds good17:04
fungiso if i reenqueued that change in deploy it would probably work now? or you mean the earlier fix wasn't sufficient?17:04
corvusfungi: earlier insufficient17:04
fungithanks17:04
mordredwe might even be able to get rid of that python17:04
corvusfungi: (i think it relied on putting certain things in all the necessary playbooks)17:04
openstackgerritMonty Taylor proposed opendev/system-config master: Use ansible_host instead of ansible_default_ip* for zk  https://review.opendev.org/72690717:05
mordredcorvus: I think a followup might be able to replace that ^^ with some jinja17:06
corvusmordred: is there a reason we're using ip addrs there and not hostnames?17:07
mordredyes - the gate doesn't have hostnames17:07
mordredwe could use hostnames if we added gate hosts to /etc/hosts17:08
clarkbcorvus: wasn't there also that zk bug where explicitly listing IPs was a workaround?17:08
clarkb(I don't recall if that was related or just fixing things that occurred at similar times)17:08
corvusmordred: k; we should add a comment; cause i'm going to keep asking that :)17:08
openstackgerritMonty Taylor proposed opendev/system-config master: Use ansible_host instead of ansible_default_ip* for zk  https://review.opendev.org/72690717:09
mordredcorvus: done!17:09
corvusre +317:09
mordredcorvus: we could aternately use the multinode base job for these which would put things into /etc/hots for us17:09
corvusmordred: yeah clarkb has suggested that; but it does lots of other stuff too and i'm not sure i'm comfortable with it as a production simulation17:10
mordrednod17:11
openstackgerritSorin Sbarnea (zbr) proposed zuul/zuul-jobs master: yamlint: EOF newlines and comments indent  https://review.opendev.org/72551617:12
corvuswe could just run multi-node-hosts-file i guess17:12
corvusoh, the 'bridge' role only runs on 'switch' and 'peers' hosts, so it won't run...17:12
corvusthat just leaves multi-node-firewall as a wildcard17:12
*** sshnaidm|afk is now known as sshnaidm17:12
mordredcorvus: and I don't think we want that - I think multi-node-hosts-file role in isolation might be a decent idea - because in production we do expect all of our hosts to dns resolve17:13
mordredso it seems like a reasonable way to simulate prod - unless we have any code that avoids /etc/hosts anywhere :)17:13
corvusyeah, i think i'd be comfortable starting down that road17:14
mordredcorvus: is it possible to dual-stack a hosts file/17:14
fungiyou can add entries for different addresses with the same name, but i don't know that they'll all be returned17:15
corvusmordred: it's ip -> name, so i don't see why not17:15
clarkbI think its all smart enough to give you the correct ip based on protocol17:16
mordredcool17:16
corvusmordred: stackoverflow says you just enter them twice, one for each protocol :)17:16
clarkb(and ipv6 should be default)17:16
fungithat's convenient17:16
openstackgerritMerged openstack/project-config master: Finish retiring syntribos repos  https://review.opendev.org/72650617:19
openstackgerritMonty Taylor proposed opendev/system-config master: Run multi-node-hosts-file in run-base-pre  https://review.opendev.org/72691017:20
openstackgerritSorin Sbarnea (zbr) proposed zuul/zuul-jobs master: tox: allow tox to be upgraded  https://review.opendev.org/69005717:20
mordredlet's see if that passes tests17:20
*** ralonsoh has quit IRC18:11
*** iurygregory has quit IRC18:25
*** roman_g has quit IRC18:32
*** roman_g has joined #opendev18:33
*** iurygregory has joined #opendev18:38
openstackgerritSorin Sbarnea (zbr) proposed zuul/zuul-jobs master: tox: allow tox to be upgraded  https://review.opendev.org/69005718:59
*** slittle1 has joined #opendev19:17
fungimordred: it won't, because it's stacked on 726907 which is broken on a trivial missing comma. i would have pushed an edit but didn't want to disturb the rest of your stack19:18
clarkbI'm popping out now for a bike ride and lunch. back in a bit19:23
fungimordred: would you like me to update that stack with the fix for 726907? i'm looking forward to rerunning infra-prod-service-nodepool so we can disable ovh without mucking with the emergency list and hand-edits of the config19:24
fungii'm holding off a semi-urgent rebuild of the mirror servers there until that happens19:25
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Add new non-npm specific javascript jobs  https://review.opendev.org/72654719:29
*** diablo_rojo has quit IRC19:30
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Extract ensure-javascript-build-tool role  https://review.opendev.org/72690019:33
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Extract ensure-javascript-build-tool role  https://review.opendev.org/72690019:35
*** factor has joined #opendev19:36
fungimordred: i'm assuming you're okay with it, so i'm pushing a revised stack and reapproving 72690719:37
openstackgerritJeremy Stanley proposed opendev/system-config master: Use ansible_host instead of ansible_default_ip* for zk  https://review.opendev.org/72690719:37
openstackgerritJeremy Stanley proposed opendev/system-config master: Run multi-node-hosts-file in run-base-pre  https://review.opendev.org/72691019:37
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Extract ensure-javascript-build-tool role  https://review.opendev.org/72690019:56
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Extract ensure-javascript-build-tool role  https://review.opendev.org/72690020:05
*** avass has quit IRC20:08
*** dpawlik has quit IRC20:46
openstackgerritMerged opendev/system-config master: Use ansible_host instead of ansible_default_ip* for zk  https://review.opendev.org/72690720:55
fungiyay!20:56
funginow i guess i can reenqueue 726848,1 into deploy20:59
fungi#status log reenqueued 726848,1 for openstack/project-config into deploy pipeline after fix 726907 merged21:03
openstackstatusfungi: finished logging21:03
corvusfungi: i noticed the deploy of 726907 failed; should we look into that?21:14
corvus    "cmd": "/usr/bin/git clone --origin origin file:///home/zuul/src/opendev.org/opendev/ansible-role-puppet /etc/ansible/roles/puppet",21:15
corvusfatal: destination path '/etc/ansible/roles/puppet' already exists and is not an empty directory.21:15
corvusthat's on nb0321:16
fungihrm, also the reenqueued 726848,1 in deploy failed21:16
corvusmordred: ^21:19
clarkbthat was changed semi recently I think21:19
corvusi'm thinking this is a flaw in install-ansible-roles21:19
mordredyeah. I think it is21:19
clarkbwe were installing via galaxy using ansible tooling but then switched to just git cloning it so that we can pull the latest changes21:20
mordredmaybe we need a force on that git:21:20
clarkbthe issue is the galaxy installs do a copy of the working dir and not a clone21:21
corvusit seems to have one21:21
corvusoh21:21
corvusso we just need a one-time upgrade to add a .git dir?21:21
corvuspossibly by just blowing away the directory? :)21:21
clarkbcorvus: ya or just move the whole thing aside and have ansible replace it the way it wants?21:21
corvusthat sounds reasonable; are all the /etc/ansible/roles dir's handled this way now?21:22
clarkbmordred: ^21:22
mordredyah21:22
mordredor - yeah - we actually should have only had 2 - puppet and cloud-launcher, right?21:22
clarkbbazelisk-build  cloud-launcher  exim  kerberos-client  openafs-client  puppet  puppet-install  set-hostname21:22
clarkbthats the list21:23
mordredoh - so - they come from 2 sources21:23
mordred2 are git repos21:23
mordredthe rest are in the roles/ dir in the root of system-config21:23
mordredthose get copy'd - so should be fine21:23
corvusso i can just mv * out of the way?21:24
mordredand I think we decided to use git: for puppet and cloud-launcher so taht they'd use symlnks and not do a copy of the .git dir21:24
mordredcorvus: yeah - everything should happily get re-created21:24
corvus#status log moved contents of /etc/ansible/roles on bridge to /etc/ansible/roles/old-2020-05-11 to allow ansible to recreate as git repos21:25
openstackstatuscorvus: finished logging21:25
corvusfungi: want to re-enqueue that change now?21:25
fungisure, can do21:28
fungidoes it matter which one we reenqueue?21:28
fungias long as it runs that job? (is it idempotent?)21:28
fungilike, should i reenqueue the one which merged most recently?21:29
clarkbfungi: it will use the change state21:29
clarkbfungi: so if you enqueue the child it will run with both, if you enqueue the parent only the first set of changes will apply21:29
fungiokay, so best to reenqueue 72690721:29
clarkb(I think its fine to enqueue either just don't do both out of order)21:29
fungithough 726907,3 was for system-config21:30
fungiit runs the same job21:30
fungithe earlier 726848,1 was for project-config21:30
clarkbah21:30
fungiso presumably the job will use the latest of the other repo regardless21:30
clarkbyes I believe so21:31
fungithey just happen to both run infra-prod-service-nodepool in deploy21:31
fungi#status log reenqueued 726907,3 for opendev/system-config into deploy pipeline after old .git dir was moved out of the way21:32
openstackstatusfungi: finished logging21:32
corvuscontinuing our thought about the zuul user... last week we said we would re-id the zuul user to 10001 on all system-config-run jobs so that the zuul nodes in the gate will be able to match the zuul nodes in prod; however, we do the same thing with the zookeeper user on our zk nodes:21:35
corvushttps://fa3204066787dd37fd86-ea893277118f144d3b928cfbb4823c04.ssl.cf1.rackcdn.com/726248/4/check/system-config-run-zuul/9df7763/bridge.openstack.org/ara-report/result/3c3e5a36-69cf-4813-aec8-d1794c2f41c0/21:35
corvusand presumably the same for the nodepool user...21:35
corvusso i don't think that approach is going to work21:35
mordredcorvus: oh! yeah - that is ... I didn't think about that21:36
clarkbwe use the same uid on all our images?21:36
corvusclarkb: yep21:37
mordredyeah21:37
clarkbI guess we could move zuul only on the zuul servers then21:37
mordredcorvus: dumb thought - perhaps what we want to do in our deployment is make a single user on all of the nodes with 10001 that is the user we use to run these services21:37
mordredinstead of a zuul, a zk and a nodepool user depending on host21:38
*** yuri has joined #opendev21:38
fungianother hacky workaround i see mentioned in places is to pass the uid/gid in when initializing the container process and chown stuff in the tree at that point21:38
corvusmordred: not a bad idea....  but thinking along a different tack -- how important is it our user be in /etc/passwd?  maybe we could define our own standardized series of users in opendev and ignore what's in the containers?21:39
mordredcorvus: yeah - and pass in a uid21:39
corvusfungi: yeah that's sort of the direction i'm thinking, except we don't need/want to chown anything (the user shouldn't be writing to anything in the image)21:39
mordredI don't thnk it's important for the user to be in /etc/passwd21:40
fungithat works21:40
clarkbya our images rely on externally mounted configs and logging and all that21:40
corvusi can't remember how important we thought it was that there be a zuul user in there...21:40
fungiand yeah, as long as any users/groups are referred to strictly by their uid/gid integer values, it's fine21:40
mordredas long as the dirs we mount that need to be written to match the user we tell docker to run the container as21:41
corvusthe potential pitfalls i can think of off the top of my head are the fingergw privelege drop and bwrap21:41
mordredall should be fine21:41
mordredcorvus: well - we aren't doing executors in docker yet - so that's fine :)21:42
mordredfor fingergw - we could tell fingergw to run on a different port and use the docker port expose thing to map it to the correct one and don't to priv drop in fingergw21:42
mordredbut that's maybe too complex21:42
fungianother workaround is to create the user(s)/group(s) at container initialization21:43
clarkbis the priv drop target user configurable too?21:43
corvusfungi: that's not possible without running the container as root21:44
fungigood point21:44
corvusfungi: (or else opening a vulnerability in the container; we used to do that in our images but we removed it)21:44
fungisame for if you needed to chown/chgrp i guess21:44
corvussounds like our best options are: a) create "containeruser" as 10001 and use that for all our containers; b) create zuul, nodepool, zk, etc, as 1000, 1001, 1002, etc and pass in --uid flags and ignore the lack of /etc/passwd entries.21:46
corvusi'm kinda leaning toward (a) just for simplicity21:46
mordredcorvus: yeah. I'm not sure having more than one user is really buying us much21:47
clarkbI think a) gets weird if we try to colocate more of these things and want to limit blast radiuses21:47
clarkbbut we arne't doing that yet and if we do do that we're probably resolving this problem anyway?21:47
mordredyeah - although a) does make running an AIO test node easier21:47
mordredand yeah21:47
fungii suppose the main risk with using a similar uid/gid across all containers is that if there's a non-root breakout vulnerability the process has access to the host side files for the processes of other containers21:47
fungithat seems fairly far down the list of things to worry about21:48
mordredyeah - because we don't _Actually_ run these that way21:48
clarkbfungi: can I take the gerrit reviewers plugin off the agenda? We concluded wait until gerrit 2.16 before adding new plugins right?21:48
fungi(also only if they're on the same server)21:48
mordredwe're pretty much service-per-machine21:48
fungiclarkb: oh, yes, though i do still owe the ml a follow-up thread on the plan21:49
* mordred has to run for the evening - see y'all tomorrows21:49
* fungi adds that to his to do list21:49
fungithanks mordred!!! have a great evening21:49
corvusi think that means on the executor we're going to need to run zuul as "container" as well21:57
openstackgerritJames E. Blair proposed opendev/system-config master: Run Zuul, Nodepool, and Zookeeper as the "container" user when in containers.  https://review.opendev.org/72695821:58
fungiso the reenqueued 726907,3 also failed... the role 'puppet-install' was not found in /home/zuul/src/opendev.org/opendev/system-config/playbooks/roles:/etc/ansible/roles:/home/zuul/src/opendev.org/opendev/system-config/playbooks  The error appears to be in '/home/zuul/src/opendev.org/opendev/system-21:58
fungiconfig/playbooks/roles/run-puppet/tasks/main.yaml': line 5, column 11, ...21:59
fungiinclude_role:     name: puppet-install21:59
corvusfungi: puppet-install was in the old dir21:59
corvusfungi: how about i move it back for now21:59
clarkbcorvus: do we need the uids to match between hosts running zuul services?22:00
clarkbor is that just for consistency and simplicity?22:00
fungiwhat's normally supposed to put it there, i wonder22:00
corvusclarkb: no, but we use the same role to write out the same zuul.conf everywhere22:00
clarkbcorvus: gotcha22:00
clarkbfungi: that was one of the role sthat was moved aside22:00
clarkbfungi: so I'm guessing we don't bootstrap those properly after galaxy was removed?22:00
fungiclarkb: i got that, but what installs it?22:00
fungiyeah, sounds like maybe nothing now22:01
corvusgrepping isn't turning up anything other than use of puppet-install22:01
fungiso this probably would have failed similarly on a fresh replacement server22:02
corvusoh there is a puppet-install role in the system-config repo22:02
clarkbya these are system-config/roles/ roles22:03
fungiand this was a change for system-config, even22:04
corvusoh and they're supposed to be copied into place so we can run ansible without a system-config repo?22:04
clarkband what we had on bridge were copies not symlinks22:04
clarkbcorvus: ya I think we expect proper copies22:04
clarkbthe config file that was pulling the other roles from galaxy doesn't list these roles22:04
clarkbI'm still not sure what mechanism was installing these before22:04
corvusi think the install-ansible role does it22:06
clarkbyup just ofund it22:07
corvuswhich we don't run in the service-nodepool playbook22:07
clarkb"Copy system-config roles into place" is the task name22:07
fungiokay, so we could probably just add install-ansible to service-nodepool?22:08
clarkbfungi: or wait for the hourly job (or enqueue the hourly job?)22:08
corvusi don't think we run that role in production -- only when we bootstrap bridge22:08
clarkboh22:08
clarkbhrm I would expect we'd keep those roles up to date :/22:08
corvusme too, maybe we dropped it from the cron script and didn't have a replacement22:09
clarkbagreed on not running it elsewhere22:09
clarkbseems we test it but don't have a prod side applying it22:09
corvusit used to be in bridge.yaml22:09
corvusbut hrm, even then, i think that was only run in zuul22:10
corvusyeah, i think this hasn't been done for a long time, if ever.22:11
*** sshnaidm is now known as sshnaidm|afk22:11
clarkbthis is me thinking out loud: we can copy those roles by hand now, but maybealso push up a change to run install-ansible in service-bridge.yaml then have mordred look that over tomorrow?22:11
corvusoh, i failed at grep; we did run service-brdige in run_all.sh22:13
*** yuri has quit IRC22:13
*** yuri has joined #opendev22:13
clarkbor maybe we extend install-ansible-roles to cover this set of roles too22:14
clarkb(that might make it easier to understand how this happens in the future)22:14
corvusclarkb: i agree with your 22:11 plan22:15
corvus(i also think moving this to install-ansible-roles might be a good idea, but non-trivial since it's a different mechanism)22:16
corvus(but at least it'd be in a role with the right name :)22:16
clarkbya the loop in intsall-ansible-modules is wrong for this set of roles22:16
clarkbfungi: ^ what do you think should we go ahead with idea at 22:11?22:16
fungiyes, seems reasonable. skimming, it doesn't seem destructive (and if we used to run it in run_all.sh then i think that confirms)22:17
openstackgerritJames E. Blair proposed opendev/system-config master: Add install-ansible to service-bridge  https://review.opendev.org/72696122:17
fungiand sorry, bit of a space cadet at the moment, just finishing up evening chores and cooking/eating dinner22:17
corvusmordred: https://review.opendev.org/72696122:18
corvusi see all the roles in the dir now22:18
corvusdid anyone move them into place, or did ansible just do that?22:19
fungii did nothing, so must be ansible ghosts22:19
corvusclarkb: ^?22:19
fungignomes?22:19
clarkbI did not22:19
fungii bet they're actually gremlins22:19
corvusi guess maybe something does run that role after all....?22:19
clarkbcorvus: must be22:19
corvusi'll just abandon that change22:19
fungithe hourly could have just run since we were looking22:19
corvusfungi: yeah, i just didn't think the hourly ran the role.  <shrug>22:20
clarkbinfra-prod-install-ansible <- did that job run?22:20
* clarkb is looking22:20
corvusfungi: anyway, want to re-run your re-enqueue?22:20
clarkbhttps://zuul.opendev.org/t/openstack/build/e0a401e507584688a5ce50d15bc1793b ya I think that was it22:21
fungiclarkb: last ran 2020-05-07T20:40:24 in deploy and took 4 mins 16 secs22:21
clarkband Ishouldn't dismiss zuul things so quickly in my greps because zuul is running the show22:21
fungii thought we were looking after that though22:21
clarkbfungi: the one above ran just now22:21
fungiwow, yep22:21
fungi2020-05-11T21:00:1022:22
fungii should have refreshed22:22
fungiokay, 726907,3 reenqueued22:22
fungiclarkb: https://zuul.opendev.org/t/openstack/build/e0a401e507584688a5ce50d15bc1793b was over an hour ago... i'm guessing there was a newer one at 22:00 that just hasn't reported yet22:24
clarkboh I was off by an hour22:25
fungiit happens22:25
clarkbya I bet we havne't cmpleted that buildset so no logs yet22:25
clarkbbut the job itself has run22:25
fungiright22:25
ianwis there a summary of what's going on, if i can help?22:26
fungiianw: the start was when we discovered builds in ovh were failing because our mirror servers there spontaneously went offline22:27
fungithat prompted pushing a change to turn max-servers in both ovh regions to 022:28
ianwright ok, that's the ones rebuilt with non-billable flavors now?22:28
fungiwell, not yet22:29
fungivery nice ovh folk turned everything back on, and let us know that we should use the non-billed flavors22:29
fungibut the max-servers never got applied on nl0422:29
fungii was waiting for utilization there to drop to 0 before starting mirror rebuilds22:29
fungithough i did delete the unused mirror instances and volumes in the meantime22:29
ianwahh ok; that seems like something i can help with if it's getting late for you22:30
fungiat this point we've just been iterating on figuring out why infra-prod-service-nodepool isn't working22:30
ianwalso on my todo was to add focal to the mirror testing22:30
clarkbI'm still around for a bit too but allergies have been really bad today (some of that was my own fault going on a bike ride not realizing the air was yellow)22:30
fungievening is rapidly bearing down on my slice of the globe22:30
fungiso i likely don't have much bandwidth to do mirror rebuilds22:31
fungibut this seems like a good opportunity to switch over to opendev.org mirrors with letsencrypt certs in ovh22:31
fungii think the non-billed flavors will likely need cinder volumes for apache/afs caches22:32
ianwok, i can look at that; so both bhs and gra?22:34
clarkbianw: oui22:34
fungiianw: thanks! i should be able to take over when i wake up, if there's still more to do22:35
fungii'll probably also be around for a few more hours to review config changes22:35
fungijust dealing with the usual waning brain function which accompanies sunset22:36
fungiinfra-prod-service-nodepool SUCCESS!22:36
fungimax-servers: 022:37
funginow we're cooking with gas22:37
fungihopefully utilization there will begin to trail off22:37
ianwahhh ok so we needed that to work to update the config :)22:37
fungiyes ;)22:37
*** DSpider has quit IRC22:38
fungithanks mordred, corvus, clarkb for working through that!22:38
ianwok i just need to transition to school supervisor mode ... (i.e. take laptop to dining room table and ensure ipads are used for good and not evil)22:39
clarkbianw: that sounds very familiar22:39
fungiwait, they're not being taught evil?22:39
clarkbfungi: kids are born with in innate ability to do evil22:39
fungioh, good point, no need to teach them more of that22:40
fungithey're already experts22:40
ianwi wonder if i can get in one run of the mirrors on focal to see if it's an option ...22:46
fungino objection here22:46
fungiwe debated it, but didn't want to venture there without tapping your current knowledge of the topic22:47
fungiparticularly regarding openafsclient et cetera22:48
ianwi think we've got everything ... openafs should be up and i think my change to test in base merged22:48
ianwactually no, but we probably should - https://review.opendev.org/#/c/725676/22:48
fungii'll take a look now22:48
ianwthat just keeps the basic setup clean22:49
fungilooks like the job still works, 41 tasks changed on the focal node22:50
openstackgerritIan Wienand proposed opendev/system-config master: Add focal testing for mirror nodes  https://review.opendev.org/72697022:53
*** tkajinam has joined #opendev22:55
fungiianw: i think you got cut-n-paste happy there and missed the critical bit, see inline comment22:56
ianwhaha yes was just wondering why it was pulling from bionic repos!22:57
clarkboh heh22:57
clarkbmy brain is already non functioning22:57
*** tosky has quit IRC22:57
fungii'm clearly not far enough into this beer yet22:57
openstackgerritIan Wienand proposed opendev/system-config master: Add focal testing for mirror nodes  https://review.opendev.org/72697022:58
ianwistr throwing an arm focal test at dib last week too, i think it failed22:59
clarkbianw: mordred has a change up to upload arm focal control nodes too fwiw23:00
clarkb(it also uploads non arm to all the other clouds too)23:00
ianwvia cloud launcher?23:01
clarkbya23:02
ianwok 726886 got it23:03
ianwanother one if you could take a poke at is https://review.opendev.org/#/c/726040/23:04
ianwit skips the kubctl stuff on arm64 so the arm64 base test works23:04
clarkbdone23:04
ianwonly because we have to keep the bridge roles minimally working on arm64, enough to just fire off ansible on hte hosts e care about23:04
fungimakes sense23:05
openstackgerritIan Wienand proposed opendev/system-config master: Upload focal images to all control plane clouds  https://review.opendev.org/72688623:09
ianwclarkb: ^ as a test i could upload to OVH manually and bootstrap a test23:10
clarkbianw: ya I think its already manually uploaded to rax23:12
ianwovh has a Ubuntu 20.04 at any rate23:13
*** sshnaidm|afk has quit IRC23:14
*** sshnaidm has joined #opendev23:15
*** sshnaidm is now known as sshnaidm|afk23:16
openstackgerritMerged opendev/system-config master: Add focal to system-config base job  https://review.opendev.org/72567623:16
ianwso i'll try it with ssd-osFoundation-3 flavor23:16
ianwfungi: was there anything important to consider in creating volumes to attach as far as you know?23:18
clarkbianw: I think we need ~200GB of disk total for two different mount points. I think a single cinder volume is fine23:18
fungiyeah, basically giving apache and afs 200g combined to spread out in, without overrunning the rootfs23:19
clarkbapache wants 100gb iirc. we set the cache prune limit lower, but because it runs periodically we need headroom to grow into23:19
ianwfungi: umm, what if there is no ssd-osFoundation-3 flavor?23:22
clarkbI think they mentioned htat and were going to fix it, I guess that didn't happen?23:23
clarkb14:41:00*          amorin | dont hesite to send me a mail, I can also enable the ssd-osFoundation-3 flavor on your other tenant (I remmeber you have 2 tenants)23:23
clarkbhrm I may have misread that as "we will add that flavor" but they weren't actually planning on it?23:23
ianwi'm not seeing it on openstackci or openstackzuul23:23
fungiinteresting, yeah i hadn't gotten far enough to check that yet23:24
clarkbssd-osFoundation-3 that is what we filter for in nodepool23:24
clarkbso it must be there in openstackzuul?23:24
fungimaybe you can catch amorin in the emea morning at least23:24
ianw# OS_CLIENT_CONFIG_FILE=/etc/openstack/all-clouds.yaml openstack --os-cloud=openstackzuul-ovh --os-region=GRA1 flavor list | grep osFoundation23:25
clarkbianw: note there isn't an all-clouds.yaml anymore we can just use the default now23:26
fungiyeah, no need to pass OS_CLIENT_CONFIG_FILE now23:28
fungialso the credentials for openstackjenkins-ovh seem to be incorrect in there23:31
fungiThe request you have made requires authentication. (HTTP 401) (Request-ID: req-1c302da4-7997-4810-9f46-49b041042b4f)23:31
clarkbhrm thats different than the http503 we had before (but similar error)23:32
clarkboh ya is it openstackzuul-ovh or jenkins?23:32
clarkbianw: ^ your grep might be hiding an error?23:32
ianwindeed, yeah it's jenkins23:32
fungiin the clouds.yaml on bridge it's jenkins23:32
fungibut either the credentials in there are wrong or that account is now disabled too23:33
ianwfungi: the openstackjenkins-ovh account just worked for me, and saw the osFoundation flavors23:33
ianwhowever, i'm not seeing it in openstackci-ovh23:33
fungihah, apparently the ~/launch-env/bin/openstack i have isn't authenticating for openstackjenkins-ovh but is authenticating for openstackci-ovh23:35
fungiif i use /usr/local/bin/openstack instead both work23:35
fungiand yeah, i also misread amorin's comment as indicating he had copied that flavor to the other project, but i hadn't yet gotten around to verifying that. oh well23:39
ianwi'll send a mail, cc infra-root23:40
clarkbianw: thnaks23:40
fungiawesome, thanks a bunch!23:41
ianwTask: letsencrypt-create-certs : Populate service facts23:52
ianwMalformed output discovered from systemd list-unit-files: acpid.service                          disabled        enabled23:52
*** kevinz has quit IRC23:52
ianwthat i did not expect ...23:52
clarkbis that ansible complaining aobut list-unit-files output?23:52
ianwyeah, i wonder if ansible itself isn't focal happy23:52
ianwhttps://github.com/ansible/ansible/issues/6853623:55
ianwso that puts a bit of a twist in that plan too23:56
clarkbhas that made it into a bug fix for a release yet? seems like that would be important for naisble to do (hopefully we don't have to go through upgrading all of ansible just to do focal)23:58

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!