Saturday, 2020-05-23

corvusugh, we probably should have just frozen the whole system00:01
corvusi spent the day trying to get the letsencrypt playbook to run, cause we're still not at the point where my change could actually deploy00:01
corvusbut the project addition change jumped the gun on it :/00:01
corvus(presumably because it didn't have to run all the failing base playbooks)00:01
corvusmordred: i've got like another minute i can help; are things stable?00:06
mordredcorvus: I think things are stable yeah - the rename at least didn't seem to break anything further00:06
fungii think things were stable enough anyway, i just didn't realize tenant config updates were going to be blocked00:06
fungiso had started digging into it to figure out why00:07
corvuswell, they weren't blocked :/00:07
mordredcorvus: I think leaving it in place is fine - it should make the running of the service-zuul playbook work when the other parts do00:07
fungier, prevented from getting applied to the scheduler i mean00:07
corvusyeah, i mean, we basically may have possibly jumped to the end of the process00:07
corvusdepends on whether we whacked all the moles00:07
corvuswe'll need to check the periodic pipeline to see00:08
corvus(at this point, we should stop enqueing old changes into deploy, since a much newer change has actually deployed)00:08
fungiunless someone's had a chance to run through the whole inventory, i expect we still probably have at least a few ipv4 addresses we'll need to accept ssh host keys for00:09
corvusfungi: what happened is that i expected the zuuld change to possibly break, but because of all the problems with the base and letsencrypt playbooks, it was never applied (that's what i've spent 2 days on).  but a project-config change applies the same zuul playbook without running the base or letsencrypt changes first, so that was when it was finally applied.  so it broke, as expected.  just not when00:11
corvusexpected.00:11
mordredfungi: I could run a quick ansible command on all and accept the rest of the host keys00:12
corvusso i think this is another aspect of the base playbook tree that we need to keep in mind as we consider how to re-work it00:12
fungii see. so we should have frozen any approval of project-config changes is what you were saying00:12
fungiat least any touching files which could set off the zuul deploy job anyway00:12
corvusfungi: yep; a characteristic of our new/current system is that there are multiple paths to get to the same deployment step.  that wasn't the case before.  project-config changes are a shortcut.00:13
fungimordred: like a no-op sort of command across the whole inventory? that ought to to the trick00:13
mordredfungi: yeah - I'm running -mshell -aecho right now00:14
fungiawesome, thanks00:14
mordredit was really pissed off about logstash-worker0100:15
corvusokay, i have to run.  i think the key thing to know is that we won't be sure the system is in equilibrium until we have a clean periodic run.  when that happens, we should be gtg.  until then, there could still be dragons.00:15
fungithanks corvus!00:16
fungii'll try to remember to check the periodic builds when i wake up tomorrow00:16
mordredfungi: ok. all done00:19
mordredfungi: I had to replace an old key for logstash-worker01 - but other than that all should be good and happy now00:19
fungithanks again!00:20
*** ysandeep|away is now known as ysandeep00:38
*** Meiyan has joined #opendev01:03
*** larainema has joined #opendev02:19
*** DSpider has joined #opendev02:21
*** stephenfin has quit IRC05:21
*** stephenfin has joined #opendev05:28
*** slaweq has joined #opendev06:20
*** sgw has quit IRC06:32
*** roman_g has quit IRC07:15
*** slaweq has quit IRC07:16
*** lpetrut has joined #opendev07:21
zbrmorning! any core around today or busy shopping for the bbq?07:21
*** lpetrut has quit IRC07:22
openstackgerritgugug proposed openstack/project-config master: Retire kolla-cli project - step 1 end project gating  https://review.opendev.org/73043107:31
openstackgerritgugug proposed openstack/project-config master: Retire kolla-cli project - step 3 remove from infra system  https://review.opendev.org/73043207:40
openstackgerritgugug proposed openstack/project-config master: Retire kolla-cli project - step 1 end project gating  https://review.opendev.org/73043107:40
*** Meiyan has quit IRC07:51
openstackgerritSorin Sbarnea (zbr) proposed opendev/elastic-recheck master: Bumped flake8  https://review.opendev.org/72932807:55
*** moppy has quit IRC08:01
*** moppy has joined #opendev08:01
*** slaweq has joined #opendev08:32
*** slaweq has quit IRC08:47
*** slaweq has joined #opendev08:49
zbrinfra-core: please merge ^08:54
openstackgerritSorin Sbarnea (zbr) proposed zuul/zuul-jobs master: Make gentoo jobs nv  https://review.opendev.org/72864009:17
*** slaweq has quit IRC10:12
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-subunit-output: Do not fail if find is not installed  https://review.opendev.org/73044810:23
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-subunit-output: Do not fail if find is not installed  https://review.opendev.org/73044810:36
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-subunit-output: Do not fail if find is not installed  https://review.opendev.org/73044810:38
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: WIP: add simple test runner  https://review.opendev.org/72868410:44
*** tosky has joined #opendev11:00
*** slaweq has joined #opendev11:02
*** yuri has quit IRC11:11
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: zuul-test: Add more labels  https://review.opendev.org/73044911:11
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: zuul-test: Add more labels  https://review.opendev.org/73044911:14
*** hrw has quit IRC13:07
*** Eighth_Doctor is now known as Eleventh_Doctor13:11
*** hrw has joined #opendev13:19
mordredzbr: looks good - I left a +2 with a comment that could either be done on that or as a followup13:29
fungimordred: looks like all our periodic deploy jobs ran cleanly: https://zuul.opendev.org/t/openstack/builds?pipeline=periodic&project=opendev%2Fsystem-config13:40
fungiso i think that means the ssh host key update took care of those13:40
mordredfungi: \o/13:40
fungi*however* we don't seem to run the zuul deploy in periodic?13:40
mordredI think we run it hourly?13:41
mordredfungi: opendev-prod-hourly is the pipeline13:41
mordredfungi: and it is red13:41
fungiaha, yep13:42
fungihttps://zuul.opendev.org/t/openstack/builds?pipeline=opendev-prod-hourly&project=opendev%2Fsystem-config13:42
fungii just went looking for it13:42
fungialso infra-prod-remote-puppet-else is failing there13:42
fungi"chown failed: failed to look up user zuul"13:44
fungialso the puppet-else log seems to include a lot of failures due to "fatal: Unable to create '/etc/ansible/roles/puppet/.git/index.lock': File exists."13:47
fungion various servers13:47
fungii assume removing /etc/ansible/roles/puppet (or all of /etc/ansible/roles maybe?) on those servers will cause them to get re-cloned correctly on the next pass13:48
mordredfungi: yeah13:48
mordredfungi: I mean - I'm honestly not sure why we're pushing ansible roles to those servers13:48
mordredfungi: WTF13:53
mordredfungi: OH13:53
mordred***OH****13:53
fungion the "failed to look up user zuul" message?13:53
mordredno - the puppet one13:53
fungiis there a become?13:53
fungioh, okay13:53
mordredwe're running install-ansible-roles delegated to localhost but on each host13:54
mordredwhich means we're trying to clone in parallel13:54
* mordred makes patch13:54
openstackgerritMonty Taylor proposed opendev/system-config master: Only install ansible roles once per run  https://review.opendev.org/73046013:59
mordredfungi: ^^ that should fix the git error13:59
fungioh! they were git collisions on bridge.o.o and not the individual servers?14:03
fungithat makes more sense14:03
openstackgerritMonty Taylor proposed opendev/system-config master: Fix a few missing zuul_user usages  https://review.opendev.org/73046114:05
mordredfungi: yeah14:05
mordredfungi: and that should fix the zuul issues14:06
fungiaha, yep14:07
fungiand at least these are triggered hourly so we should know quickly if there's anything else hiding behind those errors14:08
mordredyah - well, they're also triggered from landing - so we should know real quick14:15
fungioh, right that too ;)14:22
corvusmordred, fungi: retro +2d thx :)14:41
mordredcorvus: I think as fallout goes, that was pretty low :)14:44
corvusya :)14:44
openstackgerritMerged opendev/system-config master: Only install ansible roles once per run  https://review.opendev.org/73046014:56
openstackgerritMerged opendev/system-config master: Fix a few missing zuul_user usages  https://review.opendev.org/73046115:28
fungion the run triggered by 730460 we still hit "fatal: Unable to create '/etc/ansible/roles/puppet/.git/index.lock': File exists." on logstash-worker01 and logstash-worker1815:37
fungithose are the only task failures i see in the log though15:37
mordredWEIRD15:51
mordredI cannot account for that behavior15:51
fungialso the infra-prod-service-zuul build triggered by 730461 failed, though i'm mucking with my internet connection so can't check the log for it just yet16:08
fungilooks much better though16:36
fungithe executors all failed with errors like "usermod: user zuuld is currently used by process NNNN"16:36
fungiwhich in retrospect is expected16:36
fungiwe need the executors offline to apply that change16:36
fungii suppose i could stop half of them, wait for a periodic cycle, start those again and stop the other half, wait for another periodic cycle, then start them back up16:37
fungithe tenant config change from Open10K8S yesterday finally got applied at least16:38
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: WIP: add simple test runner  https://review.opendev.org/72868417:12
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: zuul-test: Add more labels  https://review.opendev.org/73044917:13
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: zuul-test: Add more labels  https://review.opendev.org/73044917:13
openstackgerritMarcin Juszkiewicz proposed opendev/base-jobs master: add arm64 nodesets  https://review.opendev.org/72881019:31
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: ensure-package-repositories: fix loopvar collision  https://review.opendev.org/73047720:14
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Fix deprecation warning from multinode tests  https://review.opendev.org/73047921:26
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Fix deprecation warning from multinode tests  https://review.opendev.org/73047921:28
*** DSpider has quit IRC22:14
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-subunit-output: stop using system os-testr  https://review.opendev.org/73048222:34
fungiinfra-root: out of curiosity i stopped zuul-executor on ze01 and that allowed ansible to run to completion on it in the next hourly pulse, *but* now the service won't start. first thing i spot is that the initscript references USER=zuul22:38
fungi(and of course now /etc/password only has a zuuld user)22:38
fungii'll leave it stopped for now22:39
fungimaybe we only need to patch https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/zuul-executor/files/zuul-executor.init#L21 ?22:40
openstackgerritJeremy Stanley proposed opendev/system-config master: Update username in Zuul executor initscript  https://review.opendev.org/73048322:43
fungipushed that ^ in case folks think it's the next step22:43
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-subunit-output: stop using system os-testr  https://review.opendev.org/73048222:46
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: WIP: add simple test runner  https://review.opendev.org/72868423:05
*** slaweq has quit IRC23:05
*** Meiyan has joined #opendev23:19
*** tosky has quit IRC23:27

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!