Saturday, 2020-04-18

*** DSpider has quit IRC00:12
*** mlavalle has quit IRC00:31
*** roman_g has quit IRC03:37
*** DSpider has joined #opendev07:58
*** ralonsoh has joined #opendev09:19
*** ralonsoh has quit IRC11:27
openstackgerritMonty Taylor proposed opendev/system-config master: Make applytest files outside of system-config  https://review.opendev.org/72084814:00
openstackgerritMonty Taylor proposed opendev/system-config master: Move puppet apply jobs to system-config repo  https://review.opendev.org/72088714:00
openstackgerritMonty Taylor proposed opendev/system-config master: Remove unused rspec tests  https://review.opendev.org/72080214:14
openstackgerritMonty Taylor proposed opendev/system-config master: Make applytest files outside of system-config  https://review.opendev.org/72084814:14
openstackgerritMonty Taylor proposed opendev/system-config master: Move puppet apply jobs to system-config repo  https://review.opendev.org/72088714:14
openstackgerritMonty Taylor proposed opendev/system-config master: Remove global variables from manifest/site.pp  https://review.opendev.org/72080014:14
openstackgerritMonty Taylor proposed openstack/project-config master: Use legacy infra puppet jobs from system-config  https://review.opendev.org/72088914:14
mordredfungi, clarkb, corvus: if you have some bored time - that stack above ^^ the applytest patch is the more important one (I fixed the error I think) but the others are part of starting to clean up the mess that is our puppet testing14:21
openstackgerritMonty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers  https://review.opendev.org/71762014:29
openstackgerritMonty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers  https://review.opendev.org/71762014:30
*** roman_g has joined #opendev14:37
*** roman_g has quit IRC14:37
AJaegermordred: one suggestion for https://review.opendev.org/72088714:40
openstackgerritMonty Taylor proposed opendev/system-config master: Stop cloning a bunch of puppet modules we don't use  https://review.opendev.org/72089214:42
openstackgerritMonty Taylor proposed opendev/system-config master: Make applytest files outside of system-config  https://review.opendev.org/72084814:45
openstackgerritMonty Taylor proposed opendev/system-config master: Move puppet apply jobs to system-config repo  https://review.opendev.org/72088714:45
openstackgerritMonty Taylor proposed opendev/system-config master: Remove global variables from manifest/site.pp  https://review.opendev.org/72080014:45
openstackgerritMonty Taylor proposed opendev/system-config master: Stop cloning a bunch of puppet modules we don't use  https://review.opendev.org/72089214:45
mordredAJaeger: ++14:46
openstackgerritMonty Taylor proposed openstack/project-config master: Use legacy infra puppet jobs from system-config  https://review.opendev.org/72088915:31
openstackgerritMonty Taylor proposed openstack/project-config master: Stop running jobs on unused puppet repos  https://review.opendev.org/72090015:31
openstackgerritMonty Taylor proposed openstack/project-config master: Retire unused puppet modules  https://review.opendev.org/72090115:31
openstackgerritMonty Taylor proposed opendev/system-config master: Stop cloning a bunch of puppet modules we don't use  https://review.opendev.org/72089215:31
openstackgerritMonty Taylor proposed openstack/project-config master: Use legacy infra puppet jobs from system-config  https://review.opendev.org/72088915:55
openstackgerritMonty Taylor proposed openstack/project-config master: Stop running jobs on unused puppet repos  https://review.opendev.org/72090015:55
openstackgerritMonty Taylor proposed openstack/project-config master: Retire unused puppet modules  https://review.opendev.org/72090115:55
openstackgerritMonty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers  https://review.opendev.org/71762015:58
*** olaph_ has joined #opendev16:00
*** olaph has quit IRC16:00
*** olaph_ is now known as olaph16:00
openstackgerritMonty Taylor proposed opendev/system-config master: Remove global variables from manifest/site.pp  https://review.opendev.org/72080016:03
openstackgerritMonty Taylor proposed opendev/system-config master: Stop cloning a bunch of puppet modules we don't use  https://review.opendev.org/72089216:03
AJaegermordred: let's first migrate the jobs, merge 720892 - and then retire repos.16:07
AJaegerPlease propose for all repos a change to "empty" it16:07
mordredAJaeger: ++16:17
openstackgerritMonty Taylor proposed opendev/system-config master: Remove global variables from manifest/site.pp  https://review.opendev.org/72080016:43
corvusokay, i'm going to muck about on the zk servers16:46
mordredcorvus: sounds like fun - I'm here if you want eyeballs on anything16:47
corvusmordred: thanks!16:48
corvuszk01 is the leader; i'm going to kill it and see what happens16:49
openstackgerritMonty Taylor proposed opendev/system-config master: Fix a typo in letsencrypt cron job name  https://review.opendev.org/72090416:51
openstackgerritMonty Taylor proposed opendev/system-config master: Run Zuul using Ansible and Containers  https://review.opendev.org/71762016:53
corvusrestarted zk01, no problems there; zk03 is the new leader16:56
corvusi'll restart 02 now16:56
mordredcorvus: I wonder - if you shut one of them down (so that a leader election happens but also so that zk notices there are only 2) - and then run ansible to write out the config files16:57
corvusmordred: ?16:59
mordredcorvus: hrm - as I try to write a better vesion of that I think maybe I'm just on crack - so nevermind :)16:59
corvusmordred: currently zk does not have write permission for either of the config files, so they shouldn't be changing17:00
corvus2 is back up; no problems so far17:00
corvusi'm going to restart 3 now; it's the current leader, which means we'll get another epoch (we started at 0xd, we're currently at 0xf)17:01
corvus2 won the electtion, the new epoch is 0x117:03
corvusstarting 317:03
corvusit's back up, no complaints17:04
mordredcorvus: that is disappointing17:05
corvusthat appears to be a complete rolling restart with no errors.  the sequence was: leader (1), follower (2), newleader (3)17:05
corvusi think the sequence previously was: follower (1), follower (2), leader (3).  perhaps that makes a difference17:05
corvusmaybe next i should try 1-3-2 to try to approximate that17:06
corvusoh, another difference, is that i am running 'docker stop' and not 'docker-compose down'17:07
corvusi'll keep using docker stop for the next test17:07
mordred++17:08
corvus(the permissions should forbid it writing any state anywhere, but maybe i'm wrong about that -- maybe there's something in /tmp or somesuch)17:08
corvusi've restarted 1 and 3 without incident; 2 (leader) is next17:11
corvus3 is new leader17:12
corvusepoch xs 0x1117:12
corvusis ^17:12
mordredcorvus: this is unfortunately problem free17:13
corvusyeah, 2 is up and running again17:13
clarkbperhaps docker stop is more graceful than docker-compose down?17:13
corvusmaybe17:13
clarkbthat said you'd expect zk to handle ungraceful power outages17:13
mordredyeah17:13
mordredkind of pointless to have an HA system if it can't17:14
corvusbut since 3 is the leader now, we can replay the sequence from yesterday: follower (1), follower (2), leader (3)17:14
mordred++17:14
corvusi'll do that, but still using docker stop17:14
corvusso the only known variables will be docker vs docker-compose and the traffic volume17:14
corvus(there's still enough volume for us to have increasing zxids, but clearly not as much as yesterday)17:15
corvus(that could have secondary effects though, like driving when a snapshot is written, which could have some impact)17:15
corvusanyway, i'll do 1-2-3(leader) now with docker stop17:16
corvus1,2 done; starting 3 now17:19
corvus2 is leader17:20
corvus3 is back up; no issues17:20
corvusokay, maybe it's time to try 1-3-2(leader) with docker-compose down17:21
corvusi'm poking around inside the container -- the entire zk binary installation dir is writable by the zookeeper user (however, we are not running as the zookeeper user, so we can't write to it)17:24
corvus(that seems like an error in the build process)17:24
corvusthere's nothing interesting in /tmp17:24
corvusokay, restarting 1 with d-c down17:25
openstackgerritMonty Taylor proposed opendev/system-config master: Remove global variables from manifest/site.pp  https://review.opendev.org/72080017:26
corvusall back up without error; 3 is the new leader17:28
corvusi want to repeat that once more with the 1-2-3 sequence from yesterday with docker-compose17:28
*** hrw has quit IRC17:29
corvusyeah, that worked flawlessly too17:30
corvusin my mind, that suggests that either the error yesterday was due to some inconsistency that has been ironed out since (maybe something related to the formatting of the myid files after the upgrade); or it's driven by load either directly (like a race condition with connections) or indirectly (like timing of snapshots during the restart process)17:32
corvusat least it doesn't happen every time.  so i think maybe we should keep an eye on the system and see if it happens again17:33
mordred++17:33
mordredyeah- like17:33
mordredabsent more data - it seems like there's nothing more we can do - but I suppose it's good that you've rolling restarted it multiple times now with no issue17:33
*** hrw has joined #opendev17:34
* fungi is around-ish now as well17:38
fungisounds like a heisenbug :/17:40
mordredzomg!17:43
mordredcorvus: in good news - https://review.opendev.org/#/c/717620/ and https://review.opendev.org/#/c/720527/ (zuul and nodepool in containers) are both finally green!17:44
*** hashar has joined #opendev17:49
*** hashar has quit IRC17:49
openstackgerritJames E. Blair proposed opendev/system-config master: Meetpad: proxy through meetpad to etherpad.opendev.org  https://review.opendev.org/72009518:03
corvusmordred: zomg :)18:03
mordredcorvus: ikr?18:09
openstackgerritMonty Taylor proposed opendev/system-config master: Remove global variables from manifest/site.pp  https://review.opendev.org/72080018:12
mordredcorvus: quick meetbot patch: https://review.opendev.org/#/c/720904/18:12
corvusmordred: huh.  we don't actually use that18:13
corvusmaybe we should send that patch upstream18:13
corvus(that's just for our fork of the image)18:13
corvusmordred: https://github.com/jitsi/docker-jitsi-meet/blob/master/web/rootfs/etc/cont-init.d/10-config#L3918:13
fungiyeah, our letsencrypt renewals are all handled from system-config18:15
*** tobiash has quit IRC18:17
*** tobiash has joined #opendev18:18
mordredcorvus: ah - hah18:33
mordred(I mostly just found it when I was grepping for our letsencrypt stuff but typoed on the command line :) )18:33
corvusheh, i wondered :)18:33
fungiclearly a common typographical error18:35
fungior typing error at least18:36
*** smcginnis has quit IRC18:36
openstackgerritMonty Taylor proposed opendev/system-config master: Make applytest files outside of system-config  https://review.opendev.org/72084818:42
openstackgerritMonty Taylor proposed opendev/system-config master: Move puppet apply jobs to system-config repo  https://review.opendev.org/72088718:42
openstackgerritMonty Taylor proposed opendev/system-config master: Remove global variables from manifest/site.pp  https://review.opendev.org/72080018:42
openstackgerritMerged opendev/system-config master: Meetpad: proxy through meetpad to etherpad.opendev.org  https://review.opendev.org/72009518:53
mordredinfra-root: cleanup stack ending https://review.opendev.org/#/c/720800/ is green and ready to go19:39
*** smcginnis has joined #opendev20:43
*** noonedeadpunk has quit IRC21:06
*** noonedeadpunk has joined #opendev21:07
*** DSpider has quit IRC21:40
*** hrw has quit IRC22:34
*** paladox has quit IRC22:37
*** hrw has joined #opendev22:40
*** paladox has joined #opendev22:42

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!