Wednesday, 2018-10-10

tristanCmatburt: looking now00:30
tristanCfor some reason, nodepool unregistered the static nodes00:32
tristanCok, commenting the nodepool provider section and uncommenting triggered the re-registering, jobs are scheduled now00:38
pabelangertristanC: is it possible the provider died in nodepool-launcher during rdocloud outage?00:40
matburtwe did notice what looked like an rdocloud outage00:46
matburtit lasted, maybe 10 or 15m?00:47
pabelangeror maybe OOM? I've seen a few times nodepool provider die but not restart. I cannot remember the last one00:47
pabelangerhaven't really used the static node driver, maybe some edge case where it stops working00:48
tristanCmatburt: certainely related to the rdocloud outage, we usually needs to clean services state after00:49
tristanCpabelanger: in this case the provider didn't died, but the static node was no longer registered and the provider keep the request in queue because of missing quota00:49
pabelangertristanC: any thoughts how we can have nodepool re-register the provider?00:50
tristanCpabelanger: we should discuss having a "check" periodic method in nodepool, so that driver can make sure zookeeper state match configuration and reality00:52
tristanCin nodepool driver api*00:52
pabelangertristanC: with a static node, does min-ready do anything?00:52
tristanCpabelanger: i don't think, the driver works differently, the node are created by the provider00:53
tristanCand they are re-used00:53
pabelangeryah, not really sure how it works right now00:54
pabelangerI think tobiash is using it at bmw, but not sure who else00:54
tristanCpabelanger: iiuc this bit: https://git.zuul-ci.org/cgit/nodepool/tree/nodepool/driver/static/provider.py#n9600:56
tristanCpabelanger: then the driver simply deleted the node during the outage because they were unreachable00:56
tristanCpabelanger: it seems like the driver is missing code to re-add them automatically00:56
pabelangertristanC: yes, that is not surprising to me TBH.01:01
tristanChum, and ansible jobs are not starting, here is the tb: https://ansible.softwarefactory-project.io/paste/show/21/01:09
matburtthat's strange... it's mentioning a tag that we removed recently01:11
pabelangeroh, if you delete the tag, that might make zuul unhappy01:12
matburtwe weren't happy to have to do it... it leaked over from our private tower repo01:12
matburtwhat sort of resolution do we have for this?01:12
pabelangerI would guess reclone awx repos in zuul01:12
pabelangerbut only way I know that today, is to rm -rf the repo01:13
tristanCmatburt: pabelanger: yep, i can state: absent /var/lib/zuul/executor/github.com/ansible/01:13
tristanCpabelanger: rm, gross :-)01:13
tristanCmatburt: could it be possible the tag was removed when zuul was d/c from github?01:13
tristanCe.g. during the outage yesterday?01:13
pabelangerI'm not acutally sure what zuul will do if you delete a tag it has01:15
tristanCpabelanger: removing the executor cache doesn't seems to be enough...01:18
pabelangerI think you would also need to do merger01:19
pabelangeror, if you dequeue / enqueue job, does that cause zuul to refer git refs01:20
tristanCpabelanger: if only we could dequeue/enqueue github change...01:20
pabelangerrefresh*01:20
pabelangertristanC: close / reopen the PR01:21
pabelangershould do it01:21
matburthmm01:24
matburtso these 4 PRs need to be closed and then opened again?01:24
tristanCpabelanger: it seems like executor and merger needs to be restarted, they are holding an internal cache of ref according to https://git.zuul-ci.org/cgit/zuul/tree/zuul/merger/merger.py#n54801:25
pabelangeryah, deleting tags is bad :)01:26
pabelangerI say, try to cycle the PRs, see of that helps01:26
pabelangerotherwise, we maybe confirm in #zuul01:26
pabelangerand restart zuul-mergers and zuul-executors01:26
pabelangerjeblair likely knows the right play here01:26
matburtokay just did it on 2391... lets see what happens01:27
matburtokay done on all of them01:28
matburtlooks like things are getting happy again? Is this something we should report to the zuul devs or is it known?01:29
tristanCmatburt: oh indeed, it fixed the issue. thanks!01:30
tristanCmatburt: i left a message on #zuul, surely something to be fixed in merger logic01:30
pabelangerI would have expected the jobs with failing tag to fail01:31
pabelangerthen report back to PR01:31
pabelangerbut looks like that didn't happen?01:31
tristanC"deleting tag when jobs are queued but not running" is quite a edge case :)01:31
tristanCpabelanger: no, i think it failed too early to be consider a job failure01:31
pabelangeryah, we should be able to write a unit test for it01:31
matburtI've spun up a few more static nodes that I'd love to add to the nodepool… I can't remember if I set up that PR before or if you did tristanC?01:32
matburtAnother thing that I wanted to talk to yall about01:32
pabelangermatburt: because I don't know, where are the static nodes now?01:32
tristanCmatburt: i did, the change was: https://softwarefactory-project.io/r/#/c/13747/01:33
matburtthey are static nodes in GCE also... identical to the ones we have now01:33
pabelangermatburt: is there something specific to GCE here or just that you have capacity there right now01:33
matburtWe like dealing with GCE, it's also where we host our k8s clusters and registries01:34
matburtSo from zuul's perspective… they are just other static nodes, the fact that they are in GCE isn't super relevant from a technical perspective01:34
*** tristanC has quit IRC01:37
*** logan- has quit IRC01:37
*** sbadia has quit IRC01:37
*** gundalow has quit IRC01:37
*** sshnaidm|afk has quit IRC01:37
*** jpena|off has quit IRC01:37
*** trishnag has quit IRC01:37
*** mattclay has quit IRC01:37
*** pabelanger has quit IRC01:37
*** ganeshrn has quit IRC01:37
*** matburt has quit IRC01:37
*** zoli has quit IRC01:37
*** jruzicka has quit IRC01:37
*** mnaser has quit IRC01:37
*** mordred has quit IRC01:37
*** fbo has quit IRC01:37
*** rcarrillocruz has quit IRC01:37
*** nhicher has quit IRC01:37
*** jangutter has quit IRC01:37
*** shanemcd has quit IRC01:37
*** dmsimard has quit IRC01:37
*** mhu has quit IRC01:37
*** chandankumar has quit IRC01:37
*** fc__ has quit IRC01:37
*** spredzy has quit IRC01:37
*** ChanServ has quit IRC01:37
*** chandankumar has joined #softwarefactory01:43
*** logan- has joined #softwarefactory01:43
*** mhu has joined #softwarefactory01:43
*** dmsimard has joined #softwarefactory01:43
*** shanemcd has joined #softwarefactory01:43
*** jangutter has joined #softwarefactory01:43
*** nhicher has joined #softwarefactory01:43
*** rcarrillocruz has joined #softwarefactory01:43
*** ganeshrn has joined #softwarefactory01:44
*** matburt has joined #softwarefactory01:44
*** zoli has joined #softwarefactory01:44
*** fc__ has joined #softwarefactory01:45
*** spredzy has joined #softwarefactory01:45
*** sshnaidm|afk has joined #softwarefactory01:45
*** jpena|off has joined #softwarefactory01:45
*** trishnag has joined #softwarefactory01:45
*** mattclay has joined #softwarefactory01:45
*** sbadia has joined #softwarefactory01:45
*** card.freenode.net sets mode: +o sbadia01:45
*** gundalow has joined #softwarefactory01:45
*** jruzicka has joined #softwarefactory01:46
*** mnaser has joined #softwarefactory01:46
*** mordred has joined #softwarefactory01:46
*** pabelanger has joined #softwarefactory01:46
*** fbo has joined #softwarefactory01:46
dmsimardit's just a server to ssh into :p01:46
matburtthat was a super unfortunate netsplit… I'm not sure what the last message yall got was01:47
pabelanger01:34:45    matburt | So from zuul's01:47
pabelangerwas my last message01:47
pabelangerthen I replied01:47
pabelangerokay, cool. would be great to show awx jobs running in other places then GCE, in case there is issues with that provider for some reason.01:47
matburtobviously... once shanemcd has some time we'd like to start ramping up to test out the kubernetes integration01:47
*** ChanServ has joined #softwarefactory01:47
*** card.freenode.net sets mode: +o ChanServ01:47
*** tristanC has joined #softwarefactory01:48
tristanCmatburt: that's correct01:49
matburtthere's another thing I'd like to talk to yall about but this might not be the best place to do it01:49
tristanCmatburt: i can do bluejean now if you prefer01:50
dmsimardoh I can do bluejeans too01:51
matburtyep lets do that01:52
matburthttps://bluejeans.com/544270066101:53
dmsimardmatburt: https://pagure.io/standard-test-roles/ is the fedora thing that spawns a vm to run tests on based on ansible02:10
matburtIiinteresting02:16
*** nilashishc has joined #softwarefactory03:59
*** logan- has quit IRC07:09
*** logan- has joined #softwarefactory07:11
spredzytristanC: yo07:55
spredzyCould I ask for your approval on https://github.com/ansible/zuul-jobs/pull/2307:56
spredzyI'll remove the become: yes from ansile/zuul-config07:59
tristanCspredzy: done08:02
spredzythanks for the comment I missed that08:03
* spredzy blesses reviews08:03
*** sshnaidm|afk is now known as sshnaidm08:22
*** logan- has quit IRC08:23
*** logan- has joined #softwarefactory08:27
*** rcarrillocruz has quit IRC08:30
*** jangutter has quit IRC08:32
*** jangutter has joined #softwarefactory08:33
*** jangutter has quit IRC08:37
*** jangutter has joined #softwarefactory08:37
*** nilashishc has quit IRC08:45
*** zoli is now known as zoli|lunch09:31
*** zoli|lunch is now known as zoli09:31
*** jpena|off has quit IRC10:58
*** sshnaidm is now known as sshnaidm|afk11:04
*** sshnaidm|afk is now known as sshnaidm11:24
*** zoli is now known as zoli|wfh12:01
*** zoli|wfh is now known as zoli|afk12:01
*** zoli|afk is now known as zoli12:01
*** zoli is now known as zoli|brb14:15
*** sfbender has joined #softwarefactory14:19
sfbenderFabien Boucher created software-factory/sf-config master: Make the welcome page resources connections aware  https://softwarefactory-project.io/r/1389214:19
*** zoli|brb is now known as zoli14:25
*** zoli is now known as zoli|wfh14:25
*** zoli|wfh is now known as zoli14:25
*** chandankumar is now known as chkumar|off14:55
*** sshnaidm has quit IRC15:58
*** sshnaidm has joined #softwarefactory15:59
*** zoli is now known as zoli|gone17:25
*** zoli|gone is now known as zoli17:25
*** sshnaidm is now known as sshnaidm|afk22:33

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!