Thursday, 2018-09-27

*** rcarrillocruz has quit IRC		01:11
sfbender	Tristan de Cacqueray created software-factory/sfinfo master: Add patternfly-react-ui-deps package https://softwarefactory-project.io/r/13789	03:53
sfbender	Merged software-factory/sfinfo master: Add patternfly-react-ui-deps package https://softwarefactory-project.io/r/13789	03:55
sfbender	Merged www.softwarefactory-project.io master: 2018-38 summary https://softwarefactory-project.io/r/13706	05:41
*** sfbender has quit IRC		06:06
*** nijaba has quit IRC		06:07
*** nijaba has joined #softwarefactory		06:08
*** chkumar\|off is now known as chandankumar		07:18
*** jpena\|off is now known as jpena		08:01
spredzy	tristanC: yo	08:16
spredzy	If you're around, any way you can help me out figure out why https://github.com/ansible/awx/pull/2309 isn't running the proper job on zuul ?	08:17
spredzy	I see the event being picked up by zuul, but seems it attaches no job to it	08:17
tristanC	spredzy: would you like to join mumble?	08:20
* spredzy joins		08:21
spredzy	https://github.com/ansible/zuul-config/blob/master/zuul.d/projects.yaml#L7-L12	08:22
spredzy	https://github.com/ansible/zuul-jobs/blob/master/zuul.d/templates.yaml	08:22
tristanC	spredzy: i think you need https://ansible.softwarefactory-project.io/docs/zuul/user/config.html#attr-pragma.implied-branch-matchers	08:31
tristanC	because zuul.layout: Project template <ProjectConfig python source: ansible/zuul-jobs/zuul.d/templates.yaml@master {ImpliedBranchMatcher:master}> did not match item	08:32
tristanC	spredzy: https://ansible.softwarefactory-project.io/docs/zuul/user/config.html#attr-pragma.implied-branches	08:35
*** sfbender has joined #softwarefactory		08:57
sfbender	Merged software-factory/managesf master: Fix issue where we expect to have the group resources https://softwarefactory-project.io/r/13776	08:57
sfbender	Merged software-factory/sf-config master: Fix config/sf-jobs location path gen for external gerrit https://softwarefactory-project.io/r/13760	09:02
*** zoli is now known as zoli\|lunch		09:59
*** zoli\|lunch is now known as zoli		09:59
*** jpena is now known as jpena\|lunch		11:06
matburt	tristanC how's that nodepool config with the ansible static nodes looking?	11:42
tristanC	matburt: it's looking like: https://softwarefactory-project.io/cgit/config/tree/nodepool/ansible.yaml	11:49
tristanC	matburt: shanemcd-: i sent you a mail about it btw (titled "Ansible zuul/nodepool setup with AWS"), but it seems like it didn't reached any of the rcpt inbox :(	11:50
matburt	let me look	11:51
matburt	hah I do have it... not sure why I glossed over it.	12:01
sfbender	Merged www.softwarefactory-project.io master: Add Kubernetes Nodepool Driver blog post https://softwarefactory-project.io/r/13725	12:02
matburt	tristanC given that we have the template defined in zuul-jobs (and no current definition on runner or awx) can you assign the nodeset to a project template?	12:03
tristanC	matburt: yes, you can set the "nodeset" job attribute similarly to the "vars", e.g. here: https://github.com/ansible/zuul-jobs/blob/master/zuul.d/templates.yaml#L12	12:04
tristanC	or here is another example: https://softwarefactory-project.io/cgit/DLRN/tree/.zuul.yaml#n53	12:05
tristanC	spredzy: not sure what happened, but https://github.com/ansible/zuul-jobs/commits/master 503 :)	12:05
tristanC	oh nevermind, it's now loading	12:06
matburt	https://github.com/ansible/zuul-jobs/pull/12	12:17
matburt	let me rekick my awx job... which is busted now, but I want to see what's needed on the static nodes	12:18
tristanC	matburt: we might want to adapt the base job to make sure the src directory is absent before copying the workspace, i don't think it's actually cleaned otherwise	12:19
matburt	hmm sounds good	12:19
matburt	I have to hop on a meeting, I'll be back in a few	12:19
matburt	scratch that... no meeting.	12:22
matburt	I reckon I'm going to need some dependencies installed on the static nodes	12:22
tristanC	matburt: we should keep static node customisation in a playbook to be merged in ansible/zuul-config/nodepool, next to the script i proposed to create the k8s image	12:24
tristanC	matburt: then we could have post and periodic job that may run it	12:24
matburt	agreed	12:25
tristanC	matburt: we actually do that for runC slave, a similar job can be added to ansible/zuul-config post pipeline	12:25
tristanC	matburt: shanemcd-: btw, the blogpost about k8s nodepool driver is now published here: https://www.softwarefactory-project.io/kubernetes-nodepool-driver-tech-preview.html	12:27
matburt	nice... I want to dig into that more for our smoke tests tristanC	12:28
matburt	which I'll need to turn up pretty soon	12:29
matburt	This week the goal is to get linters and unit/functional tests running. Next week shanemcd- and I are going to be at Ansiblefest	12:30
tristanC	matburt: sure, well i haven't added the k8s prodiver in production nodepool, it's only working in my sandbox. let me know when you are ready to use it and we'll enable it on sf-project.io	12:31
matburt	tristanC do you think that k8s job should merge?	12:31
tristanC	what do you mean by "should merge"?	12:32
matburt	okay gotcha. I don't want to rock the boat... if we have some time we might can turn it up this week (in the short time we have left) otherwise we might wait until after fest.	12:32
matburt	https://github.com/ansible/zuul-config/pull/21	12:32
matburt	that's passing the checks and just waiting to merge	12:32
tristanC	matburt: actually that job should be marked as "abstract", it doesn't have a run phase and doesn't do anything, it's meant to be used as a parent job for your smoke tests job	12:33
matburt	gotcha	12:33
tristanC	matburt: we could merge and iterate over the script, or keep it open until we are satisfied with the content, i don't mind either way	12:34
matburt	it looks like this just produces an image so it might not necessarily need to run on every build?	12:34
*** jpena\|lunch is now known as jpena		12:34
matburt	I might need to look into how abstract tasks work and parent jobs and how those work together with regular jobs	12:35
tristanC	matburt: the awx-test-image.sh isn't actually used, it's just the recipe i used for docker://docker.io/softwarefactoryproject/awx-test-image	12:35
matburt	yep indeed	12:36
matburt	also a good chance to dive into buildah ;)	12:36
matburt	Oh I see... base-pod	12:36
* spredzy would need to dive into it too - never used up until now		12:43
matburt	tristanC do you have some examples of how yall typically prepare nodepool members for use? Something I could borrow for inspiration when putting together the static nodepool systems	12:44
tristanC	matburt: it's not pretty, but we run these tasks on the runC host: https://softwarefactory-project.io/cgit/config/tree/nodepool/runC/customize.yaml	12:49
matburt	excellent, I appreciate that	12:49
tristanC	you could write a static-customize playbook that runs on a "ansible-static" group, then we could generate the inventory out of that list: https://softwarefactory-project.io/cgit/config/tree/nodepool/ansible.yaml	12:50
*** zoli is now known as zoli\|afk		13:05
pabelanger	tristanC: nhicher: we have 6 nodes marked ready in nodepool, but zuul isn't using them	13:43
pabelanger	can you look at nodepool-launcher and see why?	13:43
pabelanger	remote: https://softwarefactory-project.io/r/13792 Drop max-ready-age to 30mins	13:46
pabelanger	I am going to approve ^ to see if we properly clean them up	13:47
pabelanger	but, I also think we are reaching the limits of a single nodpeool-launcher	13:47
pabelanger	I'd love for us to create nl02 for software-factory	13:48
pabelanger	okay, nodepool managed to clean them up	13:54
pabelanger	but would be interested to see why we are leaking them to start with	13:54
nhicher	pabelanger: there is vexxhost-ansible-network-mtl1 in node in-use	13:55
pabelanger	nhicher: yah, before that we had 5 vexxhost-ansible-network-sjc1 nodes ready for 30+mins	13:56
pabelanger	that shouldn't happen, because we set min-ready: 0, to avoid billing charges	13:56
pabelanger	so, don't know if nodepool-launcher is overloaded, or we are some how launcher too many VMs against requests	13:57
pabelanger	I've lowered max-ready-age to 30mins to help, but means in some cases we now wait 30mins for jobs to run	13:57
pabelanger	I'm thinking 5mins is likely a good time for max-age, but really want to know why we are leaking them in the first place	13:58
nhicher	pabelanger: there is a lot of error on launcher.log, I will check was is the issue after my daily meeting	14:00
pabelanger	nhicher: great, thanks	14:00
nhicher	not only for ansible-network, but tripleo, rdocloud ...	14:00
pabelanger	nhicher: is it quota errors?	14:02
nhicher	pabelanger: ERROR nodepool.NodeLauncher-0000193913: Launch failed for node $uuid	14:08
pabelanger	nhicher: is there atraceback?	14:09
pabelanger	nhicher: is nodepool-launcher at 100%CPU by chance?	14:11
*** zoli\|afk is now known as zoli		14:14
nhicher	floating ip issue pabelanger :/	14:17
pabelanger	nhicher: for vexxhost?	14:18
pabelanger	we shouldn't be using FIPs there	14:18
nhicher	no, for rdo-cloud*	14:18
pabelanger	nhicher: ack, so are you seeing any errors for vexxhost specifically?	14:18
nhicher	I have to check, the error was for ansible-network-vyos-1.1.8	14:19
pabelanger	nhicher: ah, okay	14:19
pabelanger	yes, that is still on rdocloud	14:19
pabelanger	1 sec, let me get you a UUID for vexxhost	14:20
nhicher	right now, there are 4 nodes in-use for vexxhost	14:20
pabelanger	nhicher: ansible-fedora-28-vexxhost-ansible-network-mtl1-0000194492	14:21
pabelanger	nhicher: can you check the state changes for that in nodpeool-launcher	14:21
pabelanger	eg: I think that was ready for a good 20 mins	14:21
pabelanger	but never allocated to zuul	14:21
nhicher	https://softwarefactory-project.io/paste/show/1219/	14:21
nhicher	https://softwarefactory-project.io/paste/show/1220/	14:23
pabelanger	nhicher: yah, running jobs are working great. There seems to be some lag between state changes between nodepool and zuul.	14:24
pabelanger	openstack says node is ready, but nodepool or zuul doesn't see ready I think	14:24
pabelanger	So, trying to figure out if it is related to excess load on nodepool-launcher (100% cpu) or some other issue in nodepool / zuul	14:25
nhicher	nodepool-launcher 30% cpu (we have 4 cores)	14:29
nhicher	pabelanger: load average: 0.16, 0.24, 0.31	14:29
pabelanger	nhicher: what about the nodepool-launcher pid?	14:29
pabelanger	sadly it isn't multicore	14:30
nhicher	pabelanger: we already had this issue https://tree.taiga.io/project/morucci-software-factory/issue/1561	14:46
pabelanger	nhicher: yah, the launcher logs should help here.	14:47
pabelanger	nhicher: 0000194492 should be a good example to look at	14:47
nhicher	pabelanger: for 194492, nodepool started to build node at 14:17:38 and zuul started job at 14:18:19	14:51
pabelanger	hmm	14:51
pabelanger	nhicher: okay, let me try and reproduce	14:51
pabelanger	need to get a few jobs into check pipeline	14:51
nhicher	pabelanger: https://softwarefactory-project.io/paste/show/1221/	14:52
shanemcd-	Hi tristanC apologies I have been MIA for most of this week. Things have been hectic here getting ready for AnsibleFest. I will pick this up ASAP.	14:54
shanemcd-	Thank you again for all your help on this.	14:54
pabelanger	nhicher: classic heisenbug	15:00
pabelanger	things seem to be working well now	15:01
pabelanger	the next time is happens, I'll grab the uuid	15:01
nhicher	pabelanger: ok	15:03
*** chandankumar is now known as chkumar\|off		15:46
sfbender	Fabien Boucher created software-factory/managesf master: wip - resources add cli tool https://softwarefactory-project.io/r/13794	16:01
matburt	tristanC if you get a chance can you take a look at this: https://ansible.softwarefactory-project.io/logs/09/2309/c781c6a116ce5bddf875379204187a0d86277de2/check/tox/015f1ef/ara-report/	16:48
matburt	n/m... revoked sudo for zuul	16:55
pabelanger	looks like network issue	16:57
pabelanger	https://ansible.softwarefactory-project.io/logs/09/2309/c781c6a116ce5bddf875379204187a0d86277de2/check/tox/015f1ef/job-output.txt.gz#_2018-09-27_16_45_22_770843	16:57
pabelanger	oh, is this a VM?	16:58
pabelanger	or oci	16:58
pabelanger	static-ansible	16:58
pabelanger	guessing a container	16:59
pabelanger	matburt: I think the issue is, you don't have access to sudo there	16:59
pabelanger	so anytime you use it, it will fail	16:59
matburt	it actually was that zuul had access to sudo and couldn't revoke it on it's own	16:59
matburt	this is a GCE instance so google has special sudo groups that the user needed to be removed from	17:00
pabelanger	okay, then we likely need to update revoke-sudo role	17:00
pabelanger	https://git.zuul-ci.org/cgit/zuul-jobs/tree/roles/revoke-sudo	17:01
pabelanger	matburt: keep in mind, might be the first person using GCE, so expect issues some of the zuul-jobs	17:02
matburt	yah, they are just static nodes	17:02
matburt	trying to get our deps to line up	17:02
pabelanger	matburt: there are VMs in rdocloud which will get the jobs working out of box. you could start with them to get jobs working, then migrate once jobs shake out	17:04
matburt	That's what we did with runner, they were great but we were looking to get some dedicated capacity	17:05
matburt	so we worked with tristanC to get these turned up	17:05
matburt	With AWX we need some dependencies locally	17:06
matburt	and some of our tests can be kind-of... heavy.	17:06
pabelanger	there is also vexxhost to run some stuff too, we're using 2 regions there now but only in ansible-network tenant atm. We still need to discuss some budget this around that also	17:06
pabelanger	matburt: what sort of VM requirements?	17:06
matburt	We have 3x 4-core 16GB of memory systems	17:07
pabelanger	matburt: okay, so jobs today need a 16GB system to run? That is an intergration test right	17:08
matburt	We're also working with tristanC to turn up a k8s nodepool... we have that active now, we'll start seeing about configuring that and getting it into yalls infra	17:08
matburt	Well, they could... good to have the breathing room	17:08
matburt	a lot of awx development and testing runs out of containers and we'd love to run our smoke tests from k8s... this is a great opportunity to do that	17:09
pabelanger	today in vexxhost we have capacity for 1vcpu/1gb and 4vcpu/4gb. We can launch more flavors, just haven't because of cost related. But have the ability to go all the way up to 64gb ram I think	17:09
matburt	We have a good bit of wiggle room with what infrastructure we can bring up	17:09
matburt	it's not unusual for us to bring up some extremely large instances for one-off testing	17:10
pabelanger	matburt: yah, the only concern I have right now, is zuul is self hasn't landed container support yet. So this is all experimental, I know it will get landed upstream, but been in the works for some time. So, something to consider if you are hoping to base testing of that	17:10
*** jpena is now known as jpena\|off		17:11
pabelanger	https://review.openstack.org/#/c/560136/	17:11
matburt	We're okay with being on the vanguard of that effort	17:11
matburt	We can certainly do our testing in containers on the static nodes... it'd be fantastic if we could do it in k8s	17:11
pabelanger	right, I would suggest maybe start working on your containers on VMs. get that 100%, then tristanC and SF team land the upstream patches for k8s into zuul. I know people like bmw and godaddy are also looking for that support too	17:13
pabelanger	and with fest next week, i think it a great time to give that feedback to zuul team	17:13
pabelanger	talking with mnaser, vexxhost also has k8s capacity, which means we could launch the cluster their too	17:14
pabelanger	even multi-region, if vexxhost supported it	17:14
mnaser	i've been thinking of like	17:18
mnaser	zun + kata	17:18
*** zoli is now known as zoli\|gone		17:42
*** sshnaidm is now known as sshnaidm\|off		18:03

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!