Thursday, 2018-09-27

*** rcarrillocruz has quit IRC01:11
sfbenderTristan de Cacqueray created software-factory/sfinfo master: Add patternfly-react-ui-deps package  https://softwarefactory-project.io/r/1378903:53
sfbenderMerged software-factory/sfinfo master: Add patternfly-react-ui-deps package  https://softwarefactory-project.io/r/1378903:55
sfbenderMerged www.softwarefactory-project.io master: 2018-38 summary  https://softwarefactory-project.io/r/1370605:41
*** sfbender has quit IRC06:06
*** nijaba has quit IRC06:07
*** nijaba has joined #softwarefactory06:08
*** chkumar|off is now known as chandankumar07:18
*** jpena|off is now known as jpena08:01
spredzytristanC: yo08:16
spredzyIf you're around, any way you can help me out figure out why https://github.com/ansible/awx/pull/2309 isn't running the proper job on zuul ?08:17
spredzyI see the event being picked up by zuul, but seems it attaches no job to it08:17
tristanCspredzy: would you like to join mumble?08:20
* spredzy joins08:21
spredzyhttps://github.com/ansible/zuul-config/blob/master/zuul.d/projects.yaml#L7-L1208:22
spredzyhttps://github.com/ansible/zuul-jobs/blob/master/zuul.d/templates.yaml08:22
tristanCspredzy: i think you need https://ansible.softwarefactory-project.io/docs/zuul/user/config.html#attr-pragma.implied-branch-matchers08:31
tristanCbecause zuul.layout: Project template <ProjectConfig python source: ansible/zuul-jobs/zuul.d/templates.yaml@master {ImpliedBranchMatcher:master}> did not match item08:32
tristanCspredzy: https://ansible.softwarefactory-project.io/docs/zuul/user/config.html#attr-pragma.implied-branches08:35
*** sfbender has joined #softwarefactory08:57
sfbenderMerged software-factory/managesf master: Fix issue where we expect to have the group resources  https://softwarefactory-project.io/r/1377608:57
sfbenderMerged software-factory/sf-config master: Fix config/sf-jobs location path gen for external gerrit  https://softwarefactory-project.io/r/1376009:02
*** zoli is now known as zoli|lunch09:59
*** zoli|lunch is now known as zoli09:59
*** jpena is now known as jpena|lunch11:06
matburttristanC how's that nodepool config with the ansible static nodes looking?11:42
tristanCmatburt: it's looking like: https://softwarefactory-project.io/cgit/config/tree/nodepool/ansible.yaml11:49
tristanCmatburt: shanemcd-: i sent you a mail about it btw (titled "Ansible zuul/nodepool setup with AWS"), but it seems like it didn't reached any of the rcpt inbox :(11:50
matburtlet me look11:51
matburthah I do have it... not sure why I glossed over it.12:01
sfbenderMerged www.softwarefactory-project.io master: Add Kubernetes Nodepool Driver blog post  https://softwarefactory-project.io/r/1372512:02
matburttristanC given that we have the template defined in zuul-jobs (and no current definition on runner or awx) can you assign the nodeset to a project template?12:03
tristanCmatburt: yes, you can set the "nodeset" job attribute similarly to the "vars", e.g. here: https://github.com/ansible/zuul-jobs/blob/master/zuul.d/templates.yaml#L1212:04
tristanCor here is another example: https://softwarefactory-project.io/cgit/DLRN/tree/.zuul.yaml#n5312:05
tristanCspredzy: not sure what happened, but https://github.com/ansible/zuul-jobs/commits/master 503 :)12:05
tristanCoh nevermind, it's now loading12:06
matburthttps://github.com/ansible/zuul-jobs/pull/1212:17
matburtlet me rekick my awx job... which is busted now, but I want to see what's needed on the static nodes12:18
tristanCmatburt: we might want to adapt the base job to make sure the src directory is absent before copying the workspace, i don't think it's actually cleaned otherwise12:19
matburthmm sounds good12:19
matburtI have to hop on a meeting, I'll be back in a few12:19
matburtscratch that... no meeting.12:22
matburtI reckon I'm going to need some dependencies installed on the static nodes12:22
tristanCmatburt: we should keep static node customisation in a playbook to be merged in ansible/zuul-config/nodepool, next to the script i proposed to create the k8s image12:24
tristanCmatburt: then we could have post and periodic job that may run it12:24
matburtagreed12:25
tristanCmatburt: we actually do that for runC slave, a similar job can be added to ansible/zuul-config post pipeline12:25
tristanCmatburt: shanemcd-: btw, the blogpost about k8s nodepool driver is now published here: https://www.softwarefactory-project.io/kubernetes-nodepool-driver-tech-preview.html12:27
matburtnice... I want to dig into that more for our smoke tests tristanC12:28
matburtwhich I'll need to turn up pretty soon12:29
matburtThis week the goal is to get linters and unit/functional tests running. Next week shanemcd- and I are going to be at Ansiblefest12:30
tristanCmatburt: sure, well i haven't added the k8s prodiver in production nodepool, it's only working in my sandbox. let me know when you are ready to use it and we'll enable it on sf-project.io12:31
matburttristanC do you think that k8s job should merge?12:31
tristanCwhat do you mean by "should merge"?12:32
matburtokay gotcha. I don't want to rock the boat... if we have some time we might can turn it up this week (in the short time we have left) otherwise we might wait until after fest.12:32
matburthttps://github.com/ansible/zuul-config/pull/2112:32
matburtthat's passing the checks and just waiting to merge12:32
tristanCmatburt: actually that job should be marked as "abstract", it doesn't have a run phase and doesn't do anything, it's meant to be used as a parent job for your smoke tests job12:33
matburtgotcha12:33
tristanCmatburt: we could merge and iterate over the script, or keep it open until we are satisfied with the content, i don't mind either way12:34
matburtit looks like this just produces an image so it might not necessarily need to run on every build?12:34
*** jpena|lunch is now known as jpena12:34
matburtI might need to look into how abstract tasks work and parent jobs and how those work together with regular jobs12:35
tristanCmatburt: the awx-test-image.sh isn't actually used, it's just the recipe i used for docker://docker.io/softwarefactoryproject/awx-test-image12:35
matburtyep indeed12:36
matburtalso a good chance to dive into buildah ;)12:36
matburtOh I see... base-pod12:36
* spredzy would need to dive into it too - never used up until now12:43
matburttristanC do you have some examples of how yall typically prepare nodepool members for use? Something I could borrow for inspiration when putting together the static nodepool systems12:44
tristanCmatburt: it's not pretty, but we run these tasks on the runC host: https://softwarefactory-project.io/cgit/config/tree/nodepool/runC/customize.yaml12:49
matburtexcellent, I appreciate that12:49
tristanCyou could write a static-customize playbook that runs on a "ansible-static" group, then we could generate the inventory out of that list: https://softwarefactory-project.io/cgit/config/tree/nodepool/ansible.yaml12:50
*** zoli is now known as zoli|afk13:05
pabelangertristanC: nhicher: we have 6 nodes marked ready in nodepool, but zuul isn't using them13:43
pabelangercan you look at nodepool-launcher and see why?13:43
pabelangerremote:   https://softwarefactory-project.io/r/13792 Drop max-ready-age to 30mins13:46
pabelangerI am going to approve ^ to see if we properly clean them up13:47
pabelangerbut, I also think we are reaching the limits of a single nodpeool-launcher13:47
pabelangerI'd love for us to create nl02 for software-factory13:48
pabelangerokay, nodepool managed to clean them up13:54
pabelangerbut would be interested to see why we are leaking them to start with13:54
nhicherpabelanger: there is vexxhost-ansible-network-mtl1 in node in-use13:55
pabelangernhicher: yah, before that we had 5 vexxhost-ansible-network-sjc1 nodes ready for 30+mins13:56
pabelangerthat shouldn't happen, because we set min-ready: 0, to avoid billing charges13:56
pabelangerso, don't know if nodepool-launcher is overloaded, or we are some how launcher too many VMs against requests13:57
pabelangerI've lowered max-ready-age to 30mins to help, but means in some cases we now wait 30mins for jobs to run13:57
pabelangerI'm thinking 5mins is likely a good time for max-age, but really want to know why we are leaking them in the first place13:58
nhicherpabelanger: there is a lot of error on launcher.log, I will check was is the issue after my daily meeting14:00
pabelangernhicher: great, thanks14:00
nhichernot only for ansible-network, but tripleo, rdocloud ...14:00
pabelangernhicher: is it quota errors?14:02
nhicherpabelanger: ERROR nodepool.NodeLauncher-0000193913: Launch failed for node $uuid14:08
pabelangernhicher: is there atraceback?14:09
pabelangernhicher: is nodepool-launcher at 100%CPU by chance?14:11
*** zoli|afk is now known as zoli14:14
nhicherfloating ip issue pabelanger :/14:17
pabelangernhicher: for vexxhost?14:18
pabelangerwe shouldn't be using FIPs there14:18
nhicherno, for rdo-cloud*14:18
pabelangernhicher: ack, so are you seeing any errors for vexxhost specifically?14:18
nhicherI have to check, the error was for ansible-network-vyos-1.1.814:19
pabelangernhicher: ah, okay14:19
pabelangeryes, that is still on rdocloud14:19
pabelanger1 sec, let me get you a UUID for vexxhost14:20
nhicherright now, there are 4 nodes in-use for vexxhost14:20
pabelangernhicher: ansible-fedora-28-vexxhost-ansible-network-mtl1-000019449214:21
pabelangernhicher: can you check the state changes for that in nodpeool-launcher14:21
pabelangereg: I think that was ready for a good 20 mins14:21
pabelangerbut never allocated to zuul14:21
nhicherhttps://softwarefactory-project.io/paste/show/1219/14:21
nhicherhttps://softwarefactory-project.io/paste/show/1220/14:23
pabelangernhicher: yah, running jobs are working great. There seems to be some lag between state changes between nodepool and zuul.14:24
pabelangeropenstack says node is ready, but nodepool or zuul doesn't see ready I think14:24
pabelangerSo, trying to figure out if it is related to excess load on nodepool-launcher (100% cpu) or some other issue in nodepool / zuul14:25
nhichernodepool-launcher 30% cpu (we have 4 cores)14:29
nhicherpabelanger: load average: 0.16, 0.24, 0.3114:29
pabelangernhicher: what about the nodepool-launcher pid?14:29
pabelangersadly it isn't multicore14:30
nhicherpabelanger: we already had this issue https://tree.taiga.io/project/morucci-software-factory/issue/156114:46
pabelangernhicher: yah, the launcher logs should help here.14:47
pabelangernhicher: 0000194492 should be a good example to look at14:47
nhicherpabelanger: for 194492, nodepool started to build node at 14:17:38 and zuul started job at 14:18:1914:51
pabelangerhmm14:51
pabelangernhicher: okay, let me try and reproduce14:51
pabelangerneed to get a few jobs into check pipeline14:51
nhicherpabelanger: https://softwarefactory-project.io/paste/show/1221/14:52
shanemcd-Hi tristanC apologies I have been MIA for most of this week. Things have been hectic here getting ready for AnsibleFest. I will pick this up ASAP.14:54
shanemcd-Thank you again for all your help on this.14:54
pabelangernhicher: classic heisenbug15:00
pabelangerthings seem to be working well now15:01
pabelangerthe next time is happens, I'll grab the uuid15:01
nhicherpabelanger: ok15:03
*** chandankumar is now known as chkumar|off15:46
sfbenderFabien Boucher created software-factory/managesf master: wip - resources add cli tool  https://softwarefactory-project.io/r/1379416:01
matburttristanC if you get a chance can you take a look at this: https://ansible.softwarefactory-project.io/logs/09/2309/c781c6a116ce5bddf875379204187a0d86277de2/check/tox/015f1ef/ara-report/16:48
matburtn/m... revoked sudo for zuul16:55
pabelangerlooks like network issue16:57
pabelangerhttps://ansible.softwarefactory-project.io/logs/09/2309/c781c6a116ce5bddf875379204187a0d86277de2/check/tox/015f1ef/job-output.txt.gz#_2018-09-27_16_45_22_77084316:57
pabelangeroh, is this a VM?16:58
pabelangeror oci16:58
pabelangerstatic-ansible16:58
pabelangerguessing a container16:59
pabelangermatburt: I think the issue is, you don't have access to sudo there16:59
pabelangerso anytime you use it, it will fail16:59
matburtit actually was that zuul *had* access to sudo and couldn't revoke it on it's own16:59
matburtthis is a GCE instance so google has special sudo groups that the user needed to be removed from17:00
pabelangerokay, then we likely need to update revoke-sudo role17:00
pabelangerhttps://git.zuul-ci.org/cgit/zuul-jobs/tree/roles/revoke-sudo17:01
pabelangermatburt: keep in mind, might be the first person using GCE, so expect issues some of the zuul-jobs17:02
matburtyah, they are just static nodes17:02
matburttrying to get our deps to line up17:02
pabelangermatburt: there are VMs in rdocloud which will get the jobs working out of box. you could start with them to get jobs working, then migrate once jobs shake out17:04
matburtThat's what we did with runner, they were great but we were looking to get some dedicated capacity17:05
matburtso we worked with tristanC to get these turned up17:05
matburtWith AWX we need some dependencies locally17:06
matburtand some of our tests can be kind-of... heavy.17:06
pabelangerthere is also vexxhost to run some stuff too, we're using 2 regions there now but only in ansible-network tenant atm. We still need to discuss some budget this around that also17:06
pabelangermatburt: what sort of VM requirements?17:06
matburtWe have 3x 4-core 16GB of memory systems17:07
pabelangermatburt: okay, so jobs today need a 16GB system to run? That is an intergration test right17:08
matburtWe're also working with tristanC to turn up a k8s nodepool... we have that active now, we'll start seeing about configuring that and getting it into yalls infra17:08
matburtWell, they could... good to have the breathing room17:08
matburta lot of awx development and testing runs out of containers and we'd love to run our smoke tests from k8s... this is a great opportunity to do that17:09
pabelangertoday in vexxhost we have capacity for 1vcpu/1gb and 4vcpu/4gb. We can launch more flavors, just haven't because of cost related.  But have the ability to go all the way up to 64gb ram I think17:09
matburtWe have a good bit of wiggle room with what infrastructure we can bring up17:09
matburtit's not unusual for us to bring up some extremely large instances for one-off testing17:10
pabelangermatburt: yah, the only concern I have right now, is zuul is self hasn't landed container support yet. So this is all experimental, I know it will get landed upstream, but been in the works for some time.  So, something to consider if you are hoping to base testing of that17:10
*** jpena is now known as jpena|off17:11
pabelangerhttps://review.openstack.org/#/c/560136/17:11
matburtWe're okay with being on the vanguard of that effort17:11
matburtWe can certainly *do* our testing in containers on the static nodes... it'd be fantastic if we could do it in k8s17:11
pabelangerright, I would suggest maybe start working on your containers on VMs. get that 100%, then tristanC and SF team land the upstream patches for k8s into zuul. I know people like bmw and godaddy are also looking for that support too17:13
pabelangerand with fest next week, i think it a great time to give that feedback to zuul team17:13
pabelangertalking with mnaser, vexxhost also has k8s capacity, which means we could launch the cluster their too17:14
pabelangereven multi-region, if vexxhost supported it17:14
mnaseri've been thinking of like17:18
mnaserzun + kata17:18
*** zoli is now known as zoli|gone17:42
*** sshnaidm is now known as sshnaidm|off18:03

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!