Tuesday, 2018-10-09

tristanCspredzy: let's just remove the zuul src dir before the job starts, and that's it00:25
tristanCpabelanger: why would you like to redesing job for static node?00:27
tristanCpabelanger: static driver is merged in nodepool, are you saying we shouldn't use it?!00:27
pabelangertristanC: right, I'm suggesting we just use a node from nodepool for tox jobs, since that is how we designed them upstream. I am sure we can get them working on a static node, but if awx first moves to a VM, that will unblock the testing with zuul.00:36
tristanCspredzy: https://github.com/ansible/zuul-config/pull/33 should fix static node cleanup00:45
pabelangerwe really should be first proposing changes to base-minimal-test, to avoid merging untested code00:49
pabelangerbut seems ansible/zuul-config doesn't have a job for that00:49
pabelangerhttps://github.com/ansible/zuul-config/commit/7bb2282039c7d3241f9503916437d7679bb1ffa5#diff-dbd9cd0ce7e9a5628770143f1488ff59L4800:50
pabelangerthat should be reverted00:50
pabelangerI'll be online tomorrow to chat with awx team. Chat more inthe morning00:51
matburtpabelanger tristanC Thanks for yalls help... we're just having some trouble with the checkout dir hanging around between test runs00:51
*** tristanC has quit IRC01:01
*** tristanC has joined #softwarefactory01:01
*** nilashishc has joined #softwarefactory04:36
*** nilashishc has quit IRC06:02
*** nilashishc has joined #softwarefactory06:04
*** nilashishc has quit IRC06:06
*** nilashishc has joined #softwarefactory06:06
tristanCthere are now 7 executors on sf-project.io and we udpated the zuul webui to the new patternfly-react interface07:41
*** jangutter has quit IRC07:55
*** jangutter has joined #softwarefactory07:55
*** sshnaidm|afk is now known as sshnaidm08:29
spredzytristanC: looks good with the new patternfly-react interface :)08:35
tristanCspredzy: shouldn't we give https://github.com/ansible/zuul-config/pull/33/files a try? (e.g. merge, check if it works, revert if not, move-on if it worked...)09:05
spredzywait, let me put it on WIP. Waiting to hear back from pabelanger on an alternative approach today09:09
spredzyOopsie, that not my PR but yours :)09:10
spredzytristanC: but we can still go with it - just to prove its working. I mean even in non-static, worst case scenario the '{{ ansible_user_dir }}/src' doesn't exist anymore09:11
spredzys/anymore//09:11
*** zoli is now known as zoli|afk09:12
*** zoli|afk is now known as zoli09:12
spredzytristanC: I'll merge it09:13
tristanCspredzy: i think that's fair09:13
ganeshrn https://softwarefactory-project.io/r/#/c/13885/ <-- can someone please review and merge this PR09:14
* tristanC waiting for a recheck09:14
tristanCganeshrn: done09:15
ganeshrntristanC: cool, thanks!09:15
tristanCspredzy: seems to be good, 2018-10-09 09:15:54.352773 | TASK [Clean workspace]    static | changed09:16
tristanCspredzy: i rechecked a runc job to make sure base still works09:17
sfbenderMerged software-factory/sf-config master: zuul: adapt gateway rewrite for React interface  https://softwarefactory-project.io/r/1372709:17
tristanCspredzy: all good, container | ok09:18
tristanCspredzy: oops, actually: https://ansible.softwarefactory-project.io/logs/66/2266/d68989bfd8f34a572f383d42daaf22aaf037d2aa/check/tox-awx-ui/ddd2d79/log-classify.html09:19
spredzytristanC: we can use become: True at that stage09:19
spredzyrevoke-sudo hasn't been called yet09:20
spredzyif I am not mistaken09:20
tristanCspredzy: hum, but can zuul become on static node? revoke-sudo would be permanent...09:20
spredzycorrect sorry09:21
tristanCif that's the case, then those files are left-over which needs to be manually cleaned09:22
spredzytristanC: can we not revoke-sudo ?09:22
tristanCspredzy: yes sure, but then zuul won't be able to sudo on subsequent run...09:22
spredzyat least until we work out why those files are root and have the necessary change in to make then zuul:zuul09:23
sfbenderMerged software-factory/sf-ci master: zuul: fix gateway test for React interface  https://softwarefactory-project.io/r/1372809:23
spredzytristanC: not following09:23
tristanCspredzy: what's the owner of 35.230.187.160:/home/zuul/src/github.com/ansible/awx/awx.egg-info/PKG-INFO ?09:23
spredzyif we don't revoke-sudo, so basically leave zuul in the sudoers file09:23
spredzywhat happen with the subsequent run?09:23
tristanCspredzy: if you revoke sudo, then the zuul user of the static node won't be able to sudo09:23
spredzyI said "not revoking" it09:23
spredzyso zuul can always become: True09:24
spredzyuntil we manage file permissions properly09:24
tristanCspredzy: hum, then you need that job: https://review.openstack.org/#/c/593150/09:25
tristanCspredzy: but you shouldn't let zuul sudo on a static node, it should be restricted to only user command, since sudo can easily left persisting stuff...09:26
spredzyAgree, but currently our CI is blocked because of tests currently not passing. What I'd like to get is test passing so CI is unlocked, and then work to make it clean (proper permission, new base-job if necessary, ...)09:27
spredzySo revoke-sudo should be put back by end of week09:27
tristanCspredzy: in https://github.com/ansible/zuul-jobs/blob/master/zuul.d/jobs.yaml#L4, add a run: playbook that only use the tox role09:28
tristanCspredzy: similar to https://review.openstack.org/#/c/593150/1/playbooks/tox-with-sudo/run.yaml09:28
tristanCspredzy: otherwise, the default tox jobs do: https://git.zuul-ci.org/cgit/zuul-jobs/tree/playbooks/tox/run.yaml09:29
tristanCwhich calls revoke-sudo by default09:29
spredzyGot it09:29
tristanCspredzy: you don't want to put back revoke-sudo, you should just remove zuul from sudoers, no need to let the job do that09:29
spredzyDo you have access to the node?09:29
tristanCspredzy: nop09:30
tristanCspredzy: https://github.com/ansible/zuul-config/pull/34 should work09:31
tristanCi added ignore_errors to avoid failures when zuul is no longer sudoer09:32
tristanCwhen zuul is no longer sudoer, then we should manually clean /home/zuul/src from all nodes and revert that pull/34 change09:32
*** nilashishc has quit IRC09:34
spredzytristanC: my concern is that since its a static node, sudo has already been revoked as of now09:35
spredzySo is zuul still in the sudoers for the next run ?09:35
tristanClet's look at the logs, i think revoke-sudo only check a special file09:35
tristanCspredzy: https://ansible.softwarefactory-project.io/logs/66/2266/d68989bfd8f34a572f383d42daaf22aaf037d2aa/check/tox-awx-api/fc1b82f/ara-report/result/be45dc48-dbc8-4f17-b938-e85dc6da66a5/09:36
spredzyhttps://github.com/openstack-infra/zuul-jobs/blob/master/roles/revoke-sudo/tasks/main.yaml#L6-L1109:36
spredzyso currently zuul is not in the sudoers anymore09:37
tristanCspredzy: heh :) then who has access to the node and add zuul back?09:38
tristanCspredzy: in the meantime, you should fix that awx-tox job to not do the revoke-sudo :)09:38
spredzyhttps://github.com/ansible/zuul-jobs/pull/22/files09:39
spredzydone09:39
spredzyI have a PR from awx depending on that one'09:39
spredzyLet me see if I can get to the node09:39
spredzyI am on the nodes09:42
spredzyUpdating sudoers.d09:42
tristanCspredzy: ok, then don't forget about https://github.com/ansible/zuul-config/pull/34 , i'm going afk, bbl09:43
spredzymerged09:44
spredzyack, catch you later09:44
spredzythanks for your help09:44
*** zoli is now known as zoli|lunch10:26
*** zoli|lunch is now known as zoli10:26
*** nilashishc has joined #softwarefactory11:27
*** jangutter has quit IRC11:29
*** jangutter has joined #softwarefactory11:30
*** chandankumar has joined #softwarefactory12:19
pabelangertristanC: thanks, just seen a job run on new executor.  Do you have an ETA when we can start to scale out nodepool-launcher / nodepool-builders? Also, we likely want to start thinking of dedicated zookeeper also12:33
pabelangertristanC: http://paste.openstack.org/show/731761/12:39
pabelangerseems console.html is no longer closing properly12:39
pabelangernow when left over, and job finished, keeps saying --- END OF STREAM ---12:39
pabelangereg12:39
pabelangerhttps://ansible-network.softwarefactory-project.io/zuul/stream/1dd29174f2044ec8834775e586886853?logfile=console.log12:39
sfbenderMerged software-factory/sf-config master: config-update: skip host only running mergers or executors  https://softwarefactory-project.io/r/1327213:15
tristanCpabelanger: would you mind opening a story with suggestion/improvement?13:16
tristanCpabelanger: no eta for more scale out, i don't know if this has even been planed13:16
pabelangerokay, we've going to need a scale out soon. nodepool-launcher doesn't handle multiple providers well13:18
pabelangerI suspect it is already at 100% cpu13:18
tristanCpabelanger: zs, which runs zookeeper, nodepool-launcher and zuul-scheduler is currently at 91% idle: https://softwarefactory-project.io/grafana/?panelId=28239&fullscreen&orgId=1&var-datasource=default&var-server=zs.softwarefactory-project.io&var-inter=$__auto_interval_inter&from=now-24h&to=now13:20
pabelangertristanC: how many cores is the server?13:22
pabelangeriirc nodepool-launcher isn't multi core13:23
tristanCpabelanger: i think the issue is that nodepool tries every providers for each request, even the one that doesn't have the requested label registered13:23
pabelangeryah, that sounds familiar13:23
tristanCpabelanger: isn't nodepool multi-threaded?13:23
pabelangerall i know, the more providers we add to nodepool-launcher, the slower it will eventually get13:23
tristanCpabelanger: which i think can be fixed in nodepool by being more clever to assigning provider to request13:24
tristanCpabelanger: zs has 4 vcpu13:25
pabelangerYup, I think it should also be fixed. but for now, scaling out nodepool-launchers per tenant, could be another option13:25
pabelangerbut understand that means updates for sf13:25
tristanCpabelanger: yes, and a bad user experience as each launcher needs a custom configuration...13:25
pabelangeryah, upstream we have per launch configs13:26
tristanCpabelanger: another neat improvement would be to have per-tenant launcher, then it would make sense to run a nodepool-launcher on ansible-network.sf-project.io13:28
pabelangertristanC: right, I suggested that to nhicher last week. If we could run our own launcher, that would be great. But means it would need access to zookeeper13:29
pabelangersomething I'd like to discuss in Berlin13:30
pabelangeralong with regional zuul-executors, I think rcarrillocruz is going to pick that up again13:30
*** nilashishc has quit IRC14:37
*** zoli is now known as zoli|gone16:04
*** zoli|gone is now known as zoli16:04
pabelangeron the list of things to automate17:11
pabelangerwrong windo17:12
*** sfbender has quit IRC19:56
spredzyis it me or RDO-cloud seems down?19:58
nhicherspredzy, pabelanger: rdocloud outage, sf and review.rdo are not reachable20:04
pabelangernhicher: thanks20:04
pabelangerreally looking forward to day control plane moves out of rdocloud20:04
spredzymerci nico20:06
spredzyHopefull with Ansible (and ideally more team) relying on it that will help getting fund for that to happen20:07
nhicherspredzy, pabelanger rdocloud is back20:57
matburtlooks like our jobs are hung on the ansible tenant :/22:51
*** sshnaidm is now known as sshnaidm|afk23:32
dmsimardI'd look but I don't have access to that one :/23:50

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!