Wednesday, 2018-07-04

tristanCgundalow: on another topic, we'll upgrade ansible.sf-project.io host to the latest version of software-factory soon, (the version 3.1)00:04
tristanCgundalow: in that version, the ansible*/zuul-config project will now host that file https://softwarefactory-project.io/cgit/config/tree/zuul/ansible_networking.yaml00:05
gundalowCool. Is their an email list I should subscribe to for planed upgrades/outages notices?00:05
tristanCgundalow: we don't have such mailling-list yet, there shouldn't be any outages00:06
gundalow:)00:06
tristanCgundalow: though the upgrade will propose an update to the zuul-config project, so there will be a PR to accept to make the new version effective00:06
gundalowcool, will keep an eye out for that. Thanks for the heads up00:07
tristanCit seems like we could do that in a couple of weeks, one month top00:07
sfbenderPaul Belanger created software-factory/sf-config master: Fix grapaha graph for executor memory usage  https://softwarefactory-project.io/r/1286200:39
sfbenderPaul Belanger created software-factory/sf-config master: Add executor HDD usage to zuul-status graph  https://softwarefactory-project.io/r/1286300:46
sfbenderPaul Belanger created software-factory/sf-config master: Add max_servers metric to nodepool test nodes graph  https://softwarefactory-project.io/r/1286401:03
sfbenderMerged www.softwarefactory-project.io master: Add 3.0 release note for new sf-config and acme-tiny version  https://softwarefactory-project.io/r/1282601:06
tristanClogan-: i published a new sf-config and acme-tiny package in the 3.0 release repository. This should fix the bug you reported, thanks for the feedback! ( release note is: http://www.softwarefactory-project.io/releases/3.0/ )01:14
*** caphrim007 has joined #softwarefactory01:21
*** caphrim007 has quit IRC01:26
*** caphrim007 has joined #softwarefactory01:35
*** Guest38444 has quit IRC02:04
*** Guest38444 has joined #softwarefactory02:08
sfbenderTristan de Cacqueray created software-factory/sf-config master: zuul: install missing packages for config-check  https://softwarefactory-project.io/r/1286502:53
*** caphrim007_ has joined #softwarefactory03:01
*** caphrim007 has quit IRC03:04
sfbenderTristan de Cacqueray created software-factory/sf-ci master: Switch back to base job since log-classify is now integrated  https://softwarefactory-project.io/r/1286603:04
logan-awesome tristanC, thanks for the follow up. i'm interested to deploy 3.1 and try out a private config repo. is this the process I should be looking at using zuul_rpm_build.py? https://softwarefactory-project.io/docs/contributor/prepare_dev_environment.html03:32
tristanClogan-: you could give the current 3.1 candidate a try by running this task: https://softwarefactory-project.io/paste/show/1128/03:36
tristanCthen continue the update process as documented here: https://softwarefactory-project.io/docs/operator/upgrade.html03:36
tristanCe.g. yum update sf-config && sfconfig --upgrade03:36
logan-thanks!03:37
tristanCthough note that private config repo has not been tested, so there are still probably some issues with it03:38
tristanCfor example, we need a toggle to restrict the default acl here: https://softwarefactory-project.io/cgit/software-factory/sf-config/tree/ansible/roles/sf-repos/templates/config/resources/_internal.yaml.j2#n5903:39
tristanCand this task also needs to be udpated: https://softwarefactory-project.io/cgit/software-factory/sf-config/tree/ansible/roles/sf-repos/tasks/fetch_config_repo.yml#n503:39
tristanC(the current process is to fetch the config repo on every hosts to apply new config, and this assume public access to the repo)03:39
tristanCso to enable a private config repo, we'll have to setup the access key on every host managed by sfconfig03:40
tristanCor we could change the logic and push the config repo content from the install-server instead of pulling03:40
logan-yeah, similar to how prepare-workspace pushes the repos03:41
tristanCbasically, any task using config_public_location needs to be fixed03:43
tristanClogan-: also, even if we support private config repo (e.g. in gerrit), zuul may still leaks its content, e.g. config-check and config-update job logs will be visible in zuul status page and builds history03:45
logan-good point03:46
tristanClogan-: that can also be parametrized, e.g. if the private config option (TBD) is set, then we could make the task no_log and keep the artifacts locally on the executor03:56
tristanCfeel free to try the 3.1 candidate version though, it still adds many new great features :-)03:57
logan-will do!04:04
sfbenderTristan de Cacqueray created software-factory/sf-config master: nodepool: fix dib cache location  https://softwarefactory-project.io/r/1286806:08
*** nchakrab has joined #softwarefactory06:13
sfbenderTristan de Cacqueray created software-factory/sf-docs master: Add log-classify user documentation  https://softwarefactory-project.io/r/1286906:27
sfbenderTristan de Cacqueray created logreduce master: Fix ARA report directory link to ara-report  https://softwarefactory-project.io/r/1287006:36
sfbenderTristan de Cacqueray created logreduce master: Update zuul-jobs log-classify role  https://softwarefactory-project.io/r/1287106:36
sfbenderMerged logreduce master: Fix ARA report directory link to ara-report  https://softwarefactory-project.io/r/1287006:38
sfbenderMerged logreduce master: Update zuul-jobs log-classify role  https://softwarefactory-project.io/r/1287106:40
sfbenderMerged www.softwarefactory-project.io master: Add sprint 2018-26  https://softwarefactory-project.io/r/1280407:01
*** Guest38444 has quit IRC07:28
*** Guest38444 has joined #softwarefactory07:31
*** jpena|off is now known as jpena08:04
sfbenderMerged software-factory/sf-ci master: Switch back to base job since log-classify is now integrated  https://softwarefactory-project.io/r/1286609:52
*** jpena is now known as jpena|lunch10:59
sfbenderMerged software-factory/sf-config master: nodepool: fix dib cache location  https://softwarefactory-project.io/r/1286811:18
sfbenderMerged software-factory/sf-config master: Fix grapaha graph for executor memory usage  https://softwarefactory-project.io/r/1286211:28
sfbenderMerged software-factory/sf-config master: zuul: install missing packages for config-check  https://softwarefactory-project.io/r/1286511:35
sfbenderMerged software-factory/cauth master: cauth/repoxplorer: Harden in case of repoxplorer or elasticsearch down  https://softwarefactory-project.io/r/1283111:51
*** Guest38444 has quit IRC12:01
sfbenderFabien Boucher created software-factory/managesf master: managesf/configuration/repoxplorer: Fix in case tenant does not have default-connection  https://softwarefactory-project.io/r/1287812:02
*** Guest38444 has joined #softwarefactory12:10
*** jpena|lunch is now known as jpena12:16
sfbenderFabien Boucher created software-factory/managesf master: managesf/configuration: handle the private attribute  https://softwarefactory-project.io/r/1287912:31
rcarrillocruzfolks, any issues with the oci server12:42
rcarrillocruzseeing a lot of node job reschedules12:42
rcarrillocruzjust got a retry limit12:42
tristanCrcarrillocruz: yes, though it doesn't seems related to oci, other jobs are also failing with retry limit with dib nodeset12:44
rcarrillocruzoki12:44
tristanCrcarrillocruz: i have yet found the bottleneck, i'll have a look tomorrow12:46
tristanCwe are migrating rdoproject.org jobs over to sf-project.io zuul, this may be causing scaling issue between zuul and nodepool, or maybe executor are overloaded12:47
tristanCe.g.: https://softwarefactory-project.io/grafana/d/000000001/zuul-status?panelId=43&fullscreen&orgId=1&from=now%2FM&to=now12:47
tristanCpabelanger: that graph seems a bit odd https://softwarefactory-project.io/grafana/d/000000001/zuul-status?panelId=44&fullscreen&orgId=1&from=now%2FM&to=now, shouldn't the executor load be lower?12:49
tristanCthey only have 4cpu each12:50
tristanCpabelanger: symptoms are jobs take a long time to start, and sometime bail out with 'retry_limit'12:53
*** nchakrab_ has joined #softwarefactory12:57
sfbenderFabien Boucher created software-factory/sf-config master: cgit and hound config: take care of the private attribute  https://softwarefactory-project.io/r/1288012:57
*** nchakrab has quit IRC13:00
tristanCpabelanger: zuul.conf currently uses load_multiplier=2.5, i think we could lower this to 2 or even 1.513:06
pabelangertristanC: if you look at executor at https://softwarefactory-project.io/grafana/d/000000001/zuul-status?orgId=1 it will show if they are or are not accepting jobs, if not accepting jobs, no builds will start13:52
pabelangerthat is likely because of governor13:52
pabelangertristanC: starting build graph looks to be good too13:53
pabelangerrcarrillocruz: have a log?13:58
tristanCpabelanger: e.g. PS5 of https://softwarefactory-project.io/r/#/c/12763/13:58
tristanCpabelanger: i wonder if executors may accept job but then fail to start the build. sometime on the status page, console logs just stop with END OF STREAM, e.G.: https://softwarefactory-project.io/zuul/t/rdoproject.org/stream.html?uuid=934eac35fa994e049ed78484318e57fd&logfile=console.log14:00
pabelangertristanC: so, i don't have proof yet, but I think we are seeing poor IO on the zuul-executor, which could be taking too long to do merge operations before running the playbook, if that fails, i believe the job will be rescheduled by scheduler14:01
pabelangertristanC: can you send a copy an executor jobs for the jobs above^14:02
tristanCor that yes14:02
pabelangerwe should be able to see a timeout in logs14:02
pabelangertristanC: do we have SSDs in these compute nodes? I believe we should look to mount /var/lib/zuul with SSD, not ceph to help get better IO14:03
pabelangerlocal disk vs network disk14:03
tristanCpabelanger: indeed WARNING zuul.AnsibleJob: [build: a24ebac1a2744059b7692512e36405d5] Ansible timeout exceeded14:04
pabelangertristanC: was that pre-run?14:05
tristanCpabelanger: here is the log of the retry_limit rcarrillocruz got: https://ansible.softwarefactory-project.io/paste/show/piAZSihmXcQixLtnFb2z/14:09
tristanCpabelanger: and here is a similar failure happening with a dib nodeset: https://softwarefactory-project.io/paste/show/Q5Jb0maoKsnMf2A2wBrQ/14:11
pabelangerHmm, that timeout looks to be short14:13
pabelangerand we don't seem to log it14:13
pabelangertristanC: -9 is abort14:16
pabelangertristanC: so zuul aborted the run for some reason14:16
pabelangertristanC: new patchset?14:16
*** nchakrab_ has quit IRC14:17
pabelangerI don't think it is new hdd sensor, it should only stop jobs from running, not abort them14:17
tristanCpabelanger: iirc those were reported as retry_limit, and there is a warning about about ansible timeout14:20
pabelangertristanC: did zuul-executor get restarted during that time?14:20
pabelangertristanC: scheduler log should give more info to retries too14:21
*** nchakrab has joined #softwarefactory14:21
tristanCpabelanger: scheduler logs for the second build is https://softwarefactory-project.io/paste/show/HmAeuHC0Xb2tRjYxZTkG/14:23
tristanCpabelanger: first build is https://ansible.softwarefactory-project.io/paste/show/gx6lceANyJ8964dwJeoe/14:23
tristanCi got to go now, i'll debug more tomorrow14:24
*** nchakrab has quit IRC14:40
rcarrillocruzfolks, i need to debug a weird issue on vyos_config, within the context of a zuul job run15:53
rcarrillocruzhow can help me out to do an autohold and inject my pubkey15:53
pabelangerfbo: ^15:54
pabelangerrcarrillocruz: sorry, I don't have access myself15:54
rcarrillocruzah, nhicher is not around15:54
rcarrillocruz:/15:54
fborcarrillocruz: yep15:55
pabelangerrcarrillocruz: I think on PTO15:55
rcarrillocruzfbo: https://github.com/ansible-network/cloud-vpn/pull/315:55
rcarrillocruzlet me know when i push a new patchset15:56
rcarrillocruzso the hold is made15:56
rcarrillocruzmy keys: https://github.com/rcarrillocruz.keys15:56
rcarrillocruzor a recheck rather, don't really have anything to change on the PR15:57
fbolooks like I need a job name15:58
rcarrillocruzcloud-vpn-aws-vyos-to-aws-vpn15:58
fborcarrillocruz: ^15:58
fbook15:58
fborcarrillocruz: ok let's recheck your change15:59
rcarrillocruzdone15:59
fborcarrillocruz: the link to your pub key ?16:01
rcarrillocruzany from the link i pasted earlier16:02
rcarrillocruzhttps://github.com/rcarrillocruz.keys16:02
fbothanks16:03
rcarrillocruzwhat's the IP, i don't think that's logged on the job log16:04
rcarrillocruzor wait, i think i can get it on the nodes dashboard16:05
rcarrillocruzbah, not16:06
fborcarrillocruz: zuul@38.145.33.13316:07
rcarrillocruzthx mate16:07
rcarrillocruzwhere is the workspace put these days16:08
rcarrillocruz[zuul@host-10-0-0-11 ~]$ pwd16:08
rcarrillocruz/home/zuul16:08
rcarrillocruz[zuul@host-10-0-0-11 ~]$ ls16:08
rcarrillocruzwait16:09
rcarrillocruzi think you need to put the key on zuul-worker user16:09
fboOh but I was unable to connect with zuul-worker, and that's zuul that was defined in the nodepool config16:10
fbofor that image16:10
rcarrillocruzthing is the workapce (per the job def) is checked out on zuul-worker home folder16:10
rcarrillocruzthis is odd16:11
rcarrillocruz[zuul@host-10-0-0-11 ~]$ cd /home16:11
rcarrillocruz[zuul@host-10-0-0-11 home]$ ls16:11
fborcarrillocruz: you can sudo -i isn't it ?16:11
rcarrillocruzzuul16:11
rcarrillocruzi can sudo, but i don't see the zuul-worker home folder anywhere16:11
fbosame here.16:12
fboIs the image correct ? I mean I did nothing specific, just login in it16:12
rcarrillocruzwell, the image is a f27-oci16:12
rcarrillocruzthat's not mmanaged by me16:12
rcarrillocruzi think you may have give me access to a node that is not part of the job16:13
fboah so that's not the right image then16:13
rcarrillocruzhttps://github.com/ansible-network/zuul-config/blob/master/zuul.d/jobs.yaml16:13
rcarrillocruzwhat i need to get is access to the node that is running the job (still)16:13
rcarrillocruzif you did the autohold, it should not be deleted by nodepool, right after the job ends16:14
rcarrillocruzok16:15
rcarrillocruzso the node is16:15
rcarrillocruz000004673816:15
rcarrillocruzper https://ansible.softwarefactory-project.io/zuul/nodes.html16:15
rcarrillocruzwhat's the IP of that node16:16
rcarrillocruznodepool list should show it16:16
fborcarrillocruz: this is a container not sure I can give you access then16:16
fbofor the autohold there isn't option for specifying the image so the node on hold should be the right one16:16
rcarrillocruzi would assume containers do run a ssh daemon, and they don't have another access mechanism ?16:16
rcarrillocruzi mean, the zuul executor connects to the node16:17
rcarrillocruzi'd be surprised if the container is accessed by zuul executor by connecting to host, then doing something like a primitive docker exec or the likes16:17
rcarrillocruzif you run in nodepool16:18
rcarrillocruznodepool list |grep 000004673816:18
rcarrillocruzwhat does it show16:18
fborcarrillocruz: ok so let's try zuul-worker@38.145.33.8216:19
fbotry port 3499916:20
fbothat's the one specified on nodepool list --detail16:21
rcarrillocruzcan ssh to 22, cannot to 3499916:21
rcarrillocruzotoh, in but there's no checked out project, i assume it got deleted maybe, dunno16:22
rcarrillocruzwill try to recreate from this16:22
fborcarrillocruz: well that's note the way to do it. I removed your from there as this was the main oci node.16:27
rcarrillocruzso that's the host?16:27
rcarrillocruzyou know what16:28
rcarrillocruzi'll change the node type16:28
rcarrillocruzand try to recreate from a real fedora node16:28
rcarrillocruzpabelanger: did you created any fedora dib nodes on our tenant?16:29
rcarrillocruzi was off last week, unsure what you did there16:29
fboI succeed to connect on 34999 and was in the container16:29
rcarrillocruzlet me retry then16:29
rcarrillocruz[ricky@ricky-laptop ~]$ ssh zuul-worker@38.145.33.82 -p 3499916:30
rcarrillocruzReceived disconnect from 38.145.33.82 port 34999:2: Too many authentication failures16:30
rcarrillocruzAuthentication failed.16:30
rcarrillocruzyou may need to put my pubkey within the container zuul-worker auth keys16:30
fbooh ok you had network access to the port, cool16:30
rcarrillocruzso the way it works apparently16:31
rcarrillocruz22 is for the host16:31
rcarrillocruzthen each container16:31
rcarrillocruzit spawns an sshd process on 3499916:31
fboretry16:31
rcarrillocruzjust so zuul-executor can connect to it16:31
rcarrillocruzoci slaves that it16:31
rcarrillocruzi'm in now16:31
rcarrillocruz\o/16:31
rcarrillocruzand the change is checked out there16:31
rcarrillocruzthx fbo, i can debug now16:32
fborcarrillocruz: sorry for the time to figure out how to do that.16:32
rcarrillocruzhey, you solved it  ;-)16:33
*** jpena is now known as jpena|off17:03
*** fbo is now known as fbo|off17:14
sfbenderFabien Boucher created software-factory/managesf master: wip - managesf/resources: add extra validation for the private attribute  https://softwarefactory-project.io/r/1288317:31
*** Guest38444 has quit IRC18:43
*** Guest38444 has joined #softwarefactory18:46
*** caphrim007_ has quit IRC19:00
*** caphrim007 has joined #softwarefactory19:01
sfbenderMerged software-factory/sf-config master: zuul: integrate log-classify post actions  https://softwarefactory-project.io/r/1276320:08
gundalowCreated a new branch `stable-2.5` and I've protected the branch in GitHub, though Zuul doesn't seem to be running: https://github.com/ansible-network/network-engine/pull/107 I can't see anything in the dashboard22:34
*** Guest38444 has quit IRC22:58
*** Guest38444 has joined #softwarefactory23:13

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!