Monday, 2020-05-04

*** dpawlik has joined #softwarefactory06:04
*** dpawlik has quit IRC06:04
*** dpawlik has joined #softwarefactory06:07
*** dpawlik has quit IRC06:07
*** dpawlik has joined #softwarefactory06:08
*** apevec has joined #softwarefactory06:43
*** sshnaidm|off is now known as sshnaidm07:33
*** jpena|off is now known as jpena07:49
*** brendangalloway has joined #softwarefactory11:29
*** jpena is now known as jpena|lunch11:30
*** rfolco has joined #softwarefactory12:04
brendangallowaytristanC: Last week I had a question about the ara-report folder not being being visible.  Is there a config setting I can change to make it visible again?  Our devs don't like the change to their workflow12:13
brendangallowayI'm also noticing an issue where the post-config job does not appear to be updating the nodepool.yaml file correctly.  The playbook runs '/bin/managesf-configuration nodepool --output "/etc/nodepool/nodepool.yaml"  --extra-launcher --hostname runc' which generates a file with an empty providers entry.  If I manually run the utility without the12:15
brendangallowayextra launcher the file appears to be generated correctly12:15
tristanCthe ara-report folder is no longer visible but it is still available. Either by appending `/ara-report/` to the log_url, either by clicking the `Ara Report` link from the build result12:15
tristanCbrendangalloway: that post-config issue rings a bell, let me check12:16
brendangallowayWe found you could type the url back in manually, but having the link in the folder was a lot more convenient when debugging.  If it's not possible we'll live, but it would be preferred if we could restore the previous behaviour somehow12:19
tristanCbrendangalloway: that's unfortunate. This is happening because we switched the ara-api to be a dedicated service so that it could run the python3 version (previously it was running under mod_wsgi in apache, which meant it has to be py2 on centos)12:21
tristanCbrendangalloway: and we couldn't find a way to instruct apache to perform a rewrite rule while keeping the folder available in the generated index12:22
brendangallowayAnd lastly, I'm trying to set up a kubernetes cluster in preparation for runc being deprecated in the next release.  The documentation on what needs to be done is a bit scattered though and I'm struggling to piece together exactly what has to be done.  Does adding the hypervisor-k1s role to arch.yaml set up a kubernetes cluster on the specified12:22
brendangallowaynode, or just install the tools needed for nodepool to talk to the cluster defined the kube_file in sfconfig.yaml?12:22
brendangallowayok, that is unfortunate.  If there is some way to restore the link in the future it would be appreciated12:24
tristanCbrendangalloway: if you can setup a kubernetes and provide the kube_file that would be the best12:24
tristanCbrendangalloway: otherwise, using the k1s component will setup a fake kubernetes endpoint that will work for nodepool/zuul workload, and it will work similarly than k1s, e.g. nodepool will be auto configured and there will be a _managed_k1s provider added to the config repo12:25
tristanCftr, the code is currently available here: https://pagure.io/software-factory/k1s12:25
brendangallowayThanks.  I have deployed a kubeadm cluster on another network in our openstack cluster and copied across the admin config file, updated sfconfig.yaml and run sfconfig --no-install12:27
brendangallowayThe operate nodepool docs refer to the provider defined in _local_hypervisor_k1s.yaml, but that only gets created when using the hypervisor-k1s role?  Do I still need to defined a provider, or can I simply refer to the one defined in the kube_file?  If the latter, how do I do so12:29
tristanCbrendangalloway: well the migration from runc to kubernetes remains to be defined and documented :)12:30
tristanCbrendangalloway: until then, you can setup a custom (not managed by sfconfig) provider, like this one: https://opendev.org/zuul/nodepool/src/branch/master/nodepool/tests/fixtures/kubernetes.yaml#L10-L2412:31
tristanCthe `context` attribute should match a context from the kube_file you provided12:31
*** jpena|lunch is now known as jpena12:31
brendangallowayI'm not planning on migrating the existing runc jobs just yet (I see there's a stream of patches that are required to do so), but do want to see if we can use the kube ourselves before then.12:33
tristanCbrendangalloway: are you using the opendev.org/zuul/zuul-jobs project?12:35
brendangallowayI think so - we're using the zuul-jobs that were provided as part of the software-factory install12:38
tristanCbrendangalloway: ok good, (we are still waiting for some role to be accepted upstream to help with replacing runc with kubectl), and iirc they are provided by the zuul-jobs copy provided by the software-factory install12:40
brendangallowayok, I will not try migrate to kubectl just yet.12:41
brendangallowayOk, thanks for all the help.  I see the k1s driver providers mechanisms to store dockerfile definitions of images in the config repo itself.  Is there a similar mechanism for managing external kubernetes clusters in the CI definitions, or would we have to have any custom containers defined in a repo and just refer to them in the provider12:47
brendangallowaydefinition?12:47
tristanCbrendangalloway: about the post-config job failing to produce a valid nodepool configuration, i can't find a fix. But looking at the code, i think there maybe was an issue with the ansible fact cache mechanism12:47
tristanCbrendangalloway: in particular, the `--extra-launcher` argument is added if `ansible_hostname` is not the `first_launcher` ( from https://softwarefactory-project.io/cgit/software-factory/sf-config/tree/ansible/roles/sf-nodepool/tasks/update.yml#n26 )12:48
brendangallowayand are there any config requirements for dockers to be zuul workers?  I see the centos-7 dockerfile example provided for k1s performs the equivalent of the zuul-worker dib element.  Do we have to do something similar for external pods?12:49
tristanCbrendangalloway: and first-launcher is set to be the `name` of the host in the arch file ( from https://softwarefactory-project.io/cgit/software-factory/sf-config/tree/sfconfig/arch.py#n89 )12:49
tristanCbrendangalloway: thus could you share the output of `grep ^first_launcher /var/lib/software-factory/ansible/group_vars/all.yaml` and `ansible -m setup runc | grep ansible_hostname`12:51
tristanCand perhaps dropping the file in `/var/lib/software-factory/ansible/facts/` would help fix that issue?12:52
brendangallowayI'm guessing this is a problem:12:52
brendangallowayansible -m setup runc | grep ansible_hostname [WARNING]: Could not match supplied host pattern, ignoring: runc [WARNING]: No hosts matched, nothing to do12:52
brendangallowayfirst launcher is 'main'12:53
tristanCbrendangalloway: about managing custom container image, this is currently specifics to k1s. We are working on a generalized solution named `zuul-images-jobs`, but that is a lot of work12:53
tristanCbrendangalloway: i would recommend you manage the image manually at first12:54
tristanCe.g. either publish them to a public registry, or push them directly to the host running the kubernetes cluster12:54
brendangallowaytristanc: ok, and we would need to prep the image with the zuul-worker steps?  We have a private docker repo on site for our containerised components12:55
brendangallowayby dropping the file I should delete the runc file in that folder?12:56
tristanCbrendangalloway: it depends on your job, but if you use the upstream jobs like `tox`, there are some run statement that help with that, in particular: https://softwarefactory-project.io/cgit/software-factory/sf-config/tree/ansible/roles/sf-repos/files/config/containers/centos-7/Dockerfile#n2912:57
tristanCand installing tool like python3-devel, rsync and such12:57
tristanCotherwise any image (with at least linux-util or busybox and python) should work12:58
brendangallowayDon't all worker nodes require a zuul login, zuul-minimum packages and sudo permission for the executor to use them?  We've encountered lots of problem with static nodes not being set up in exactly the manner that zuul expects13:00
brendangallowayand are there public images that have already been configured as zuul workers?13:00
tristanCfor kubernetes it is diffirent as ansible will use the `kubectl` connection plugin, e.g. it runs `kubectl exec` command, thus there is no need for a login or existing user13:00
tristanCbut some zuul-jobs will perform a `revoke-sudo` task, and this will fail if sudo is not configured, as suggested by the two RUN statements from the `centos-7/Dockerfile#n29` above link13:03
brendangallowayok, so the requirements for what are installed will depend on the job13:03
tristanCbrendangalloway: yes. And the `centos-7/Dockerfile` should provide something equivalent of the default runc-centos label13:04
brendangallowayand if we want to implement any jobs that use/inherit the fetch output role, we need to use the new role as in the patches at https://review.opendev.org/#/q/topic:zuul-jobs-with-kubectl ?13:05
tristanCthat is correct, any roles that perform synchronize are going to fail otherwise13:07
brendangallowayspecifically synchronise to the worker?13:08
brendangallowaynot synchronise in general, for example copying the log files across to the elk node13:09
tristanCbrendangalloway: the roles that uses synchronize to fetch artifact file from the worker to the executor, those needs to be adapted to copy the file to the ~/zuul-output directory, and let the base job fetch the file13:09
tristanCbrendangalloway: about the config-update job failure, i guess you were able to fix it manually in /usr/share/sf-config/ansible/roles/sf-nodepool/tasks/update.yml ?13:10
brendangallowayOk - I think the only places we use that at the moment are in the runc containers13:10
tristanCbrendangalloway: iiuc that failure, you have an host in the arch file named `main`, and another one named `runc` ?13:10
brendangallowayI fixed it by running the bin by hand without the --extra launcher and then restarting nodepool13:11
brendangallowayyes, the runc containers are on a separate host to the executor13:11
tristanCbrendangalloway: that may be reverted once the config-update job run again, until we understand the issue and have a fix, you better remove the `extra-launcher` argument from the `sf-nodepool/tasks/update.yml` file13:12
brendangallowayI will do so.  Now that you mention that - when running sfconfig --upgrade I had to edit the timeout command in /usr/share/sf-config/ansible/roles/sf-elasticsearch/tasks/postconf.yml +4213:17
brendangallowaywhat is the correct way to to submit bug reports or similar for issues like that?13:18
tristanCbrendangalloway: on this page: https://tree.taiga.io/project/morucci-software-factory/issues13:19
brendangallowaytristanC: Thank you so much for all the help.  I think that is everything I needed to know13:22
tristanCbrendangalloway: you're welcome, thank you for the feedback!13:29
tristanCbrendangalloway: so we are looking at the `extra-launcher` issue, and it seems like ansible may be using incorrect fact, could you check the `ansible_hostname` value in `/var/lib/software-factory/ansible/facts` and see if they are consistent? If not, i think dropping the file should be enough to prevent that failure, but we'll add some check to at least detect when this is happening13:30
tristanCturns out we just had a similar issue in another deployment, and the `ansible_hostname` from the fact is not the same as the hostname of the host, resulting in that incorrect `extra-launcher` argument being set13:31
brendangallowaywould hostname vs FQDN be a bit enough difference?13:32
brendangalloway*big enough13:32
brendangallowaythe facts file has the hostname as runc, the arch file has it as runc.domain13:33
tristanCwe would like the ansible_hostname to match what is in the arch file, that is just the name, without the domain13:33
brendangallowayso I would need to remove the domains in the arch file?13:34
tristanCbrendangalloway: i guess you are not running the nodepool-launcher on the runc host, thus i suspect the "main" host has a `ansible_hostname` fact that refers to runc13:34
brendangallowayyes nodepool launcher is on main.ci, runc containers are running on runc.ci13:35
tristanCbrendangalloway: changing the runc host name from the arch file shouldn't be required13:35
brendangallowaySo I must change it in the facts file?13:36
tristanCbrendangalloway: in our case, we found that `grep ansible_hostname /var/lib/software-factory/ansible/facts/main-host.org.name` shows `nodepool-builder`, instead of `main-host`13:36
tristanCiirc, removing the fact file should ensure that ansible_hostname is correct, but we are looking for the why, and how to prevent that :)13:38
brendangalloway!!13:41
openstackbrendangalloway: Error: "!" is not a valid command.13:41
brendangalloway[root@main.domain facts]# grep ansible_hostname *builder.domain:    "ansible_hostname": "builder",elk.domain:    "ansible_hostname": "elk",main.domain:    "ansible_hostname": "runc",merger.domain:    "ansible_hostname": "merger",runc.domain:    "ansible_hostname": "runc",13:42
brendangallowayso the hostname being runc in the main file is the issue?13:42
tristanCbrendangalloway: yes13:42
tristanCthat confused the nodepool.yaml generation logic, resulting in an empty provider list13:43
brendangallowayany idea how that file ended up being wrong?13:43
brendangallowayI assume I should change that and revert the change to the upgrade task?13:43
tristanCthat's the issue we are trying to understand13:44
tristanConce this is fixed, the upgrade task bandaid can be reverted13:44
brendangallowayIs there other information I can provide that would help you understand the issue?13:45
tristanCthat's ok thank you, we are debugging an affected setup13:45
brendangallowayok, let me know if I can assist13:46
sfbenderDaniel Pawlik created software-factory/managesf master: DNM - Added external-project parameter to compute repo by hound  https://softwarefactory-project.io/r/1820414:56
sfbenderTristan de Cacqueray created software-factory/sf-config master: sfconfig: add an update facts task  https://softwarefactory-project.io/r/1820514:56
tristanCbrendangalloway: https://softwarefactory-project.io/r/18205 seems to be a solution for the invalid fact ansible_hostname14:57
*** dpawlik has quit IRC15:59
*** jpena is now known as jpena|off17:04
*** sshnaidm is now known as sshnaidm|afk18:07
sfbenderMerged www.softwarefactory-project.io master: Add previous sprints summaries  https://softwarefactory-project.io/r/1806918:10
*** brendangalloway has quit IRC19:20
*** rfolco has quit IRC21:27
*** rfolco has joined #softwarefactory22:03

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!