Friday, 2019-10-04

openstackgerritPaul Belanger proposed zuul/zuul master: Remove support for ansible 2.5  https://review.opendev.org/65043100:01
openstackgerritPaul Belanger proposed zuul/zuul master: Switch ansible_default to 2.8  https://review.opendev.org/67669500:01
openstackgerritPaul Belanger proposed zuul/zuul master: WIP: Support Ansible 2.9  https://review.opendev.org/67485400:01
openstackgerritTristan Cacqueray proposed zuul/zuul-registry master: Add OCI containers/storage support  https://review.opendev.org/68651200:05
tristanCcorvus: the zuul-registry api is quite docker centric, though i was able to make it work with skopeo and a local containers/image vfs store with ^00:06
tristanC(in case we want to use the zuul-registry without docker)00:06
tristanCi meant the Storage layer assume docker paths, thus the OCI drivers needs to decode the path component back to get the repo, tag, blob info00:16
openstackgerritIan Wienand proposed zuul/nodepool master: [wip] validate diskimages have top-level labels  https://review.opendev.org/68651400:39
mnaserabout ianw patch there, why do we even have top level labels in the nodepool config file?00:43
ianwumm, i guess that's where you set min-ready?00:44
mnaserianw: i guess maybe then we can assume a default value of min-ready and not hard fail if a label is missing there?00:45
ianwso basically auto-populate it?00:49
mnaserianw: kinda, i guess00:50
ianwi think that list of labels has to be there (e.g. http://zuul.openstack.org/labels) ... but possibly in the config loader it could see the missing label and auto-add it.  but then it might get confusing where values are coming from00:51
openstackgerritMohammed Naser proposed zuul/nodepool master: Added failing configuration check  https://review.opendev.org/68651501:00
*** CrayZee has joined #zuul01:10
*** jangutter_ has joined #zuul01:10
*** panda|off has quit IRC01:12
*** panda has joined #zuul01:12
*** jangutter has quit IRC01:12
*** shachar has quit IRC01:13
ianwpabelanger: i think the auto detection will help with the selinux bindings01:16
ianwalso note this fix for the streamer maybe related? -> https://review.opendev.org/#/c/682556/01:18
ianwmnaser: did you want me to rebase on that, or are you still working on something better?01:18
*** rfolco has quit IRC01:19
ianwbut also note the auto support i'll re-propose after next zuul release : https://review.opendev.org/#/c/682797/101:20
*** rfolco has joined #zuul01:27
*** shanemcd has quit IRC01:29
*** bstinson has quit IRC01:30
*** wznoinsk has quit IRC01:31
*** shanemcd has joined #zuul01:31
*** bstinson has joined #zuul01:40
mnaserianw: late here and I'm out already, so feel free to squash or whatever. I worked on a test first :p01:41
ianwok, well i'll have some lunch and think about it ;)  ttyl01:47
corvustristanC: i'm a little confused by that change.  it's true that zuul-registry should, eventually, not require docker (though i've only tested it with docker so far).  but it's meant to implement the registry protocol, so it should be able to shadow dockerhub, quay.io, gcr.io, etc eventually.  as long as it implements the protocol, that should work -- i don't understand why a special storage backend would be01:53
corvusneeded01:53
corvustristanC: there's nothing docker-specific about the backend storage.  it's actually not compatible with the docker registry.  it's a private backend storage format meant only for this service.  it's not meant to share the backend storage with anything else.01:56
*** jamesmcarthur has joined #zuul02:13
*** rfolco has quit IRC02:45
tristanCcorvus: couldn't the job building the image without docker start the zuul-registry to let the rest of the buildset pull the eventual built content?02:46
openstackgerritTristan Cacqueray proposed zuul/zuul-registry master: Add OCI containers/storage support  https://review.opendev.org/68651202:52
*** bhavikdbavishi has joined #zuul03:29
openstackgerritIan Wienand proposed zuul/nodepool master: Validate openstack provider pool labels have top-level labels  https://review.opendev.org/68651403:30
*** bhavikdbavishi1 has joined #zuul03:32
*** bhavikdbavishi has quit IRC03:33
*** bhavikdbavishi1 is now known as bhavikdbavishi03:33
*** jamesmcarthur has quit IRC03:34
*** jamesmcarthur has joined #zuul03:35
*** jamesmcarthur has quit IRC03:39
openstackgerritIan Wienand proposed zuul/nodepool master: Validate openstack provider pool labels have top-level labels  https://review.opendev.org/68651403:53
openstackgerritIan Wienand proposed zuul/nodepool master: Validate openstack provider pool labels have top-level labels  https://review.opendev.org/68651404:00
*** jamesmcarthur has joined #zuul04:05
*** jamesmcarthur has quit IRC04:12
*** gouthamr has quit IRC04:15
*** gouthamr has joined #zuul04:16
*** jamesmcarthur has joined #zuul05:08
*** jamesmcarthur has quit IRC05:13
*** raukadah is now known as chandankumar05:13
*** rlandy|bbl is now known as rlandy05:43
*** jamesmcarthur has joined #zuul06:09
*** jlk has joined #zuul06:12
*** jamesmcarthur has quit IRC06:15
*** spsurya has joined #zuul06:33
*** jamesmcarthur has joined #zuul07:11
*** pcaruana has joined #zuul07:15
*** jamesmcarthur has quit IRC07:16
*** tosky has joined #zuul07:18
*** hashar has joined #zuul07:26
*** jpena|off is now known as jpena07:38
openstackgerritMatthieu Huin proposed zuul/zuul master: Add OpenAPI description for enqueue, dequeue, autohold  https://review.opendev.org/67425707:53
*** hashar_ has joined #zuul07:58
*** hashar has quit IRC08:00
*** jamesmcarthur has joined #zuul08:13
*** mhu has joined #zuul08:17
*** jamesmcarthur has quit IRC08:18
mhumorning, I see my change to add sphinx-contrib/openapi to the projects under tenant zuul was merged (https://opendev.org/openstack/project-config/src/branch/master/zuul/main.yaml#L1685) but the project doesn't appear in http://zuul.openstack.org/projects08:19
mhudoes zuul need a restart/reconfig?08:19
openstackgerritFabien Boucher proposed zuul/zuul master: Pagure - Support for branch creation/deletion  https://review.opendev.org/68511608:20
*** zbr is now known as zbr|ruck08:23
*** hashar_ has quit IRC09:02
*** hashar has joined #zuul09:02
fricklermhu: that's the url for the openstack tenant, check https://zuul.opendev.org/t/zuul/projects09:11
mhufrickler, ah thanks09:12
*** jamesmcarthur has joined #zuul09:14
*** jamesmcarthur has quit IRC09:19
openstackgerritMatthieu Huin proposed zuul/zuul master: Zuul Web: add /api/user/authorizations endpoint  https://review.opendev.org/64109909:23
*** rfolco has joined #zuul09:23
*** panda is now known as panda|bbl09:25
*** bhavikdbavishi has quit IRC09:40
*** hashar_ has joined #zuul09:50
*** hashar has quit IRC09:53
openstackgerritFabien Boucher proposed zuul/zuul master: Gitlab - Basic handling of merge_requests event  https://review.opendev.org/68599010:00
*** hashar_ has quit IRC10:02
*** mugsie has quit IRC10:03
*** mugsie has joined #zuul10:05
*** jamesmcarthur has joined #zuul10:15
*** jamesmcarthur has quit IRC10:19
*** saneax has joined #zuul10:23
*** badboy has joined #zuul10:26
*** saneax has quit IRC10:32
*** panda|bbl is now known as panda10:39
*** jamesmcarthur has joined #zuul10:47
*** jamesmcarthur has quit IRC10:51
mordredtristanC: https://review.opendev.org/#/c/686542/10:57
*** jpena is now known as jpena|lunch11:00
jangutter_we've got a local hack on our openstack nodepool driver to bypass cleanupLeakedPorts and cleanupLeakedInstances. (long and convoluted story _why_)...11:07
*** jangutter_ is now known as jangutter11:07
jangutterI'd like to fix this upstream: there are two options. One is just to make it a config option that you can flip to turn off the leak cleanup...11:08
jangutterThe second is to expose how long it you wait till you decide the port/instance has leaked (in seconds).11:08
jangutterany preference? I seem to remember there was some wish for the second one to move out the hardcoded values into config...11:09
openstackgerritMonty Taylor proposed zuul/zuul-registry master: WIP Consume typing from openstacksdk and keystoneauth  https://review.opendev.org/68640311:12
*** zbr|ruck is now known as zbr|lunch11:15
*** donnyd has joined #zuul11:18
*** panda is now known as panda|eat11:43
*** jamesmcarthur has joined #zuul11:48
*** hashar has joined #zuul11:52
*** EmilienM has quit IRC11:53
*** jamesmcarthur has quit IRC11:53
*** EmilienM has joined #zuul11:54
*** jpena|lunch is now known as jpena12:00
*** badboy has quit IRC12:07
*** spsurya has quit IRC12:10
tristanCmordred: nice! Also I wasn't able to make it work speculatively in tox and used this command to test it instead: MYPYPATH=/home/fedora/openstacksdk/ ./.tox/pep8/bin/mypy --strict zuul_registry/12:11
pabelangerianw: in our case, auto detection won't work, we want to force python38 for testing reasons. auto will just find any python12:12
pabelangerianw: cool, will see when next zuul release is for zuul_console is. However, I think I also see some warnings from ansible about callback plugin, maybe not python3 safe for other things12:13
*** jamesmcarthur has joined #zuul12:15
tristanCjangutter: both options could be implemented though, a provider cleanup toggle, and perhaps a cleanup-min-age ttl value12:16
pabelangerjangutter: at first guess, clean-floating-ips is a setting so could see other being one too. But also likey want to understand why in case some other issue12:17
jangutterpabelanger: it's because of ironic, ironically :-p12:18
mordredtristanC: it worked speculatively for me - or, at least - I did ".tox/pep8/bin/pip install -e ../../openstack/openstacksdk" and it worked12:19
jangutterpabelanger: if you don't enable the ironic neutron driver, the ports of baremetal nodes are _always_ marked down. So they helpfully get cleaned up even though the instance is still using them.12:19
mordredtristanC: if we wanted to see it in the gate, we'd need to make a job (or a version of the job) with ksa and openstacksdk in the required-projects list12:19
mordredjangutter: oh that's lovely12:20
mordredjangutter: I can see why that would be displeasing :)12:20
jangutterpabelanger: it's possible for me to clean up all that - but it's going to take me writing an HA tripleo agent for ironic agent.... or swap two lines in the nodepool driver :-p12:20
janguttermorded: it's kinda well known and there are ways around it, but it's also only a mild annoyance, except in this case.12:21
jangutterpabelanger: piggybacking off of clean-floating-ips is exactly what I did :0-p12:22
jangutterpabelanger: apologies for the really weird :-p typO.12:22
*** rfolco has quit IRC12:24
*** rfolco has joined #zuul12:25
pabelangerjangutter: maybe propose your patch, with extra info in commit and see what others thing. But in general, I wouldn't want to disable clean up features in nodepool, as it usually does a good job at keeping tenant / project resources free. I am sure others will likely reply too12:25
jangutterpabelanger: yep, cleanup is pretty useful - I'd actually like to keep the default behaviour the same with the option to tweak it added.12:28
fungijangutter: for opendev's deployment we ran into nodepool not waiting long enough in one of our providers so was deleting the new ports because it took too long for server instances to get far enough along to be attached to them. after some reevaluation of the port cleanup routine we upped the wait time from 3 minutes to 10 minutes. that went into effect in nodepool 3.8.0. to what vintage of nodepool are you12:30
fungirunning?12:30
jangutterfungi: it's the software-factory one - 3.8.0. Haven't checked to see lately if it checks if the ports are assigned to an instance rather than just down.12:32
jangutterfungi: they never go to UP, for the entire lifetime of the baremetal instance managed by ironic.12:33
*** rlandy has joined #zuul12:34
*** nhicher has quit IRC12:35
*** nhicher has joined #zuul12:36
jangutterfungi: ah, yeah, logic's still the same in tip. If the ports are DOWN they are eligible for cleanup. https://opendev.org/zuul/nodepool/src/branch/master/nodepool/driver/openstack/provider.py#L54712:36
*** panda|eat is now known as panda12:37
jangutterplease forgive noobie question: is there a way to crank up the verbosity of the ansible log for the zuul executor?12:41
fungijangutter: got it, so the ports need to exist but will never be reported as "up" by the api?12:41
jangutterfungi: yeah, if you add the ironic neutron agent, or use the ironic neutron mechanism driver with supported HW, they can go 'up'12:42
fungiand i take it there's some reason not to do that12:42
tristanCjangutter: https://softwarefactory-project.io/docs/operator/zuul_operator.html#troubleshooting-the-executor12:42
pabelangerhttps://zuul-ci.org/docs/zuul/admin/components.html#id11 too12:43
janguttertristanC: I tried that... ps -a says ansible-playbook is running with -vvv, but either I can't find the logs or somehow they get sanitized.12:44
pabelangerso, tried python38 on fedora-30 last night for ansible. Ran into issues with libselinux python bindings missing, guess fedora-32 will have them. Time to try out python38 PPA for bionic12:44
tristanCjangutter: oh right, you need to set level: DEBUG for the zuul logger in /etc/zuul/executor-logging.yaml12:44
jangutterfungi: yeah the two options to manage the ports are either I have to build a tripleo HA service for the ironic neutron agent, or we have to get actual supporting switches.12:45
jangutterthanks tristanC, gonna try that now.12:45
*** jamesmcarthur has quit IRC12:47
*** jamesmcarthur has joined #zuul12:47
*** rlandy is now known as rlandy|mtg13:02
mnaserhmm, zuul k8s with in-cluster config seems to not working properly13:24
mnaseri think it has to dowith two facts, k8s passes a few env variables (`KUBERNETES_SERVICE_HOST`, `KUBERNETES_SERVICE_PORT`)13:24
mnaserand also the `/var/run/secrets/kubernetes.io/serviceaccount` needs to be exposed inside bwrap13:25
mnaseri can imagine how the latter can actually be a security concern tho..13:25
mnaserim not sure what can be a good path for that13:25
*** zbr|lunch is now known as zbr|ruck13:25
tristanCmnaser: fwiw nodepool k8s driver creates a service account, pass the token along with the node object and zuul-executor should setup a standalone (per-build) kubeconfig file13:37
mnasertristanC: oh, that's not what ended up happening when i tried it the other day13:37
tristanCmnaser: hum, what happened then? :)13:38
mnaseri pretty much got a "cant find a config, au revoir"13:38
mnaserbut i didn't dig more into it13:38
mnaserbut now im seeing that function13:38
tristanCmnaser: could you share the logs of that behavior?13:40
mnaseri wonder if we're not pointing to the kubeconfig file?13:40
mnaseryeah, ill have to dig them again and rebuild things to use k8s13:40
tristanCmnaser: could it be that kubectl isn't installed in the zuul-executor image?13:40
mnaserit was the upstream one13:41
mnasermaybe that was it13:41
mnaserno but actually i remember cp-ing kubectl into the container with an initcontainer (as a hack)13:41
mnasertristanC: shouldn't we set `kubectl_kubeconfig` ?13:41
mnaserwe''re preparing the file in work_root/.kube/config13:42
mnaserbut im not actually seeing us point to the config file we generate13:43
tristanCmnaser: the work_root should be the home of the zuul user inside the bwrap13:44
mnaserah ok, so that implies things should work then13:44
mnaserill dig a bit more13:44
tristanCmnaser: i can confirm this work, in some situation at least, here is a job running tox job with a kubectl connection: https://review.opendev.org/68204913:45
ShrewstristanC: where did we land on the openshift job instability last night? Is https://review.opendev.org/686474 the proposed fix?14:24
*** bhavikdbavishi has joined #zuul14:24
tristanCShrews: i'm not familiar enough with glean to tell if this is going to fix the issue.14:25
pabelangerI believe the issue was slow nested virt, at least what ianw was looking into14:26
*** bhavikdbavishi1 has joined #zuul14:27
Shrewspabelanger: we have MANY issues, i believe. that being one of them14:27
Shrewsi'm hoping we can all put out heads together and wipe them out to get np tests back to stable14:28
fungipabelanger: current theory there is that because it was a long stack of changes which got rebased, all the jobs those queued resulted in suboptimal performance for booting some of the cirros vms14:28
*** bhavikdbavishi has quit IRC14:28
*** bhavikdbavishi1 is now known as bhavikdbavishi14:28
pabelangerAh, regarding glean, reading source TYPE does seem option, so unsure if that will have an impact to fix. We should also confirm with centos, I looked at latest fedora14:28
tristanCShrews: well, it's not related to the openshift job, it's just that fedora instance on fortnebula doesn't have ipv4 address setup in the guest14:28
ShrewstristanC: has anyone spoken with donnyd about this yet?14:29
fungitristanC: right, in that case it sounds like another instance of the kernel/networkmanager race on v6 autoconfiguration14:29
donnydhi14:29
tristanCShrews: well we could, but the cloud metadata are correct and centos do get the ipv4, thus it seems like a dib/glean issue that only affect fedora image14:30
donnydwhich fedora14:30
donnydis it all fedoras or just a particular version?14:30
tristanCdonnyd: https://zuul.opendev.org/t/zuul/build/dc0765e1b481464e8d9823f3d5f7e4f0/log/zuul-info/zuul-info.launcher.txt#2614:30
fungibasically if the kernel receives and acts on a v6 route advertisement before glean fires networkmanager, then nm will see there's already a v6 address assigned and will refuse to touch the interface, even to perform v4 address assignment14:30
tristanCdonnyd: should show https://zuul.opendev.org/t/zuul/build/dc0765e1b481464e8d9823f3d5f7e4f0/log/zuul-info/inventory.yaml#3214:31
*** rlandy|mtg is now known as rlandy14:32
fungiwe've already done some tweaking to try and eliminate that race condition in opendev's images, but it's a very-long-standing nm bug which as far as i know has never been resolved14:32
donnydwell I upped the RA interval a while back to fix an issue with the def route being cleared and RA's not coming in fast enough to not fail the job14:32
ShrewstristanC: fungi: can we use an image other than fedora? would that help?14:33
Shrewsi'm not sure why one is fedora and one is centos, tbh14:33
fungii known basically nothing about openshift so i can't begin to guess14:33
fungis/known/know/14:33
pabelangerwe should try to collect glean log for that job, it should be in /tmp14:34
pabelangerthat will give some more info what is happening14:34
fungii was assuming it was that it needed a newer kernel or some particular userspace tools available only on f29+14:34
donnydI can fire some test instances and report back if that is faster14:34
tristanCShrews: fungi: i've only used openshift on centos, it should work on fedora, but with all the docker issue i don't know how it well14:34
pabelangerwe've seen cases where glean doesn't run at the right time, and interfaces don't setup properly in the past14:35
tristanCShrews: fungi: we are using fedora for the nodepool service which needs python3, perhaps we could switch to ubuntu but i don't know how to install k8s client there...14:35
fungipabelanger: yeah, we should also collect the kernel log so we see messages from it about slaac autoconfiguration, to be able to compare timestamps and see if it's the same nm race as we've already been struggling with elsewhere14:36
pabelanger++14:36
Shrewsfungi: yeah, my knowledge is zero as well14:37
donnydalso does this only effect fedora2914:38
donnydwhat about 28 or 3014:38
clarkbShrews: fungi jangutter maybe we should put port cleanup behind aflag like floating ip cleanup14:38
Shrewsdonnyd: i don't think we know the answer to that yet14:39
pabelangerIMO, we should at least bump to fedora-30, given it is the latest and we ually remove older versions from nodepool.14:39
clarkband re thr NM issue ya the debian bug is like 4 years old14:39
openstackgerritFabien Boucher proposed zuul/zuul master: Gitlab - Basic handling of merge_requests event  https://review.opendev.org/68599014:39
pabelangerhowever, fedora also have changes to NM recently for DIB, so maybe we are dealing with that14:39
Shrewsdonnyd: i think all we know is that if fails consistently in FN with the current setup. other providers seem to be ok14:40
fungihttps://bugs.debian.org/755202 suggests setting net.ipv6.conf.*.{autoconf,accept_ra}=0 as a workaround, so that the kernel won't do autoconf and then configure nm to do that instead, but i have no idea if we've tried it in opendev yet14:40
openstackDebian bug 755202 in network-manager "network-manager: keeps creating and using new connection "eth0" that does not work" [Important,Open]14:40
Shrewsso if it was all a glean issue, seems like it would hit across other providers14:40
openstackgerritTristan Cacqueray proposed zuul/nodepool master: Switch to fedora-30 for the openshift integration job  https://review.opendev.org/68673714:40
donnydwell they don't use v6 as the primary means of communication14:40
Shrewsis FN our only v6 provider?14:41
donnydI think limestone may be using v6 as well14:41
pabelangerlimestone is14:41
pabelangerIIRC14:41
donnydcan you get this job to run there too14:41
jangutterclarkb: yep, that's my idea - got side-tracked by some other shiny things, but should be sending some reviews next week.14:41
funginope, but not all providers use the same v6 config distribution, advertisement timings, have the same performance profiles, et cetera14:41
fungiand the bug i linked is a race condition14:42
donnydfungi: we hit this in the beginning if I am not mistaken14:42
pabelangerwe could also setup autohold for job, and inspect the node14:42
fungidonnyd: yeah, but not on the same distros14:42
clarkbfungi: I tested that and it did not work14:42
pabelangerbut, logs should also expose the info too14:42
clarkbI believe ianw subsequently testedit tok14:42
clarkbthis will be the third time we debug this issue14:43
clarkbquite a bit is known about it14:43
fungiclarkb: any idea if we tried setting that on the per-interface sysctls or only on the all sysctl (which only propagates on interface creation)?14:43
pabelangerclarkb: ack, I've missed that will defer to you :)14:44
clarkbfungi: I think I set it using the all sysctl when testing and not on the per interface sysctl14:44
clarkbit is easy enough to test if you manually boot nodes in fn14:44
fungithat may not have propagated to the actual interfaces then14:44
clarkbyou get no ipv6 with that updated sysctl at all14:44
clarkbconfirmed via ssh from second nodeoverprivate ipv414:45
donnydhttp://paste.openstack.org/show/781087/14:45
fungiclarkb: ahh, but did we configure networkmanager to do v6 autoconf?14:45
donnydthat is the log from the test instance14:45
clarkbfungi: I dont recall if glean is configuring it to explicity configure ipv6 via NM14:46
donnydwhere should I look folr the glean log14:46
fungiif i'm reading the workaround there correctly, it's to tell the kernel not to do v6 autoconf and then use nm to do v6 autoconf instead14:46
clarkbdonnyd: glean logs to syslog14:46
donnydhttp://paste.openstack.org/show/781088/14:46
pabelanger[    9.033926] glean.sh[634]: DEBUG:glean:Writing output file : /etc/sysconfig/network-scripts/ifcfg-ens314:46
pabelangerthat looks right14:46
pabelangerbut we need to see the content14:46
clarkbfungi you can grab the ifcfg file ^ and check14:47
donnydmy manually fired instance works correctly14:47
donnydhttp://paste.openstack.org/show/781089/14:47
donnydclarkb: ^^^^14:47
fungidonnyd: since it's a race condition, i don't know whether we should expect it to fail consistently14:47
Shrewsfungi: i don't think we've seen *any* successful runs on FN, so not too racey ?14:48
clarkbI love that NM_CONTROLLER is completly useless as a flag14:48
pabelangerodd, TYPE=ethernet is set there14:49
fungiShrews: did they all fail on the missing f29 ipv4 address bug though?14:49
pabelangerbut, looks as I would expect14:49
fungican't rule out the possibility there's more than one problem causing those particular job failures14:49
clarkbIPV6_AUTOCONF defaults to true if IPV6FORWARDING is set to false14:52
donnydlogstash ```node_provider:"fortnebula-regionone" AND filename:"job-output.txt" AND message:"Upload logs to swift" AND build_node:"fedora-29"```14:52
clarkbbut the forwardinf flag doesnt seem to have a documented default14:52
donnydshows only one success in the last day14:52
Shrewsfungi: i'm not sure. tristanC was the one that pointed out the FN connection via https://zuul.opendev.org/t/zuul/builds?job_name=nodepool-functional-openshift&result=FAILURE14:53
donnydhttps://75254dc8ba7993915a2c-0839d54e3d76bec33f8faada9b13d241.ssl.cf5.rackcdn.com/686515/1/check/nodepool-functional-openshift/02f127f/job-output.txt14:53
donnydfor example this one was successful14:53
fungiyeah, i think we need to update the job to collect kernel, nm and glean logs and then analyze the ones from a failed build there14:54
fungiand configs for them as well i guess14:55
donnydfungi: this seems pretty similar to the issues we hit when FN was being brought online14:55
fungiyep14:55
donnydwhere it worked... ish... when it wanted to14:56
donnydI do however apologize for all the hassles FN has created for everyone though14:56
fungiagain though, tuning timing parameters to address a race condition is just a workaround, and usually only reduces their frequency doesn't eliminate it14:56
fungidonnyd: i don't think it's creating hassles, just helping us find bugs14:57
*** chandankumar is now known as raukadah14:57
donnydthis one here is limestone14:59
donnydhttps://zuul.opendev.org/t/zuul/build/7572c0cc839b4b49b70777b42482171914:59
fungianyway, it would be excellent if folks with a personal investment in using rh distros for those jobs would help work out what the solution is... all this started cropping up when we switched glean to use networkmanager, in service of "better" rh distro network configuration14:59
jangutterin some other facepalm-newbie news, we think we might have figured out why zuul isn't reconnecting to the newly reformatted baremetal node in our jobs.14:59
donnydso could be ipv6 related14:59
donnydoh but that is a different failure15:00
openstackgerritJames E. Blair proposed zuul/zuul-registry master: Fix merge error in streaming support  https://review.opendev.org/68650515:00
jangutterzuul seems to generate a per-job key and revokes the original ssh key from it's inventory. Unfortunately we didn't back up and restore the .ssh keys between the reformat, so our bet is that zuul has no way of reconnecting.15:00
fungidonnyd: yeah, as i said, we shouldn't rule out the possibility that there's more than one way those jobs are failing15:00
clarkbjangutter: ya that is done to avoid cross talk between jobs15:00
jangutterclarkb: just saving and restoring $ZUUL_HOME/.ssh should do the trick?15:01
clarkbjangutter: likely yes15:01
corvusjangutter: yeah that's probably best, but also that is all done by ansible in roles in the base job, so if you need to change something about that, it's "just" job changes, not zuul code changes.15:02
corvusjangutter: (for instance, depending on when you do your formatting, you might be able to just re-order it so it happens after the key change)15:02
janguttercorvus: yep, I saw that, but I'm also a mite hesitant about effectively not dropping privileges, especially since EVERYTHING has the same .ssh key.15:03
clarkbfungi: and ya we weretold NM is the way of the future on red hat distros so ianw started work early to get ahead of that switvh15:03
pabelangerneat! ansible seems to work under python38 :D15:03
clarkbif anyone at red hat knows how ti make ipv6 and ipv4 worl reliably with NM that input would be greatly appreciated15:04
fungipabelanger: 3.8.0rc1?15:04
clarkbbest I could come up with is "its broken and racey but increasing certain tineouts helps avoid it)15:04
* fungi apparently needs to update his 3.8.0b3 build to 3.8.0rc1 now that it's a thing15:05
donnydwell it would seem centos7 doesn't hit this issue15:05
clarkbdonnyd: its all d own to startup timing15:06
donnydso its not all RH distros, just more recent versions of NM right?15:06
clarkbcentos has a different kernel and services that start on boot15:06
donnydtrue15:06
clarkbif the timing works out such that NM starts before the kernel processes RAs it is fine15:06
clarkband we influence that by setting a delay for the kernel to begin procesaing RAs15:07
mordredfungi: WOAH: https://docs.python.org/3.9/whatsnew/3.8.html#assignment-expressions15:07
mordredfungi: I don't know what language this is anymore :)15:07
donnydcan't we just force NM to overwrite even if it does have a current RA15:07
clarkbdonnyd: I was unable to determine how to do that but possibly15:08
janguttermordred: assignment expressions is why Guido left as BDFL.15:08
*** arxcruz|ruck is now known as arxcruz|rover15:08
pabelangerfungi: yah! pulled it from https://launchpad.net/~deadsnakes/+archive/ubuntu/ppa so I don't have to fight selinux15:08
mordredjangutter: mind blown15:08
fungipabelanger: oh, i just build from source, but i don't do anything with selinux on debian so it's never been a problem for me15:09
*** jamesmcarthur has quit IRC15:09
donnydIs there anything else I can help to T/S???15:09
fungijangutter: it brings back memories of c15:09
clarkbdonnyd: I dont think so its aknown issue and needs someone that wants working NM on redhat platforms to dig in15:10
clarkbto figure out how to get NM to ignore the kernels possible config for an interface or some other solution15:10
pabelangerfungi: yah, fedora-30 ships with python3.8 beta1 (I think) everything worked in ansible, except things that needed libselinux. So moved to bionic, it just worked15:11
openstackgerritMerged zuul/zuul-registry master: Fix merge error in streaming support  https://review.opendev.org/68650515:11
tristanCclarkb: i'm not sure to understand, how ipv6 stuff would prevent the ipv4 address to be assigned?15:11
*** jamesmcarthur has joined #zuul15:11
pabelangerthat is good enough for me, until fedora-32 comes out15:12
donnydmaybe a sweet udev rule that will flush the current kernel table when it detects a config drive is attached would work15:12
clarkbtristanC: because NM will not configure an interface if it is already configured by something else15:12
clarkbtristanC: see fungi's debian bug link above15:12
jangutterfungi: I recently watched some C++20 stuff, I'm thinking the half-life of the "idiomatic form" of any language is now probably 2 years or so.15:13
tristanCclarkb: thanks, i'll have a look15:13
donnydtristanC: I think udev could easily solve this issue. see https://github.com/clearlinux/micro-config-drive/pull/43/files15:15
donnydI think with a rule that RUNS some sort of flush on the kernel RA's would in turn allow NM to function correctly in the event a config drive is connected to a system15:16
donnydbut it would only do it for things that are using config drive15:17
donnydjust a thought on how to fix it15:18
*** donnyd is now known as donnyd_afk15:19
fungiclarkb: tristanC: no idea if it will help but i'm spending some time looking through the networkmanager upstream defect tracker in hopes of finding a similar report about the problem there15:19
fungisince the networkmanager package maintainer in debian seems to have been unable to reproduce the problem (possibly due to lack of trying?) it doesn't appear anything was done to forward the report to the gnome community's tracker (but also it looks like they switched to gitlab issues a year ago, so maybe they just didn't import old reports?)15:21
*** mattw4 has joined #zuul15:22
tristanCfwiw, on my fedora the ifcfg file contains IPV6INIT=yes IPV6_AUTOCONF=yes IPV6_DEFROUTE=yes and IPV6_FAILURE_FATAL=no which doesn't seem to be set by glean15:27
pabelangerglean for redhat doesn't support ipv6, somebody needs to still add support15:29
clarkbright it relies on the kernel accepting RAs to make that function. Which probably extra complicates things with this NM behavior15:31
clarkbmy understanding is this isn't an issue with NM not accepting RAs but instead the problem is NM will ignore any interface it believes is configured by some other entity15:32
clarkbthat said we can edit glean to update the config and see if it helps15:32
tristanCalso, cloud-init does set NETWORKING_IPV6 and IPV6_AUTOCONF in /etc/sysconfig/network for redhat system15:33
tristanChere is how a cloud-init based system network sysconfig file looks like: https://review.opendev.org/68674915:38
tristanCperhaps this tricks NetworkManager into managing interface, even if it already has ip addresses?15:39
mordredtristanC: none of those settings look like *bad* ideas15:39
clarkbtristanC: left a comment on that15:42
*** saneax has joined #zuul15:47
tristanCclarkb: hum, but shouldn't we remove that 'if ipv6: continue' condition here: https://opendev.org/opendev/glean/src/branch/master/glean/cmd.py#L270 ?15:48
mnaser"just use ipv6, it solves all of our problems" they said15:48
clarkbtristanC: yes that is what pabelanger was talking about. red hat doesn't have support for this in glean today so those checks will have to be removed and the code updated to address it15:49
*** rlandy is now known as rlandy|brb15:49
clarkbmnaser: well if you don't use network manager it works great :)15:49
clarkbtristanC: also your latest patchset assumes the vlan case I think15:49
mnaserclarkb: i think that statement stands in general for ipv6 or not =P15:49
pabelangerclarkb: tristanC: was on list of things to do when working on openstack-infra, sadly never did finish it :(15:50
tristanCclarkb: i'm not sure to understand why we shouldn't enable ipv6 for every interface?15:51
clarkbtristanC: because the configuration is based on the config drive data. If that doesn't include ipv6 config we don't confiure ipv615:51
tristanCclarkb: but the network manager would just set link local address in that case15:52
clarkbtristanC: the whole point is to configure it as per the config drive15:53
clarkband config drive may specific static addresses too15:53
clarkbor dhcpv615:53
clarkbyou can't assume this single config15:53
*** hashar has quit IRC15:56
tristanCwhat happen if an interface has both ipv4 and ipv6?16:00
*** donnyd_afk is now known as donnyd16:01
tristanCi'm not familiar with the config drive format, is this documented somewhere?16:01
clarkbunfortunately I don't know that nova documents their network data json file16:01
clarkbif an interface has both ipv4 and ipv6 there are different entries in the json for it and glean should configure the interface for both16:02
tristanCoh, so we need to edit a single file for two different interfaces when it has both v4 and v6?16:03
clarkbyou can share v4 and v6 on a single interface16:03
*** tosky has quit IRC16:05
*** jamesmcarthur has quit IRC16:06
fungitristanC: clarkb: best doc i can find for network_data.json is https://docs.openstack.org/nova/latest/user/metadata.html#openstack-format-metadata16:07
funginot exactly a formal specification, merely an example16:08
fungichatting in #openstack-nova about it now16:11
tristanCwhat's confusing is that glean is processing an interface list that may contains 2 entries for the same interfaces that needs to be configured through a single file...16:11
*** jamesmcarthur has joined #zuul16:11
tristanCi'm tempted to merge the two entries into one to simplify the file template, but that's quite a refactor...16:12
clarkbtristanC: it may be more than 2 entries too because an interface can have multiple ipv4 and multiple ipv6 entries16:12
tristanCclarkb: multiple ipv4 address for a single interface doesn't seem to be supported by the write_redhat_interfaces procedure16:13
clarkbtristanC: k, we can likely ignore that case for now then16:14
tristanCand do we have to support dhcpv6 or static ipv6 address?16:15
clarkbthe other platforms do, but as a first pass we can probably ignore those. None of our ipv6 only clouds currently rely on either method (rax relies on ipv6 + static but also has ipv4)16:15
fungiclarkb: tristanC: closest thing nova has to an api document for that file are these: https://specs.openstack.org/openstack/nova-specs/specs/liberty/implemented/metadata-service-network-info.html#rest-api-impact https://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/multiple-fixed-ips-network-information.html#rest-api-impact16:17
fungias far as stability of that api, they try to only ever add keys and not remove any, but also it rarely gets anything new added anyway16:17
*** michael-beaver has joined #zuul16:24
Shrewscorvus: where is the wait-to-start.sh script the quick-start job uses to wait for mysql. we just hit that same error from yesterday16:25
Shrewsit's referenced in docker-compose.yaml, but grep is failing me16:25
tristanCfungi: clarkb: pabelanger: then perhaps the last PS of https://review.opendev.org/#/c/686749/ might do the trick16:25
fungiShrews: `git ls-files` in zuul/zuul says there's a doc/source/admin/examples/playbooks/wait-to-start.sh16:27
fungiis that what you're looking for?16:27
Shrewsfungi: yes! thx16:27
*** saneax has quit IRC16:27
corvusShrews: which cloud?16:27
Shrewscorvus: limestone16:28
corvusShrews: that's 2/2 on limestone16:28
Shrews\o/16:28
corvusShrews: hypothesis: something about limestone causes the inter-container networking in docker to be weird16:29
Shrewscorvus: i'm going to set a hold on that job to see if we can explore some more16:30
corvusShrews: ++16:30
Shrewsalso, to use funky new autohold commands16:30
clarkbtristanC: couple of notes there16:30
clarkbtristanC: I think it is close but we need to get the interface type checks correct16:31
*** rlandy|brb is now known as rlandy16:31
Shrewscorvus: i set autoholds for both np and zuul. playing the recheck game now16:37
*** zbr|ruck has quit IRC16:39
openstackgerritMerged zuul/zuul master: web: render log manifest consistently  https://review.opendev.org/68630716:44
*** zbr has joined #zuul16:50
*** jangutter has quit IRC16:56
*** jpena is now known as jpena|off17:08
*** bhavikdbavishi has quit IRC17:30
*** bhavikdbavishi has joined #zuul17:32
openstackgerritJames E. Blair proposed zuul/zuul-registry master: DNM null commit for testing  https://review.opendev.org/68679117:40
openstackgerritJames E. Blair proposed zuul/zuul-registry master: DNM: second null commit for more testing  https://review.opendev.org/68679317:42
fungiclarkb: tristanC: for a bit more history on nm's coexistence with kernel v6 autoconf, the discussion in https://bugzilla.gnome.org/show_bug.cgi?id=682932 seems relevant17:47
openstackGnome bug 682932 in IP and DNS config "settings: harmonize IPv6 methods with IPv4" [Normal,Assigned] - Assigned to psimerda17:47
tristanCcorvus: i'm working on adding skopeo support to the zuul-registry17:52
clarkbfungi: the "no touching" stuff seems particularly related. Its odd that this would be the behavior tough if your intent is a working interface17:52
clarkbbut also we do tell NM to touch the device via sysconfig17:52
clarkbbut maybe the NM sysconfig plugin isn't a sufficiently strong trigger to override the RAs the kernel got?17:53
corvustristanC: cool.  yesterday i started working on impleminting the proxying needed for shadowing; i may be busy with operational stuff today but will continue that next week.  i haven't hit any road-blocks so far.17:53
fungii also found references to [ipv6] method=ignore for interface configs, but that's not mentioned in the NetworkManager.conf(5) manpage on my modern systems17:53
corvusclarkb: ^ related, i can now answer questions about docker registry authentication.  i understand why (and still think it's dumb)17:54
corvusclarkb: (the really short version is that getting a token (ie "authenticating") is required even for anonymous pulls because they designed it that way.  it's even true for our intermediate registry, which we can now pull from using an "anonymous" user with no password (anon-ftp forever))17:55
fungiso there's code in zuul-registry to grant tokens?17:58
openstackgerritTristan Cacqueray proposed zuul/zuul-registry master: Add type annotations  https://review.opendev.org/68624918:02
openstackgerritTristan Cacqueray proposed zuul/zuul-registry master: Add support for skopeo copy  https://review.opendev.org/68680318:02
clarkbfungi: it is possible the basic auth stuff also just works?18:02
tristanCcorvus: and regarding type annotations, could we land it or should i rebase without it?18:02
corvustristanC: we can land it -- i just left some review comments on ps5 so you can see my thoughts -- i'm still really concerned that i won't be able to contribute effectively (or enjoy contributing) with mypy.  it seems very difficult to use -- especially the more you use python for what it's good at.  i still feel like i'd be happier using a language designed this way from the start like c++ or rust.  but i18:06
corvuspromised to give it a real try, so i will +2 it with reservations.  :)18:06
corvustristanC: and thank you for doing this.  :)18:07
corvustristanC: oh wait i spotted something in the types change18:08
*** rfolco is now known as rfolco|bbl18:12
openstackgerritJames E. Blair proposed zuul/zuul-registry master: DNM null commit for testing  https://review.opendev.org/68679118:12
openstackgerritJames E. Blair proposed zuul/zuul-registry master: DNM: second null commit for more testing  https://review.opendev.org/68679318:13
mordredcorvus, tristanC: even with rust and c++ there is a trend towards allowing for a greater amount of type inferrance and only using explicit types when needed ... so I think learning when adding annotations is useful and when it's not will be a key to this being enjoyable18:15
corvusmordred: yeah, it's those variable types that bug/worry me the most.  i'm assuming they were required since tristanC is using strict?18:15
mordreddunno - and yeah - the function ones feel like an immediate win. the variable ones are ... interesting18:16
corvustristanC: just to be clear, my -1 on 686249 is for the not_found change; i'll +2 it after that and we can continue playing with it18:17
* mordred goes to eat fish18:17
clarkbcorvus: the inline with assignment types?18:21
corvusclarkb: yeah18:21
clarkbI wonder if those become more readable (easy to accept too maybe?) if done in a more C like style of specifying type and declaring the var early before using it?18:24
*** jamesmcarthur has quit IRC18:25
tristanCcorvus: understood, i think it's worth a try on this new project, and i'd be happy to help explain how it works. Then if it's too cumbersome, then we could drop the strictness and only use it where it's not confusing18:25
tristanCclarkb: yes, mypy is often confused when new variables are introduced in if then block for example18:26
corvusi don't mind declaring variables that way if that improves things all around18:27
tristanCmordred: right, but because python isn't compiled, i think it's really hard to infer function type. iiuc, only Haskell can infer function types effectively, thus i would push for that language instead :-)18:27
corvustristanC: your skopeo change will need to install skopeo18:34
tristanCcorvus: replied to the not_found thing. The method currently doesn't return anything, thus it shouldn't matter to return None after right?18:34
corvustristanC: the patch i'm working on extends it to return things18:35
corvustristanC: right now, because you can't configure shadow, it always raises an exception, but the api is exception or returned data.18:35
tristanCalright, let me check if we can disable that checks, mypy doesn't let you return func() if func returns none18:36
corvusaroo?18:36
corvus"def func(): return None; return func()" is valid18:36
Shrewsand totes useful function, too  :)18:37
tristanCand it fails with: zuul_registry/main.py:121: error: "not_found" of "RegistryAPI" does not return a value18:37
corvusNone is a value :)18:37
corvustristanC: do you maybe have to say not_found returns Optional something?18:38
tristanCOptional[T] is just sugar for Union[None, T]18:38
tristanCcorvus: https://github.com/python/mypy/issues/654918:39
corvusokay18:40
corvusso mypy has crossed from type checking into code style18:40
corvusand i *strongly* object to that18:40
*** jamesmcarthur has joined #zuul18:40
tristanCfair enough, then i don't mind abandoning 68624918:42
corvusa type checker would say "this function returns none, and the function you are calling returns none, therefore, the types match"18:42
tristanCmay i reply that to the issue? :)18:43
corvustristanC: sure :)18:44
openstackgerritTristan Cacqueray proposed zuul/zuul-registry master: Add support for skopeo copy  https://review.opendev.org/68680318:49
openstackgerritTristan Cacqueray proposed zuul/zuul-registry master: Add type annotations  https://review.opendev.org/68624918:49
tristanCin the meantime i've switch rebase order of the above change to get skopeo test in sooner18:49
*** bhavikdbavishi has quit IRC18:50
*** pcaruana has quit IRC18:51
tristanCfacebook/pyre-check does not like # type: ignore comments, google/pytype does not like empty list when a function returns List[str], and Microsoft/pyright needs nodejs :-)18:58
openstackgerritTim Burke proposed zuul/zuul-registry master: Rework the stream_blob/stream_object API  https://review.opendev.org/68682719:00
SpamapScorvus: tristanC I'm just catching up, but very curious about this function thing. Is it something in https://review.opendev.org/#/c/686249/ ?19:07
tristanCSpamapS: yes, we have to https://review.opendev.org/#/c/686249/7/zuul_registry/main.py@13919:09
SpamapScorvus: you say you feel that mypy is doing style checking instead of type checking, but I'm just not seeing it, even though I want to see it. :)19:09
clarkbSpamapS: https://github.com/python/mypy/issues/6549 is related19:10
SpamapSjust saw that19:10
clarkbSpamapS: basically they are going the extra step to say "this particular type signature is bad so we won't allow it" rather than simply checking the type signature matches how it is used19:10
SpamapSthe comment about "assumes that the return value of methods that return None should not be used" sounds like pure BS19:10
tristanCSpamapS: not necessarly, it depends on how 'None' is currently represented in mypy design19:12
SpamapShttps://github.com/python/mypy/issues/6549#issuecomment-53852593719:12
SpamapSI responded19:12
SpamapSit's crap19:12
SpamapSSaying you can't use None is like saying you can't use any class. Everything in python is an object. Period.19:13
*** jamesmcarthur has quit IRC19:14
clarkbhaskell even encodes this into a Maybe because it is so common https://wiki.haskell.org/Maybe19:15
SpamapSOption<T> is the rust equivalent, I think.19:17
tristanCSpamapS: it seems like NoneType is special in python, e.g. "not even defined in the standard library"19:18
clarkbtristanC: types.NoneType exists at least19:21
tristanCclarkb: not on my python-3.719:24
clarkboh interesting did they remove that in 3?19:26
SpamapSYeah I think because None is a singleton it doesn't make much sense.19:26
SpamapS-> None conveys exactly what is expected19:26
*** hashar has joined #zuul19:27
tristanCit seems like None is special in python, for example on the repl, "None" doesn't display anything. Thus maybe mypy does a special case for it resulting in weird behavior such as the one described in the above issue19:28
SpamapSYeah I'm suggesting it definitely should never be a special case.19:29
SpamapSAnd I think that's what corvus was on about too.19:29
tristanCin Haskell, Nothing is a real type, and the repl doesn't swallow it when it is returned19:29
tristanCSpamapS: yeah I agree mypy shouldn't impose such rules that is not related to type check, but perhaps there is something more tricky to make it work...19:31
SpamapSAlso Guido doesn't like my candor. ;)19:31
corvuswhatever comes of this, it's a successful learning exercise19:35
*** saneax has joined #zuul19:36
*** hashar has quit IRC19:37
*** hashar has joined #zuul19:37
*** panda is now known as panda|off19:38
SpamapScorvus: agreed, and to be fair, I've not had this particular problem, despite mypy'ing a lot of code the last year.19:41
corvusi really like the 'call this function and stop processing' paradigm i guess.  but then, i also like to put "if foo is None: continue" on one line because i think it's one thing, so i'm clearly not one of the cool kids.19:44
tristanCcorvus: also that particular problem will go away once not_found does return something :-)19:45
corvustristanC: yes, i'll work on it from that angle :)19:45
corvusSpamapS: is there a bar where python 1.5.2 programmers go to drink?  i should be there19:46
*** zbr has quit IRC19:46
clarkbI guess they want you to use exceptions more?19:48
corvusfunny story -- i have python code running at home that's been in continuous production for so long that it entirely skipped "new-style" classes.  it uses "class foo:" because "class foo(object):" hadn't been invented yet, and now suddenly it's practially py3 ready with no work on my part!19:48
clarkbcorvus: I think the lesson there is if we ignore our problems they solve themselves :)19:49
corvusthat is absolutely what i have taken from that19:50
fungiif you sit by the river long enough, you will watch the bodies of your enemies float by (proverb mis-attributed to all manner of historical figures)19:57
*** saneax has quit IRC20:01
corvusfungi the great20:04
corvus(a hysterical figure)20:04
fungithe 9 o'clock show is different from the 7 o'clock. tell all your friends, and don't forget to tip your waiters20:13
*** EmilienM is now known as EvilienM20:18
*** EvilienM is now known as containerizes_hi20:19
*** containerizes_hi is now known as containerized20:19
SpamapScorvus: at least you aren't having to watch that code be ported to ruby. ;)20:19
*** containerized is now known as EvilienM20:19
openstackgerritTristan Cacqueray proposed zuul/zuul-registry master: Add support for skopeo copy  https://review.opendev.org/68680320:29
*** jamesmcarthur has joined #zuul20:33
*** jamesmcarthur has quit IRC20:44
*** jamesmcarthur has joined #zuul20:46
*** hashar has quit IRC20:50
*** tosky has joined #zuul21:31
*** jamesmcarthur has quit IRC21:34
*** rlandy has quit IRC21:40
*** armstrongs has joined #zuul21:51
*** EvilienM is now known as EmilienM21:52
*** armstrongs has quit IRC22:03
*** mattw4 has quit IRC22:53
*** rfolco|bbl has quit IRC22:54
*** tosky has quit IRC23:39

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!