Monday, 2020-05-04

*** sameo has joined #kata-dev05:07
*** ailan_ has joined #kata-dev05:37
*** pcaruana has joined #kata-dev06:31
*** jodh has joined #kata-dev06:43
*** sgarzare has joined #kata-dev07:23
*** davidgiluk has joined #kata-dev08:01
*** sgarzare has quit IRC08:08
*** gwhaley has joined #kata-dev08:08
*** sgarzare has joined #kata-dev08:16
*** tmhoang has joined #kata-dev08:57
*** ailan_ has quit IRC09:21
*** ailan_ has joined #kata-dev10:58
*** jodh has quit IRC11:58
*** vgoyal has joined #kata-dev12:16
*** devimc has joined #kata-dev12:35
*** pcaruana has quit IRC12:39
*** pcaruana has joined #kata-dev12:40
*** hashar has joined #kata-dev12:53
kata-irc-bot<fidencio> Howdy! So, last week talked a bit about possible gating tests we could use for Fedora packages. libpod gating tests seemed like a good start and @wmoschet gave it a try. Now, I'm looking at the results and trying to understand whether the failures are "expected" or not (so, sorry, but I'll end up dropping few questions here Today, being this one the first one).12:59
kata-irc-bot<fidencio> Is kata-runtime supposed to deal with uidmappings? I mean, something like `podman run --rm --uidmap 0:100:1000 fedora mount` ? I'm currently getting the following error: `Error: rpc error: code = Unknown desc = User namespaces enabled, but no user mapping found.: OCI runtime error`  Of course, it works as expected when not using kata as the runtime.12:59
fidenciodevimc: ^ :-)13:05
fidencioand buenos días!13:06
devimcfidencio: buenos dias13:16
devimcfidencio: take a look https://github.com/kata-containers/tests/blob/master/.ci/podman/configuration_podman.yaml13:16
fidenciodevimc: cool, so it basically means that if it's not in the list, it's not supported and intended to not be supported for now. Do I understand it correctly?13:18
devimcfidencio: yup13:20
fidenciodevimc: cool, that was easy. Is it worth to have an issue open (at least some) of the non-supported cases?13:22
fidenciodevimc: it could help us to keep track of things that are failing, in general13:23
devimcfidencio: yes, it'd great13:24
devimc*it'd be13:24
fidenciodevimc: okay, I'll open the issues against the "test" repo13:25
fidenciodevimc: thanks for the help!13:25
devimcfidencio: yw ;)13:25
*** hashar has quit IRC13:33
kata-irc-bot<wmoschet> @fidencio is it one of the cases that failed in my tests?13:41
kata-irc-bot<wmoschet> Another question: is there such as list of non-supported cases for cri-o as well?13:42
kata-irc-bot<fidencio> @wmoschet, yep, that's one of your cases and I'm going through case by case to check what's the reason of the failure13:47
kata-irc-bot<fidencio> and then I'll open issues on kata-containers that we can end up pointing to podman itself if something could be improved on their side13:47
kata-irc-bot<wmoschet> @fidencio got it. Then for those non-supported cases I can change my script to filter out the tests13:47
kata-irc-bot<wmoschet> cool13:47
kata-irc-bot<fidencio> Yep, for now we'll just skip the ones failing ... just gimme some time to go through all of them, so we can at least annotate which one we expect to be an easy fix, and whatnot13:48
kata-irc-botAction: fidencio has a meeting in a few minutes, so it'll take some time :slightly_smiling_face:13:48
kata-irc-bot<wmoschet> @fidencio sure, I have plenty of other things to do...so no pressure. And I am returning from vacation and still in slow motion mode :slightly_smiling_face:13:49
kata-irc-bot<wmoschet> s/vacation/extended weekend/13:50
*** pcaruana has quit IRC13:52
*** pcaruana has joined #kata-dev13:52
*** sbrivio has quit IRC14:03
*** st3 has joined #kata-dev14:04
*** devimc has quit IRC14:38
*** devimc has joined #kata-dev14:39
*** hashar has joined #kata-dev14:45
*** hashar has quit IRC14:50
*** hashar has joined #kata-dev15:51
*** gwhaley has quit IRC17:02
*** sgarzare has quit IRC17:03
*** ailan__ has joined #kata-dev18:22
*** Jeffrey4l has quit IRC18:23
*** ailan_ has quit IRC18:24
*** Jeffrey4l has joined #kata-dev18:31
*** dklyle has joined #kata-dev18:36
*** Jeffrey4l has quit IRC18:36
*** Jeffrey4l has joined #kata-dev18:43
fidenciodevimc: don't we control whether we register the handler or not?19:00
devimcfidencio: I think so, there is a lit of signals to handle19:00
fidenciodevimc: so how would be hit that situation? Oo19:01
devimcfidencio: give more context about the test19:02
devimcwhat commands it runs?19:03
devimcfidencio: does podman use `kata-runtime` to signal the container?19:06
fidenciodevimc: podman --runtime=/usr/bin/kata-runtime run -d alpine sleep 60; podman --runtime=/usr/bin/kata-runtime stop $container_id;19:07
devimcfidencio: or the workload (in our case the kata-shim) is signaled directly?19:07
fidenciodevimc: that's basically what the test does19:07
devimcfidencio: ohh i see19:10
fidenciodevimc: somehow it triggers a SIGKILL instead of SIGTERM19:10
devimcfidencio: https://paste.centos.org/view/54cd00b719:13
devimcthis patch will fix it19:14
devimcfidencio: but now the question is why that condition was added?19:14
fidenciodevimc: yep, I'd dare to say the patch doesn't fix it unless we understand why that was added at the first place19:16
*** davidgiluk has quit IRC19:17
fidenciodevimc: let me open an issue for that and we can keep debugging and keeping trace of what we're doing19:17
fidenciodevimc: btw, this is *not* high-prio from my side19:17
devimcfidencio: oks, thx19:18
*** st3 is now known as sbrivio20:12
fidenciodevimc: https://github.com/kata-containers/tests/issues/2504 does the error look familiar? O:-)20:12
devimcfidencio: lol - no again20:14
fidenciodevimc: at least now we have it in a test20:15
devimcfidencio: does it run with selinux on?20:15
devimcwhy it's failing again?20:16
fidenciodevimc: so, I just checking which are the libpod failures and this one is the 3rd of the 6 we faced ...20:16
fidenciodevimc: of course, as I'm running it on a default RHEL, it comes with SELinux enable20:17
fidenciodevimc: now I just deleted /var/lib/{vc,containers}20:17
fidenciodevimc: set the selinux to disabled, rebooted the machine ... and let's see20:17
fidenciodevimc: I'm just glad I've faced this before and you've helped me before ...20:18
sbriviohi, i'm getting the dreaded "Failed to check if grpc server is working: context deadline exceeded: OCI runtime error" while running a somewhat "heavy" kernel (say, KASan and lockdep)20:19
sbriviokata-agent is up and running after ~20s (if i boot up the image in qemu stand-alone), i'm running kata-runtime with podman, it gives up after ~17s20:20
sbrivioi changed defaultDialTimeout from 15s to 60s in agent, shim, and runtime, rebuilt, still it seems to time out after 15s, does anybody have any pointer?20:21
sbrivio(that change was inspired by devimc's https://github.com/jshachm/agent/commit/6bd9b01106f6b8570f48c6cd7b8403dc5a831d30 )20:23
devimcsbrivio: if the container has not been created after 10s, the container manager will kill it20:29
sbriviodevimc, i see, do you know where i could change that?20:30
devimcsbrivio: I tried, but it was impossible, that timeout is not configurable20:30
sbrivioshim/vendor/github.com/docker/docker/container/monitor.go:10:loggerCloseTimeout = 10 * time.Second20:31
devimcsbrivio: why your kernel is too "heavy" ?20:31
sbriviothis one perhaps? hmm20:31
fidenciodevimc: debug kernel :-)20:31
sbriviodevimc, yeah, essentially, because i have KASan enabled :) lockdep alone would probably not take so long20:32
devimcsbrivio: have you tried with the kata cli ?20:32
sbriviooh, wait, that's just for docker, one of the many random results grep -rn "10 \* time" gave me20:32
sbriviodevimc, no, i haven't, not sure how to do that, is there some guide?20:33
devimcsbrivio: guide? - ppff I have something better.. source code20:34
devimcsbrivio: https://github.com/kata-containers/tests/blob/master/functional/vfio/run.sh#L53-L8020:34
sbriviodevimc, oh, that's a language i like! thanks :)20:35
devimcsbrivio: yw ;)20:35
fidenciookay, 3 out of 6 issues are on github. the rest will be opened Tomorrow.20:38
fidenciodevimc: as usual, thanks for the help!20:38
devimcfidencio: cool! and yw20:39
fidenciotake care everyone and "siganme los buenos" O:-)20:40
devimchaha20:40
devimcfidencio: take care man!20:40
sbriviodevimc, [while trying to recycle a bundle that kata-runtime can digest] by "container manager" you mean podman or docker? because from my logs that doesn't seem to be the case20:51
devimcyes20:52
devimcsbrivio: what's the failure?20:52
sbriviothe first one i see in syslog from kata-runtime is a: "Stopping Sandbox"20:52
sbrivio(pasting logs...)20:53
devimcsbrivio: you can use paste,centos.org20:54
devimcsbrivio: https://paste.centos.org/20:54
sbriviodevimc, yes yes, i'm used to 0bin, using that as i've seen it used here already :)20:55
sbriviohttps://paste.centos.org/view/fb809212 sorry, it's horrible20:55
sbriviofailure seems to come at line 2920:55
*** sameo has quit IRC20:57
devimcsbrivio: are you running this nested ?20:58
devimcsbrivio: on azure?20:59
sbriviodevimc, nested on local kvm-amd box20:59
devimcsbrivio: run `kata-runtime kata-check`21:00
sbrivio# kata-runtime kata-check21:00
sbrivioSystem is capable of running Kata Containers21:00
sbrivioSystem can currently create Kata Containers21:00
devimcI see21:00
sbrivioi mean, it works with the same kernel built without KASan and lockdep (and a few others, perhaps)21:00
devimcsbrivio: the VM is started at 22:17:52 and stopped 14 seconds later21:02
devimcsbrivio: have you tried without vsocks?21:03
sbriviodevimc, 14 to 15 seconds, yes, that's what made me think it was that defaultDialTimeout21:03
sbrivionot yet, or i'm not even sure it's configured to use vsocks (sorry, i'm a kernel developer, relatively new to this) -- checking21:03
devimcsbrivio: it's using vsocks21:04
sbriviodevimc, oh, now that you mention it, i see it in the logs :)21:04
devimcsbrivio: take a look to /usr/share/defaults/kata-containers/configuration.toml21:04
devimcyou can disable it there21:05
sbrivioyep, editing now21:05
devimcor here /etc/kata-containers/configuration.toml21:06
sbrivioyeah, i have a copy in /usr/local/share that i'm using, it's actually disabled there, trying to find out where that might come from instead...21:06
devimcthe file in /etc takes precedence21:07
sbrivioi built the thing with SYSCONFDIR to be sure, anyway, checking it's not enabled anywhere now...21:07
sbrivio(the thing == kata-runtime)21:07
sbriviodevimc, okay, disabled it for real now :) and it goes further, thanks!21:09
sbriviojust hitting this for some reason now:21:10
sbrivioDEBU[0021] Starting container 434a74005f29c79a8fd7507a8e9e2a7b3c0a9f9b1d2f9180269e5b1696a2e178 with command [/bin/bash]21:10
sbriviorpc error: code = Unknown desc = path "memory" missing21:10
sbrivioERRO[0032] `/usr/bin/kata-runtime start 434a74005f29c79a8fd7507a8e9e2a7b3c0a9f9b1d2f9180269e5b1696a2e178` failed: exit status 121:10
sbriviobut i doubt it's related to vsocks21:10
devimcyeah, that's a different error21:10
devimcat least now `kata-runtime start` is executed21:10
devimcsbrivio: do you have enough (>3GB) RAM memory?21:11
sbrivio6GB, yeah21:12
sbrivio              total        used        free      shared  buff/cache   available21:12
sbrivioMem:        6795520     1227944     3007304        7852     2560272     525360421:12
devimcsbrivio: are you using podman?21:13
sbrivioyep:21:13
devimc`path "memory" missing` doesn't make sense to me21:13
sbriviosomething like this: podman --runtime /usr/bin/kata-runtime run --log-level=debug --security-opt label=type:container_kvm_t -it fedora21:13
devimcselinux on - ouch!21:14
sbrivioha, sorry! :D21:14
* sbrivio disables and retries21:14
sbrivionope, it's not that, grepping around now...21:15
devimcsbrivio: it should work,21:15
devimcsbrivio: but I don't use it21:15
devimcsbrivio: I think there are still some bugs - https://github.com/kata-containers/tests/issues/2504#issuecomment-62368907121:16
devimcdisable it, rm -rf /var/lib/{vc,containers}, and try again..21:16
* sbrivio tries that too21:16
sbriviodevimc, that's another failure now, Failed to check if grpc server is working: rpc error: code = Unavailable desc = transport is closing: OCI runtime error21:19
sbriviobut... i have a hint now, before "Stopping Sandbox", i'm getting a message by earlyoom21:20
sbrivioMay 04 23:18:53 localhost.localdomain earlyoom[1012]: mem avail:  4472 of  6636 MiB (67 %), swap free:    0 of    0 MiB ( 0 %)21:20
sbrivio(67%? come on, it's not much...)21:20
* sbrivio restarts the whole thing with way more memory21:20
devimcsbrivio: no21:20
devimcchange the default_memory21:20
devimcin the configuration file21:20
sbrivioalso in configuration.toml?21:20
sbriviookay21:20
devimc640 - 1024 should be enough21:21
devimcby default is 204821:21
devimcsbrivio: and don't forget to enable_debug21:21
sbriviodevimc, thanks, done, set enable_debug for all the components now, same "path memory missing" thing:21:24
devimcsbrivio: that's weird21:24
* sbrivio fetching logs21:24
devimcsbrivio: how are you building the kernel?21:24
sbriviowith make, on the host21:25
sbriviodevimc, what do you mean exactly? kernel config?21:25
devimcsbrivio: yes21:25
sbriviojust a moment21:25
devimcsbrivio: seems like you forgot to enable something21:25
devimcsbrivio: clone this repo https://github.com/kata-containers/packaging21:26
sbriviodevimc, https://paste.centos.org/view/b2cec596 and still, it works without CONFIG_KASAN and other stuff21:26
devimcsbrivio: I see 5.721:27
devimcthat too new for us21:27
devimcnot sure if we support it21:27
sbriviodevimc, yeah, it's pretty much latest upstream21:27
sbriviodevimc, still, it works without those options21:28
sbrivio(i mean, net-next.git kind of upstream)21:28
sbriviodevimc, so that could be something that kata-agent doesn't understand? or something unexpected happening in the guest, you mean?21:29
devimcsbrivio: yeah - something missing under /sys21:29
* sbrivio still trying to relate podman logs to syslog21:30
devimcsbrivio: sudo journalctl -b -t kata-proxy21:30
sbriviodevimc, oh, so much better, thanks (laugh if you want, but i'm not used to systemd :))21:31
*** vgoyal has quit IRC21:32
devimcsbrivio: devuan ?21:32
sbriviodevimc, and finally: https://paste.centos.org/view/5878ffb921:32
sbriviodevimc, debian, at some point devuan, now debian, still not used to it :)21:33
sbrivio(i thought, let's switch back to "proper" debian so that i learn... eventually...)21:34
devimcsbrivio: "Could not update parent cpuset cgroup (/sys/fs/cgroup/cpuset/libpod/cpuset.cpus) cpuset:'0': open /sys/fs/cgroup/cpuset/libpod/cpuset.cpus: no such file or directory"21:34
devimcsbrivio: cool!21:34
sbriviodevimc, i see, checking configuration :)21:34
devimcI like debian21:34
devimcsbrivio: seem like you don't have support for cgroups21:34
devimccpu cgroups21:35
sbrivioyeah, that would be totally weird, still, checking :)21:35
devimcsbrivio: take a look to this folder https://github.com/kata-containers/packaging/tree/master/kernel/configs/fragments21:36
devimcit contains all the CONFIGs that kata needs21:36
sbriviodevimc, useful, thanks. how do you "source" those?21:36
devimcmight be you missed one21:36
devimcsbrivio: scripts/kconfig/merge_config.sh21:38
devimcit's a kernel tool21:38
sbriviodevimc, ah, that :) yep, sure21:38
sbrivioCONFIG_CGROUP_CPUACCT=y can't be, really...21:38
sbrivioalso CPUSET set21:38
devimcsbrivio: don;t forget all *_CGROUP_*21:39
sbriviodevimc, it's all there -- let me check what happens if i boot the guest in a similar way...21:40
sbrivio(with just qemu, as i would get with kata-runtime)21:40
devimcsbrivio: gotta go - let's continue this tomorrow -cu! take care21:41
*** devimc has quit IRC21:41
*** ailan__ has quit IRC21:52
*** hashar has quit IRC22:42

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!