Thursday, 2021-08-19

kata-irc-bot<fupan> What’s the raw volume in /mnt?  Can you paste the test cases here, since the kata-agent only used that check to make sure it mount a blk storage from device file in guest.02:13
kata-irc-bot<dgibson> @fupan this device bind mount thing is still confusing me04:55
kata-irc-bot<dgibson> @fupan afaict, every place that sets the "blk" storage driver on the runtime side also sets the storage source to a PCI path, so it won't hit that /dev/ path04:55
kata-irc-bot<fupan> @dgibson Do you mean it wouldn’t reach https://github.com/kata-containers/kata-containers/blob/main/src/agent/src/mount.rs#L378 here, instead it would go to here: https://github.com/kata-containers/kata-containers/blob/main/src/agent/src/mount.rs#L386 ?05:20
kata-irc-bot<dgibson> well assuming this case goes to that function at all, which I'm not 100% sure of yet05:22
kata-irc-bot<fupan> but there’s no difference, and it would reach https://github.com/kata-containers/kata-containers/blob/main/src/agent/src/mount.rs#L392 at last, which would also do the bind mount, and the mount src would be the real /dev/<vdx> path.05:23
kata-irc-bot<fupan> Here https://github.com/kata-containers/kata-containers/blob/main/src/agent/src/mount.rs#L389 the storage source would be set the path of /dev/<vdx>.05:24
kata-irc-bot<dgibson> right, I guess si05:26
kata-irc-bot<dgibson> so05:26
kata-irc-bot<dgibson> I guess that mount wouldn't work anyway if it was hitting that /dev path - the host path of the /dev node wouldn't match the guest path anyway05:26
kata-irc-bot<dgibson> I think it's a bug that both this bind mount of a device node, and actually mounting a device with a filesystem go through the same "driver"05:27
kata-irc-bot<dgibson> they're very different operations05:27
kata-irc-bot<fupan> @dgibson Actually  in the case of the /dev path, it isn’t the host path, and it’s the guest path which would be figured by runtime according its virtual path in guest.05:35
kata-irc-bot<dgibson> hmm... that's not really clear.  The source field has different precise semantics depending on a bunch of different factors05:35
kata-irc-bot<dgibson> in some cases, yes, the runtime preducts the guest path (often in ways I suspect aren't really reliable), but I'm not sure that's true in every case05:36
kata-irc-bot<dgibson> for PCI devices the host has no way of predicting the guest /dev path05:36
kata-irc-bot<dgibson> well, usually05:36
kata-irc-bot<fupan> But for blk volume, sometime’s runtime couldn’t know the real filesystem, and just inherit the original “bind” mount to deal with the storage mount.05:38
kata-irc-bot<dgibson> in the case you're describint there is no "real filesystem"05:40
kata-irc-bot<dgibson> or at least the filesystem is not relevant to the operation05:41
kata-irc-bot<dgibson> you're binding a device node into the container05:41
kata-irc-bot<dgibson> on runc that's a straightforward bind mount, on kata we need to translate host to guest device node, then do a guest-local bind mount05:41
kata-irc-bot<dgibson> that's a totally different operation from asking that the filesystem container on a host device node be mounted into the guest05:41
kata-irc-bot<dgibson> instead it's like mounting a single regular file into the container05:43
kata-irc-bot<dgibson> also a simple bind mount on runc, but for kata we'd need to use virtiofs (or 9p) instead05:43
kata-irc-bot<dgibson> I'm not sure where those two cases (dev fs. regular file) are separated in the kata runtime05:44
kata-irc-bot<fupan> It didn’t mount the host device node into guest, actually it hotplug the host device into guest and mounted the guest device node05:44
kata-irc-bot<dgibson> "it" is unclear05:45
kata-irc-bot<dgibson> we're trying to match runc semantics, right05:45
kata-irc-bot<fupan> It means kata runtime.05:46
kata-irc-bot<dgibson> I mean there are several different related, but distinct cases here05:46
kata-irc-bot<dgibson> for runc, -v /dev/sdX:/some/path is basically identical to -v /tmp/somefile:/some/path05:47
kata-irc-bot<dgibson> for Kata they're different05:47
kata-irc-bot<fupan> Yes, and only when runtime wouldn’t catch the block’s filesystem, kata would tried to match runc’s sematics and did the device node bind mounted.05:47
kata-irc-bot<dgibson> no, that's not correct05:47
kata-irc-bot<dgibson> even when it knows the filesystem, with runc -v /dev/sdX:/some/path won't mount the filesystem in the container05:48
kata-irc-bot<dgibson> it will mount the device node in the container05:48
kata-irc-bot<dgibson> the container can use the device raw, or it can mount it itself05:48
kata-irc-bot<dgibson> in fact, runc really never needs to mount filesystems, other than bind mounts05:48
kata-irc-bot<dgibson> Kata does, as an optimization05:48
kata-irc-bot<dgibson> the normal case for "blk" is, AFAICT, where we mount a block device on the host, then bind that mounted filesystem into the container05:49
kata-irc-bot<dgibson> with runc, it just bind mounts the host mount05:49
kata-irc-bot<dgibson> for Kata we could virtiofs the host mount, but that's slow05:49
kata-irc-bot<dgibson> so AIUI, we have an optimization to instead map the underlying device into the guest and mount it within the guest05:50
kata-irc-bot<dgibson> Hrm... wait.. now I'm less sure about this05:50
kata-irc-bot<dgibson> They *should* be totally different operations, but OCI is kind of broken05:51
kata-irc-bot<dgibson> the semantics of the container shouldn't depend on whether the runtime recognizes the filesystem or not05:51
kata-irc-botAction: dgibson rereads OCI05:53
kata-irc-bot<dgibson> of for goodness sake05:53
kata-irc-bot<dgibson> no, you're at least partially right, and OCI is totally broken05:53
kata-irc-bot<dgibson> "source" for a mount can be a device name or a host path05:53
kata-irc-bot<fupan> that’s  kata’s optimization, once it know the filesystem, it would do the real filesystem mount onto the container. yes, it’s break the oci semantic, but it would be the user expected.05:53
kata-irc-bot<dgibson> ugh...05:54
kata-irc-bot<dgibson> so the distinguishing thing is whether it's a bind mount (from the host side description) or not05:54
kata-irc-bot<dgibson> it's still nothing to do with whether the runtime can recognize the filesystem or not05:54
kata-irc-bot<dgibson> it's whether the container spec says it's a real fs or a bind mount05:55
kata-irc-bot<dgibson> the thing is that "bind" mounts as specificed in the container spec make no sense in the Kata context05:56
kata-irc-bot<greg.bock> I’ve been carrying forward these: https://github.com/kata-containers/agent/pull/407  https://github.com/kata-containers/runtime/pull/88205:56
kata-irc-bot<dgibson> for regular files "Bind" mounts for the container mean virtiofs mounts within the guest05:57
kata-irc-bot<dgibson> but for device nodes we have to do extra magic05:57
kata-irc-bot<dgibson> my point is that the special case of exposing a host device node into the guest should be a special case of the virito-fs mount path, not of the block mount path05:58
kata-irc-bot<dgibson> it has nothing to do with block *mount*ing05:58
kata-irc-bot<dgibson> we're passing through the host side fstype to the guest, and that can never be correct for bind mounts05:59
kata-irc-bot<fupan> But there’s no meaning to expose a host device node into guest by virtio-fs , since the container could’t access the host device node.06:00
kata-irc-bot<dgibson> exactly, which is we we can't implement it with virtiofs06:00
kata-irc-bot<dgibson> but my point is the difference in container semantics between "filesystem" volume and "bind" volume is a more fundamental distinction06:00
kata-irc-bot<dgibson> the decision is like this06:02
kata-irc-bot<dgibson> if (volume-is-bind-in-spec) {       if (bind-target is regular files & dirs) {06:02
kata-irc-bot<dgibson> but at the moment paths A & C are ending up in the same place and we have to re-special case them with logic that doesn't really make sense06:06
kata-irc-bot<fupan> @dgibson Did you mean the B and C are blended into the same place?06:50
kata-irc-bot<eric.adams> According to https://vsupalov.com/docker-latest-tag/ the latest is the default tag when no tag is specified. Browsing around some other popular Docker hub repos they used :latest to refer to the last release which usually is some alpha.  I think using latest as a tag to match the last image posted makes sense. The "stable" version will always be a little bit older which makes sense to me.21:55
kata-irc-bot<fidencio> Cool, it matches with what was my understanding!21:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!