15:02:01 #startmeeting libvirt 15:02:02 Meeting started Tue Jul 15 15:02:01 2014 UTC and is due to finish in 60 minutes. The chair is danpb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:03 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:02:05 The meeting name has been set to 'libvirt' 15:02:08 \o 15:02:10 o/ 15:03:01 o/ 15:03:25 o/ 15:03:49 ....no agenda so... 15:03:55 #topic Open Discussion 15:03:57 o/ 15:04:44 Who here is going to the minisummit? 15:05:15 I'll be there 15:05:16 you mean the Nova mid-cycle meetup ? 15:05:20 yeah 15:05:24 ill be there 15:05:29 afraid I won't be there - clashes with my holiday 15:05:33 I am sadly giving it a miss this time 15:05:37 Ahhhh gotcha 15:05:39 another conf just before that 15:05:58 Okey dokey 15:06:04 I'm going to be there too 15:06:39 Want to discuss that review, s1rp_? 15:07:14 sure, so the idea is that we want to add config-drive support to lxc guests 15:07:27 but config-drive as it is in trunk only exposes block-devices 15:07:58 so this patch adds a new backend type called `fs` which just drops the configdrive info into a directory on the host 15:08:03 right yes, i think we discussed in the past doing a bind mount of a filesystem 15:08:06 yes 15:08:09 from there, we copy it into the rootfs to aboid a bind mount 15:08:28 we avoid a bindmount because there were recent security issues with it 15:08:47 well when i say bind mount, i don't mean nova doing a bind mount 15:08:57 i mean just add a element to libvirt XML for it 15:09:14 and let libvirt figure out the bind mount setup it wants todo 15:09:38 in fact you would not actually need to neccessarily change nova config drive support 15:09:44 right, but bind-mounts were recently shown to be hackable, the kernel datastructure allowed you to traverse to the root of the bind-mounted filesystem 15:09:48 you could let nova continue to create a FAT or ISO image 15:09:53 so we wanted to avoid that entirely 15:10:00 and then libvirt would just mount the image in the container 15:10:35 s1rp_: do you hve a link for that problem 15:10:46 s1rp_: basically everything about the way LXC is setup involves bind mounts 15:10:59 so config drive use of bind mounts is the least of your worries if that's got a flaw 15:11:05 danpb: it's based on this: http://seclists.org/oss-sec/2014/q2/565 15:11:10 danpb: yeah ill try and dig that up... was on HN a few weeks ago... apmelton thomasem happen to remember where we saw that 15:11:27 oh that one 15:11:35 danpb: userns insulate you from this problem, but we wanted to be extra careful 15:11:42 that's a docker flaw due to lack of userns 15:11:56 if you don't use userns then you must *not* give out capabilities to the container if you want to be secure 15:12:05 thats not a bind mount flaw 15:12:21 so i don't think that's any reason to avoid use of bind mounts for config drive 15:12:49 using bind mount can also expose details about the host 15:12:49 so in general, we'd like to shrink the attack surface of lxc containers by not exposing underlying host resources 15:12:50 we already document that LXC in Nova is considered insecure (untill we have the userns feature done) 15:13:21 for instance if we're bind mounting from those hosts root filesystem, details like capacity are exposed to the container 15:13:37 apmelton: yeah good point, that was another issue with it 15:13:42 ok, so that's a more reasonable argument 15:14:06 so if that's a concern then instead of introducing a new configdrive type = fs, just use the existing type=iso or type=fat 15:14:15 and we can just loopback mount the config drive image in the container 15:14:25 at a well known location of /config or some such 15:14:49 danpb: isn't that more moving parts? 15:15:20 that'd have the advantage that we'd not have so much difference between lxc & non-lxc setup from nova's pov 15:15:44 danpb: does that expose the iso as a device, or as a filesystem? 15:15:53 configdrive already creates a temp-direcotry and then moves that into the blockdev... the proposed lxc approach just 'blesses' that temp directory into a real directory on the host 15:15:59 then we move that into rootfs 15:16:09 apmelton: if you use then you provide a file containing a filesystem which gets mounted at the desire place 15:16:38 apmelton: so the container would see the filesystem, just as it would if you'd done a bind mount but without exposing host FS 15:17:09 ah so we'd do ? 15:17:11 s1rp_: you'd have to keep that temp directory around forever though now 15:17:27 s1rp_: so it introduces extra cleanup that has to happen for LXC 15:17:40 danpb: how come, it just gets moved into rootfs and we can forget about it 15:17:40 apmelton: no just type=file, libvirt should auto-detect the format and mount it 15:17:57 ah cool 15:18:02 did not know that 15:18:08 s1rp_: oh hmm, you're actually moving the dirctory 15:18:16 danpb: yep 15:18:28 danpb: code happily will skip the rm if it's not there.. 15:18:55 danpb: also need to add `mv` to compute.filters to allow the copy into the rootfs 15:19:07 tried to lock that down with a regex though 15:19:25 s1rp_: i think from the cloud admin POV it'd still be compelling if we could do this without introducing the need to use CONF.config_drive_format == 'fs' 15:19:37 s1rp_: because that would be one less magic config setting they need to remember to change for LXC only 15:19:50 danpb: yeah i definitely ++ that point 15:20:27 so could you just try out the approach to see if it will "just work" with the config_drive_format = fat or iso9660 or both 15:20:44 if we hit problems with that idea then we could re-visit it 15:21:01 sounds good 15:21:16 i'll comment on the review for sake of history 15:21:22 coolness 15:21:50 danpb: do you know if filesystem supports blkiotune? 15:23:00 like, can we set limits for the underlying block device supporting the filesystem? 15:24:26 apmelton: there's two block I/O tuning options in libvirt - there's global to the VM settings and there's per-disk settings 15:24:47 the former uses cgroups and should work with LXC IIRC, and the latter uses built-in QEMU rate limiting of which there's no equivalent with LXC 15:25:26 if the LXC are each backed by unique loopback or nbd devices though 15:25:49 the top level global I/O tuning with cgroups is probably suffiicient, since each loop/nbd dev would have a distinct major/minor number 15:26:09 so there's that, but also tunes on the config drive 15:26:34 we wouldn't want a container to be able to peg the drive hosting nova's data dir 15:26:50 so we'd wanna set limits on that loopback device pretty low 15:27:18 yep, this is probably something that could use some enhancement 15:27:47 I was doing some digging a little while ago, and I don't think the filesystem tag worked with blkiotune 15:28:27 I didn't see any cgroups getting set at least 15:28:38 it currently doesn't - you'd have to figure out which loop device was used and then set the top level policy 15:30:06 I suppose that could be tricky if you're setting two different tunings on filesystems hosted on the same block device 15:30:59 yeah, that's why QEMU added support for per-device limits itself 15:31:10 because cgroups can;'t distinguish that scenario 15:31:20 gotcha 15:33:46 Well this has certainly been a productive chat. 15:35:07 cool, so any other topics.... 15:37:21 ok, lets call this a wrap 15:37:24 #endmeeting