15:02:15 #startmeeting XenAPI 15:02:15 Meeting started Wed Dec 11 15:02:15 2013 UTC and is due to finish in 60 minutes. The chair is johnthetubaguy. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:16 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:02:17 hello all 15:02:18 The meeting name has been set to 'xenapi' 15:02:23 who is around today? 15:02:29 hi 15:02:57 hi 15:03:25 cool, so lets get cracking 15:03:33 #topic Blueprints 15:03:40 Sorry guys 15:03:45 anyone got anything to chat about blueprints? 15:03:56 yes :) 15:04:03 cool, fire away 15:04:19 I added a link to an etherpad and I don't know if it is a good practice 15:04:37 Could you just link to the bp thouveng ? 15:04:42 in here I mean 15:04:48 I'm being thick and can't find it 15:05:00 https://blueprints.launchpad.net/nova/+spec/pci-passthrough-xenapi 15:05:09 and the link to ehterpad is https://etherpad.openstack.org/p/pci-passthrough-xenapi 15:05:22 Perfect 15:05:28 that's right isn't it johnthetubaguy ? 15:05:59 yeah, that looks good 15:06:29 ok cool. 15:07:13 are there any bits you want to discuss in that? 15:07:18 or just get a general review? 15:07:48 just a general review for the moment 15:08:15 I think it LGTM - but that might be because we've discussed it outside of the BP 15:08:17 what is all the hiding of PCI devices? 15:08:51 I thought all the hiding is implemented inside nova? 15:08:54 I'll let you answer thouveng - but if you want me to step in, let me know 15:09:21 yes please go because I didn't get the question sorry 15:09:54 thouveng: you say about passing a "hide" option to pciback module, why is that? 15:10:00 In order to do PCI pass through for PV guests in xen (and stably for HVM guests) the devices should use the pciback driver in dom0 to make sure dom0 doesn't use them for other things 15:10:20 Therefore they need to be "hidden" from the normal kernel boot so pciback can collect them 15:10:34 hence pciback.hide=(device_id) on the kernel command line 15:11:11 the KVM approach is to change the module dynamically but that's a little less stable - and yuo still have to enumerate the devices in nova.conf anyway 15:11:16 so the two might as well be combined 15:12:25 OK, so I am not sure I understand what is going there, I know there is a whilelist that tells nova which devices it can pass to a guest, and in the flavor we tell nova which devices to pass to a specific server 15:13:10 Post-whitelist, KVM will try to disconnect the device from dom0 and attach it to the KVM-equivalent of the pciback driver 15:13:16 oh, I see 15:13:36 so we have to disconnect some set of device from dom0, and thats what we are updating? 15:13:37 All we're saying is we'll do that at boot time since that's better practice - particularly for xen 15:13:58 yes - but they are not "disconnected", just never connected to dom0, as it's a dom0 kernel option 15:14:15 right 15:14:28 so this is moving into host aggreates, and the config file is going away 15:14:33 does that create an issue? 15:14:42 what is moving into host aggregates? 15:15:08 and what config file? do you mean the whitelist option in nova.conf? 15:15:26 but no, I'm sure it won't create a problem 15:15:36 there are two different things to configure pci passthrough. The whitelist is just used bu the compute node. 15:16:02 yes, the configuration in nova.conf is going into the DB 15:16:08 well - the compute node uses the whitelist to report to the scheduler what it can provide, right? 15:16:13 so I think that it doesn't create any issue since we just replaced the white list by the boot command line detection. 15:16:15 BobBall: yes 15:16:17 that includes the white list I think 15:16:18 that won't be a problem 15:16:27 I think :P 15:16:45 erm, except the user now specifies the whitelist through an administration API? 15:16:53 as in rest API 15:17:31 That won't be possible in the current thinking - and possibly isn't possible _at all_ with Xen 15:18:16 hmm, so thats what we decided at the summit for PCI passthrough, oh dear... 15:18:42 so, its not all bad right… as long as you expose more PCI devices than you want in your nova based whitelist 15:18:49 true 15:18:57 so it migth need to be an intersection of the two 15:19:03 i.e. you have to configure in dom0 to expose it 15:19:03 cool, I think we are good 15:19:11 yeah, +1 15:19:14 and then nova can only use it if it's both there and in the whitelist 15:19:28 expose some in dom0, then you can configure some of those in the dymanic whitelist, it all wokrs 15:19:33 yup 15:19:35 cool 15:19:39 any more blueprint stuff? 15:19:50 It might be possible with newer versions of xen btw - so Augusta or beyond - which use Xen 4.2+ 15:20:04 but not with existing versions (which don't have xl pci-set-assignable stuff) 15:20:14 -xen +XenServer 15:20:39 ah, OK 15:20:41 good to know 15:20:44 nice 15:20:47 Does that make sense thouveng? I think that's right? 15:20:58 yes I think so 15:21:34 I would just get in there, and throw up some code, and we can help you through it 15:21:44 all sounding really good :) 15:22:03 good good 15:22:08 #topic Docs 15:22:11 any news? 15:22:33 #topic QA 15:22:49 matel: want to update us on the tempest gate work? 15:23:10 Yep, nodepool is not prepared for server restarts 15:23:24 so I'm proposing a patch, so that this concept fits in. 15:23:38 I am looking at the other side, so assuming we get nodepool sorted, and we have a VM setup, what can we do 15:23:59 at the same time, going to see what I can do with config drive to get IPs into it, inside rax cloud 15:24:08 If that's ready, 2 more items left on the list: 1.) prepare the instance for snapshotting 2.) come up with a localrc. 15:24:20 yep 15:24:22 Yes, config drive would save us some reboots 15:24:36 I am kinda looking at (2), in theory anyways 15:24:50 Apart from that, an email will go to the infra list with our ideas. 15:24:56 bobball: how is making tempest stable going? 15:25:00 Hopefully today. 15:25:04 matel: some fine work 15:25:35 in the RS cloud just waiting for Mate to collect the logs so we can try and figure out why it's not working properly up there but it works better here 15:25:38 anyway 15:25:53 I've got a whole heap of changes stuck waiting for review needed to get tempest stable + fast enough 15:26:05 BobBall: is that full tempest failing or smoke too? 15:26:13 Bob - what's up with changing to raring? 15:26:16 smoke in RS cloud fails 15:26:23 That's what I was just going tos ay - but I want suacy 15:26:25 saucy* 15:26:39 anyway - the hopefully last issue is a kernel bug in precise 15:26:45 Oh, so you say we should try saucy? 15:26:45 which we've just confirmed as a kernel bug 15:26:50 yeah 15:26:53 saucy is newer 15:26:57 new = good, right? 15:27:03 something like that 15:27:04 I'm a bit confused, soucy is the latest? 15:27:08 Maybe raring is good enough 15:27:09 yes 15:27:12 raring = 13.04 15:27:14 It's not true in the software world. 15:27:15 saucy = 13.10 15:27:26 matel: what is in your XVA at the moment? 15:27:43 A debootstrapped install. 15:27:48 wait a sec... 15:27:53 Anyway - the kernel bug in precise causes a semaphore to be held when the userspace program finishes 15:27:58 causing all sorts of things to fail randomly 15:28:01 nasty 15:28:05 like lsof or lvs or anything really 15:28:15 eek, nice 15:28:20 which in turn (very disappointingly) means that tempest fails 15:28:22 john: which xva are you asking? 15:28:27 john: gimme url. 15:28:33 I thought all XVAs were precies? 15:28:35 precise* 15:28:43 matel: the one in your script in gerrit? 15:28:57 Ah, OK, I thought that you are interested in the package list. 15:29:05 So yes, that's a precise. 15:29:24 OK, so there is (3) update xva to latest ubuntu 15:29:26 Guys, should we agree to go for saucy? 15:29:45 we should go for whatever works for you locally at the moment 15:29:50 anyway - the frustrating thing is that for some reason we don't seem to hit this kernel issue if we don't have one of my changes... but other things in tempest randomly fail without it 15:30:00 lol 15:30:06 that sucks 15:30:26 #link https://review.openstack.org/#/c/60253/ is the one that fixes some real nova failures and seems to somehow expose the kernel bug 15:30:38 that's not lol, Bob is loosing his hair. 15:30:56 So, Saucy? 15:30:57 Bob? 15:30:59 It's true. I have pulled most of it out in the last week. 15:31:03 I say yes matel 15:31:11 no point sticking on precise IMO 15:31:24 you make it go so fast… you hit a kernel bug 15:31:24 it happens to us all 15:31:28 Okay, I will go for that as well. 15:31:28 either saucy or just say "sod it" and go for centos like the rest of the infra jobs ;) 15:31:37 john - do you know if anyone is on saucy? 15:31:39 but that's a bigger change 15:31:51 I'm happy with trying raring if it's easier - e.g. exists in RS 15:31:52 Yep, I am afraid of the unknowns. 15:31:58 I duno, can't remember what we use, don't think its ubuntu 15:32:07 Okay, let's go with raring. 15:32:14 RS are standardising on Debian moving forward 15:32:15 raring is fine for now 15:32:24 wheezy? 15:32:40 Bob, do you think, wheezy would be a good option? 15:32:48 not sure which one matel ... antonym did tell me, but I can't remember 15:32:50 BobBall: thats all I remember, debian 15:33:01 but debian vs ubuntu yes 15:33:05 but some folks want centos, but hey 15:33:07 I'm just afraid of being the only team on the edge. 15:33:20 yeah, lets just pick something that works 15:33:25 maybe it was sid - Rackspace like being on the edge ;) 15:33:28 if it falls over, we pick something else righ 15:33:44 The problem is the cost of these probes, John. 15:34:02 It's quite expensive, so thinking for a while is a good idea. 15:34:13 sure, but we know precise is broken, I would rather with pick an LTS, but whatever works for now 15:34:15 Can we just remove the XVA and run a dozen or so smokes overnight to see if raring works? 15:34:26 BobBall: +1 15:34:28 anyways 15:34:33 lets move on I think 15:34:51 not precise as precise is broken for us, seems OK for now 15:35:09 but lets leave that for now 15:35:16 we need the nodepool working first 15:35:30 lets get a failing test rather than no test 15:35:33 will add you to the reviewers. 15:35:39 cool, sounds good 15:35:55 any bugs that people want to talk about? 15:36:12 https://review.openstack.org/#/c/60808/ is always fun 15:36:20 got a very weird thing happening 15:36:24 but it's not the cause of the kernel bug 15:36:40 basically we get kernel messages 15:36:49 saying the device is in use by nova when we're trying to unplug it 15:36:53 and it leaks grant entries 15:37:04 which isn't a "problem" - but something that's very weird 15:37:16 the device _does_ unplug 15:37:27 because the next loop sees it as inactive then it's ok 15:37:33 but compounded it might cause a problem 15:37:39 after hundreds of the g.e.'s leak 15:37:53 so I was trying to fix it with sync's and direct access to disks 15:37:57 all of which should prevent it 15:37:59 but it's not :( 15:38:00 yeah, might run out of handles or something... 15:38:06 hmm 15:38:11 not handles - but Xen will get very unhappy 15:38:25 I assert the changes I've made are good changes and worthwile to have 15:38:29 which is why I haven't pulled them 15:38:39 but they haven't fully fixed the issue I'm seeing 15:38:58 and I can't explain why because everything is so disconnected it's impossible to trace back to the nova code that's causing this 15:39:12 AND it only happens in parallel tempest at random times too 15:39:23 yuck 15:39:46 But maybe it'll be fixed by upgrading to the latest version of Ubuntu 15:39:58 And if it's not, we can just wait for 14.04 :) 15:40:12 yeah, sounds nasty, would PVHVM be better? 15:40:36 can't run that in RS cloud 15:40:49 I assume you mean HVM rather than PVH 15:40:56 true, damm half working nested virt 15:41:16 yeah, HVM with PV drivers, but we can't do that either, I assume 15:41:33 PVH will be cool when it exists 15:41:38 +1 15:41:47 cool, so lets move on... 15:41:57 #topic Open Discussion 15:42:05 anything else for today's meeting? 15:42:32 Doc Bug Day 12/20 -- next Friday 15:42:34 follow the sun! 15:42:50 regarding direct IO for writing config drive you just commented John - I don't have a bug that I can say is fixed by this, which is why it doesn't have a link 15:43:00 ah, my last day at work, that sounds like a good time to help update docs, I will but that in my diary 15:43:05 the change is one that we should be doing, but the symptoms I saw weren't fixed by this 15:43:06 It would be great to clean up/consolidate the Xen doc bugs, I think they're mostly tagged accurately. https://bugs.launchpad.net/openstack-manuals/+bugs?field.tag=xen 15:43:21 johnthetubaguy: yeah the timing is pretty cool. My team is gonna put in a movie in the afternoon :) 15:44:26 #action johnthetubaguy sort out doc bugs on doc day 15:45:12 cool, so I guess we are all done? 15:48:02 #endmeeting