15:01:11 #startmeeting XenAPI 15:01:12 Meeting started Wed Dec 4 15:01:11 2013 UTC and is due to finish in 60 minutes. The chair is johnthetubaguy. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:13 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:16 Hi everyone 15:01:17 The meeting name has been set to 'xenapi' 15:01:19 hi 15:01:24 hi 15:01:25 who is around for today's meeting? 15:01:34 hi 15:01:42 I'm here 15:01:47 cool 15:01:51 so lets get cracking 15:02:05 #link https://wiki.openstack.org/wiki/Meetings/XenAPI 15:02:17 #topic Blueprints 15:02:32 Icehouse-1 is closing, and Icehouse-2 is starting 15:02:42 anyone got any worries or plans with that? 15:03:00 any blueprints we need to get in during Icehouse-2? 15:03:01 nope - but there is a BP that thouveng will want to talk about for I-2 15:03:11 o/ 15:03:13 first - hi thouveng ! 15:03:24 hello everybody 15:03:30 hi BobBall 15:03:32 thouveng: hello! feel free to discuss your blueprint, and give a quick intro 15:03:44 now? 15:03:48 sure 15:03:53 ok 15:03:56 Just as a brief intro - thouveng is from bull.net who are interested in PCI pass through and potentially vGPU in future 15:04:19 So first here is the link: https://blueprints.launchpad.net/nova/+spec/pci-passthrough-xenapi 15:04:27 ah, cool 15:04:42 The goal of this bp is to add support for pci passthrough into xenapi driver 15:05:07 I see two tasks. First add the support for updating the status of the compute host 15:05:17 and second add the mechanism to attach a pci device to a VM when booting. 15:05:56 What do you mean by status of the host? The PCI devices that it has available? 15:06:02 sounds good, I have added them as work items in the blueprint 15:06:20 The pci devices that are available for the pci passthrough 15:06:52 So what's the procses for getting this prioritised / cores assigned johnthetubaguy ? 15:07:05 does the current structure in Nova look OK for wiring up for to XenAPI? 15:07:25 BobBall: just target it to a milestone, and it should get reviewed, I am kinda doing that as we type 15:07:44 johnthetubaguy: Yes it seems to have all needed wires 15:08:15 cool, I will add a note in the blueprint saying we expect no new configuration settings, does that seem fair? 15:09:22 Yes. For the configuration I have some doubts but I think that I only need to catch pci devices that are passed on the dom0 command line when booting the device. 15:09:38 So I think that we don't need extra configuration. 15:09:53 thouveng: how are you going to wire up with xenapi, does it need a plugin? 15:10:19 johnthetubaguy: Yes exactly. 15:10:22 thouveng: that sounds right to me, the filtering and grouping that is getting added in nova can be wired up once it droppes 15:10:49 thouveng: cool, it would be good to get these details in the blueprint, helps set expectations, and review the direction the implementation will take. 15:10:55 let me just add that in 15:10:56 I should be able to add the function into an existing plugin. So I just need to upgrade the plugin version. 15:11:36 BobBall: does this really need a plugin, are we missing some XenAPI functions for this stuff? 15:11:51 The issue is we need to list the PCI devices on the host 15:12:04 of course anything on the host can be exposed through XAPI - but XAPI doesn't do that ATM 15:12:17 right, I guess that was it, thats OK 15:12:23 so any use of a plugin could be considered "missing functionality" from XAPI 15:12:52 In particular thouveng is going to look at the boot command line and/or the output of lspci -Dv to see which modules are loaded for the devices 15:13:02 To me that feels too specific to be exposed through XAPI 15:13:05 and that's what plugins are for 15:13:14 yeah, probably 15:13:25 sounds good 15:13:28 thouveng isn't proposing a plugin that _modifies_ state from dom0 - just reads it 15:13:31 … final question 15:13:36 all the modification can be done through XAPI 15:14:09 agree 15:14:15 BobBall: cool, thats what I remember from looking that this, I have added it to the blueprint 15:14:19 so that question 15:14:33 do you think the code will be ready during Icehouse-2? 15:14:52 Sorry I don't remember the date for I2? 15:15:05 23rd Jan 15:15:22 year, so needs to be up for review early Jan 15:15:38 or 23th Jan if you want to be pedantic and copy the schedule 15:15:43 #link https://wiki.openstack.org/wiki/Icehouse_Release_Schedule 15:15:46 we can always move things if we have too, but sooner is better 15:16:02 Hey quick question, since this is piggy-backing libvirt support, does libvirt handle hot-plugging? 15:16:10 The first part about updating host status will be available. 15:16:27 OK, well thats enough to say I-2 for now, and see how it goes 15:16:46 leif: I kinda assume this is attach at boot, not hot-plug right now, but thouveng? 15:16:48 I hope that the part that attaches the pci device will be ready too but I didn't did that much on that part. 15:16:48 think so leif - but I might be missing something from the code. don't know to be honest. 15:17:11 johnthetubaguy: attach at boot yes. No hotplug. 15:17:22 I think its just at boot via the flavour right now, so thats all cool 15:17:40 johnthetubaguy: exactly 15:17:46 Yeah, no problem with level. Was asking since hotplug support on libvirt what do we return back? 15:18:00 "level" - only support at boot. 15:18:23 no idea leif 15:18:24 Wanted to know if this is an immediate error or post launch error. 15:18:34 well, should just be extra wiring up, its the reporting status I have more worries about 15:18:35 no problem, just asking at this poitn. 15:18:50 immediate error? 15:19:06 Actually looking at the code - I might be mistaken 15:19:09 https://review.openstack.org/#/c/39891/40/nova/virt/libvirt/driver.py 15:19:12 at least in that changeset 15:19:33 the "hotpluging" only happens during VM lifecycle operations (e.g. reboot, suspend, snapshot) 15:19:47 so might just be reqs from libvirt for a reboot to work etc 15:19:53 yeah, there is no "attach PCI device" api call I know about 15:20:00 but there could be I supose 15:20:19 anyways, lets not get sidetracked 15:20:25 blueprint approved 15:20:26 agreed. :-) 15:20:45 w00t. So - john - how do we get another core signed up? 15:20:49 :) 15:21:06 BobBall: you ask them, or you get me to ask them 15:21:29 but to be honest, just see who is interested and adds there name in the first instance 15:21:32 heh :) I thought this was now a managed process 15:21:54 well, its crowd sourced 15:22:02 Ah - if it's low priority then that's one thing... I assumed this would be a medium blueprint so needed two cores to sign up before approval or something 15:22:07 I see blueprint I want, so I sign up to do them 15:22:14 or if I see xenapi ones, I sign up 15:22:32 its low because there are not two cores signed up 15:22:42 it can be promoted if another core signs up 15:23:01 ok 15:23:01 its nice and loose 15:23:05 anyways 15:23:23 asesome stuff, I remember talking about this in San Diego, and never getting time to do it :D 15:23:37 lets move on.. 15:23:46 #topic Docs 15:23:58 anyone got any doc things? 15:24:09 no doc fun this week 15:26:11 john? 15:26:55 oh 15:26:59 yeah, sorry 15:26:59 network issues? 15:27:08 #topic Bugs & QA 15:27:14 how is the tempest work going 15:27:21 * johnthetubaguy looks at matel 15:27:25 matel first? 15:27:32 Okay. 15:27:57 So I am eorking on scripts to create a cloud- ready xenserver 15:28:13 #link https://github.com/citrix-openstack/xenapi-in-the-cloud 15:28:36 is that "working" now? 15:29:01 it's working, but I am speeding it up - so that we have an image, that could just be launched. 15:29:07 … and have we done a tempest run in that cloud based XenServer yet? 15:29:11 Thanks to antonym - tap devices were fixed :) 15:29:15 Yes, we did smoke. 15:29:24 how "fast" was smoke? 15:29:28 and did it work? 15:29:39 Ran 223 tests in 624.476s 15:29:59 tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern[compute,image,volume] 164.053 15:29:59 tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern[compute,image,network] 98.140 15:29:59 tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario[compute,image,network,volume] 91.124 15:29:59 tempest.api.compute.servers.test_server_rescue.ServerRescueTestJSON.test_rescue_unrescue_instance[gate,smoke] 38.828 15:30:04 tempest.scenario.test_server_basic_ops.TestServerBasicOps.test_server_basicops[compute,network] 36.813 15:30:07 tempest.thirdparty.boto.test_ec2_volumes.EC2VolumesTest.test_create_volume_from_snapshot[gate,smoke] 33.709 15:30:10 tempest.api.compute.v3.servers.test_server_actions.ServerActionsV3TestXML.test_rebuild_server[gate,smoke] 31.742 15:30:13 tempest.api.compute.servers.test_server_rescue.ServerRescueTestXML.test_rescue_unrescue_instance[gate,smoke] 23.964 15:30:14 hmm, OK, that could be worse 15:30:16 tempest.scenario.test_dashboard_basic_ops.TestDashboardBasicOps.test_basic_scenario[dashboard] 23.038 15:30:19 tempest.api.compute.v3.servers.test_server_actions.ServerActionsV3TestJSON.test_rebuild_server[gate,smoke] 18.980 15:30:22 Some really slow ones. 15:30:24 So that's smoke. 15:30:30 No full runs yet 15:30:39 any errors with smoke? 15:30:42 Bob has fixes to make the test runs more stable. 15:30:42 I guess not? 15:30:49 They were a bit unstable. 15:30:53 hmm, OK 15:30:58 I'll pass it to Bob. 15:31:02 so can we wire this up into Zuul now? 15:31:08 As he found the reason. 15:31:28 I've found some really fun errors in full tempest... 15:31:30 I would need someone, who would help wire it up to that black box. 15:31:51 Ranging from lack of memory leading to compute memory fragmentation leading to VBD plug failures which then clearly fails the tempest test 15:31:52 OK, so have you seen the docs and other changes where people added tests? I can help with that 15:31:57 I would say, we are getting closer to wire it up, but it's measured in weeks... 15:32:14 why weeks? 15:32:37 Because there is work to be done, and work needs time. 15:32:54 Lots of unknowns still in exactly how to integrate with the existing zuul stuff 15:33:02 agreed, but digging into that, can I help make that go faster, or is it not that kind of work? 15:33:07 like exactly where the split between creating a VM and putting it in the nodepool should be etc 15:33:11 Sure, you can make it faster. 15:33:26 I would need to have a description of the zuul entry points. 15:33:42 AFAIK, it's "PREPARE IMAGE" and "RUN DEVSTACK" 15:33:50 Yup, OK, so I will have a chat with people, but I can come and sit next you tomorrow and help move it forward? 15:34:05 And we would need to find out how to "customise" the localrc - so that it knows about XenServer. 15:34:24 yup, that all makes sense 15:34:29 Yes, so I am working on a script, that is producing an image. 15:34:38 that sounds good 15:34:45 I am happy to look into Zuul bits if that helps 15:34:46 which is really the first step - I am close (this week) 15:34:55 anyways, we are making good progress 15:34:55 Okay, stay in touch 15:34:58 lets move on 15:35:04 http://ci.openstack.org/ 15:35:06 BTW 15:35:18 those docs are rubbish 15:35:24 prehaps 15:35:25 we were looking at the yesterday :) 15:35:37 Yes, I am so upset about the lack of documentation. 15:35:42 OK, any more on QA 15:35:42 I know more of the overview than is in the docs just from playing 15:35:48 Well we can talk about my tempest fun 15:35:52 sure 15:35:59 you got patches and bugs up? 15:36:00 So if john, you could put together a nice diagram, and put it on the docs page, that would help. 15:36:04 As I said - if you give domU too little memory, all sorts of things can go wrong 15:36:12 matel: my plan is to read the code 15:36:14 I've got a patch up for devstack - but you'll need to be aware of it in production 15:36:26 Sure, and create a diagram as well. 15:36:29 BobBall: yes, well, we try not to run out of memory I guess 15:36:29 if your compute VM is swapping then you might be hitting an issue where VBD plug fails 15:36:53 we don't run rabbit in our compute domUs, oddly 15:36:55 it's not running out of memory that's the problem - it's memory fragmentation that prevents you from allocating a 128k block - which is quite large in memory terms 15:37:01 but its a good catch though 15:37:13 so if you have a long-running and busy compute VM with only just enough memory, you _will_ hit this sooner or later 15:37:42 I suspect all three of those will apply at Rackspace since I know you squeeze compute's memory to get more VMs on a box 15:38:03 yeah, not sure about the headroom though, needs a good luck 15:38:12 whats the fix? 15:38:16 more memory 15:38:23 lol, OK 15:38:23 or reboot compute to stop fragmentation 15:38:33 yeah, thats possible 15:38:46 but who knows how often you have to reboot 15:38:57 parallel tempest was hitting it within 60 minutes 15:38:58 never under normal operation, but hey 15:39:27 Anyway - next fun - we already synchronize on VBD.plug, but we have to synchronise on VBD.unplug too (both at the same time) otherwise we get the same random racy failures 15:39:36 #link https://review.openstack.org/#/c/59856/ 15:39:47 Ah, yes, I did ask about that at the time, I guessed we might need that 15:40:35 and there are some weird things I'm looking at ATM but I don't have a solution for 15:40:38 cool, any more? 15:40:40 timeouts with volumes 15:40:52 the really annoying thing is that full tempest does pass 15:40:59 oh, is that the old kernel issue with iSCSI on the same box again? 15:40:59 so it's timeouts / races that we're fighting against 15:41:03 no 15:41:06 Mate fixed that 15:41:16 but it could be related? 15:41:31 I doubt it at this point 15:41:39 you get deadlocks in tap disk, is that related? 15:41:41 the iscsi issue is definitely mitigated 15:41:55 there may be other deadlocks - but not that one 15:42:08 that one is mitigated by adding a new memcopy in the process 15:42:24 if we're still seeing deadlocks then it's a different issue 15:42:29 Let me dig up the change... 15:42:34 OK, no worries 15:42:43 just checking the things that came to mind 15:42:48 (the memcopy isn't actually real - it's about changing the IO mode used by the SR) 15:43:05 #link https://github.com/citrix-openstack/qa/blob/master/install-devstack-xen.sh#L314 15:43:49 cool, making progress though, which is good to see 15:43:57 so, in other news... 15:44:17 I plan to use all your good work, and copy and paste it to run cloud cafe tests, assuming we keep using those 15:44:27 but thats just for context 15:44:42 so... 15:44:48 #topic Open Discussion 15:44:56 any more for any more? 15:45:19 can't for the life of me think what it was 15:45:22 but I was going to say something 15:46:08 I'm okay, I just need zuul- capable people. 15:46:11 to talk to. 15:46:24 Just ask on -infra 15:46:30 they usually answer 15:46:36 Oh, yes. 15:46:52 and they only charge £5 per question 15:46:54 quite reasonable 15:46:57 yeah, so I am happy to take on the zuul integration with you if that helps? 15:47:37 Let's find entry points, that's good enough. 15:47:38 while you make the stand-a-lone script stable, and Bob works on getting tempest running stabley 15:47:46 yup, I am happy to work on that 15:47:49 johnthetubaguy: other question - while we're here - guy in #openstack-dev asking a question I don't know the answer to... 15:47:58 okay, sounds like a plan. 15:48:02 hi any idea why two rules are used instead of using single rule line number 137,147 https://github.com/openstack/nova/blob/master/plugins/xenserver/networking/etc/xensource/scripts/ovs_configure_vif_flows.py 15:48:17 thought you might know :) 15:48:24 nope sorry, I am the wrong guy to ask about that stuff 15:48:36 I guessed it was different implementations of ARP - some using the host IP and others using 0.0.0.0 as the response 15:48:40 ah well 15:49:07 yeah, its deep in the niggles of network land, I know the basics not the details there :( 15:49:28 indeed 15:49:37 cool, are we all done? 15:49:48 a good meeting today I feel, thank you all for your hard work! 15:49:52 I know a fair bit, particularly around that tenant isolation, but that one had me stumped... 15:50:04 heh - you're welcome? 15:50:34 ...I was meeting more generally, not the hard work of the meeting 15:50:36 anyways 15:50:39 any more questions? 15:51:02 yeah 15:51:20 what's the deal with this furby boom thing? I hear it's a popular xmas toy but I just don't get it 15:51:23 maybe I'm too old... 15:51:31 hmmm - do you think that question is off-topic? 15:52:17 maybe 15:52:26 I never get these crazes 15:52:33 they always seem to pass me by 15:52:38 but maybe I am just tight 15:52:41 anyways... 15:52:45 #endmeeting