17:01:12 #startmeeting XenAPI 17:01:13 Meeting started Wed Mar 6 17:01:12 2013 UTC. The chair is johnthetubaguy. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:01:14 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:01:16 The meeting name has been set to 'xenapi' 17:01:22 Yay :) 17:01:34 hi everyone 17:02:02 Morning John 17:02:05 or afternoon 17:02:09 depending on where everyone is 17:02:25 hi 17:02:29 #topic actions from last meeting 17:02:38 So I had a few actions 17:03:04 #action johnthetubaguy needs to do the actions from last week 17:03:14 haha 17:03:16 nice action 17:03:29 been stuck with XCP on CentOS, so not really started on the CentOS install docs 17:03:32 hey ho 17:03:34 I guess it was a busy week. 17:03:53 * johnthetubaguy bangs head against wall 17:03:59 anyways... 17:04:07 What's the progress on XCP on CentOS? 17:04:16 #topic blueprints 17:04:38 we got a meeting on that Tueday 17.00 UTC in #centos-devel 17:04:59 summary: broken with odd permissions errors, no one really can tell why 17:05:05 sounds fun 17:05:14 joyus 17:05:22 so… blueprints? 17:05:36 just a call to look at the etherpad for the summit and add things 17:05:41 oh 17:05:43 we added the odd bit last time 17:05:48 I forgot to add stuff 17:05:52 lemme have a quick look 17:06:05 #link https://etherpad.openstack.org/HavanaXenAPIRoadmap 17:06:11 yeah 17:06:14 just got it from the web page 17:06:17 my bad, sorry 17:06:27 np 17:06:34 okay 17:06:46 there is one key thing that isn't there 17:06:55 or if it is then I can't see... 17:07:01 is quantum support for XS 17:07:06 didn't make grizzly-3 17:07:13 its a nova session though 17:07:26 so it should definitely be on the roadmap even if we don't need a new blueprint 17:07:28 ahhhhh 17:07:33 it was covered in the last summit 17:07:33 didn't spot that from etherpad! 17:07:43 implementation agreed, code pushed for review 17:07:48 *nod* 17:07:53 but only one quatum core ever reviewed 17:08:03 just to document / share 17:08:13 Tis a shame it didn't make the grizzly cut! 17:08:41 indeed, we half planned a backport to folsom 17:09:06 but no one will review it, but hopefully that will change now 17:09:19 so anything else? 17:09:25 blueprint wise 17:09:53 not from my end 17:10:01 #topic docs 17:10:04 any news? 17:10:09 I didn't do my things 17:10:21 there was a note on the mailing list about live-migraiton docs 17:10:28 but I guess there will be an AOB for adding new features to the roadmap? 17:10:32 they look fairly poor, might need to expand them 17:10:50 sure, we can do 17:10:57 we've got a plan to look at some docs - Mate has an action this week to go through and look at what's there 17:11:03 cool 17:11:20 #action matelakat to look at state of XenAPI docs and report back next week 17:11:23 there we go :D 17:11:29 I was going to say can you add an #action 17:11:41 catch me on IRC if there are questions 17:11:51 #link https://github.com/citrix-openstack/bugstat/blob/master/bugreport/main_report.md#openstack-manuals----20 17:12:13 cool 17:12:36 A lot to do... 17:13:06 some of those don't affect XenAPI 17:13:23 it's just a dumb search 17:13:31 sure, no worries 17:13:38 I will look at them, and put em to categories. 17:13:42 e.g. https://bugs.launchpad.net/openstack-manuals/+bug/1095095 17:13:43 Launchpad bug 1095095 in openstack-manuals "Configuring for resize with KVM" [Medium,Confirmed] 17:13:59 just says KVM docs aren't as good as XenServer for resize 17:14:19 So it includes the string "XenServer" 17:14:22 indeed 17:14:35 but only in the context of "XenServer doesn't have this bug" :) 17:15:07 its probably worth manually added xenserver tags, and having a tag only search 17:15:22 anyways, lets move on 17:15:36 any more for any more? 17:15:38 I'd like to keep the dumb search, but use the tagged search to say "this has been triaged by someone who knows it's a XS bug" 17:15:47 +1 17:16:07 that is what I meant 17:16:34 ah right 17:16:50 #topic QA and Bugs 17:16:59 anything major worrying people? 17:17:11 matelakat have you got the link to your bug finder? 17:17:26 I guess one thing that surprised me is that devstack multihost doesn't seem to be tested by anyone else 17:17:49 #link https://github.com/citrix-openstack/bugstat 17:18:01 we'd like to get some eyes on this: https://review.openstack.org/#/c/23662/ 17:18:41 *has a butchers* 17:20:32 That one's an interesting issue! 17:20:42 and a little painful :) 17:21:13 I'm going to try it out 17:21:23 So has the whole SR gone away? 17:21:34 hopefully with the patch, yes 17:21:49 ahhh - this an iSCSI SR? 17:21:52 yes 17:21:53 I guess the point is, if your iSCSI target dies, then VM will not start 17:22:00 johnthetubaguy: exactly 17:22:04 yup - not surprising. 17:22:05 okay 17:22:05 got it 17:22:13 I was getting a little confused 17:22:45 I'm not sure what happens in the other HVs case 17:22:59 but I also don't have to worry about that case 17:23:07 *grin* 17:23:34 I'll have to have a think about this one 17:23:56 yeah, sounds like an excessive timeout, would be nice to be able to specify that in the check call 17:24:12 s1rp and I talked about it quite a bit, and this seems to be the best we could come up with on short notice 17:24:18 mad props to him for making it work 17:24:23 you mean the XAPI timeout? 17:24:55 erm, timeout in the xapi operation 17:25:06 I guess this is currently a critical issue for you guys? 17:25:12 well, it's an ugly one 17:25:20 requires ops to go in and nuke the SR 17:25:28 it doesn't happen often 17:25:35 maybe some kind of health check would be better, with tunable timeout 17:25:59 it should only happen if something happens to the network or we lose a storage node 17:26:01 I like scan SR because it should be quick in the working cases 17:26:01 so XS doesn't timeout the SR? 17:26:36 oh, I see you only call that in error cases 17:26:55 johnthetubaguy: yeah, we don't do anything unless it doesn't boot 17:27:14 makes sense 17:27:31 how long ago had the SR gone away? 17:27:34 or had it only just gone? 17:27:55 shame we can't have a non destructive error case again, do we need to tell cinder we detached the volume? 17:27:57 the sr is still there, it just can't make the iscsi connection 17:28:10 johnthetubaguy: compute manager does that 17:28:14 Also, could you just post the XS error log to the bug so that we've got a traceback 17:28:28 the fun part was propagating the bad devices back up to compute 17:28:38 *grin* that does look fun 17:28:43 ah, got ya, didn't get there yet 17:29:11 BobBall: I'll try to remember to paste a stack 17:29:28 this _handle_bad_volumes_detached case? 17:29:53 well, I'll grab the xen log from the failed boot 17:30:04 that's perfect 17:30:36 pull out the network cable between your iscsi target an hypervisor, it should repo OK 17:30:41 * BobBall is impressed with this one 17:30:46 I like that bug 17:30:48 glad you like it 17:30:56 is there a Bug Of The Month award? 17:31:01 we were hoping XS would boot without all the volumes, but alas 17:31:08 yup 17:31:21 well we might also be able to patch ISCSISR.py to do something 17:31:22 not sure 17:31:24 good old xapi trying to protect us from doing bad things again 17:31:26 depends how the SR is failing 17:31:35 unlikely to be XAPI 17:31:52 oh, OK 17:32:02 is it the vm start that fails? I'm almost surprised the shutdown works ok if the SR is timing out 17:32:17 it probably got shutdown before that right? 17:32:22 I'm not sure 17:32:28 or this is the first start? 17:32:30 it's a reboot, so it wasn't really shut down 17:32:35 ah 17:33:00 * johnthetubaguy remembers bug report…drrr 17:33:08 ok well might I suggest that John, you and I take an action to look at it? 17:33:26 I see you've already added yourself! 17:33:30 hah :) 17:33:43 make sure the SR is behaving correctly, for the "graceful" fix 17:34:09 guitarzan, do you happen to know if this is a soft or hard reboot? 17:34:49 #action johnthetubaguy guitarzan to look into broken SR issues https://review.openstack.org/#/c/23662 17:35:19 BobBall: not sure 17:35:24 I'll try both 17:35:31 maybe hard because soft failed... 17:35:33 cool 17:35:39 probably both tbh 17:35:47 that's my guess 17:35:58 cool 17:36:02 *not sure if XAPI handles the SRs differently for the two cases* 17:36:05 Anyway - let's move on :) 17:36:08 indeed 17:36:13 any more bugs? 17:36:33 me guessing that is a no... 17:36:54 #topic Open Discussion 17:37:07 so, bobball has a few things? 17:37:22 ohai guys... 17:37:34 hey 17:37:34 we were just talking about you 17:37:37 yeah the clean_reboot operation hangs for 120 secs... 17:37:50 luckily a subsequent SR.scan seems to be quick-ish 17:37:56 ahhhhh 17:37:59 only the first-one after unplugging seems to be slow 17:38:11 it's like it stores some data somewhere marking it as failed (?) 17:38:12 BobBall: I haven't tried that iscsi patch you sent me yet 17:38:18 that figures, cool 17:38:57 s1rp, I thought it was the SR scan that waited 120 seconds 17:39:14 failing fast in clean_reboot is likely to be a XAPI thing waiting for the SR to respond to it's attach request 17:39:15 that too... lemme clarify 17:39:32 sorry john - we're derailing the agenda :D 17:39:38 so if you do an sr-scan w/o a reboot, then that call will take 120 secs (this is what i was doing on the comand line to troubleshoot this) 17:40:03 its OK, its important 17:40:07 but, and i'm not 100% sure on this, but if you do a clean_reboot, that will cause an underlying timeout, but i *think* the next SR.scan will actually fail-fast 17:40:08 ah - but failed reboot followed by sr-scan to find the failing device is fast 17:40:27 BobBall: yeah, need to triple check that case, but i believe so 17:40:28 unfortunately that might mean the timeout is in iscsiadm ? 17:40:37 ... or fortunately :) 17:40:43 that might be easy to fix 17:40:50 right, hack the RD 17:40:54 lol SR 17:41:01 or just an other-config 17:41:07 I think we can pass some iscsiadm flags through 17:41:09 even better 17:41:18 not 100% on that though. Maybe only 73% sure. 17:41:54 btw john, my stuff on libvirt can wait until next week if we have other things to get through :) 17:42:14 BobBall: thanks 17:43:19 s1rp, Was saying to guitarzan that we'd like some of the XS logs in the bug report just for tracability if that's ok 17:44:35 BobBall: cool, we can get those over to you; luckily this is very easy to replicate! 17:44:55 sounds good, any more on that one? 17:45:26 no :) Let's leave that one for now 17:45:27 can always take it to the ML 17:45:44 cool, bobball summit stuff you wanted to mention? 17:46:18 Uhhhh maybe? I don't remember which summit stuff you're referring to? 17:47:12 ok, missunderstood 17:47:24 put stuff on the etherpad to help discuss at the summit 17:47:47 Sorry - I could have been clearer! :) 17:47:51 assuming that session goes ahead, if there is loads, might ask for extra sessions 17:48:04 Summit stuff then - looking forward to it. matelakat and I have booked our flights so we'll see you there 17:48:24 sounds like a Xen on libvirt vs XenAPI disucssion might be good, as long as it says sensible and not too religious 17:48:50 I was thinking in the summit 17:48:57 but bob you wanted to bring that up this weel? 17:49:00 week? 17:49:19 Well what I'd like to understand is what the primary value that the XenAPI integration is using from XAPI that can't be provided by libvirt 17:50:22 not sure if we've got enough time to explore that question properly today 17:50:24 which is fine :) 17:50:58 we do alot of weird stuff w/ dom0 plugins, but that probably could be handled with proper hooks in the libvirt layer 17:51:37 s1rp: we can't use both today, we would freak out xapi 17:51:44 but yes, I get your point 17:52:13 I think the question is, should we evolve XAPI/XCP or should we evlove libvirt 17:52:30 and how much effort is each approach, at this point 17:52:50 I guess what is missing between libvirt+Xen vs xapi+Xen in openstack today 17:53:19 Well the question is more that there are lots of things that are getting first-dibs in libvirt and whether a libvirt-on-xen/xapi hybrid approach would bring us much and what level of pain it would be for XAPI to tolerate such a hybrid approach 17:53:22 I get the idea that gap could be quite small, but I never got libvirt+Xen working that well, but didn't try very hard 17:53:47 hmm, maybe 17:54:02 but that is like how many years out? 17:54:29 It's not this week, that's true 17:54:30 well, maybe that fits into evolving XAPI actually... 17:54:49 I wondered about using xenopsd instead 17:55:26 thats the thing under xapi 17:56:09 anyways, we can take this offline 17:56:13 anything else? 17:57:22 cool 17:57:29 #endmeeting