14:00:16 <PaulMurray> #startmeeting Nova Live Migration
14:00:17 <openstack> Meeting started Tue Jun 28 14:00:16 2016 UTC and is due to finish in 60 minutes.  The chair is PaulMurray. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:19 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:21 <openstack> The meeting name has been set to 'nova_live_migration'
14:00:25 <davidgiluk> o/
14:00:25 <andrearosa> ji
14:00:27 <diana_clarke> o/
14:00:28 <mdbooth> o/
14:00:29 <pkoniszewski> o/
14:00:29 <PaulMurray> Hi all
14:00:30 <andreas_s> o/
14:00:31 <luis5tb> o/
14:00:46 <mriedem> o/
14:00:52 <PaulMurray> agenda https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration
14:01:12 <PaulMurray> Did anyone notice that the agenda was up last friday ! ?
14:01:25 * kashyap didn't
14:01:26 * PaulMurray organized for once
14:01:26 <luis5tb> :D
14:02:02 <PaulMurray> #topic Non-priority features
14:02:04 <paul-carlton2> hi
14:02:16 <PaulMurray> June 30: non-priority feature freeze
14:02:21 <PaulMurray> two days away
14:02:37 <PaulMurray> so a quick check for anything that needs to be pushed along
14:02:38 <kashyap> Here's the URL, that PaulMurray has been shy to post here :-) - https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration
14:02:43 <paul-carlton2> https://review.openstack.org/#/c/308198
14:03:07 <paul-carlton2> and https://review.openstack.org/#/c/328280
14:03:17 <PaulMurray> kashyap, its up a few lines ^^^^
14:03:24 * PaulMurray not so shy
14:03:36 <kashyap> Oh, then I joined late :-) Sorry
14:03:39 <PaulMurray> immediately before you joined in fact
14:03:43 <PaulMurray> :)
14:04:12 <PaulMurray> I found Re-Proposes to check destination on migrations is complete
14:04:18 <PaulMurray> I haven't looked today
14:04:33 <PaulMurray> Automatic Live Migration Completion (luis5tb, pkoniszewski, paul-carlton)
14:04:40 <PaulMurray> how is this one going ? ^^
14:04:52 <pkoniszewski> well, it is in very good shape
14:04:55 <luis5tb> looking good
14:04:57 <pkoniszewski> i think everything is already accepted
14:04:58 <PaulMurray> looks nearly there
14:05:05 <PaulMurray> https://review.openstack.org/#/q/topic:bp/auto-live-migration-completion
14:05:17 <pkoniszewski> so we are waiting for the last +2/+W on libvirt cleanup
14:05:31 <mriedem> yeah https://review.openstack.org/#/c/327763/
14:05:33 <mdbooth> paul-carlton2: Incidentally, are you aware of this patch: https://review.openstack.org/#/c/270289/
14:05:39 <mriedem> danpb just pinged me to look at that one after the meeting
14:05:41 <mdbooth> paul-carlton2: Also looks fairly close to the line
14:05:45 <woodster_> o/
14:06:28 <PaulMurray> Live Migration of Rescued Instances (paul-carlton)
14:06:30 <paul-carlton2> mdbooth, hadn't seen that
14:06:39 <PaulMurray> https://review.openstack.org/#/q/topic:bp/live-migrate-rescued-instances
14:07:08 <PaulMurray> this looks to have more to go - paul-carlton2 ?
14:07:18 <mriedem> no tempest test for that? ^
14:07:45 <paul-carlton2> mriedem, I haven't done the tempest test for it but plan to, been delayed, my dad died two weeks ago
14:08:04 <mdbooth> mriedem: Incidentally, how can we add a tempest test for something which hasn't landed yet? What's the dance to avoid breaking the gate?
14:08:04 <PaulMurray> sorry to hear that paul-carlton2
14:08:20 <PaulMurray> depends-on
14:08:21 <diana_clarke> paul-carlton2: I'm so sorry, Paul. I'm learning tempest & happy to try a rescue test senerio.
14:08:24 <mriedem> paul-carlton2: sorry to hear that
14:08:29 <mriedem> mdbooth: yeah, depends-on
14:08:34 <mdbooth> PaulMurray: Thanks
14:08:47 <mriedem> mdbooth: e.g. https://review.openstack.org/#/c/327191/
14:09:07 <paul-carlton2> I think I know what is needed to do tempest test, just need to implement it
14:10:02 <PaulMurray> the you have a merge conflict and a couple of test failures - can you fix those ?
14:10:23 <paul-carlton2> yep
14:10:43 <PaulMurray> ok, next
14:10:45 <PaulMurray> Make checks before live-migration async (tdurakov)
14:10:55 <PaulMurray> https://review.openstack.org/#/q/topic:bp/live-migrate-rescued-instances
14:11:10 <PaulMurray> oops - that's wrong
14:11:23 <PaulMurray> https://review.openstack.org/#/q/topic:bp/async-live-migration-rest-check
14:11:30 <pkoniszewski> tdurakov said the he will skip live migration meeting today
14:11:59 <mriedem> i had asked for a tempest test on the api changes there also
14:12:13 <mriedem> the api change is the last thing in the series https://review.openstack.org/#/c/314932/
14:12:36 <mriedem> it also had other issues
14:12:58 <PaulMurray> that last one is his also
14:12:59 <PaulMurray> Remove compute-compute communication in live-migration (tdurakov)
14:13:09 <PaulMurray> https://review.openstack.org/#/q/topic:bp/remove-compute-compute-communication
14:13:24 <mriedem> that has 4 open changes, some have -1s
14:13:45 <PaulMurray> pkoniszewski, is tdurakov just missing the meeting - i.e. is he working on these ?
14:14:14 <pkoniszewski> he is just missing the meeting
14:14:17 <pkoniszewski> he keeps working on
14:14:23 <PaulMurray> ok
14:14:28 <pkoniszewski> that's what i understood
14:14:37 <PaulMurray> So a few of those are close
14:14:55 <PaulMurray> lets see if we can get them landed - call for help reviewing if needed
14:15:18 <PaulMurray> #topic CI
14:15:33 <PaulMurray> I think the lm job has moved to check queue ?
14:15:45 <pkoniszewski> yes
14:16:20 <mriedem> the results are way up and down
14:17:11 <davidgiluk> mriedem: you mean time wise or failures?
14:17:23 <mriedem> failures
14:17:27 <mriedem> very sporadic
14:17:34 <davidgiluk> got logs to look at?
14:18:36 <mriedem> http://tinyurl.com/zcceagy
14:19:40 <davidgiluk> logs of any of the failed ones?
14:20:14 <mriedem> davidgiluk: there are 2 bugs on http://status.openstack.org//elastic-recheck/index.html related to live migration failures
14:20:24 <mriedem> following the logstash links for those would get you some failure jobs and links ot logs
14:20:43 <mriedem> problem is those are also for the multinode job which is running on stable branches too
14:21:20 <PaulMurray> http://bugs.launchpad.net/bugs/1539271
14:21:20 <openstack> Launchpad bug 1539271 in OpenStack Compute (nova) "Libvirt live block migration migration stalls" [High,Confirmed]
14:21:37 <PaulMurray> http://bugs.launchpad.net/bugs/1524898
14:21:37 <openstack> Launchpad bug 1524898 in OpenStack Compute (nova) "Volume based live migration aborted unexpectedly" [High,Confirmed]
14:21:48 <PaulMurray> are those the ones you mean ?
14:22:39 <mriedem> yeah
14:22:53 <mriedem> but like i said, e-r is also tracking failures for those from stable branches, so it skews the results a bit
14:23:02 <PaulMurray> is there a way to tell which job they are coming from ?
14:23:04 <mriedem> you'd have to filter on the live migration job name and master branch only i'd think
14:23:08 <mriedem> build_name
14:23:13 <mriedem> on the left side, in logstash
14:23:40 <mriedem> we want to filter on build_name:"gate-tempest-dsvm-multinode-live-migration"
14:24:51 <PaulMurray> we can look later (I expect everyone is trying right now though)
14:25:47 <PaulMurray> They are the same ones that have been around a long time - it would be good to find a way into them
14:25:58 <mriedem> so what i'm seeing is,
14:26:08 <mriedem> it's this one http://status.openstack.org//elastic-recheck/index.html#1524898
14:26:36 <mriedem> 1539271 isn't showing up in the last 7 days on that job
14:27:08 <mriedem> so, we could try skipping the volume-backed live migration test for now and see if the job stabilizes
14:27:10 <kashyap> Is anybody able to reproduce that 'stalling' bug? 1539271
14:27:30 <mriedem> kashyap: 1539271 isn't the problem in the live migration job from what logstash is showing me
14:27:38 <PaulMurray> we tried here and couldn't reproduce
14:27:40 <mriedem> the failures are coming from bug 1524898
14:27:40 <openstack> bug 1524898 in OpenStack Compute (nova) "Volume based live migration aborted unexpectedly" [High,Confirmed] https://launchpad.net/bugs/1524898
14:27:45 <davidgiluk> mriedem: Is that still running the old qemu ?
14:27:48 * kashyap is reading that
14:28:20 <mriedem> davidgiluk: from one of the random failures http://logs.openstack.org/55/328055/12/check/gate-tempest-dsvm-multinode-live-migration/e4d241f/logs/subnode-2/
14:28:28 <kashyap> davidgiluk: Not modern enough for sure
14:28:44 <mriedem> qemu-kvm                              1:2.5+dfsg-5ubuntu10.2
14:28:57 <mriedem> libvirt-bin                           1.3.1
14:29:07 <mriedem> these are ubuntu 16.04 nodes
14:29:25 <mriedem> kashyap: to be fair, your answer is always that we aren't running against trunk libvirt/qemu :)
14:29:27 <kashyap> 1.2 QEMU was released in 2012 DEC
14:29:36 <mriedem> kashyap: this is qemu 2.5
14:29:36 <kashyap> mriedem: :-) I knew someone would say that
14:29:58 <davidgiluk> mriedem: OK, that's much better than it used to be
14:29:59 <kashyap> mriedem: Okay, "I'll shut up and look at logs"
14:30:31 <mriedem> so the volume backed live migration aborts
14:30:31 <mriedem> not sure why
14:30:48 <mriedem> the live migration job could be failing for other reasons we aren't tracking yet, e.g. i saw one fail this weekend which was due to ansible failing
14:30:51 <mriedem> but we didn't have the ansible logs from the node to see why (failed during inventory collection)
14:30:54 <davidgiluk> mriedem: Instead of holding the meeting up, can you pm and help me work through the logs?
14:31:03 <mriedem> i'm out some this afternoon
14:31:18 <mriedem> davidgiluk: danpb is probably best to ask questions about libvirt logs for a faliure
14:31:20 <mriedem> in the -nova channel
14:31:22 <davidgiluk> mriedem: ok, no problem; but I'd be happy to try and work through the logs with someone
14:31:25 <mriedem> s/afternoon/morning/
14:31:33 <mriedem> work with kashyap and danpb
14:31:49 <kashyap> davidgiluk: Yeah, happy to assist
14:32:08 <PaulMurray> is there anything else for CI ?
14:32:36 <PaulMurray> lets go on then
14:32:42 <PaulMurray> #topic Libvirt Storage Pools
14:33:07 <diana_clarke> No news from me, really. Image backend reviews are always welcome though. Otherwise, I'm looking into tempest coverage.
14:33:36 * mdbooth is currently churning quite a lot of stuff
14:33:48 <PaulMurray> mdbooth, diana_clarke I still want to dig into this - but later this week. (as I said last week)
14:33:57 <mdbooth> I now have a large and growing outstanding patch queue, which I intend to prepend to diana_clarke's patch queue
14:34:18 <mdbooth> It's long enough that I'll have to send out an explanatory covering email when it's done
14:34:55 <PaulMurray> thanks mdbooth
14:35:12 <PaulMurray> you thought we would not get there this cycle - is that still your opinion ?
14:35:27 <mdbooth> PaulMurray: Unfortunately yes
14:36:02 <PaulMurray> ok - I'll catch up later
14:36:11 <PaulMurray> moving on
14:36:26 <PaulMurray> #topic Use target VIF information on Migration
14:36:44 <PaulMurray> this topic is about awareness for something coming up
14:36:45 <PaulMurray> https://blueprints.launchpad.net/nova/+spec/migration-use-target-vif
14:37:10 <andreas_s> right, it requires some coordination between nova and neutron
14:37:51 <andreas_s> neutron work needs to be done first (new api), so what neutron is looking for is some statement from nova that this fits somehow in their plans
14:38:09 <andreas_s> on the other side nova of course wants to know the priorities on neutron side for this...
14:38:24 <PaulMurray> If anyone can't remember
14:38:41 <PaulMurray> the problem is to do with the port binding after live migration
14:38:49 <PaulMurray> taking too long and causing problems
14:38:59 <PaulMurray> so the plan is to bind ports in advance
14:39:10 <andreas_s> and happening too late! if it fails, the instance is stuck in error state on the target
14:39:36 <mriedem> so we're basically going to have nova ask neutron for a host capabilities during scheduling for live migration
14:40:00 <mriedem> which is not really the same as generic resource pools/providers
14:40:29 <mriedem> andreas_s: i know carl_baldwin is going to be at the nova midcycle, and armax might be too but i'm not sure about armax,
14:40:39 <mriedem> we're too late in newton to do anything with this in nova,
14:40:48 <carl_baldwin> Yes, I'll be there the first two days.
14:40:49 <mriedem> but we should talk to them about it there if the neutron team has thought about it
14:40:49 <andreas_s> mriedem, I know, that's not a problem
14:41:18 <andreas_s> could also be next release
14:41:28 <armax> mriedem: I am working it out, do you have an updated agenda?
14:41:37 <mriedem> andreas_s: so work on getting any design docs / background / issues put together before the meetup so we can discuss
14:41:47 <mriedem> armax: we have an etherpad with topics
14:42:02 * armax looks
14:42:08 <mriedem> https://etherpad.openstack.org/p/nova-newton-midcycle
14:42:19 <andreas_s> mriedem, ok, all stuff should be covered by the 2 specs (nova + neutron)
14:42:47 <PaulMurray> andreas_s, I would like to help get this lined up
14:43:00 <PaulMurray> so ask if you need anything
14:43:10 <mriedem> andreas_s: i've added an entry to the etherpad, if there are specs please link them in
14:43:17 <PaulMurray> what is the state of the neutron side at the moment ?
14:43:19 <andreas_s> PaulMurray, thanks!
14:43:24 <andreas_s> mriedem, I'll do!
14:43:29 <armax> mriedem: if I can come, I’ll be able to participate full time July 20, so it would be helpful perhaps for the Neutron topics to defer them until then
14:43:42 <andreas_s> PaulMurray, I'm currenlty working on a database rework which is a prereq
14:43:43 <armax> mriedem: but I’ll give you a more definite answer later this week
14:43:50 <mriedem> armax: sounds good
14:44:18 <andreas_s> to be honest, I don't think that the new api will make newton!
14:44:28 <andreas_s> so probably ocata
14:45:08 <mriedem> andreas_s: i'd avoid adding a new api to neutron until we have discussed it on the nova side
14:45:16 <mriedem> as in nova would actually use the new api and fits what we need as a client
14:45:27 <andreas_s> mriedem, exactly
14:46:01 <andreas_s> mriedem, but the database restructure was planned anyhow - that's what I'm currently working on
14:46:38 <PaulMurray> andreas_s, did you way you are coming to our mid cycle ?
14:46:51 <PaulMurray> s/way/say/
14:47:10 <andreas_s> PaulMurray: I don't think there's a way, as our nova guy is also not able to come...
14:47:29 <andreas_s> but I can try to ask my mgmt again with this item in mind
14:47:42 <mriedem> is markus_z coming?
14:47:56 <mriedem> i'm assuming that's your nova guy?
14:48:11 <andreas_s> mriedem, yes, not he just said that he will not come
14:48:24 <andreas_s> confusing sentence...
14:48:28 <andreas_s> he will not come
14:48:39 <mriedem> ok
14:49:10 <andreas_s> mriedem, PaulMurray: So I'll update the etherpad with the specs
14:49:17 <PaulMurray> yep, thanks
14:49:25 <andreas_s> try to get as much feedback as possible before the midcycle
14:49:39 <andreas_s> and will brief armax and carl_baldwin to discuss that
14:49:41 <PaulMurray> and ping me if you need help
14:50:03 <PaulMurray> ...next on the agenda
14:50:12 <andreas_s> In addition I'll send out a mail via ML in a minute or so
14:50:16 <andreas_s> thanks!
14:50:22 <PaulMurray> #topic reviews / open discussion
14:50:31 <PaulMurray> nothing on the agenda
14:50:39 <PaulMurray> does anyone have anything ?
14:50:43 <PaulMurray> for last few minutes
14:50:51 <PaulMurray> or we end
14:51:18 <PaulMurray> I'll take that as a 'no'
14:51:24 <PaulMurray> thanks for coming everyone
14:51:37 <PaulMurray> lets get those last patches in for FF
14:51:51 <PaulMurray> #endmeeting