14:01:02 <PaulMurray> #startmeeting Nova Live Migration
14:01:05 <openstack> Meeting started Tue May 17 14:01:02 2016 UTC and is due to finish in 60 minutes.  The chair is PaulMurray. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:09 <openstack> The meeting name has been set to 'nova_live_migration'
14:01:15 <diana_clarke> o/
14:01:17 <davidgiluk> o/
14:01:22 <luis_> o/
14:01:28 <PaulMurray> Hi all, just do a ping on other channel
14:01:30 <abhishekk> \o
14:01:31 * johnthetubaguy lurks with mild intent
14:01:37 <andreas_s> hi
14:01:43 <paul-carlton1> hi
14:01:44 <mdbooth> o/
14:01:48 <ametts> o/
14:01:58 <pkoniszewski> o/
14:02:06 <PaulMurray> I normally do that in advance but got a little delayed
14:02:28 <PaulMurray> agenda: https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration
14:02:46 <andrearosa> o/
14:03:00 * kashyap waves
14:03:12 <PaulMurray> #topic CI
14:03:24 <PaulMurray> jumping straight in
14:03:30 <mriedem> o/
14:03:35 <PaulMurray> we had a couple of actions from last week
14:03:56 <PaulMurray> tdurakov, did you get anything from markus or tony ?
14:04:06 <PaulMurray> the action was: tdurakov to follow up with markus_z/tonyb about the trunk libvirt/qemu repo
14:04:41 <PaulMurray> don't see a wave from tdurakov - maybe he will turn up in a minute
14:04:47 <mriedem> tdurakov was also looking at using xenial nodes
14:05:00 <mriedem> i think he had a patch up actually
14:05:05 * mriedem looks
14:05:27 <mriedem> https://review.openstack.org/#/c/314636/
14:05:55 <mriedem> according to that, gate-tempest-dsvm-multinode-live-migration should be running on xenial nodes now
14:05:56 <PaulMurray> ah good, do you know if its made any difference ?
14:06:12 <mriedem> i haven't seen that switch yet, but i know where to look
14:06:34 <kashyap> (For those not familiar, "xenial" == Ubuntu 16.04 Long Term Release (LTS))
14:06:40 <mriedem> http://logs.openstack.org/03/315703/1/experimental/gate-tempest-dsvm-multinode-live-migration/1288a98/
14:06:58 <mriedem> http://logs.openstack.org/03/315703/1/experimental/gate-tempest-dsvm-multinode-live-migration/1288a98/logs/dpkg-l.txt.gz
14:07:06 <mriedem> libvirt 1.3.1
14:07:13 <mriedem> qemu 2.5
14:07:31 <PaulMurray> that's more recent than we need, so good
14:07:34 <mdbooth> kashyap: Thanks, I was assuming it was hypervisor related
14:07:35 <mriedem> and considering https://review.openstack.org/#/c/315703/ no it doesn't appear to have helped with stability
14:07:48 <kashyap> mdbooth: Me too.  I thought it was a play on 'Xen'
14:08:13 <mriedem> in that change, the job has failed 3 times
14:08:20 <mriedem> before it ever gets to setup ceph
14:08:21 <andrearosa> :(
14:08:29 <mriedem> so the default config is failing testes
14:08:31 <mriedem> *tests
14:08:56 <mriedem> http://logs.openstack.org/03/315703/1/experimental/gate-tempest-dsvm-multinode-live-migration/1288a98/logs/screen-n-cpu.txt.gz?level=TRACE#_2016-05-13_21_32_30_216
14:08:58 <mdbooth> mriedem: Have you looked at the failures, btw?
14:09:03 <mriedem> so
14:09:03 <mriedem> Unable to find CPU definition: gate64
14:09:10 <mdbooth> mriedem: Right, thanks
14:09:10 <mriedem> it looks like the job itself is just busted
14:09:30 <mriedem> there was a lot of tinkering for cpu models for the multinode live migration job on trusty,
14:09:49 <mriedem> switching to xenial didn't magically work so the cpu models will have to be tinkered with again it looks like
14:10:01 <kashyap> mriedem: I recall clarkb playing with CPU models in Gate Infra
14:10:06 <mriedem> yeah
14:10:54 <PaulMurray> ok, so something to get into
14:11:02 <mdbooth> mriedem: I take it the infrastructure is genuinely heterogeneous?
14:11:20 <kashyap> Specifcally this: "Update libvirt cpu map before starting nova " -- https://review.openstack.org/#/c/168407/
14:11:23 <mriedem> mdbooth: yes
14:11:38 <mdbooth> mriedem: Ok. Maybe we could just hardcode some lcd?
14:11:55 <PaulMurray> what's lcd ?
14:12:05 <mdbooth> lowest common denominator
14:12:18 <PaulMurray> :) its always the easy ones
14:12:20 <mriedem> i thought that's what they were doing
14:12:21 <mdbooth> It's not as if the cpu is really that important to us
14:12:26 <kashyap> mdbooth: It was hard-coded, if you're referring to CPU features:  https://review.openstack.org/#/c/168407/9/tools/cpu_map_update.py
14:12:38 <mriedem> probably an action item for someone to talk to clarkb in infra after the meeting
14:12:49 <mdbooth> k
14:13:12 <kashyap> mriedem: I can follow up
14:13:27 <PaulMurray> I'll make it an action
14:14:00 <kashyap> (Or if someone with better Infra access than me wants to take it up, that's fine too.)
14:14:07 <PaulMurray> #action kashyap to follow up with clarkb about experimental job running xenial fix (presumed cpu model)
14:14:39 <PaulMurray> its a good start anyway
14:14:44 <mriedem> https://review.openstack.org/#/c/168407/9/tools/cpu_map_update.py doesn't appear to bomb out at all'
14:14:54 <mriedem> i.e. if it can't find what it's looking for i don't see it fail
14:15:59 <mriedem> http://logs.openstack.org/03/315703/1/experimental/gate-tempest-dsvm-multinode-live-migration/1288a98/logs/libvirt/cpu_map.xml
14:16:15 <mriedem> <model name="gate64">
14:16:16 <mriedem> is in there
14:17:02 <kashyap> Hmm, but somehow the CPU def. is lost / not recognized.
14:17:26 <mriedem> it's also in the subnode http://logs.openstack.org/03/315703/1/experimental/gate-tempest-dsvm-multinode-live-migration/1288a98/logs/subnode-2/libvirt/cpu_map.xml
14:18:21 <PaulMurray> lets not debug this here, we can pick it up outside the meeting
14:18:37 <kashyap> Yeah.
14:18:40 <PaulMurray> The next action was on mriedem
14:18:43 <PaulMurray> mriedem to hack up devstack-gate changes to test lvm and raw image backends
14:18:51 <PaulMurray> I saw a patch
14:19:00 <mriedem> https://review.openstack.org/#/c/316298/
14:19:05 <mriedem> ^ is the lvm experimental queue job
14:19:13 <mriedem> there is a dependency on a devstack change and a nova change
14:19:54 <PaulMurray> this one isn't merged: https://review.openstack.org/#/c/316295/
14:20:04 <mriedem> yeah i need to update it quick
14:20:07 <mriedem> easy peasy
14:20:09 <mriedem> should be merged today
14:20:22 <mriedem> then just need another +2 on the infra change, which i can wrastle up in -infra later
14:21:09 <PaulMurray> is it this: https://review.openstack.org/#/c/215929/
14:21:16 <PaulMurray> that's merged ?
14:21:19 <mriedem> no
14:21:48 <PaulMurray> oh, you mean your one
14:21:57 <mriedem> yeah the nova change for the blacklist rc file
14:22:04 <mriedem> the file has a .txt extension which i need to drop
14:22:09 <mriedem> oomichi: found that
14:22:12 <mriedem> quick fix
14:22:23 <mriedem> anyway, should have this all merged today so we can use the job
14:22:36 <mriedem> note it's on the experimental queue, so to run it you have to comment with 'check experimental' on your patch
14:22:39 <mriedem> mdbooth: ^
14:23:06 <mdbooth> mriedem: Cool, thanks. We're not quite there yet, though. diana_clarke will update.
14:23:15 <mriedem> as for a raw job,
14:23:19 <mriedem> i didn't get that far
14:23:32 <mriedem> last week we talked about maybe switching one of the existing qcow2 jobs to use raw
14:23:49 <mriedem> since everything except ceph and this new lvm job is using qcow2
14:24:10 <diana_clarke> mriedem: yes, thanks!
14:24:20 <mriedem> i could test that out with a devstack-gate patch too, i think it's just a matter of setting use_cow_images=False right mdbooth?
14:24:44 <mdbooth> mriedem: Assuming images_type is defaulted, then yes.
14:26:36 <PaulMurray> Do you know which job to use?
14:26:50 <mriedem> no
14:26:59 <mriedem> was thinking maybe the postgresql job
14:27:13 <mriedem> that's already an odd duck of a job anyway
14:27:38 <PaulMurray> :)
14:28:04 <PaulMurray> you still happy to do it
14:28:16 <mriedem> i can push the d-g patch yeah
14:28:18 <mriedem> and see how it goes
14:28:24 <mriedem> lots of meetings today though
14:28:47 <PaulMurray> #action mriedem to change an existing job to use raw instead of qcow2
14:28:54 <PaulMurray> that's life at the top
14:29:17 <PaulMurray> Anything else needed for CI today ?
14:29:38 <PaulMurray> ok - moving on
14:29:43 <PaulMurray> #topic Libvirt Storage Pools
14:29:51 <PaulMurray> any update mdbooth diana_clarke
14:30:10 <diana_clarke> I've added the new methods (create_from_image & create_from_func) to the following backends: Ploop, Rbd, Flat (aka Raw, aka NoBacking). I'll toss them up for review later today
14:30:39 <PaulMurray> thanks
14:30:44 <diana_clarke> I believe Matt is auditing the BDM object usage in libvirt/driver.py in preparation for adding the driver_info to the BDM object. Please correct me if I'm wrong, Matt.
14:31:21 <mdbooth> Yup. It's not quite as clear cut as I'd hoped. I need to draw some pictures to work out how BDM info gets from compute/manager into the driver
14:31:52 <mdbooth> If anybody here is intimately familiar with that process and has some time I'd love to pick your brains, btw
14:32:18 <PaulMurray> mdbooth, there's a wiki or doc page somewhere for bdm - if you discover something not on there it would be good to update it
14:32:41 <PaulMurray> mdbooth, or tell me and I'll update it
14:32:51 <mriedem> mdbooth: i know some of it
14:32:59 <paul-carlton1> My specs for storage pools and libvirt migration are awating review https://review.openstack.org/#/c/310505/ and https://review.openstack.org/#/c/310538/
14:33:08 <mriedem> mdbooth: it's all wrapped up in the mystical magical 3 different block_device.py modules in nova
14:33:19 <mriedem> you might have a dict, you might have an object, you might have a dict that wraps an object
14:33:30 <mdbooth> mriedem: Yeah, it's the relationship between those which isn't yet clear to me.
14:33:55 * mdbooth suspects that these days you probably always have an object, and that the code is unnecessarily crufty
14:34:09 <PaulMurray> probably true
14:34:13 <PaulMurray> or its close
14:34:50 <PaulMurray> Was it nikola who was doing that
14:34:56 <mdbooth> Originally, yes
14:35:03 <mriedem> yeah, some of the cruft is from the legacy bdm stuff
14:35:07 <mriedem> converting v1 bdms to v2
14:35:17 <mriedem> so there is a lot of facade stuff and wrapping
14:35:26 <PaulMurray> I remember he was the only one who knew some of this stuff
14:35:37 <mriedem> i've fixed some bugs in some of it
14:35:46 <mriedem> while we're deprecating apis, maybe we should deprecated bdm v1
14:35:51 <mriedem> *deprecate
14:36:44 <PaulMurray> getting a bit off topic now
14:36:46 <mdbooth> mriedem: Yeah. It's weird that we're not converting those at the api layer.
14:36:51 <mdbooth> PaulMurray: Indeed, sorry.
14:37:05 <PaulMurray> paul-carlton1, mentioned his specs aboe
14:37:19 <PaulMurray> would be good to get those sorted
14:37:39 <PaulMurray> Is there anything else to add on that paul-carlton1
14:37:41 <PaulMurray> ?
14:38:43 <paul-carlton1> hopefully they speak for themselves but I'm anxious that they don't miss cut for Newton specs
14:38:58 <mdbooth> paul-carlton1: When is that?
14:39:01 <PaulMurray> its priority so there is a bit to go, but yes
14:39:22 <mriedem> n-1
14:39:25 <mriedem> 6/2
14:39:31 <mriedem> is non-priority spec approval freeze
14:39:40 <paul-carlton1> end of may I think
14:39:48 <mriedem> we said, however, that libvirt storage pools are a priority
14:39:55 <PaulMurray> mriedem, 6th of feburary - great
14:40:02 <mriedem> this is our release schedule https://wiki.openstack.org/wiki/Nova/Newton_Release_Schedule
14:40:10 <mriedem> PaulMurray: 'merican date format only here
14:40:32 <mriedem> priorities https://specs.openstack.org/openstack/nova-specs/priorities/newton-priorities.html#libvirt-storage-pools-live-migration
14:40:35 <paul-carlton1> ok, so no immediate panic!
14:40:49 <PaulMurray> but better to get it settled anyway
14:41:13 <PaulMurray> moving on agin
14:41:24 <PaulMurray> #topic Specs
14:41:31 <PaulMurray> well, sort of moving on
14:41:48 <davidgiluk> where is the migration force/postcopy/autoconverge/etc spec upto?
14:41:53 <PaulMurray> the Post-copy and auto-converge merged with auto completion seems close to agreement
14:42:23 * PaulMurray ah, trying to cut paste
14:42:24 <paul-carlton1> mriedem, when is priority feature freeze, I don't see it on wiki page
14:42:37 <PaulMurray> https://review.openstack.org/#/c/306561 - Automatic Live Migration Completion
14:42:40 <mriedem> paul-carlton1: same as normal FF
14:42:50 <mriedem> paul-carlton1: sept 2nd
14:43:32 <PaulMurray> That spec did have a +2 from danpb
14:43:37 <johnthetubaguy> so isn't switching to post-copy automated somehow?
14:43:42 <paul-carlton1> ta
14:43:54 <paul-carlton1> johnthetubaguy, nope
14:43:58 <PaulMurray> johnthetubaguy, no, we have to tell it to do it
14:44:11 <PaulMurray> johnthetubaguy, so the spec talks about when and how agressive to be
14:44:20 <PaulMurray> in moitoring thread
14:44:24 <johnthetubaguy> gotcha
14:44:52 <PaulMurray> It also adds a --force-complete flag to live migrate API
14:44:53 <johnthetubaguy> I just don't like config options change the API sematics, but I kinda get why we want that here, I just need some theory about that first
14:45:32 <mdbooth> johnthetubaguy: Automated by Nova.
14:46:06 <PaulMurray> johnthetubaguy, the idea is always use post-copy if you can
14:46:19 <PaulMurray> but allow ops to disable it if they really want
14:46:28 <PaulMurray> so default is to have it
14:47:01 <paul-carlton1> if available, post-copy is a recent feature
14:47:36 <PaulMurray> This spec is not priority so we could do with reviews
14:47:48 <johnthetubaguy> hmm, I thought we said post-copy was dangerous, but I guess its optional
14:47:51 <PaulMurray> at earliest convenience
14:48:23 <mdbooth> johnthetubaguy: I think the 'dangerous' moniker is largely misinformation.
14:48:57 <PaulMurray> mdbooth, I think the exposure is small, but the main problem is its new
14:49:06 <paul-carlton1> johnthetubaguy, there are risks of instance failing and needing a reboot if network goes down but networks are pretty reliable and a reboot is not end of the world
14:49:09 <PaulMurray> and in production we find new = don't always work
14:49:13 <mdbooth> PaulMurray: Always apropos bugs.
14:49:51 <paul-carlton1> the alternative may be a suspend or auto-converge which is potentially worse
14:50:06 <PaulMurray> Any other specs to mention?
14:50:12 <mdbooth> Basically, almost everything which would cause post-copy to fail would also have caused some other failure.
14:50:12 <paul-carlton1> https://review.openstack.org/#/c/307131/
14:50:47 <PaulMurray> johnthetubaguy, you -1 that before ^^^
14:50:48 <paul-carlton1> Live Migration of rescued instances, I have an implementation ready(ish) for this
14:50:55 <PaulMurray> I think it covers what you asked for ?
14:51:12 <paul-carlton1> I've updated it since speaking to johnthetubaguy
14:51:39 <johnthetubaguy> cool, I should get back to that one and take another look soon
14:51:39 <PaulMurray> time is moving on
14:51:58 <PaulMurray> #topic reviews
14:52:08 <PaulMurray> anything specific in review ?
14:52:20 <abhishekk> hi
14:52:24 <abhishekk> https://review.openstack.org/#/c/215483/ - Set migration status to 'error' on live-migration failure
14:53:01 <PaulMurray> I think that one is there now - just needs core review
14:53:23 <abhishekk> yes, please do the needful when get time:)
14:53:35 <mriedem> not sure i can handle an 8 LOC patch
14:53:53 <PaulMurray> I'll ping the cores that had an opinion on it if they don't respond soon
14:54:15 <PaulMurray> #topic Open Discussion
14:54:18 <abhishekk> thank you PaulMurray
14:54:30 <PaulMurray> a few minutes for any other business
14:55:34 <PaulMurray> I guess we're done
14:55:39 <PaulMurray> thank you all for coming
14:55:46 <PaulMurray> please do some reviews for others
14:55:59 <johnthetubaguy> I guess the etherpad is all up to date right?
14:56:05 <PaulMurray> #action all review sub team patches
14:56:17 <PaulMurray> the sub team section on the review tracking page is
14:56:27 <PaulMurray> I need to clean up our own page
14:56:46 <PaulMurray> but for now the https://etherpad.openstack.org/p/newton-nova-priorities-tracking is the place to go
14:56:55 * mdbooth is getting substantive review on the bottom of the instance storage patch stack, btw
14:57:22 <PaulMurray> good mdbooth
14:57:25 <PaulMurray> bye
14:57:25 <mdbooth> which is awesome
14:57:31 <PaulMurray> #endmeeting