14:03:34 <edleafe> #startmeeting nova_scheduler
14:03:35 <openstack> Meeting started Mon Sep 12 14:03:34 2016 UTC and is due to finish in 60 minutes.  The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:03:36 <zigo> Le'ts continue on #openstack-pkg
14:03:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:03:38 <openstack> The meeting name has been set to 'nova_scheduler'
14:03:45 <edleafe> who's here?
14:03:47 <Yingxin> o/
14:03:47 <cdent> o/
14:03:48 <bauzas> \o
14:03:56 <alex_xu> o/
14:03:57 * bauzas with very little support tho
14:04:14 <edleafe> This should be super-quick
14:04:18 <cdent> jay's either on a plane, or on his way to. He promises to review resource provider related stuff on the place
14:04:33 <bauzas> and I need to do my homework as well
14:04:39 <edleafe> cdent: or even the plane?
14:04:43 <edleafe> :_)
14:04:50 <bauzas> but for the moment, working on a regression bugfix
14:05:02 <edleafe> Anyway, everything we need to focus on is here: https://etherpad.openstack.org/p/placement-next
14:05:03 <cdent> edleafe: that too :)
14:05:34 <cdent> we probably need to take a moment (now or after) to make sure that etherpad is up to date, but yeah, it's the center of things at the moment
14:05:49 <edleafe> One thing that just came up, though, is alex_xu's comments on https://review.openstack.org/#/c/368035
14:06:04 <alex_xu> edleafe: thanks
14:06:18 <bauzas> cdent: at least focusing on the "Things we need for Newton" section, hope people can understand :)
14:06:23 <edleafe> that involves including the RP generation both a the request level, and at each individual inventory record
14:06:24 <alex_xu> question about why we have two resource_provider_generation in the request
14:06:34 <edleafe> bauzas: good point
14:06:50 <edleafe> cdent: can you comment on that?
14:06:58 <edleafe> (the multiple generation question)
14:07:12 <cdent> alex_xu: I think the gist there is that the generation at the top level is required, the one within the inventory itself is ignored, but allowed to be there because it is required in the post
14:07:24 <cdent> so if you have code that is generating inventories for both situations you can reuse it
14:07:46 <cdent> it is weird
14:07:57 <bauzas> okay, so we don't have schemas validating the JSON output ?
14:08:02 <alex_xu> cdent: ok, so we still have use that old format, right?
14:08:20 <cdent> it comes about because there was originally only POSTing one inventory at /inventories and then PUT for several inventories at /inventories was added
14:08:25 <bauzas> could we maybe enforce that (the JSON output) at the unittest level ?
14:08:41 <cdent> schemas are not set for output, just input
14:08:49 <bauzas> cdent: is gabbit able to enforce that ?
14:09:03 <bauzas> I mean validing our outputs ?
14:09:05 <cdent> validation of the output is currently only done in the gabbi tests but unit tests are possible as well, if that's desired
14:09:13 <cdent> bauzas: they already do don't they?
14:09:24 <bauzas> cdent: sorry I'm unclear
14:09:30 <cdent> that is: the gabbi tests would fail if the output wasn't what the gabbit tests wanted
14:09:33 <bauzas> cdent: I know we do validate the output with gabbi
14:09:42 <bauzas> cdent: but we validate that per-field, right?
14:09:56 <bauzas> cdent: I was more or less thinking of comparing the whole dict
14:10:03 <cdent> the jsonpaths won't work if the overall structure isn't right
14:10:12 <cdent> but yes, there's not single chunk that validates an entire structure
14:10:29 <johnthetubaguy> bauzas: you mean like the tempest tests that run the output validation json schema things?
14:10:31 <bauzas> cdent: I'm asking that for clarity, ie. something we could point users to
14:10:50 <bauzas> cdent: for the moment, we can only point to specs, right?
14:11:02 <cdent> Yes, we could. If you think we should do it now, instead of ocata, could you make a bug?
14:11:15 <bauzas> cdent: not really for Newton-ishb
14:11:20 <cdent> k
14:11:52 <bauzas> cdent: it's related to the point I missed alex_xu's point mostly because I guess I haven't correctly figured out the JSON output in my mind
14:12:05 * cdent nods
14:12:28 <alex_xu> cdent: the currently code didn't use the generation for each inventory...why we need keep that?
14:13:04 <cdent> alex_xu: we don't _need_ to keep it, but as I said above it is more flexible to do so: "so if you have code that is generating inventories for both situations you can reuse it"
14:13:06 <bauzas> johnthetubaguy: well, I'm rather thinking of the api sample tests that verify the response too
14:13:24 <johnthetubaguy> yeah, I get you now, the samples that then flow into the API docs
14:14:00 <bauzas> johnthetubaguy: I like reading the api sample templates everytime I'm looking for some API response
14:14:06 <cdent> bauzas, johnthetubaguy: we can write, in gabbi if we want to, tests that validate the full request and full response. Most of the gabbi tests right now validate only parts of the response, as they aren't intended to be testing the serializers, but as bauzas points out doing so would make good inspectability
14:14:10 <bauzas> because I'm getting the whole JSON schema
14:14:32 <alex_xu> cdent: ah, thanks
14:14:35 <bauzas> anyway, that's something not really needed for Newton, so no rush
14:14:44 <johnthetubaguy> cdent: yeah, something to capture / check the samples would be good, but yeah, thats of next time
14:16:04 <edleafe> PRobably was good not to have it now, as the structures changed so often as we figured things out
14:16:12 <edleafe> But once they settle down...
14:16:26 <cdent> edleafe++
14:17:25 <edleafe> So cdent - is there anything we can help with (besides reviews)?
14:17:35 * edleafe has a few spare cycles
14:17:43 <cdent> I seem to recall there was a bug posted late last week about something to do with migrations and allocations?
14:17:45 * cdent looks for it
14:18:08 <bauzas> cdent: yeah, we don't track migrations like we should
14:18:14 <cdent> https://bugs.launchpad.net/nova/+bug/1621709
14:18:15 <openstack> Launchpad bug 1621709 in OpenStack Compute (nova) "There is no allocation record for migration action" [Medium,Confirmed] - Assigned to Alex Xu (xuhj)
14:18:24 <alex_xu> yea, I file that bug
14:18:37 <bauzas> cdent: we only reconcile when we run the periodic update
14:18:43 <alex_xu> but after check the situation, looks like not very easy one
14:18:47 <bauzas> oh, alex_xu, you're on it ?
14:18:49 <edleafe> alex_xu: do you need help with that? I can pick it up after your day ends
14:19:02 <cdent> is the periodic thing not sufficient?
14:19:03 <bauzas> alex_xu: well, that seems easy to me
14:19:06 <bauzas> cdent: it is
14:19:22 <johnthetubaguy> the problem is failed live-migrates I assume?
14:19:23 <alex_xu> ok, you guys can free to take it :)
14:19:26 <bauzas> cdent: so that's not really a big deal for newton, only a stretch goal
14:19:35 <bauzas> johnthetubaguy: live migrations are worst than that
14:19:36 <johnthetubaguy> right, its not a big deal for newton
14:19:44 <alex_xu> johnthetubaguy: nothing failed, just we needn't record for claim
14:19:45 <cdent> the other thing, which mriedem wanted now not later, was https://bugs.launchpad.net/nova/+bug/1621888
14:19:46 <openstack> Launchpad bug 1621888 in OpenStack Compute (nova) "placement-api http responses are not marked for translation" [Medium,Confirmed] - Assigned to Chris Dent (cdent)
14:19:50 <bauzas> johnthetubaguy: here, I'm talking of resizes and cold migrations
14:19:51 <alex_xu> johnthetubaguy: agree
14:20:00 <bauzas> johnthetubaguy: because we have the MoveClaim
14:20:07 <cdent> which I'v assigned myself, but I was leaving it until the representations and logs really settled
14:20:16 <cdent> which I'm assuming won't be the case until the resource tracker has stabilized
14:20:16 <mriedem> cdent: we should just fix that today
14:20:19 <bauzas> johnthetubaguy: but, but, we don't have any claims for live-migration, which is evel
14:20:20 <bauzas> evil
14:20:24 <igordcard> :q
14:20:24 <mriedem> RC1 is on thursday
14:20:28 <johnthetubaguy> so, long term, if you don't have a claim, and new instance takes the spot of the place you want to move to, it all goes odd
14:20:52 <bauzas> johnthetubaguy: which means that even if we add the placement api call on migrations, we'd still derail on live-migs
14:21:05 <cdent> mriedem: there's a couple of server side things to merge, hopefully today.
14:21:08 <bauzas> but then the world would be fixed every 60 secs
14:21:17 <alex_xu> and we didn't cleanup the allocation which compute node didn't know in the update_available_resource
14:22:03 <bauzas> okay, I need to get done my cellsv2 patch super quick so I can help with the migration allocation patch
14:22:27 <mriedem> fyi, we have a grenade full job that runs with the placement API in the experimental queue
14:22:35 <cdent> mriedem++
14:22:46 <mriedem> if you're working on placement changes, you can 'check experimental' that
14:23:10 <bauzas> mriedem: <3
14:23:25 <edleafe> mriedem: kewl
14:23:28 <Yingxin> there might be another bug that "can_host" is always 0 for compute node resource providers
14:23:52 <cdent> Yingxin: ah, good catch. I think we've probably just forgotten that
14:24:05 <Yingxin> https://bugs.launchpad.net/nova/+bug/1622538
14:24:06 <openstack> Launchpad bug 1622538 in OpenStack Compute (nova) "Wrong "can_host" field of compute node resource providers" [Undecided,New] - Assigned to Yingxin (cyx1231st)
14:24:34 <Yingxin> easy fix
14:24:36 <cdent> Yingxin: I suspect that was just an oversight when doing the PUT
14:24:46 <Yingxin> cdent: yup
14:25:44 <cdent> dansmith: is your inventory stuff synced up with jay's representation changes?
14:25:55 <edleafe> Be sure to update the etherpad with these bugs and their fixes
14:26:15 <edleafe> #link https://bugs.launchpad.net/nova/+bug/1621709
14:26:18 <openstack> Launchpad bug 1621709 in OpenStack Compute (nova) "There is no allocation record for migration action" [Medium,Confirmed] - Assigned to Alex Xu (xuhj)
14:26:18 <edleafe> #link https://bugs.launchpad.net/nova/+bug/1622538
14:26:19 <openstack> Launchpad bug 1622538 in OpenStack Compute (nova) "Wrong "can_host" field of compute node resource providers" [Undecided,New] - Assigned to Yingxin (cyx1231st)
14:26:45 <Yingxin> edleafe: ok
14:27:18 <cdent> edleafe: in terms of "things to do to help" I reckon it's the same as last time: just try it in devstack and see what's wrong
14:27:37 <mriedem> are we holding up rc1 for those bugs?
14:27:39 <cdent> there are things like: https://bugs.launchpad.net/nova/+bug/1620748
14:27:40 <openstack> Launchpad bug 1620748 in OpenStack Compute (nova) "In placement when an attempt is made to write to missing inventory the error message is ugly" [Medium,Confirmed] - Assigned to Chris Dent (cdent)
14:27:48 <cdent> which are not critical, but present
14:28:10 <edleafe> cdent: ok
14:28:11 <cdent> mriedem: migration one probably not, can_host probably yes
14:28:21 <cdent> but the latter is a very quick fix
14:28:24 <mriedem> my assumption is we're not going to hold rc1 for placement bugs as it's optional
14:28:33 <mriedem> and bugs can be backported
14:28:44 * cdent shrugs
14:28:51 <mriedem> but hit me up if there is something ready that people feel we should get in
14:28:58 <cdent> I don't understand all these rules and regulations :)
14:29:18 <dansmith> cdent: no, I should probably rebase that on his I guess
14:29:42 <dansmith> cdent: I had expected mine to be merged already and that his would just fix it up in place, but...
14:30:06 * cdent nods at dansmith with ellipsis in his eyes
14:30:07 <edleafe> cdent: http://ru.memegenerator.net/instance/63122651
14:30:40 <bauzas> well, MHO is that I don't think we should hold RC1 for placement bugs - but, RC2 and later could be a possibility
14:30:40 <dansmith> mriedem: placement api changes should go in before rc1 if we can
14:30:45 <dansmith> mriedem: otherwise it's pretty messy
14:31:08 <mriedem> dansmith: sure, but the range of what those changes are is pretty diverse
14:31:34 <dansmith> mriedem: there's only one left, AFAIK
14:31:52 <dansmith> this one: https://review.openstack.org/#/c/368035/
14:32:40 <mriedem> ok, um https://etherpad.openstack.org/p/placement-next
14:32:46 <mriedem> whoever is updating the 'things we need for newton'
14:32:56 <mriedem> are we sure?
14:33:07 <mriedem> my point is let's just make sure the list in https://etherpad.openstack.org/p/placement-next is sane
14:33:41 <mriedem> anyway, i'll be hitting up people later after meetings
14:33:54 <edleafe> ...or just hitting people
14:33:56 <cdent> huh
14:34:18 <cdent> I apparently never have enough context when people mention things whether they mean now or later
14:34:52 <cdent> Nor does the arbitrariness of it all ever become clear
14:35:01 <mriedem> if something is going to bad to backport,
14:35:12 <mriedem> or make placement much worse to migrate to if you don't have it at rc1,
14:35:17 <mriedem> then let's mark it for rc1
14:35:25 <mriedem> but like the translation bug can be a non-rc1 thing
14:35:31 <dansmith> definitely
14:35:48 <dansmith> mriedem: I just made some updates to the list
14:35:55 <cdent> mriedem: so, on the translation thing, why do _any_ of it now instead of just waiting?
14:35:59 <dansmith> mriedem: three patches for newton, but one is the critical, the others could go later if we *had* to
14:36:05 <cdent> your comments on the bug is confusing
14:37:22 <mriedem> cdent: b/c it'll take 5 minutes to do it?
14:37:35 <mriedem> and there will be translations after rc1
14:37:46 <cdent> sure, but then we have some of it marked and some of it not, which just seems weird
14:37:57 <mriedem> i'm unclear on why we're waiting for things on the server side that's preventing us from translating the api side
14:38:03 <mriedem> s/server/RT/
14:38:07 <mriedem> unless api == server
14:38:18 <cdent> api does == server
14:38:24 <cdent> because the bug is about api responses
14:38:29 <mriedem> anyway, if there are new changes that are adding new exceptions, then let's just mark those for translatoin in the changes that introduces them...
14:38:31 <mriedem> and not add more gorp
14:39:17 <cdent> k
14:39:42 <edleafe> Anything else?
14:39:51 <edleafe> Or do we get back to work?
14:40:21 <mriedem> back to the pile
14:40:24 * edleafe only hears the ringing in my ears
14:40:33 <edleafe> ok, everyone - thanks!
14:40:35 <edleafe> #endmeeting