14:00:11 <edleafe> #startmeeting nova_scheduler
14:00:12 <openstack> Meeting started Mon Feb 12 14:00:11 2018 UTC and is due to finish in 60 minutes.  The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:16 <openstack> The meeting name has been set to 'nova_scheduler'
14:00:25 <leakypipes> o=#
14:00:25 <efried> @/
14:00:30 <takashin> o/
14:00:32 <jroll> \o
14:00:32 <efried> leakypipes: Is that the Enterprise?
14:00:33 <ttsiouts> o/
14:00:42 <leakypipes> efried: no, it's tendonitis in my elbow.
14:00:46 <mriedem> o/
14:00:48 <efried> ouch
14:00:51 <jroll> I just thought you were holding a waffle
14:00:58 <efried> hahaha
14:01:09 <leakypipes> jroll: wafflehands Jay.
14:01:11 <edleafe> leakypipes: too many beer curls?
14:01:16 <leakypipes> edleafe: indeed.
14:01:24 <jroll> heh
14:01:30 <edleafe> um, it's Monday, leakypipes
14:01:35 <cdent> o/
14:01:37 <leakypipes> oh..
14:01:40 <efried> Friday never ended for leakypipes
14:01:53 <edleafe> ...hence the sore elbow
14:01:57 * bauzas yawns
14:02:05 <jaypipes> heh, no, Friday definitely ended on Friday.
14:02:23 <bauzas> days are subjective
14:02:31 <edleafe> #topic Specs and Reviews
14:02:40 <bauzas> you can call out everyday is Friday, unless someone proves you the contrary
14:02:45 <edleafe> Work for Rocky is starting to appear
14:03:02 <edleafe> Anyone have a particular spec/review to discuss?
14:03:19 <jaypipes> nope. I need to finish writing one and start reviewing a bunch of em.
14:03:21 <bauzas> PTG is in 2 weeks, do we really need to discuss now about specs ?
14:03:43 <edleafe> bauzas: well, that's why I asked instead of going over everything
14:03:46 <bauzas> unless someone wants some opinion when writing a spec ?
14:03:52 <arvindn05> https://review.openstack.org/#/c/541507/
14:03:54 <cdent> bauzas: presumably we want to have our brains warmed up?
14:03:59 <edleafe> this way, unless there is something pressing, we can move ahead
14:04:10 <bauzas> cdent: it's snowing here, I don't need to warm up
14:04:21 <cdent> as you wish
14:04:32 <arvindn05> unfortunately i think some folks wont be able to make PTG...so preliminary reviews are appreciated :)
14:04:39 <edleafe> arvindn05: is that spec just for glance?
14:04:59 <edleafe> #link Support traits in Glance https://review.openstack.org/#/c/541507/
14:05:03 <jaypipes> arvindn05: interesting idea, thanks for the link. will review.
14:05:13 <mriedem> do traits support a TEXT blob?
14:05:16 <arvindn05> nope...its for nova to honor trait hits available in glance image meta data
14:05:22 <mriedem> b/c glance image metadata is a TEXT blob and not size limited in the glance API
14:05:39 <jroll> traits in placement are limited to 255 characters iirc
14:05:41 <mriedem> https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api_models.py#L579
14:05:43 <mriedem> yup
14:05:48 <mriedem> so that is issue #1
14:05:49 <efried> minus CUSTOM_
14:06:07 <arvindn05> Thanks
14:06:22 <efried> ...though I guess the image meta could have standard traits too, duh.
14:06:22 <edleafe> so glance "traits" != placement "traits"?
14:06:46 <mriedem> having not read the spec,
14:06:49 <edleafe> guess I should read the spec first
14:06:52 <edleafe> jinx
14:06:56 <mriedem> i'm not sure how the glance image meta key/value turns into a single string tag
14:07:18 <jroll> tl;dr glance would have properties like trait:HW_CPU_X86_AVX2=required
14:07:21 <arvindn05> >so glance "traits" != placement "traits"?
14:07:22 <arvindn05> yes...unfortunate naming from my side i guess
14:07:26 <jaypipes> mriedem: looks like it's pretty much identical to how we're doing the flavor metadata key/value conversion for traits.
14:07:50 <arvindn05> yes @jaypipes
14:08:12 <arvindn05> the glance image meta data also has the 255 character limit so fits in well there
14:08:26 <mriedem> arvindn05: not in glance v2
14:08:30 <mriedem> which is the only thing that nova supports
14:08:35 <mriedem> arvindn05: see https://review.openstack.org/#/c/526900/
14:09:18 <mriedem> https://github.com/openstack/glance/blob/master/glance/db/sqlalchemy/models.py#L159
14:09:22 <jaypipes> arvindn05: well, note that the length for these things' value is always going to be 8 (the length of the string "required"). it's the *key* that changes..
14:09:24 <bauzas> I think we agreed on having traits being something understandable by operators
14:09:28 <jaypipes> mriedem: ^^
14:09:47 <bauzas> having traits being more than 255 letters looks to me not fine for operators
14:10:15 <arvindn05> ahh...ok. Will check the limits in glance
14:10:36 <arvindn05> https://developer.openstack.org/api-ref/image/v2/index.html#create-an-image
14:11:02 <jaypipes> mriedem, bauzas: keys are still limited in glance to 255. https://github.com/openstack/glance/blob/master/glance/db/sqlalchemy/models.py#L158
14:11:07 <arvindn05> >List of tags for this image. Each tag is a string of at most 255 chars.
14:11:07 <arvindn05> i dont see the above referencing key length...will look at the code
14:11:12 <bauzas> jaypipes: then cool
14:11:18 <mriedem> i didn't realize it was the image meta key
14:11:33 <mriedem> https://review.openstack.org/#/c/541507/4/specs/rocky/approved/glance-image-traits.rst@35
14:11:43 <mriedem> ^ not sure why image meta with host aggregates doesn't solve that problem already
14:11:52 <efried> Depending how close we're getting to how it's expressed in flavors, there'll be that trait<N>: prefix too.
14:12:17 * efried also hasn't read the spec yet
14:12:41 <bauzas> mriedem: well, it's also possible, for sure
14:12:50 <edleafe> We should probably read the spec and continue the discussion there
14:13:01 <efried> The resource class affinity spec should be a pretty quick read.  I'm starting to warm up to the idea presented in the first section of Alternatives.  It's a lot more powerful, even though it's less user-friendly.  Anyway, I'd love some feedback on that (and the idea in general).  Wouldn't mind having one direction picked out before the PTG.  https://review.openstack.org/543062
14:13:11 <arvindn05> yup...that would be great
14:13:11 <bauzas> what I just want to make sure is that if Glance provides a traits API, those are exactly like the same than in Nova
14:13:12 <jroll> mriedem: agree with that, but I think there's plenty of use cases for this. easy example: my hypervisor image requires VT bit :)
14:13:25 <bauzas> or we could have problems like Cinder vs. Nova AZs :p
14:13:46 <edleafe> #link Resource Class Affinity Spec https://review.openstack.org/543062
14:14:57 <edleafe> Anything else for specs/reviews?
14:15:00 <arvindn05> bauzas: we are overloading the metadata field in glance for expressing traits. The data model matches almost exactly https://github.com/openstack/glance/blob/master/glance/db/sqlalchemy/models.py#L158
14:15:18 <jaypipes> bauzas: ack. though this isn't a proposed "API" per-se, but rather just a convention of naming metadata keys in the same fashion as Nova's flavor metadata embeds the whole trait:XXX encoding in the flavor metadata key
14:16:07 <jaypipes> really, it's just "standardizing" image metadata keys to be (more like) os-traits standard traits. which I think is a very good idea.
14:16:11 <efried> ...and having the scheduler pay attention to that and feed those traits into the allocation candidates request
14:16:19 <jaypipes> yup, efried
14:16:39 * edleafe is back after a network hiccup
14:16:40 <efried> ...and attach those traits to which parts of the flavor request?
14:16:53 <efried> Perhaps this is answered in the spec, which I still haven't read since four minutes ago.
14:17:03 <jaypipes> :)
14:17:11 <efried> But yeah, this sounds like it has promise.
14:17:18 <cdent> efried: the integration with flavors is a question on left on the spec, which you'll see when you read it ;)
14:17:33 <edleafe> So what did I miss - still discussing the glance traits spec?
14:17:47 <jaypipes> edleafe: ya, but I think we should move on.
14:17:57 <efried> We just can't seem to let it go.  We're all so intrigued.
14:18:12 <arvindn05> yea...we have to deal with conflicts between image and flavor traits...its briefly address in the spec so will add more details based on comments
14:18:24 <efried> #action everybody read the spec and we'll discuss it next week?
14:18:34 <edleafe> efried: or comment on the spec??
14:18:38 <efried> that too :)
14:18:42 <arvindn05> thanks for the discussions...we can move on i think...i look forward to the comments :)
14:18:45 <efried> ...discuss if needed
14:18:52 <edleafe> #topic Bugs
14:18:53 <edleafe> #link Placement bugs https://bugs.launchpad.net/nova/+bugs?field.tag=placement
14:19:00 <edleafe> No new bugs this week
14:19:16 <mriedem> i've got a bug
14:19:18 <mriedem> that's not a placement bug
14:19:30 <mriedem> https://review.openstack.org/#/q/topic:bug/1746483+(status:open+OR+status:merged)
14:19:42 <mriedem> looking for any historical background on IsolatedHostsFilter if anyone knows,
14:19:46 <mriedem> wrt boot from volume,
14:20:07 <jaypipes> ah, BFV again
14:20:07 <mriedem> that filter never worked properly with boot from volume requests, unless it did by just saying, 'i don't care about those types of requests'
14:20:22 <mriedem> the problem is the requestspec.image.id isn't set for a bfv request,
14:20:30 <mriedem> so the filter doesn't get the image id to tell if it's isolated or not
14:20:43 <cdent> mriedem: I say fix it most real, and if people discover they were getting accidental results, now their life is better
14:20:49 <mriedem> if the purpose of the filter is to only keep ephemeral images pinned to isolated hosts, then i guess it's fine
14:20:53 <bauzas> mriedem: I saw your email, sorry missed to reply
14:21:11 <efried> Wait, we allow scheduler topics in this meeting that *don't* involve placement??
14:21:21 <mriedem> efried: i know it's crazy right
14:21:25 * cdent cookies efried
14:21:33 <mriedem> and we still get non-placement scheduler bugs
14:21:44 <bauzas> efried: there is a world beyond placement
14:21:50 <cdent> lies
14:21:51 <mriedem> given this has been broken since mitaka, it's obvious not many people use this filter
14:21:55 <edleafe> Well, I *suppose* we can allow those questions
14:22:09 <jaypipes> mriedem: you mean not many people use this filter AND use BFV.
14:22:16 <bauzas> mriedem: AFAIK, operators prefer to use filters with host aggregates and image mapping
14:22:25 <mriedem> jaypipes: i assume more people use bfv than this filter
14:22:38 <mriedem> bauzas: yes this filter is redundant with image meta + aggregates
14:22:55 <bauzas> voilĂ , problem solved
14:22:59 <mriedem> so my plan is to fix the obvious regression, backport that,
14:23:00 <bauzas> call it deprecated :p
14:23:09 <mriedem> and then just leave this not supporting bfv as always
14:23:16 <bauzas> I agree
14:23:39 <bauzas> I mean
14:23:40 <mriedem> it did make me wonder if we should put the image id in the request spec anyway, for later lookups if needed, since we have the information in the api when we create the instance,
14:23:49 <mriedem> but i don't have a good use case for that right now
14:24:09 <mriedem> it's just another one of those things where the request spec can't really be trusted later
14:24:38 <bauzas> if you want MHO,
14:24:41 <mriedem> we've talked in the cells meeting about relying on the request spec if a cell is down
14:24:58 <bauzas> that's just the fact we agreed on having BFV instances not having an image that's the problem
14:25:10 <bauzas> it breaks the model
14:25:24 <mriedem> you mean 'we agreed' being some guys 7 years ago?
14:25:40 <bauzas> I'm in the community
14:25:52 <gibi> but there are cases when a BFV instance does not have an image et all. I can create volume in cinder and pass it to nova to boot from it
14:25:53 <bauzas> so, we == me, even if I wasn't yet providing bugs to Nova yet
14:26:05 <bauzas> gibi: I konw
14:26:07 <mriedem> gibi: we pull the image from the volume metadata
14:26:18 <mriedem> we rely on that for things like rebuild,
14:26:19 <bauzas> gibi: I'm just saying we shot ourselves in the feet when we accepted that
14:26:23 <mriedem> where we now check to see if the image is changing
14:26:36 <mriedem> anyway, we can move on
14:26:42 <mriedem> i can talk with dansmith about the cellsv2 thing
14:26:49 <bauzas> mriedem: agreed, will review your series ASAP
14:26:59 <bauzas> mriedem: ping me if I haven't done that today
14:27:09 <mriedem> ok
14:27:20 <bauzas> for the RequestSpec thing, I'd love to discuss that too with dan
14:27:36 <bauzas> because I agree, we need to think hard about all the corner cases
14:27:50 <bauzas> the deletion thing is like a big problem
14:27:56 <mriedem> jaypipes: i thought you might like this trick https://review.openstack.org/#/c/543257/2/nova/tests/functional/regressions/test_bug_1746483.py@93
14:27:58 <mriedem> for the functional test
14:28:00 <bauzas> but maybe we also have moar problems
14:30:34 <jaypipes> mriedem: yeah, saw that...
14:31:25 <mriedem> what's next/
14:31:26 <mriedem> ?
14:31:31 <bauzas> mriedem: I'm not getting the story for that contextmanager, but let's punt that offline
14:31:47 <mriedem> it's a private me + jay story
14:32:07 <bauzas> heh
14:32:07 <mriedem> discussed publicly in -nova once
14:32:13 <edleafe> Well, I didn't want to interrupt your fun
14:32:21 <edleafe> Let's move on
14:32:25 <edleafe> #topic Open discussion
14:32:37 <edleafe> Nothing on the agenda
14:32:53 <jroll> random question before I bother proposing a blueprint, would you placement folks be open to something like this? https://review.openstack.org/#/c/542992/
14:33:00 <edleafe> We have had several discussions regarding affinity/distance in placement lately
14:33:06 <jroll> (tl;dr a quick config to enable oslo's healthcheck middleware)
14:33:36 <jaypipes> jroll: for the API services, sure.
14:33:52 <edleafe> jroll: looks sane
14:34:04 <jroll> sweet, thank you
14:34:09 * jroll puts on rocky todo list
14:34:10 <cdent> jroll: is that the _new_ healthcheck middleware?
14:34:22 <jroll> cdent: it's the only one I know of?
14:34:25 <mriedem> is there a way to discover if that's available?
14:34:34 <mriedem> besides just trying  to hit it and fail?
14:34:43 <bauzas> jroll: wouldn't that be an API SIG discussion ?
14:34:44 * alex_xu waves very late
14:34:52 <cdent> jroll: https://review.openstack.org/#/c/531456/
14:34:58 <jroll> bauzas: I ask about placement specifically, because it doesn't use paste
14:34:59 <bauzas> jroll: and I wonder why you can't just call the root
14:35:08 <bauzas> and see if 20x or else
14:35:17 <jroll> bauzas: our current healthcheck calls / and looks for a 401 :P
14:35:23 <arvindn05> would be good to know what type of calls the other services make to confirm they are healthy
14:35:30 <cdent> jroll: (note that I don't like that spec, I think it is internet over-engineering task force)
14:35:32 <bauzas> I mean, I used haproxy in the past a lot
14:35:39 <cdent> jroll: that's been fixed, / is no auth now
14:35:47 <jroll> ah neat
14:35:57 <bauzas> and IIRC, you can healthcheck whatever you want without requiring a specific WGSI middleware
14:36:13 <mriedem> yeah this doesn't seem very useful
14:36:14 <bauzas> because if your WSGI app is stuck, the middleware will still be happy
14:36:23 <cdent> bauzas: the spec https://review.openstack.org/#/c/531456/ has a very different opinion ,you might find it interesting
14:36:32 <mriedem> if i can hit / and get version doc it's functioning right?
14:36:43 <bauzas> what I'd like honestly is some kind of guidance from the SIG :)
14:37:04 <bauzas> because application monitoring is a laaaaaarge domain
14:37:24 <jroll> sure, part of why I like it is that it would be standard across all API services (for me)
14:37:44 <bauzas> personnally, I'd trust more a 3rd party monitoring system that would just notify my balancer to put oor a faulty node
14:37:53 <mriedem> i see that in the spec, and looks like it uses custom tests per service for what healthy means
14:38:32 <mriedem> if this becomes a thing some projects do, i can smell the community wide goal for S
14:38:34 <bauzas> it creates a dependency on the monitoring system, but do people seriously run in production without having a reliable monitoring system ?
14:38:42 <bauzas> mriedem: ++
14:38:46 <jroll> to be clear, what I have now in production is this middleware on every service that uses paste for config, and I wanted to add it to those that don't. I prefer to do such a thing upstream, so floating the idea
14:38:46 <bauzas> not a placement thing
14:38:59 <jaypipes> jroll: big +1 from me.
14:39:16 <bauzas> ie. placement should ack to the direction that the whole coommunity would agree
14:39:35 <jaypipes> jroll: again, on the API services. the non-API services like nova-compute and nova-conductor don't have the same place to hook middleware up.
14:39:36 <cdent> jroll: the easy way to do it upstream would be to, in deploy.py do some kind of config or env check to include the middleware in the stack
14:39:44 <bauzas> hence the API SIG thing or the TC goal, I don't really care about the formalism
14:39:50 <jroll> jaypipes: right
14:40:04 <jroll> cdent: agree: https://review.openstack.org/#/c/542992/3/ironic/api/app.py
14:40:13 <bauzas> yeah
14:40:14 <cdent> jroll: deploy.py is performing the same pattern as paste
14:40:27 <bauzas> I mean, that really depends on what you want to monitor, again
14:40:29 <cdent> yup, totes
14:40:49 <mriedem> has this been queued up for ptg discussion? first i've heard of this.
14:40:58 <cdent> bauzas: the api sig is unlikely to make any assertions about healthcheck stuff until the debate on the oslo spec resolves
14:40:58 <mriedem> so i assume i'm not the only one
14:40:59 <bauzas> me too
14:41:00 <jroll> no, this is an idea I had thursday
14:41:15 <bauzas> cdent: that's a reasonable outcome and I think I agree with you
14:41:20 <jroll> literally just asking "does anyone vehemently hate this"
14:41:21 <mriedem> well https://review.openstack.org/#/c/531456/ suggests it was discussed in sydney
14:41:24 <bauzas> cdent: probably more a cross-project thing
14:41:53 <bauzas> do we have kind of operator-developer SIG about monitoring ?
14:42:02 <bauzas> I know we have a logs WG
14:42:04 <bauzas> -ish
14:42:15 <jroll> anyway, I hear y'all, will go ahead and engage the rest of openstack
14:42:35 <mriedem> ptg discussion during ptg in the api sig is probably a good starting point
14:42:42 <mriedem> at the ptg,
14:42:44 <mriedem> while ptging
14:42:47 <mriedem> sorry
14:43:08 <mriedem> oh right the self healing sig
14:43:10 <mriedem> forgot about that one
14:43:15 <cdent> mriedem: feel free to add to https://etherpad.openstack.org/p/api-sig-ptg-rocky if you want
14:43:27 <mriedem> will do
14:44:35 <edleafe> OK, then, anything else on your collective minds?
14:45:25 * edleafe hears a collective yawn
14:45:33 <edleafe> Thanks everyone!
14:45:35 <edleafe> #endmeeting