14:00:11 #startmeeting nova_scheduler 14:00:12 Meeting started Mon Feb 12 14:00:11 2018 UTC and is due to finish in 60 minutes. The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:13 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:16 The meeting name has been set to 'nova_scheduler' 14:00:25 o=# 14:00:25 @/ 14:00:30 o/ 14:00:32 \o 14:00:32 leakypipes: Is that the Enterprise? 14:00:33 o/ 14:00:42 efried: no, it's tendonitis in my elbow. 14:00:46 o/ 14:00:48 ouch 14:00:51 I just thought you were holding a waffle 14:00:58 hahaha 14:01:09 jroll: wafflehands Jay. 14:01:11 leakypipes: too many beer curls? 14:01:16 edleafe: indeed. 14:01:24 heh 14:01:30 um, it's Monday, leakypipes 14:01:35 o/ 14:01:37 oh.. 14:01:40 Friday never ended for leakypipes 14:01:53 ...hence the sore elbow 14:01:57 * bauzas yawns 14:02:05 heh, no, Friday definitely ended on Friday. 14:02:23 days are subjective 14:02:31 #topic Specs and Reviews 14:02:40 you can call out everyday is Friday, unless someone proves you the contrary 14:02:45 Work for Rocky is starting to appear 14:03:02 Anyone have a particular spec/review to discuss? 14:03:19 nope. I need to finish writing one and start reviewing a bunch of em. 14:03:21 PTG is in 2 weeks, do we really need to discuss now about specs ? 14:03:43 bauzas: well, that's why I asked instead of going over everything 14:03:46 unless someone wants some opinion when writing a spec ? 14:03:52 https://review.openstack.org/#/c/541507/ 14:03:54 bauzas: presumably we want to have our brains warmed up? 14:03:59 this way, unless there is something pressing, we can move ahead 14:04:10 cdent: it's snowing here, I don't need to warm up 14:04:21 as you wish 14:04:32 unfortunately i think some folks wont be able to make PTG...so preliminary reviews are appreciated :) 14:04:39 arvindn05: is that spec just for glance? 14:04:59 #link Support traits in Glance https://review.openstack.org/#/c/541507/ 14:05:03 arvindn05: interesting idea, thanks for the link. will review. 14:05:13 do traits support a TEXT blob? 14:05:16 nope...its for nova to honor trait hits available in glance image meta data 14:05:22 b/c glance image metadata is a TEXT blob and not size limited in the glance API 14:05:39 traits in placement are limited to 255 characters iirc 14:05:41 https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api_models.py#L579 14:05:43 yup 14:05:48 so that is issue #1 14:05:49 minus CUSTOM_ 14:06:07 Thanks 14:06:22 ...though I guess the image meta could have standard traits too, duh. 14:06:22 so glance "traits" != placement "traits"? 14:06:46 having not read the spec, 14:06:49 guess I should read the spec first 14:06:52 jinx 14:06:56 i'm not sure how the glance image meta key/value turns into a single string tag 14:07:18 tl;dr glance would have properties like trait:HW_CPU_X86_AVX2=required 14:07:21 >so glance "traits" != placement "traits"? 14:07:22 yes...unfortunate naming from my side i guess 14:07:26 mriedem: looks like it's pretty much identical to how we're doing the flavor metadata key/value conversion for traits. 14:07:50 yes @jaypipes 14:08:12 the glance image meta data also has the 255 character limit so fits in well there 14:08:26 arvindn05: not in glance v2 14:08:30 which is the only thing that nova supports 14:08:35 arvindn05: see https://review.openstack.org/#/c/526900/ 14:09:18 https://github.com/openstack/glance/blob/master/glance/db/sqlalchemy/models.py#L159 14:09:22 arvindn05: well, note that the length for these things' value is always going to be 8 (the length of the string "required"). it's the *key* that changes.. 14:09:24 I think we agreed on having traits being something understandable by operators 14:09:28 mriedem: ^^ 14:09:47 having traits being more than 255 letters looks to me not fine for operators 14:10:15 ahh...ok. Will check the limits in glance 14:10:36 https://developer.openstack.org/api-ref/image/v2/index.html#create-an-image 14:11:02 mriedem, bauzas: keys are still limited in glance to 255. https://github.com/openstack/glance/blob/master/glance/db/sqlalchemy/models.py#L158 14:11:07 >List of tags for this image. Each tag is a string of at most 255 chars. 14:11:07 i dont see the above referencing key length...will look at the code 14:11:12 jaypipes: then cool 14:11:18 i didn't realize it was the image meta key 14:11:33 https://review.openstack.org/#/c/541507/4/specs/rocky/approved/glance-image-traits.rst@35 14:11:43 ^ not sure why image meta with host aggregates doesn't solve that problem already 14:11:52 Depending how close we're getting to how it's expressed in flavors, there'll be that trait: prefix too. 14:12:17 * efried also hasn't read the spec yet 14:12:41 mriedem: well, it's also possible, for sure 14:12:50 We should probably read the spec and continue the discussion there 14:13:01 The resource class affinity spec should be a pretty quick read. I'm starting to warm up to the idea presented in the first section of Alternatives. It's a lot more powerful, even though it's less user-friendly. Anyway, I'd love some feedback on that (and the idea in general). Wouldn't mind having one direction picked out before the PTG. https://review.openstack.org/543062 14:13:11 yup...that would be great 14:13:11 what I just want to make sure is that if Glance provides a traits API, those are exactly like the same than in Nova 14:13:12 mriedem: agree with that, but I think there's plenty of use cases for this. easy example: my hypervisor image requires VT bit :) 14:13:25 or we could have problems like Cinder vs. Nova AZs :p 14:13:46 #link Resource Class Affinity Spec https://review.openstack.org/543062 14:14:57 Anything else for specs/reviews? 14:15:00 bauzas: we are overloading the metadata field in glance for expressing traits. The data model matches almost exactly https://github.com/openstack/glance/blob/master/glance/db/sqlalchemy/models.py#L158 14:15:18 bauzas: ack. though this isn't a proposed "API" per-se, but rather just a convention of naming metadata keys in the same fashion as Nova's flavor metadata embeds the whole trait:XXX encoding in the flavor metadata key 14:16:07 really, it's just "standardizing" image metadata keys to be (more like) os-traits standard traits. which I think is a very good idea. 14:16:11 ...and having the scheduler pay attention to that and feed those traits into the allocation candidates request 14:16:19 yup, efried 14:16:39 * edleafe is back after a network hiccup 14:16:40 ...and attach those traits to which parts of the flavor request? 14:16:53 Perhaps this is answered in the spec, which I still haven't read since four minutes ago. 14:17:03 :) 14:17:11 But yeah, this sounds like it has promise. 14:17:18 efried: the integration with flavors is a question on left on the spec, which you'll see when you read it ;) 14:17:33 So what did I miss - still discussing the glance traits spec? 14:17:47 edleafe: ya, but I think we should move on. 14:17:57 We just can't seem to let it go. We're all so intrigued. 14:18:12 yea...we have to deal with conflicts between image and flavor traits...its briefly address in the spec so will add more details based on comments 14:18:24 #action everybody read the spec and we'll discuss it next week? 14:18:34 efried: or comment on the spec?? 14:18:38 that too :) 14:18:42 thanks for the discussions...we can move on i think...i look forward to the comments :) 14:18:45 ...discuss if needed 14:18:52 #topic Bugs 14:18:53 #link Placement bugs https://bugs.launchpad.net/nova/+bugs?field.tag=placement 14:19:00 No new bugs this week 14:19:16 i've got a bug 14:19:18 that's not a placement bug 14:19:30 https://review.openstack.org/#/q/topic:bug/1746483+(status:open+OR+status:merged) 14:19:42 looking for any historical background on IsolatedHostsFilter if anyone knows, 14:19:46 wrt boot from volume, 14:20:07 ah, BFV again 14:20:07 that filter never worked properly with boot from volume requests, unless it did by just saying, 'i don't care about those types of requests' 14:20:22 the problem is the requestspec.image.id isn't set for a bfv request, 14:20:30 so the filter doesn't get the image id to tell if it's isolated or not 14:20:43 mriedem: I say fix it most real, and if people discover they were getting accidental results, now their life is better 14:20:49 if the purpose of the filter is to only keep ephemeral images pinned to isolated hosts, then i guess it's fine 14:20:53 mriedem: I saw your email, sorry missed to reply 14:21:11 Wait, we allow scheduler topics in this meeting that *don't* involve placement?? 14:21:21 efried: i know it's crazy right 14:21:25 * cdent cookies efried 14:21:33 and we still get non-placement scheduler bugs 14:21:44 efried: there is a world beyond placement 14:21:50 lies 14:21:51 given this has been broken since mitaka, it's obvious not many people use this filter 14:21:55 Well, I *suppose* we can allow those questions 14:22:09 mriedem: you mean not many people use this filter AND use BFV. 14:22:16 mriedem: AFAIK, operators prefer to use filters with host aggregates and image mapping 14:22:25 jaypipes: i assume more people use bfv than this filter 14:22:38 bauzas: yes this filter is redundant with image meta + aggregates 14:22:55 voilĂ , problem solved 14:22:59 so my plan is to fix the obvious regression, backport that, 14:23:00 call it deprecated :p 14:23:09 and then just leave this not supporting bfv as always 14:23:16 I agree 14:23:39 I mean 14:23:40 it did make me wonder if we should put the image id in the request spec anyway, for later lookups if needed, since we have the information in the api when we create the instance, 14:23:49 but i don't have a good use case for that right now 14:24:09 it's just another one of those things where the request spec can't really be trusted later 14:24:38 if you want MHO, 14:24:41 we've talked in the cells meeting about relying on the request spec if a cell is down 14:24:58 that's just the fact we agreed on having BFV instances not having an image that's the problem 14:25:10 it breaks the model 14:25:24 you mean 'we agreed' being some guys 7 years ago? 14:25:40 I'm in the community 14:25:52 but there are cases when a BFV instance does not have an image et all. I can create volume in cinder and pass it to nova to boot from it 14:25:53 so, we == me, even if I wasn't yet providing bugs to Nova yet 14:26:05 gibi: I konw 14:26:07 gibi: we pull the image from the volume metadata 14:26:18 we rely on that for things like rebuild, 14:26:19 gibi: I'm just saying we shot ourselves in the feet when we accepted that 14:26:23 where we now check to see if the image is changing 14:26:36 anyway, we can move on 14:26:42 i can talk with dansmith about the cellsv2 thing 14:26:49 mriedem: agreed, will review your series ASAP 14:26:59 mriedem: ping me if I haven't done that today 14:27:09 ok 14:27:20 for the RequestSpec thing, I'd love to discuss that too with dan 14:27:36 because I agree, we need to think hard about all the corner cases 14:27:50 the deletion thing is like a big problem 14:27:56 jaypipes: i thought you might like this trick https://review.openstack.org/#/c/543257/2/nova/tests/functional/regressions/test_bug_1746483.py@93 14:27:58 for the functional test 14:28:00 but maybe we also have moar problems 14:30:34 mriedem: yeah, saw that... 14:31:25 what's next/ 14:31:26 ? 14:31:31 mriedem: I'm not getting the story for that contextmanager, but let's punt that offline 14:31:47 it's a private me + jay story 14:32:07 heh 14:32:07 discussed publicly in -nova once 14:32:13 Well, I didn't want to interrupt your fun 14:32:21 Let's move on 14:32:25 #topic Open discussion 14:32:37 Nothing on the agenda 14:32:53 random question before I bother proposing a blueprint, would you placement folks be open to something like this? https://review.openstack.org/#/c/542992/ 14:33:00 We have had several discussions regarding affinity/distance in placement lately 14:33:06 (tl;dr a quick config to enable oslo's healthcheck middleware) 14:33:36 jroll: for the API services, sure. 14:33:52 jroll: looks sane 14:34:04 sweet, thank you 14:34:09 * jroll puts on rocky todo list 14:34:10 jroll: is that the _new_ healthcheck middleware? 14:34:22 cdent: it's the only one I know of? 14:34:25 is there a way to discover if that's available? 14:34:34 besides just trying to hit it and fail? 14:34:43 jroll: wouldn't that be an API SIG discussion ? 14:34:44 * alex_xu waves very late 14:34:52 jroll: https://review.openstack.org/#/c/531456/ 14:34:58 bauzas: I ask about placement specifically, because it doesn't use paste 14:34:59 jroll: and I wonder why you can't just call the root 14:35:08 and see if 20x or else 14:35:17 bauzas: our current healthcheck calls / and looks for a 401 :P 14:35:23 would be good to know what type of calls the other services make to confirm they are healthy 14:35:30 jroll: (note that I don't like that spec, I think it is internet over-engineering task force) 14:35:32 I mean, I used haproxy in the past a lot 14:35:39 jroll: that's been fixed, / is no auth now 14:35:47 ah neat 14:35:57 and IIRC, you can healthcheck whatever you want without requiring a specific WGSI middleware 14:36:13 yeah this doesn't seem very useful 14:36:14 because if your WSGI app is stuck, the middleware will still be happy 14:36:23 bauzas: the spec https://review.openstack.org/#/c/531456/ has a very different opinion ,you might find it interesting 14:36:32 if i can hit / and get version doc it's functioning right? 14:36:43 what I'd like honestly is some kind of guidance from the SIG :) 14:37:04 because application monitoring is a laaaaaarge domain 14:37:24 sure, part of why I like it is that it would be standard across all API services (for me) 14:37:44 personnally, I'd trust more a 3rd party monitoring system that would just notify my balancer to put oor a faulty node 14:37:53 i see that in the spec, and looks like it uses custom tests per service for what healthy means 14:38:32 if this becomes a thing some projects do, i can smell the community wide goal for S 14:38:34 it creates a dependency on the monitoring system, but do people seriously run in production without having a reliable monitoring system ? 14:38:42 mriedem: ++ 14:38:46 to be clear, what I have now in production is this middleware on every service that uses paste for config, and I wanted to add it to those that don't. I prefer to do such a thing upstream, so floating the idea 14:38:46 not a placement thing 14:38:59 jroll: big +1 from me. 14:39:16 ie. placement should ack to the direction that the whole coommunity would agree 14:39:35 jroll: again, on the API services. the non-API services like nova-compute and nova-conductor don't have the same place to hook middleware up. 14:39:36 jroll: the easy way to do it upstream would be to, in deploy.py do some kind of config or env check to include the middleware in the stack 14:39:44 hence the API SIG thing or the TC goal, I don't really care about the formalism 14:39:50 jaypipes: right 14:40:04 cdent: agree: https://review.openstack.org/#/c/542992/3/ironic/api/app.py 14:40:13 yeah 14:40:14 jroll: deploy.py is performing the same pattern as paste 14:40:27 I mean, that really depends on what you want to monitor, again 14:40:29 yup, totes 14:40:49 has this been queued up for ptg discussion? first i've heard of this. 14:40:58 bauzas: the api sig is unlikely to make any assertions about healthcheck stuff until the debate on the oslo spec resolves 14:40:58 so i assume i'm not the only one 14:40:59 me too 14:41:00 no, this is an idea I had thursday 14:41:15 cdent: that's a reasonable outcome and I think I agree with you 14:41:20 literally just asking "does anyone vehemently hate this" 14:41:21 well https://review.openstack.org/#/c/531456/ suggests it was discussed in sydney 14:41:24 cdent: probably more a cross-project thing 14:41:53 do we have kind of operator-developer SIG about monitoring ? 14:42:02 I know we have a logs WG 14:42:04 -ish 14:42:15 anyway, I hear y'all, will go ahead and engage the rest of openstack 14:42:35 ptg discussion during ptg in the api sig is probably a good starting point 14:42:42 at the ptg, 14:42:44 while ptging 14:42:47 sorry 14:43:08 oh right the self healing sig 14:43:10 forgot about that one 14:43:15 mriedem: feel free to add to https://etherpad.openstack.org/p/api-sig-ptg-rocky if you want 14:43:27 will do 14:44:35 OK, then, anything else on your collective minds? 14:45:25 * edleafe hears a collective yawn 14:45:33 Thanks everyone! 14:45:35 #endmeeting