15:00:07 <therve> #startmeeting heat
15:00:08 <openstack> Meeting started Wed Jun  8 15:00:07 2016 UTC and is due to finish in 60 minutes.  The chair is therve. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:13 <therve> #topic Roll call
15:00:13 <openstack> The meeting name has been set to 'heat'
15:00:35 <jdob> o/
15:00:39 <jaimguer> o/
15:00:46 <jasond> o/
15:00:50 <zaneb> howdy y'all
15:00:53 <ananta> Hi
15:00:53 <duvarenkov__> hi
15:00:53 <ramishra> hi
15:00:54 <rpothier> o/
15:00:56 <cwolferh> o/
15:01:20 <spzala> o/
15:01:44 <Drago> o/
15:01:57 <ochuprykov> hi
15:02:23 <therve> OK!
15:02:43 <therve> #topic Adding items to agenda
15:02:54 <therve> #link https://wiki.openstack.org/wiki/Meetings/HeatAgenda#Agenda_.282016-06-08_1500_UTC.29
15:03:31 <therve> #topic Convergence status
15:03:48 <therve> OK! I added this item because we missed the n1 release
15:04:00 <therve> Got some small issues, like the tripleo skip not merged
15:04:39 <therve> I believe it's in now, so we should proceed
15:04:52 <ananta> therve: I think we are ready, we should discuss if there is something stopping us from doing so
15:05:08 <shardy> therve: yup it merged https://review.openstack.org/#/c/321090/
15:05:24 <prazumovsky> Hi!
15:05:29 <ricolin> o/
15:05:30 <therve> ananta, Oh the unittest one is even merged already?
15:05:33 <jdandrea> o/
15:05:39 <ananta> yup
15:05:46 <therve> Awesome
15:05:52 <therve> Let's do this then :)
15:05:59 <zaneb> SHIP. IT.
15:06:00 <ananta> https://review.openstack.org/#/c/325798/
15:06:02 <shardy> therve: I'm planning to get an experimental tripleo job working that enables it
15:06:14 <jdob> three seconds later "Guys, the CI rack is literally on fire."
15:06:45 <shardy> thanks to the patches from cwolferh, zaneb and others the heat memory utilization is looking quite a lot less bad now
15:06:55 <therve> Wednesday is as good a day as any to break everybody :)
15:06:58 <shardy> so it'll be interesting to see how much it explodes now
15:07:00 <zaneb> shardy: nice, any numbers on that?
15:07:08 <therve> Please pay attention to CI breakage from other projects
15:07:13 <shardy> I know stevebaker also has some db optimizations which I've been meaning to test
15:07:32 <shardy> zaneb: I've got some locally but not yet plotted them - will try to do so later
15:07:44 <zaneb> shardy: awesome, thanks
15:07:54 <therve> #topic Performance improvements
15:08:08 <therve> Retro topic!
15:08:31 <therve> shardy, Do you think we'll see some other things taking memory now that the files thing is fixed?
15:09:23 <shardy> therve: I'm sure there are others, but that was definitely a big part of the problem
15:09:44 <shardy> I'll try to quantify it later then we can decide how much effort to invest in further memory profiling etc
15:10:00 <therve> Yeah I was hoping it would clear things enough that we could see other problems
15:11:03 <therve> We'll see how it goes
15:11:23 <therve> That makes me think of another topic
15:11:31 <therve> #topic Refactor db API
15:11:36 <shardy> I think the next big issue is DB optimization, e.g bug #1578854
15:11:36 <openstack> bug 1578854 in heat "resource-list makes SQL calls for every resource which is a nested stack" [High,In progress] https://launchpad.net/bugs/1578854 - Assigned to Steve Baker (steve-stevebaker)
15:11:54 <therve> zaneb, So the other day you mentioned having more transactions around db operations
15:11:55 <shardy> we've got operators waiting well over 10 minutes for a single heat resource-list command :(
15:12:29 <therve> There is this new enginefacade API in oslo_db which may make things more obvious
15:12:34 <therve> shardy, Yeah I hope it will help
15:12:48 <zaneb> shardy: doesn't it time out at 10 minutes even on an undercloud?
15:13:11 <shardy> real 13m23.083s
15:13:13 <shardy> it appears not
15:13:22 <shardy> I'm not sure if config was modified tho
15:13:32 <zaneb> maybe
15:13:47 <therve> shardy, It's with -n5 or something though, right?
15:14:04 <shardy> therve: Yeah, and it was a big deployment (about 300 nodes)
15:14:34 <therve> Yeah
15:14:40 <zaneb> I've seen some of stevebaker's patches land already... looks like they should make a big difference
15:14:57 <therve> But even some 100 sql queries shouldn't take 10 minutes :)
15:15:13 <therve> Especially when heat has basically one user
15:15:33 <shardy> zaneb: cool, it'd be good to see if we can get them tested in some real large deployments - it's hard to really prove in dev environments
15:15:44 <zaneb> therve: I suspect there is some n^2-ness to how the current queries work ;)
15:16:33 <therve> zaneb, Yeah I know... I reintroduced one recently even...
15:16:43 <therve> Because of that required by property
15:17:09 <zaneb> shardy: just need to find some sucke^W testers to try it out
15:18:29 <shardy> zaneb: heh, those with a spare few hundred nodes ideally ;)
15:18:53 <shardy> slagle got some time on the OSIC cluster, I'll try to find out when that's happening
15:19:25 <therve> #topic Pending reviews
15:19:35 <therve> So hum, we have some reviews around
15:19:40 <therve> Please review!
15:19:59 <therve> https://review.openstack.org/135492 seems ready to me, it'd be cool to have that in
15:20:25 <therve> The conditions thing is almost there, so if you want to have a look before it's too late, please do
15:20:29 <therve> cc zaneb  :)
15:20:54 <zaneb> yeah, need to re-review that then
15:21:19 <therve> And yeah the performance tweaks by stevebaker too
15:21:32 <therve> Though I'd like rally to be working before, but well :)
15:21:40 <jdob> are thsoe tweaks under the same topic?
15:21:50 <jdob> or just look for baker patches
15:23:03 <therve> jdob, I think there are a bunch of them
15:23:24 <shardy> I'd like to see stevebaker's https://review.openstack.org/#/c/280963/ stack failures list patch land
15:23:26 <therve> At least bug/1588561 and bug/1578854
15:23:34 <shardy> we'll probably enable it by default in tripleoclient
15:23:38 <jdob> kk
15:25:03 <therve> shardy, It seems to query all the resources?
15:25:37 <shardy> therve: it recurses over the failed ones, yes
15:25:56 <therve> shardy, I mean it does a plain resources.list() and then filter on the result, instead of just querying the failed ones
15:26:07 <shardy> probably scope for optimization, but the output it provides is good
15:26:09 <shardy> therve: aha
15:26:48 <ramishra> therve: what about the external_id patch? I've raised an issue, though there may not be an easy fix for it atm.
15:26:57 <ramishra> so probably can go in.
15:27:57 <therve> ramishra, I don't really understand your concern
15:28:17 <therve> You mean you can't reference a removed resource from the properties?
15:28:59 <ramishra> therve: if we leave the properties with a reference to another resource in the template(for a external resource)
15:29:15 <ramishra> then it's used for dependency calculation
15:29:32 <ramishra> so when that dependant resource is removed from the template it would fail
15:29:44 <ramishra> when you try to update
15:29:49 <therve> That seems correct to me?
15:29:59 <therve> I don't understand which behavior you're expecting
15:30:28 <therve> Completely ignore the properties of the external resource?
15:30:28 <ramishra> It would not fail, when the reference is there.
15:31:04 <ramishra> yes, ignore them for dep calculation, or don't to dep calculation for external resource
15:31:14 <therve> I don't really see why.
15:31:29 <therve> Maybe it's worth documenting it, but it seems reasonable as-is to me
15:31:58 <ramishra> what's the deps for external resource mean?
15:32:10 <zaneb> without having looked at the patch, I don't think we would ever want to ignore anything in the dependency calculation
15:32:31 <zaneb> but this is perhaps a topic best discussed after the meeting :)
15:32:42 <therve> ramishra, Not much, but that's a user problem
15:32:56 <therve> Don't add dep to other resources if you have an external resource
15:33:03 <therve> Yeah
15:33:07 <therve> #topic Open discussions
15:33:51 <zaneb> I had one thing to make folks aware of
15:34:04 <ramishra> I've one more thing to discuss, why do we have policy enforcement for resource_types in heat default policy. Don't they mask the policies from the services.
15:34:36 <jdob> i have a quick one: Kanagaraj and I have been working towards a PoC and spec changes for the hot-template stuff, so there should be something to show in the next few days
15:35:27 <therve> ramishra, They allow providers to give a better user experience
15:35:45 <therve> jdob, Awesome!
15:36:09 <zaneb> back in Kilo when we split nested stacks and called them over RPC, we didn't have a way of cancelling them if another failure occurred in the parent stack, with the result that the user has to wait for everything to time out after a failed operation before they could try again
15:36:24 <zaneb> we put in code to fix this on updates in Liberty
15:36:42 <zaneb> unfortunately that code is not working even in Mitaka :(
15:37:00 <zaneb> also, we don't have any code that even attempts to cancel creates
15:37:12 <ramishra> therve: hmmm.. I am not sure how that's better experience, when there can be possibilty of conflicts in the heat policy vs service policy
15:37:34 <zaneb> and if the code was present/working it would still be doing stuff wrong
15:37:41 <zaneb> so that's what I'm working on at the moment :)
15:38:13 <therve> ramishra, http://specs.openstack.org/openstack/heat-specs/specs/liberty/conditional-resource-exposure-roles.html was approved
15:38:43 <zaneb> ramishra: eventually there's going to be a keystone API for this. in the meantime this is probably the best we can do
15:39:19 <therve> zaneb, That's cool. Do you think that will touch what we talked about with regards to transactions?
15:40:27 <zaneb> therve: hopefully we're going to interrupt threads at yield points instead of randomly, so that may solve/mask many of the problems
15:40:45 <therve> zaneb, Hummm. Don't we do that already?
15:40:45 <zaneb> therve: but I still think we should use transactions for everything
15:41:06 <ramishra> zaneb: my suggestion is to get rid of them, anyway user will get to know about service policy enforcements. Though I know there would not be many takers for this:)
15:41:36 <zaneb> therve: for stack-cancel-update yes. but when you initiate a delete it just kills the thread
15:42:14 <zaneb> therve: it will be a lot of work to do this everywhere though. and even then there will be times when we have to kill a thread as a fallback
15:42:19 <therve> zaneb, Right, but kill an eventlet thread is just raising an exception at yield point
15:42:22 <therve> AFAIU at least
15:43:18 <zaneb> therve: by 'yield' I meant explicit 'yield' in a co-routine. so can't happen in the middle of a series of DB operations like eventlet switch can
15:43:34 <therve> Ah, okay
15:44:00 <zaneb> ramishra: that's what we used to have and it was terrible ;)
15:44:19 <therve> Well I'm very interested in that work at any rate :)
15:44:27 <zaneb> ramishra: we had to keep admin-only resource plugins in /contrib. this is the price of moving everything in-tree
15:45:13 <zaneb> therve: great, you can review my mega patch series when it's ready ;)
15:45:19 <therve> :D
15:45:24 <therve> OK, anything else?
15:45:59 <zaneb> er, that read wrong. it's a very long series, not a series of very long patches
15:46:45 <cwolferh> feedback on change 303692  (resource properties data data model change) would be appreciated, especially last comment
15:46:48 <therve> I totally expected 28 patches each 13 lines long by reading that :)
15:47:46 <therve> cwolferh, Do you mean https://review.openstack.org/267953 ?
15:48:21 <cwolferh> er, right, the other patch :-)
15:48:36 <therve> cwolferh, So yeah that test interacts with other tests somewhat on purpose
15:48:57 <therve> The purge is something operators are going to run while heat is running, so it needs to work while stuff is happening
15:49:22 <zaneb> therve: they're not going to run it with time=0 though, surely?
15:49:23 <cwolferh> i'm not sure purge_deleted 0 should really be encouraged, but ok
15:49:42 <therve> zaneb, They're not, but I'm not sure it should make a difference?
15:50:01 <cwolferh> well, the good thing at least, it didn't cause the other tests to fail
15:50:55 <therve> I'm not sure why it's suddenly a problem with the new table
15:51:09 <therve> If you calculate what to remove correctly it should work as usual
15:51:35 <cwolferh> i'm not sure it wasn't stepping on other things and not being noticed before
15:52:15 <therve> I'm pretty sure it was stepping on other things and some issue were found :)
15:52:39 <zaneb> therve: if we e.g. write the DELETE_COMPLETE event after marking the stack DELETE_COMPLETE then there could be a race there that only occurs if with purge_deleted 0
15:53:34 <zaneb> and that's not necessarily an illegitimate way to do it, although it does sound fixable
15:54:12 <zaneb> although stack-level events should have properties attached anyway...
15:54:15 <therve> zaneb, events reference stacks though?
15:54:22 <therve> So we would see this issue on master
15:55:03 <zaneb> the failure was an event referencing a deleted properties data row iirc
15:55:10 <cwolferh> right
15:55:24 <cwolferh> also a resource
15:55:38 <zaneb> we can move this one back to #heat too I think
15:55:44 <therve> Sure
15:55:44 <cwolferh> yep
15:55:50 <therve> #endmeeting