15:00:07 #startmeeting heat 15:00:08 Meeting started Wed Jun 8 15:00:07 2016 UTC and is due to finish in 60 minutes. The chair is therve. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:10 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:13 #topic Roll call 15:00:13 The meeting name has been set to 'heat' 15:00:35 o/ 15:00:39 o/ 15:00:46 o/ 15:00:50 howdy y'all 15:00:53 Hi 15:00:53 hi 15:00:53 hi 15:00:54 o/ 15:00:56 o/ 15:01:20 o/ 15:01:44 o/ 15:01:57 hi 15:02:23 OK! 15:02:43 #topic Adding items to agenda 15:02:54 #link https://wiki.openstack.org/wiki/Meetings/HeatAgenda#Agenda_.282016-06-08_1500_UTC.29 15:03:31 #topic Convergence status 15:03:48 OK! I added this item because we missed the n1 release 15:04:00 Got some small issues, like the tripleo skip not merged 15:04:39 I believe it's in now, so we should proceed 15:04:52 therve: I think we are ready, we should discuss if there is something stopping us from doing so 15:05:08 therve: yup it merged https://review.openstack.org/#/c/321090/ 15:05:24 Hi! 15:05:29 o/ 15:05:30 ananta, Oh the unittest one is even merged already? 15:05:33 o/ 15:05:39 yup 15:05:46 Awesome 15:05:52 Let's do this then :) 15:05:59 SHIP. IT. 15:06:00 https://review.openstack.org/#/c/325798/ 15:06:02 therve: I'm planning to get an experimental tripleo job working that enables it 15:06:14 three seconds later "Guys, the CI rack is literally on fire." 15:06:45 thanks to the patches from cwolferh, zaneb and others the heat memory utilization is looking quite a lot less bad now 15:06:55 Wednesday is as good a day as any to break everybody :) 15:06:58 so it'll be interesting to see how much it explodes now 15:07:00 shardy: nice, any numbers on that? 15:07:08 Please pay attention to CI breakage from other projects 15:07:13 I know stevebaker also has some db optimizations which I've been meaning to test 15:07:32 zaneb: I've got some locally but not yet plotted them - will try to do so later 15:07:44 shardy: awesome, thanks 15:07:54 #topic Performance improvements 15:08:08 Retro topic! 15:08:31 shardy, Do you think we'll see some other things taking memory now that the files thing is fixed? 15:09:23 therve: I'm sure there are others, but that was definitely a big part of the problem 15:09:44 I'll try to quantify it later then we can decide how much effort to invest in further memory profiling etc 15:10:00 Yeah I was hoping it would clear things enough that we could see other problems 15:11:03 We'll see how it goes 15:11:23 That makes me think of another topic 15:11:31 #topic Refactor db API 15:11:36 I think the next big issue is DB optimization, e.g bug #1578854 15:11:36 bug 1578854 in heat "resource-list makes SQL calls for every resource which is a nested stack" [High,In progress] https://launchpad.net/bugs/1578854 - Assigned to Steve Baker (steve-stevebaker) 15:11:54 zaneb, So the other day you mentioned having more transactions around db operations 15:11:55 we've got operators waiting well over 10 minutes for a single heat resource-list command :( 15:12:29 There is this new enginefacade API in oslo_db which may make things more obvious 15:12:34 shardy, Yeah I hope it will help 15:12:48 shardy: doesn't it time out at 10 minutes even on an undercloud? 15:13:11 real 13m23.083s 15:13:13 it appears not 15:13:22 I'm not sure if config was modified tho 15:13:32 maybe 15:13:47 shardy, It's with -n5 or something though, right? 15:14:04 therve: Yeah, and it was a big deployment (about 300 nodes) 15:14:34 Yeah 15:14:40 I've seen some of stevebaker's patches land already... looks like they should make a big difference 15:14:57 But even some 100 sql queries shouldn't take 10 minutes :) 15:15:13 Especially when heat has basically one user 15:15:33 zaneb: cool, it'd be good to see if we can get them tested in some real large deployments - it's hard to really prove in dev environments 15:15:44 therve: I suspect there is some n^2-ness to how the current queries work ;) 15:16:33 zaneb, Yeah I know... I reintroduced one recently even... 15:16:43 Because of that required by property 15:17:09 shardy: just need to find some sucke^W testers to try it out 15:18:29 zaneb: heh, those with a spare few hundred nodes ideally ;) 15:18:53 slagle got some time on the OSIC cluster, I'll try to find out when that's happening 15:19:25 #topic Pending reviews 15:19:35 So hum, we have some reviews around 15:19:40 Please review! 15:19:59 https://review.openstack.org/135492 seems ready to me, it'd be cool to have that in 15:20:25 The conditions thing is almost there, so if you want to have a look before it's too late, please do 15:20:29 cc zaneb :) 15:20:54 yeah, need to re-review that then 15:21:19 And yeah the performance tweaks by stevebaker too 15:21:32 Though I'd like rally to be working before, but well :) 15:21:40 are thsoe tweaks under the same topic? 15:21:50 or just look for baker patches 15:23:03 jdob, I think there are a bunch of them 15:23:24 I'd like to see stevebaker's https://review.openstack.org/#/c/280963/ stack failures list patch land 15:23:26 At least bug/1588561 and bug/1578854 15:23:34 we'll probably enable it by default in tripleoclient 15:23:38 kk 15:25:03 shardy, It seems to query all the resources? 15:25:37 therve: it recurses over the failed ones, yes 15:25:56 shardy, I mean it does a plain resources.list() and then filter on the result, instead of just querying the failed ones 15:26:07 probably scope for optimization, but the output it provides is good 15:26:09 therve: aha 15:26:48 therve: what about the external_id patch? I've raised an issue, though there may not be an easy fix for it atm. 15:26:57 so probably can go in. 15:27:57 ramishra, I don't really understand your concern 15:28:17 You mean you can't reference a removed resource from the properties? 15:28:59 therve: if we leave the properties with a reference to another resource in the template(for a external resource) 15:29:15 then it's used for dependency calculation 15:29:32 so when that dependant resource is removed from the template it would fail 15:29:44 when you try to update 15:29:49 That seems correct to me? 15:29:59 I don't understand which behavior you're expecting 15:30:28 Completely ignore the properties of the external resource? 15:30:28 It would not fail, when the reference is there. 15:31:04 yes, ignore them for dep calculation, or don't to dep calculation for external resource 15:31:14 I don't really see why. 15:31:29 Maybe it's worth documenting it, but it seems reasonable as-is to me 15:31:58 what's the deps for external resource mean? 15:32:10 without having looked at the patch, I don't think we would ever want to ignore anything in the dependency calculation 15:32:31 but this is perhaps a topic best discussed after the meeting :) 15:32:42 ramishra, Not much, but that's a user problem 15:32:56 Don't add dep to other resources if you have an external resource 15:33:03 Yeah 15:33:07 #topic Open discussions 15:33:51 I had one thing to make folks aware of 15:34:04 I've one more thing to discuss, why do we have policy enforcement for resource_types in heat default policy. Don't they mask the policies from the services. 15:34:36 i have a quick one: Kanagaraj and I have been working towards a PoC and spec changes for the hot-template stuff, so there should be something to show in the next few days 15:35:27 ramishra, They allow providers to give a better user experience 15:35:45 jdob, Awesome! 15:36:09 back in Kilo when we split nested stacks and called them over RPC, we didn't have a way of cancelling them if another failure occurred in the parent stack, with the result that the user has to wait for everything to time out after a failed operation before they could try again 15:36:24 we put in code to fix this on updates in Liberty 15:36:42 unfortunately that code is not working even in Mitaka :( 15:37:00 also, we don't have any code that even attempts to cancel creates 15:37:12 therve: hmmm.. I am not sure how that's better experience, when there can be possibilty of conflicts in the heat policy vs service policy 15:37:34 and if the code was present/working it would still be doing stuff wrong 15:37:41 so that's what I'm working on at the moment :) 15:38:13 ramishra, http://specs.openstack.org/openstack/heat-specs/specs/liberty/conditional-resource-exposure-roles.html was approved 15:38:43 ramishra: eventually there's going to be a keystone API for this. in the meantime this is probably the best we can do 15:39:19 zaneb, That's cool. Do you think that will touch what we talked about with regards to transactions? 15:40:27 therve: hopefully we're going to interrupt threads at yield points instead of randomly, so that may solve/mask many of the problems 15:40:45 zaneb, Hummm. Don't we do that already? 15:40:45 therve: but I still think we should use transactions for everything 15:41:06 zaneb: my suggestion is to get rid of them, anyway user will get to know about service policy enforcements. Though I know there would not be many takers for this:) 15:41:36 therve: for stack-cancel-update yes. but when you initiate a delete it just kills the thread 15:42:14 therve: it will be a lot of work to do this everywhere though. and even then there will be times when we have to kill a thread as a fallback 15:42:19 zaneb, Right, but kill an eventlet thread is just raising an exception at yield point 15:42:22 AFAIU at least 15:43:18 therve: by 'yield' I meant explicit 'yield' in a co-routine. so can't happen in the middle of a series of DB operations like eventlet switch can 15:43:34 Ah, okay 15:44:00 ramishra: that's what we used to have and it was terrible ;) 15:44:19 Well I'm very interested in that work at any rate :) 15:44:27 ramishra: we had to keep admin-only resource plugins in /contrib. this is the price of moving everything in-tree 15:45:13 therve: great, you can review my mega patch series when it's ready ;) 15:45:19 :D 15:45:24 OK, anything else? 15:45:59 er, that read wrong. it's a very long series, not a series of very long patches 15:46:45 feedback on change 303692 (resource properties data data model change) would be appreciated, especially last comment 15:46:48 I totally expected 28 patches each 13 lines long by reading that :) 15:47:46 cwolferh, Do you mean https://review.openstack.org/267953 ? 15:48:21 er, right, the other patch :-) 15:48:36 cwolferh, So yeah that test interacts with other tests somewhat on purpose 15:48:57 The purge is something operators are going to run while heat is running, so it needs to work while stuff is happening 15:49:22 therve: they're not going to run it with time=0 though, surely? 15:49:23 i'm not sure purge_deleted 0 should really be encouraged, but ok 15:49:42 zaneb, They're not, but I'm not sure it should make a difference? 15:50:01 well, the good thing at least, it didn't cause the other tests to fail 15:50:55 I'm not sure why it's suddenly a problem with the new table 15:51:09 If you calculate what to remove correctly it should work as usual 15:51:35 i'm not sure it wasn't stepping on other things and not being noticed before 15:52:15 I'm pretty sure it was stepping on other things and some issue were found :) 15:52:39 therve: if we e.g. write the DELETE_COMPLETE event after marking the stack DELETE_COMPLETE then there could be a race there that only occurs if with purge_deleted 0 15:53:34 and that's not necessarily an illegitimate way to do it, although it does sound fixable 15:54:12 although stack-level events should have properties attached anyway... 15:54:15 zaneb, events reference stacks though? 15:54:22 So we would see this issue on master 15:55:03 the failure was an event referencing a deleted properties data row iirc 15:55:10 right 15:55:24 also a resource 15:55:38 we can move this one back to #heat too I think 15:55:44 Sure 15:55:44 yep 15:55:50 #endmeeting