08:01:42 #startmeeting Heat 08:01:43 Meeting started Wed Apr 6 08:01:42 2016 UTC and is due to finish in 60 minutes. The chair is therve. Information about MeetBot at http://wiki.debian.org/MeetBot. 08:01:44 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 08:01:47 The meeting name has been set to 'heat' 08:02:10 #topic Rollcall 08:02:17 o/ 08:02:57 skraynev ? 08:03:13 Who's supposed to be here 08:04:02 stevebaker, Maybe? 08:04:19 o/ 08:04:23 o/ 08:04:32 o/ 08:04:42 therve: I am here 08:04:45 :) 08:04:59 Alright, that's a start :) 08:05:04 ramishra ? 08:05:21 #topic Adding items to agenda 08:05:37 #link https://wiki.openstack.org/wiki/Meetings/HeatAgenda#Agenda_.282016-04-06_0800_UTC.29 08:06:08 #topic Add new tag for stable branches 08:06:14 skraynev, So what's up 08:07:39 therve: so... 08:08:06 I thought about adding new tags for stable branches like liberty and kilo 08:08:25 looks like we enough new patches in these branches 08:08:37 and last tags were applied long time ago 08:09:20 I discussed it with ttx yesterday and he told, that have tag each 2 month sounds - ok 08:09:24 OK I have no idea what those do and how to do it :) 08:09:38 however I need also ask the same question in openstack-release channel 08:09:49 is this the new model for stable releases, e.g per-project tagging instead of a coordinated periodic stable release? 08:09:59 Somethng like 5.0.2 on kilo? 08:10:05 therve: I suppose, that it's just patch to releases repo... 08:10:10 therve: yeah 08:10:55 That seems reasonable 08:11:00 shardy: I am not sure.. May be we have not coordinated periodic stable release now... 08:11:10 Let's see after the meeting if we can do it? 08:11:42 shardy: it's another good reason to ask this question on openstack-stable 08:11:50 therve: sure 08:11:58 OK moving on 08:12:09 #topic AWS resources future 08:12:16 I don't know who put that in :) 08:13:33 skraynev, You presumably? 08:13:46 therve: yeah. :) 08:14:23 What's your question? 08:14:34 so I remember, that we planned don't remove AWS resources 08:15:08 but I was surprised, that nova team deprecated AWS support and totally removed it... 08:15:18 skraynev: well, it moved into a different project 08:15:19 and move it to another repo. 08:15:29 shardy: right 08:15:41 I personally don't feel like it's a huge issue for us, the maintenance overhead seems pretty small 08:15:41 so my question was: do we need to do in the same way ? 08:15:51 and we still have some coupling between resources I believe 08:15:51 We don't need to, no 08:16:17 I feel like, one day, we might want to move the cfn compatible stuff (API's and resources), but not now, or even anytime soon 08:16:28 The cfn API is still fairly useful by default until we have a nice signature story 08:16:30 shardy: as I remember, we wanted to decouple them 08:16:48 therve: Yeah, by default SoftwareDeployment still uses it 08:17:06 I started a discussion about changing that a few months ago, but then realized that certain config groups won't work without it 08:17:13 the os-apply-config one in particular 08:17:26 I'm working to remove that from TripleO, but it may be in use elsewhere also 08:17:26 shardy: aha. I see :) 08:17:52 shardy, What doesn't work? 08:18:14 therve: signalling back to heat, because the os-refresh-config script expects a pre-signed URL 08:18:20 the swift signal approach may work 08:18:23 Ah I see 08:18:28 but the heat native one won't atm 08:18:32 Not a bug in Heat per itself 08:18:49 I could probably fix the hook script, but have focussed instead on trying to not use it 08:19:03 therve: No, just a known use-case, and reason for not ripping out the CFN stuff just yet ;) 08:19:13 Right 08:19:45 skraynev, So yeah, we'll keep it unless we have a good reason, like legal crap or a real advantage to rip it out 08:20:01 mostly templates in our product team is using AWS resources:( 08:20:07 I always imagined we'd end up with CFN resources as provider resources (nested stack templates) 08:20:21 therve: ok :) I just wanted to clarify it in light of changes in nova ;) 08:20:30 but our stack abstraction has ended up so heavyweight I think the performance penalty would be too much atm 08:21:13 shardy, We can still do that. And then optimize :) 08:21:36 #topic Summit sessions 08:21:42 tiantian: Thanks, that's good feedback - I suspected they are in quite wide use but it's hard to know for sure :) 08:21:45 #link https://etherpad.openstack.org/p/newton-heat-sessions 08:22:01 +1000 on performance improvements 08:22:15 Yeah I believe that's one focus 08:22:24 Heat has become, by far, the worst memory hog on the TripleO undercloud recently 08:22:38 Still, there are probably other topics :). I sent an email to the list, but didn't get any feedback yet 08:22:41 we fixed it a while back, but it's crept back up again :( 08:22:59 shardy, We can split that topic in several sessions, 40 minutes is probably not enough 08:23:47 improve +1 08:24:43 We have 12 slots. If we have relic topics from Tokyo we can revive them too 08:24:53 #link https://etherpad.openstack.org/p/mitaka-heat-sessions 08:26:27 *cricket sound* 08:26:40 therve: do you plan prepare list of pain points for performance ? 08:26:48 I don't feel like we've made much progress on the composition improvements or breaking the stack barrier 08:26:57 those may be worth revisiting 08:27:06 therve: honestly I did not research this area deeply :) 08:27:16 skraynev, I started to 08:27:32 skraynev, I'm interested about what Rally can do for us, if you know 08:27:41 One thing I've also been wondering about is do we need a heat option which optimizes for all-in-one environments 08:28:10 we've been optimizing more and more for breaking work out over RPC, but this makes performance increasingly bad for typical all-in-one deployments 08:28:10 therve: awesome, it will be really useful to discuss the described issues. 08:28:43 shardy, Yeah I think there is a tradeoff to be made 08:28:54 therve: which date estimate for proposing topics for sessions ? 08:29:17 Final deadline next week 08:29:27 We still have next meeting 08:29:32 therve: i.e. when do you plan start migrate it from etherpad to schedule ? 08:29:40 therve: aha 08:29:45 got it 08:31:04 I'd rather have it done before, obviously :) 08:31:38 shardy, I'm also a bit concerned with convergence, which performs even worse than the regular workflow 08:31:56 therve: Yeah, that's kinda the root of my question 08:32:15 e.g do we need an abstraction which defeats some of that overhead for e.g developer environments 08:32:40 My changes to RPC nested stacks were the first step, and convergence goes a step further 08:33:18 We don't do the expensive polling over RPC though 08:33:36 shardy, For dev, you can try the fake messaging driver 08:33:58 therve: Yeah, this is the discussion I'd like to have (not necessarily at summit, just generally) 08:34:12 is there a way to offer at least docs on how to make aio enviroments work better 08:34:19 or at least as well as they once did ;) 08:34:30 or do we need to abstract things internally to enable that 08:35:07 Well, I'd like all envs to perform well :) 08:35:23 shardy: I have some graphics for comparison convergence vs legacy (it based on Angus written scenarios for rally) 08:35:26 I don't think there is any reason for things to do badly, except bad implementation 08:35:29 therve: Sure, me too - I'm just pointing out we're regressing badly for some sorts of deployments 08:35:37 however I measured it on liberty code :( 08:36:32 shardy: btw... what about plans of enabling TripleO job against convergence 08:36:43 I remmeber, that we discussed it early 08:37:04 skraynev: This is related to my question - we're hitting heat RPC timeouts without convergence due to the TripleO undercloud being an all-in-one enviromnent 08:37:12 I'm pretty sure convergence will make that worse 08:37:35 we can do some tests, but I don't think we can enable convergence for tripleo until we resove some of the performance issues 08:37:55 e.g we increased the worker count because of hitting RPC timeouts for nested stacks 08:38:03 which made us run out of memory 08:38:15 convergence will make the situation worse AFAICT 08:38:43 Obviously it's a similar experience for developers and other aio users 08:38:44 Most likely 08:40:03 shardy: hm.. interesting. do you have guess why it happens ? I mean where exactly we have little performance ? 08:40:38 Presumably we're doing to many things in parallel? Maybe we should try to limit that 08:40:39 skraynev: because we create >70 nested stacks with 4 workers, which causes Heat to DoS itself and use >2GB of memory 08:41:03 Doing it a bit more sequentially could work much better 08:41:27 therve: Yeah, we've already done stuff like unrolling nesting to improve things 08:41:44 effectively compromising on our use of the heat template model due to implementation limitations 08:42:02 shardy: could you try to use https://review.openstack.org/#/c/301323/ ? 08:42:06 but we don't want to do too much sequentially, because we've got constraints in terms of wall-time for our CI 08:42:17 or https://review.openstack.org/#/c/244117/ 08:42:33 sounds, like it may help you 08:43:24 The batch work may be a interesting thing to try indeed 08:43:28 Anyway 08:43:33 #topic Open discussion 08:43:36 skraynev: thanks, looks interesting, although zaneb appears to be opposing it 08:44:02 shardy: IMO it will be really good test-case for these patches 08:44:19 skraynev: yup, I'll pull them and give it a try, thanks! 08:44:39 :) good. 08:46:04 shardy, A good thing would be to find a reproducer without tripleo 08:46:11 Ideally without servers at all 08:46:33 I wonder if just nested stacks of TestResource would trigger the issues 08:47:33 therve: I afraid, that we may have issue with reproducing it on devstack. in this case it will be really bad for normal development/fixing. 08:47:37 therve: Yeah I did raise one against convergence with a reproducer that may emulate what we're seeeing 08:47:41 let me find it 08:48:08 I'm pretty sure it can be reproduced, I'll raise a bug 08:48:39 Cool 08:48:44 Anything else? 08:50:16 Alright thanks all 08:50:18 #endmeeting