#openstack-meeting log

20:01:23 <shardy> #startmeeting heat
20:01:24 <openstack> Meeting started Wed Aug 21 20:01:23 2013 UTC and is due to finish in 60 minutes.  The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:01:25 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
20:01:27 <openstack> The meeting name has been set to 'heat'
20:01:34 <shardy> #topic rollcall
20:01:40 <shardy> hi all, who's around?
20:01:42 <jpeeler> jpeeler here
20:01:42 <funzo> o/
20:01:43 <m4dcoder> o/
20:01:44 <radix> hello!
20:01:45 <stevebaker> here
20:01:48 <spzala> Hi!
20:01:55 <therve> Hi!
20:02:01 <zaneb> \o
20:02:36 <shardy> #link https://wiki.openstack.org/wiki/Meetings/HeatAgenda
20:03:10 <SpamapS> o/
20:03:14 <SpamapS> (leaving early)
20:03:17 <adrian_otto> o/
20:03:24 <shardy> asalkeld, sdake?
20:03:37 <shardy> ok lets get started
20:03:56 <shardy> #topic Review last week's actions
20:04:27 <shardy> #link http://eavesdrop.openstack.org/meetings/heat/2013/heat.2013-08-14-20.00.html
20:04:35 * shardy shardy to post mission statement
20:04:42 <shardy> oops, I forgot to do that, again
20:04:59 <shardy> does anyone have a link to the thread where this was requested, then I'll reply after this meeting?
20:05:08 <shardy> #action shardy to post mission statement, again
20:05:20 <stevebaker> I think people just post new threads with the statement
20:05:33 <shardy> stevebaker: Ok, thanks
20:05:44 <shardy> anything else from last week before we move on?
20:06:23 <shardy> #topic Reminder re Feature proposal freeze
20:07:10 <shardy> So it's this Friday, any features posted for review after that will require an exception, so we should start -2ing stuff which gets posted late
20:07:12 <therve> The gate already decided for us it seems
20:07:25 <shardy> therve: haha, yea quite possibly ;)
20:07:29 * bnemec is here, but got distracted playing with stackalytics...
20:07:39 <stevebaker> maybe we should review late items at the meeting
20:08:15 <shardy> stevebaker: yup, next item is h3 bps
20:08:24 <shardy> #topic h3 blueprint status
20:08:32 <zaneb> so, I'm working on https://bugs.launchpad.net/heat/+bug/1176142
20:08:34 <uvirtbot> Launchpad bug 1176142 in heat "UPDATE_REPLACE deletes things before it creates the replacement" [High,In progress]
20:08:40 <zaneb> dunno if you would call that a "feature"
20:08:46 <shardy> #link https://launchpad.net/heat/+milestone/havana-3
20:08:49 <zaneb> but it is *not* going to land by friday
20:08:50 <SpamapS> zaneb: \o/
20:09:03 <SpamapS> zaneb: that is a bug 100%
20:09:04 <uvirtbot> Launchpad bug 100 in launchpad "uploading po file overwrites authors list" [Medium,Fix released] https://launchpad.net/bugs/100
20:09:09 <SpamapS> haha doh
20:09:22 <zaneb> basically, it takes longer to rebase my current work than time I have to work on it
20:09:24 <shardy> zaneb: well technically I think it's only BPs which are frozen, but IMO we should start deferring any big bugs which contain new functionality not just fixes
20:09:36 <zaneb> so I have a negative amount of time to spend on the actual problem
20:09:40 <shardy> zaneb: so maybe we defer until early icehouse?
20:09:45 <zaneb> which is already *really* hard
20:09:48 <shardy> #link https://launchpad.net/heat/+milestone/havana-3
20:10:21 <shardy> Ok so I think heat-trusts will slip too - I've just had too many problems with keystone, and it's taken too long to get the client support merged
20:10:27 <SpamapS> zaneb: are you saying that you have so many things in-flight that you spend all day rebasing?
20:10:30 <zaneb> yeah, let's see how things go over the next week or so, but I'm thinking it will probably have to be bumped :(
20:10:40 <radix> shardy: sad :(
20:10:59 <therve> zaneb, Considering it's a bug, can't it go past the feature freeze?
20:11:08 <therve> Or is it too big of a behavior change?
20:11:18 <zaneb> it's a big behaviour change, tbh
20:11:19 <adrian_otto> bugs are exempt
20:11:21 <shardy> zaneb, radix: IMO it's much better to bump big stuff which is too late than land loads of risky stuff and make our first integrated release really broken
20:11:31 <adrian_otto> there is no point in delaying a bugfix
20:11:57 <radix> shardy: yeah, I understand
20:12:03 <shardy> adrian_otto: This is a complete rework of our update logic, so it's not a normal bugfix
20:12:20 <adrian_otto> same logic applies
20:12:24 <shardy> but yeah, in general bugs are fine
20:12:29 <adrian_otto> unless you are addinga  new feature?
20:12:29 <m4dcoder> zaneb: if the UPDATE_REPLACE is fixed, i think it's going to be easier to implement the rolling update for as-update-policy.
20:12:35 <radix> I was just sad that the keystone stuff is buggy
20:13:04 <shardy> radix: well it's very new stuff, AFAICT we're the first to try to really use it
20:13:33 <SpamapS> shardy: I think landing potentially broken stuff early is also an anti-pattern. Lets land good stuff.
20:13:34 <radix> yeah, that's usually how it goes when you have inter project dependencies like that
20:14:31 <shardy> SpamapS: yeah, but if we've got stuff which is rushed and may be flaky, or depends on stuff known-to-be-flaky in other projects, now is not really the time to land it
20:14:45 <shardy> early in the cycle is much less risky as we've got more time to fix and test
20:14:52 <SpamapS> Yes, what I am saying is, lets not count on it being landable, flaky, on day 1 of icehouse.
20:15:07 <SpamapS> There is never a time where it is ok to risk breaking everything.
20:15:53 <shardy> SpamapS: sure, but deferring gives those working on said features more time to test and solidify their stuff before it's merged
20:16:16 <SpamapS> deferring +1. Planning to de-stabilize trunk, -1.
20:16:40 <SpamapS> (and I acknowledge that you were just saying "lets defer")
20:16:53 <SpamapS> I'm just saying, don't defer, and stop working on it.
20:17:02 <SpamapS> defer landing, keep stabilizing it.
20:17:11 <shardy> Yeah, who was saying destablize trunk, I think you just made that up ;P
20:17:17 <shardy> anyways..
20:17:32 <SpamapS> The entire software industry made that up. Its called "lets just drop it in trunk when we re-open after freeze".
20:17:43 <shardy> There are a few bp's still in "Good Progress", some of which have patches posted I think:
20:17:45 * SpamapS moves on :)
20:18:05 <shardy> #link https://blueprints.launchpad.net/heat/+spec/hot-parameters
20:18:24 <shardy> #link https://blueprints.launchpad.net/heat/+spec/multiple-engines
20:18:52 <shardy> #link https://blueprints.launchpad.net/heat/+spec/heat-multicloud
20:18:53 <stevebaker> Should have something to submit today
20:18:58 <stevebaker> for multi-engines
20:19:07 <stevebaker> i mean multicloud :/
20:19:18 <stevebaker> to many multis
20:19:23 <shardy> #link https://blueprints.launchpad.net/heat/+spec/oslo-db-support
20:19:45 <shardy> Ok, cool, just wanted to see if some of those should actually be either Implemented or Needs Code Review
20:19:57 <shardy> anyone know if hot-parameters is really done?
20:20:34 <zaneb> I think it's really sort-of done
20:20:42 <shardy> https://review.openstack.org/#/q/status:merged+project:openstack/heat+branch:master+topic:bp/hot-parameters,n,z
20:21:08 <shardy> All the stuff posted is merged, so can we claim it Implemented for h3 purposes?
20:21:22 <zaneb> for h3 purposes I think so, yes
20:21:43 <shardy> zaneb: Ok, cool, thanks
20:22:14 <zaneb> best to double check with nanjj too though
20:22:19 <radix> then maybe it needs another bp for post-h features
20:22:57 <shardy> I moved a few bugs into h3 which look like it would be good to fix, if anyone has bandwidth and needs something to do please pick them up :)
20:22:57 <zaneb> radix: yes, quite likely
20:23:20 <shardy> Ok I'll ping nanjj to check tomorrow
20:23:43 <shardy> Anyone else have anything to raise re h3 before we move on?
20:23:58 <therve> I may slip my lbaas bp in it
20:24:09 <therve> It "just" needs one more branch I think
20:24:33 <shardy> Overall I think we've done really really well, 27 bps and 60 bugs atm, if we land most of that it's going to be a great effort :)
20:24:33 <radix> what's the remaining branch?
20:24:54 <shardy> therve: Ok, cool, if things are up for review but not targetted please add them
20:24:59 <therve> radix, https://review.openstack.org/#/c/41475/
20:25:07 <m4dcoder> i'm still working to get my last patch submitted for review for as-update-policy before end of week.  as-update-policy was moved to "next".
20:25:17 <shardy> I think m4dcoder posted a patch which got bumped and we may be able to pull back
20:25:21 <shardy> snap
20:25:21 <radix> oh OK
20:25:59 <shardy> m4dcoder: Ok, if it looks like it's going to land, I'll pull it back
20:26:19 <m4dcoder> thx.  i have 1 in review and another 1 i'm going to submit before end of week for as-update-policy.
20:26:45 <shardy> m4dcoder: Ok, sounds good thanks
20:27:43 <shardy> #topic Open Discussion
20:27:54 <shardy> anyone have anything?
20:28:05 <spzala> shardy: I will follow up with nanji tonight on hot-parameter validation blueprint, I think it is done like you mentioned. Sorry I was on another meeting.
20:28:22 <m4dcoder> can havana release of heat still work with an openstack instances on grizzly?
20:28:31 <shardy> spzala: Ok, that would be great, thanks, pls change the Implementation status if so
20:28:43 <stevebaker> there was a recent change where if a nova boot fails, the instance resource deletes the server during create. Do we want to be doing this?
20:29:02 <zaneb> stevebaker: excellent question
20:29:02 <shardy> m4dcoder: You mean on a grizzly openstack install?
20:29:06 <spzala> shardy: OK, no problem. Yup, will do.
20:29:08 <m4dcoder> shardy: yes
20:29:14 <zaneb> I don't want to be doing that, it frightens me
20:29:29 <shardy> m4dcoder: maybe, but it's not something we support, you need to use stable/grizzly
20:29:44 <stevebaker> I'm inclined to leave the failed server there, for post-mortum if nothing else
20:29:54 <radix> I have been wondering about having an explicit "try to converge" operation in heat
20:30:37 <radix> which would basically clean up and retry to get things to look like the template
20:30:39 <SpamapS> Are we deleting it, and re-trying?
20:30:42 <SpamapS> I do like that
20:30:46 <SpamapS> an ERROR state is a dead instance.
20:30:54 <stevebaker> just deleting and putting the resource in FAILED state
20:30:55 <shardy> stevebaker: I agree, I don't think we want to delete, or try to delete until stack delete
20:31:07 <stevebaker> i shall raise a bug
20:31:13 <radix> no, it doesn't currently retry. and I don't think it should by default
20:31:27 <m4dcoder> shardy: thanks.
20:31:29 <SpamapS> yeah that could lead to a large bill. ;)
20:31:42 <radix> instance group does this too BTW - deletes the sub resources
20:32:24 <shardy> radix: yeah I was looking at that in conjunction with a patch from liang
20:32:46 <zaneb> how did a major change to the behaviour of instances sneak in in a commit that claimed to be just adding a Racksapce resource?
20:32:48 <therve> radix, It'd be nice to have at least an API call to "retry" create
20:32:50 <zaneb> https://github.com/openstack/heat/commit/2684f2bb4cda1b1a23ce596fcdb476bb961ea3f8
20:32:53 <shardy> radix: IMO that is also wrong, the InstanceGroup resource should go into a failed state (probably UPDATE, FAILED) if it can't adjust
20:32:57 <zaneb> that's extremely uncool
20:33:18 <shardy> therve: there is a bug for allowing retry of create/update
20:33:20 <radix> yeah. I didn't do it, I just tried to maintain the behavior through my redactor :)
20:33:32 <radix> f
20:34:08 <therve> zaneb, That's a bit sad and untested :/
20:34:30 <bnemec> zaneb: Wow, that was a really bad commit.  It introduced the ResourceFailure bug too.
20:34:32 <SpamapS> shardy: all these transitions to failed state make the urgency of needing a "RETRY" capability go up.
20:34:39 <shardy> radix: sure, well lets raise a big and fix it
20:35:02 <zaneb> bnemec: no, I introduced the ResourceFailure bug by not spotting that (bizarre) change
20:35:04 <SpamapS> have not had time to address the lack thereof.. but would still like to very much
20:35:08 <shardy> SpamapS: well IIRC it's assigned to you...
20:35:33 <zaneb> bnemec: and by "not spotting" I mean "relying on the unit tests instead of grep"
20:35:42 <shardy> SpamapS: if you don't have the bandwidth, let me know and we'll reassign
20:35:45 <SpamapS> Yeah, I keep running into these things where heat is a time bomb waiting to eat all of your memory/disk/cpu ... can't seem to prioritize retry over those. ;)
20:35:48 * zaneb goes to the naughty corner
20:36:06 <bnemec> zaneb: Yeah, part of the problem with that is it's too large.  800 some lines is too much to review properly.
20:36:29 <kebray> SpamapS we're running into the same problem.. other general scalability issues are keeping us from getting to implementing a Retry
20:36:47 * SpamapS watches the ducks line up
20:36:56 <kebray> but, +1 on wanting retry, both retry create, and retry individual steps of the create.
20:36:56 <SpamapS> and now I have an appointment that I have to get to.
20:36:59 <zaneb> bnemec: well, the problem is when it says "Add resource for Rackspace Cloud Servers" but actually makes fundamental changes to other resources :)
20:37:00 <SpamapS> anybody need me for something before I go?
20:37:44 <shardy> kebray: OK well lets coordinate getting someone (if SpamapS can't get to it) looking at that soon
20:37:54 <kebray> shardy sounds good.
20:38:01 <shardy> SpamapS: o/
20:38:21 * SpamapS goes poof
20:38:22 <shardy> So I have a general question re upgrade strategy when we move to trusts..
20:38:38 <stevebaker> shoot
20:39:00 <shardy> the cleanest way is to drop the DB and just use trusts for all user_creds, but I'm thinking we need to allow transistion, ie existing stacks should still work
20:39:24 <bnemec> zaneb: Sure, but in a 200 line change that probably gets shot down by reviewers.  In 800 it gets lost in the noise.
20:39:32 <bnemec> (sorry for the tangent in the middle of the meeting)
20:39:45 <shardy> so my current plan is to extend the context and user_creds adding a trust_id, which we use if it's there, otherwise we fall back to the user/pass for old, existing stacks
20:39:59 <therve> shardy, Is there a way to migrate existing stacks?
20:40:04 <zaneb> bnemec: fair point; that's hard to avoid when you're adding a whole new resource though
20:40:05 <stevebaker> shardy: how about storing in resource_data?
20:40:22 <shardy> therve: not really, because you don't have a connection to keystone at DB migrate time
20:40:40 <zaneb> bnemec: but yes, that should have been at least 3 patches. and at least 1 should have been rejected ;)
20:40:45 <shardy> but we could write a tool which creates a trust using the stored credentials, and migrate it that way
20:41:05 <bnemec> zaneb: Agreed. :-)
20:41:25 <stevebaker> shardy: does this mean new secrets need to make it onto instances?
20:41:34 <shardy> stevebaker: I was thinking of just adding the trust_id to the stack table, but that makes the overlap between old/new methods harder
20:41:38 <stevebaker> oh, this is just for api requests
20:41:38 <zaneb> shardy: heat-manage could do that maybe?
20:41:54 <shardy> stevebaker: no, not yet, this is just the credentials for periodic tasks in the engine
20:42:19 <shardy> zaneb: hmm, yeah, but heat-manage would need the ID of the heat service user
20:42:36 <stevebaker> short answer is, it would be nice if the old method continued to work in parallel
20:42:42 <shardy> I guess it could read it from a cli arg or config file
20:43:00 <zaneb> shardy: it would need the whole config file to decrypt the credentials anyway
20:43:08 <zaneb> shardy: but that seems doable
20:43:13 <shardy> zaneb: good point
20:43:35 <zaneb> shardy: if you want to get really fancy, you could do it in the db migration ;)
20:43:44 <shardy> stevebaker: agreed, but then do we publish e.g that we'll transistion to just trusts after e.g one cycle?
20:44:21 <shardy> zaneb: haha, yeah I guess, was trying to keep things simple ;)
20:44:45 <stevebaker> shardy: actually, we may need to keep the old way for a while if we support older openstack clouds
20:45:07 <shardy> Ok thanks all for the input, will try to get a wip patch up for review soon, aiming for early Icehouse when the keystoneclient patches etc have landed
20:45:19 <stevebaker> like, indefinitely. And have some way of discovering if the keystone supports trusts
20:45:29 <zaneb> ick
20:45:36 <shardy> stevebaker: you mean for multicloud?
20:45:42 <stevebaker> shardy: yes
20:45:48 <shardy> gah
20:45:59 <zaneb> it's extremely, extremely uncool that we are storing credentials at all
20:46:00 <stevebaker> :D
20:46:02 <shardy> that could get really messy
20:46:17 <zaneb> I think it's better to say that if you don't have a compatible keystone, you lose out
20:46:23 <stevebaker> zaneb: yes, unless it is a private heat installation
20:46:39 <zaneb> than to say that if you don't have a compatible keystone, we store your password in a really insecure way
20:46:48 <shardy> zaneb: that's why I was hoping everyone would say drop-the-db, kill the stored-creds ;)
20:46:51 <stevebaker> plaintext!
20:46:59 <zaneb> it may as well be
20:47:31 <zaneb> if we keep both around, you'll never be sure which we're doing
20:47:43 <stevebaker> maybe we can look into a keystore for icehouse
20:47:47 <shardy> stevebaker: So we say master only supports havana for native openstack deployments, are you saying we somehow have to maintain indefinite backwards compat for multicloud?
20:48:29 <shardy> zaneb: If we can manage a flag-day migration, I would much prefer it, and the resulting code will be much much easier to maintain
20:48:30 <stevebaker> shardy: that is something we should discuss
20:48:44 <stevebaker> sorry, I have to go.
20:48:56 <shardy> stevebaker: Ok lets pick it up on the ML
20:49:15 <shardy> anyone else have anything for the last few minutes?
20:49:37 <radix> I was distracted for a bit, was a bug filed about not deleting failed instances?
20:49:45 <radix> I can file one for the InstanceGroup and take that on
20:50:20 <zaneb> radix: I think stevebaker was going to file one
20:50:32 <zaneb> for Instance, that is
20:50:34 <shardy> radix: not yet, please do, it was discussed ref https://review.openstack.org/#/c/42462/
20:51:03 <radix> okie doke
20:51:17 <zaneb> radix: https://bugs.launchpad.net/heat/+bug/1215132
20:51:19 <radix> whoah. nice new format for jenkins comments :D
20:51:20 <uvirtbot> Launchpad bug 1215132 in heat "Nova server gets deleted immediately after failed create" [High,Confirmed]
20:51:33 <radix> zaneb: ok, I'll create a similar one for InstanceGroup.
20:52:09 <shardy> funzo: Did you get the feedback you needed re autoscaling last week?
20:52:30 <shardy> funzo: guess you mainly need to speak with asalkeld re alarms etc from your ML post?
20:53:06 <funzo> I didn't have a long discussion, it was just pointing to the doc
20:53:14 <radix> I can't assign bugs to milestones, so if someone wants to put https://bugs.launchpad.net/heat/+bug/1215140 in h3 that'd be peachy
20:53:15 <uvirtbot> Launchpad bug 1215140 in heat "InstanceGroup shouldn't delete instances that failed to be created" [Undecided,New]
20:53:57 <funzo> shardy: I believe the general thought is we could use nested stacks as a first cut, but I've been focused more on DIB this past week.
20:54:21 <shardy> funzo: Ok, well shout if there's any info you need from us :)
20:54:23 <funzo> shardy: I'll probably talk more about the scaling work when the rhel images are booting in os
20:54:30 <funzo> shardy: definitely will, thx
20:54:34 <shardy> funzo: Ok, cool
20:54:47 <shardy> anything else before we wrap things up?
20:55:25 <shardy> Ok then, well thanks all!
20:55:31 <shardy> #endmeeting