20:01:10 #startmeeting heat 20:01:11 Meeting started Wed Jun 26 20:01:10 2013 UTC. The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:01:12 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:01:14 The meeting name has been set to 'heat' 20:01:19 #topic rollcall 20:01:25 o/ 20:01:30 hi 20:01:32 ahoy! 20:01:34 hello 20:01:34 yo 20:01:38 o/ 20:01:42 pong 20:01:45 Hoy 20:02:05 oh hey 20:02:21 asalkeld: I now realize just how late this meeting is for you. you're dedicated :) 20:02:30 well, early I guess :) 20:02:31 it is 6am 20:02:32 o/ 20:03:12 Ok, hi all, lets get started! :) 20:03:36 #topic Review last weeks actions 20:03:41 o/ 20:03:50 hey sdake 20:03:56 #link http://eavesdrop.openstack.org/meetings/heat/2013/heat.2013-06-19-20.00.html 20:04:03 two actions: 20:04:14 #info sdake to raise BP re gold images 20:04:23 that happened: 20:04:51 #link https://blueprints.launchpad.net/heat/+spec/cloudinit-cfntools 20:05:00 quit 20:05:18 #info asalked to write up AS/Ceilometer wiki 20:05:27 that also happened: 20:05:47 #link https://wiki.openstack.org/wiki/Heat/AutoScaling 20:05:52 yes, thanks asalkeld :) 20:06:00 thanks asalkeld :) 20:06:06 Indeed 20:06:11 so still need to chat to radix and therve about 20:06:12 it 20:06:24 sdake: the cloud-init cfntools looks interesting 20:06:32 sdake: aiming for h2 or h3? 20:06:49 maybe h3 depends how native-nova-instance comes along 20:06:57 sdake: ok, cool 20:07:00 I finally got my devstack working - so can make some progress 20:07:10 that blueprint will only take a couple days 20:07:15 the one you linked above 20:07:21 anyone else got anything to raise from last week? 20:07:23 sdake, is that a generic approach for installing any kind of "agent" or just cfn-tools? 20:07:27 sdake: SpamapS's os-collect-config may be doing the metadata fetch for cfntools, just something to keep in mind for that bp 20:07:31 any agent tspatzier 20:07:52 sdake, sounds interesting 20:08:05 my thinking is you put a agent directory in a config file, and bam it gets loaded 20:08:11 Right I just submitted os-collect-config to stackforge 20:08:20 sdake, There is some limit on how much you can put in user data no? 20:08:36 therve yes, I added compression to userdata 20:08:41 but there may be some upper limits 20:08:55 unfortunately userdata compresses into base64 ;( 20:08:58 there will be cloud-provider-specific limits too, but we can sort that 20:08:59 can't send binary 20:09:05 Yeah 20:09:07 sdake: any idea what the compressed limit will be, roughly? 20:09:14 the uncompressed limit is really low iirc 20:09:16 sdake: does this put the actual agent source in the userdata? cfntools tarball is already close to the limit 20:09:20 shardy no idea 20:09:23 https://github.com/SpamapS/os-collect-config.git until it is added in stackforge. :) 20:09:28 stevebaker yes 20:09:48 It's fairly low in EC2, like 16k or something 20:09:51 it puts the agent files in the userdata and compresses them, then base64 encodes them 20:09:52 o/ 20:10:03 ec2 is 16k, nto sure what openstack is 20:10:06 sdake: I'm just thinking we need to ensure we don't leave a tiny amount of space for template userdata stuff by filling up the payload with agents 20:10:08 16k is rediculous 20:10:20 shardy it will take some testing to see what happens 20:10:20 It's pretty small 20:10:31 the payload might have to be an install script rather than the actual agent 20:10:35 my compression blueprint ends with a 4k userdata on a small template 20:10:45 So, I'd almost rather see heat just serve the cfntools tarball to the instance rather than shove it into userdata. 20:10:57 or from swift? 20:11:01 stevebaker: that's what I was thinking, but cloud-config already allows you to specify scripts, which could e.g pip install stuff 20:11:15 randallburt, nice idea 20:11:50 pip install has problems 20:11:51 so why not just add a pre-userdata script to install cfntools? 20:12:05 if pip mirrors are down, instance wont start 20:12:06 SpamapS: or just specify a url in a config file, which could be anything, webserver, pre-signed swift url, etc 20:12:07 randallburt: yeah having it in swift or just having a configurable URL for them in the engine seems like it would work fine. 20:12:13 if network is slow, pip install times out 20:12:22 really we don't want to pip install anything at image start time 20:12:43 key is to be local 20:12:44 I'm sure the NSA has clouds that don't have Internet access at all, too :) 20:12:47 Not if you don't have a mirror 20:12:47 spamaps I'll try that out 20:12:59 Doesn't have to be pip.. just needs to be something which can encapsulate tools. 20:13:11 ya config drive might work too 20:13:32 anyway feel free to discuss on the blueprint :) 20:13:39 Ok, sounds good 20:13:58 #topic h2 blueprint status 20:14:02 sdake, agent tools always a popular discussion point 20:14:09 lol 20:14:09 asalkeld groan :( 20:14:20 #link https://launchpad.net/heat/+milestone/havana-2 20:14:30 two weeks until h2 20:14:54 stevebaker been busy - 17 bps ;) 20:15:04 #link https://wiki.openstack.org/wiki/Havana_Release_Schedule 20:15:18 I've bumped a couple of blocked/not started things to h3 already 20:15:21 oh i guess those are bugs 20:15:41 anything else there which probably won't make it? 20:15:58 depends on reviews, but I'm good 20:16:04 https://bugs.launchpad.net/heat/+bug/1154139 20:16:06 I'm starting to worry that I haven't started on rolling updates, so it may not make H3, but nothing in H2. 20:16:16 not a real high priority and not sure I'll have time to fix it 20:16:47 Ok, cool, well if in doubt please start bumping rather than suddenly deferring lots of things near the milestone 20:17:05 I'll be switching to quantum by default, so should start getting some traction on those quantum bugs 20:17:31 stevebaker: sounds good 20:18:05 i'd like tot switch to quantum by default too, stevebaker if you sort out devstack + quantum can you share your localrc? 20:18:10 mine seems to not get dhcp addresses 20:18:10 so on a related-to-h2 note, I'm unfortunately going to be on holiday for the two weeks around the milestone (booked before I took the PTL role on..) 20:18:32 sdake: I made some progress yesterday, will pm 20:18:37 thanks 20:18:48 so I need a volunteer to handle the release coordination, speaking to ttx about go/no-go and generally making sure what we end up with works 20:18:52 anyone keen? 20:18:57 I can 20:19:25 stevebaker: OK, thanks - I'll follow up with you to discuss the details 20:19:48 SpamapS: we should do a heat-cfntools release too, and switch the element to install from pip? 20:19:49 #info stevebaker to do h2 release management while shardy on pto 20:20:07 stevebaker: +1 20:20:17 stevebaker: IMO we should do the heat-cfntools release now 20:20:20 and way more often 20:20:24 it has had a ton of improvement 20:20:29 yeah 20:20:38 anyone like an action to do that? 20:20:53 even if I am trying to kill it, as yet unsuccessful, its rather wiley, like its old man 20:21:00 lol 20:21:03 * SpamapS raises Dr. Evil eyebrow at heat-cfntools 20:21:27 its not being killed, just neutered ;) 20:21:56 #info heat-cfntools release to be made prior to h2 20:21:58 * SpamapS suggests we rename cfn-hup to SCOTTYDONT 20:22:33 Ok, that's all I have for today 20:22:38 #info open discussion 20:22:51 GenericResource for integration tests 20:23:03 +1 20:23:12 asalkeld, therve: do you guys want to have a chat now/after the meeting? 20:23:26 I'd like to register GenericResource by default again, so it can be used in tempest templates 20:23:39 Well after is a bit late for me 20:23:49 stevebaker: could it just be loaded as a plugin? 20:23:51 and extend it so it can do mock delays during in-progress states 20:23:52 therve: ok 20:24:08 So.. 20:24:12 ResourceFacade 20:24:14 radix might help to do an informal first 20:24:30 stevebaker: or do you mean put it in engine/resources? 20:24:41 SpamapS, that is only usable to a nested stack 20:24:42 shardy: it could, devstack would have to configure that though - I don't really see the harm in it being registered by default though. 20:24:45 ok 20:24:49 so small user base 20:24:53 shardy: yes, back to engine/resources 20:25:03 by the way: https://twitter.com/radix/status/349986392971022336 20:25:15 it could be useful to users, as a placeholder in their templates 20:25:25 stevebaker: I guess it's not that big of a deal, but if it's only ever going to be used for testing, it'd be kind of nice for it to be in a test-appropriate location 20:25:32 asalkeld: The term still makese no sense to me. 20:26:13 SpamapS, open to others... 20:26:24 asalkeld, therve: anyway I kind of want to let you guys talk about linear vs circular dependency 20:26:28 radix: is it you or the cat typing? ;) 20:26:31 I'm in the middle 20:26:36 shardy: we cooperate ;-) 20:26:36 shardy, stevebaker: would it make sense just to leave it where it is and configure the tempest integration stuff to load plugins from tests/resources? 20:26:57 radix: that explains all the references to mice.. ;) 20:27:04 hehe :) 20:27:10 randallburt: that's basically what I was suggesting, but I'm not strongly opposed to moving it if needed 20:27:27 asalkeld, Right, what do you think of "duplicating" the server creation in the AS service? 20:27:29 fwiw, I'd rather it stay in a test-y place. 20:27:32 randallburt: I'll look into that, but I'm not against moving it 20:27:35 Anyway, please respond on openstack-dev if you have some insight for me to help me understand what Fn::ResourceFacade means. 20:27:36 asalkeld, So that it's possibly standalone 20:27:50 SpamapS: I agree the resource facade naming is awkward, but I've got no better alternatives to suggest 20:27:57 radix, well we need to dream up a usable solution 20:28:10 just not sure on what feature that provides 20:28:16 I think SpamapS's use cases may also be relevant to the way the AS-Service and Heat fit together 20:28:38 what use case was that? 20:28:40 asalkeld: ok, I'll send an email to openstack-dev (or just add something to the wiki page) about the dreamt up solutions. I can write up two alternatives 20:28:52 asalkeld: he's talked about being able to address the individual nodes in a scaling group 20:29:10 I have some ideas for that 20:29:33 radix, do you have time for a chat after this? 20:29:36 radix: I was working on the InstanceGroup resource today, and thought it's pretty dumb that we don't store the instances in the DB 20:29:41 asalkeld: yep, plenty 20:29:45 cool 20:30:10 I guess that will be fixed as part of bug 1189278 tho 20:30:13 shardy, i still think instancegroup should be a special type of nested stack 20:30:13 shardy: yeah but I just want to understand the use cases better 20:30:41 asalkeld: Yeah, maybe that would work 20:31:02 For the record, I am -1 on AS spinning up instances for itself. I'd much rather see auto scaling as a subset of ceilometer's functionality suitable for general u[D[D[D[D[D[D[D[D[D[D[D[D[D[D[D[D[Dser consumption. 20:31:26 asalkeld: that would actually make the stuff in there which converts from string into GroupedInstance resources much cleaner (or unnecessary) 20:31:29 SpamapS, agree 20:31:43 shardy, excactly 20:31:44 SpamapS, I'm not sure those are contradictory? 20:31:53 so, I think we have a pretty strong interest in *allowing* autoscale to be able to spin up instances without needing to use Heat templates 20:31:58 SpamapS, I mean, ceilometer would need some kind of service to autoscaling? 20:31:59 at least, without requiring the user to care about Heat templates 20:32:31 SpamapS: You may be right, but we need to get the first stage of ceilometer integration done, then we'll have a clearer picture of what should do what 20:32:34 therve: ceilometer needs a place to send notifications to. 20:32:41 so use webhooks and let orchestration set up autoscale how it sees fit while allowing others to spin up their own orchestration/scaling solutions. 20:32:46 shardy: agreed 20:32:47 radix I get that, but what I don't get is how you plan to communicate to the service the metadata which describes the autoscaling 20:32:57 SpamapS: my expectation was that CM would notify us when to spin up instances, or kill them 20:33:06 not that CM would do the orchestration part 20:33:12 sdake: do you mean in the "user is using Heat templates" case? 20:33:27 radix I mean the user needs some userdata, and some parameters around scaling 20:33:29 sdake, launchconf 20:33:30 shardy, Well not really? It would notify us when a load is high, or something 20:33:30 shardy: precisely my thinking as well. If AS is allowed to do that independent of Heat, then it will make the instances inside AS unaddressable (or special cased) by Heat. 20:33:37 yes launchconfig from aws speak 20:33:45 that is like a template 20:34:01 sdake: so yeah, that's the debate. I see two possibilities 20:34:01 therve: Yeah, which we then map to an AS action via the ScalingPolicy associated with the alarm/event 20:34:11 Right ok 20:34:17 I think I'm going to write up these two possibilities and we can all have concrete options to discuss 20:34:23 I'd advocate a special nested stack that as can update 20:34:33 radix sounds like a good openstack-dev topic ;) 20:34:37 yes 20:34:45 so to me, an instance group should just be an array of resources. AS would use notifications to operate on that array. Said arrays are useful without AS too, so I believe the instance management should only be in Heat. 20:34:54 asalkeld: I'm starting to agree with you.. :) 20:35:01 with a parameter giving the number of instances 20:35:11 then an AS action is just handle_update, like the loadbalancer now is 20:35:14 so no worries about config 20:35:21 yea 20:35:25 gotta pick robyn up from airport bbl 20:35:28 o/ 20:35:29 asalkeld: you also need to be able to say "remove this instance that is performing inefficiently" 20:35:32 asalkeld: +1 20:35:48 I'll make sure to include use cases 20:36:18 or "remove this instance that has been compromised ... 20:36:35 yea 20:36:42 Both of those things are things Heat needs to support regardless of AS being smart enough to remove inefficient instances eventually. 20:36:48 SpamapS: that's a differnt type of resource, a specialized InstanceGroup 20:37:13 shardy: what makes it special? 20:37:25 works on an array? 20:37:40 not just one resource 20:37:54 SpamapS: the fact that you differentiate health of individual instances, rather than just adding or removing them in a dumb way, like we currently do 20:38:24 SpamapS: I'm not saying it's a bad idea, but just that it should be e.g InstanceMonitorGroup or something 20:38:27 shardy: right, I'm saying that independent of AS.. as an operator.. one needs these capabilities. 20:38:36 gah 20:38:48 one group with all the capabilities please :) 20:39:07 (maybe not everything turned on, but don't make me delete all the servers just to turn on a nice feature) 20:39:15 Yeah, I kinda feel like there should be an "InstanceGroup" living in Heat that is manipulated by other things 20:39:18 SpamapS: what you're suggesting implies a more detailed set of metrics than is necessary for simple scaling 20:39:30 an administrator might remove an instance, an autoscale service might add an instance 20:39:34 shardy, that is ok 20:39:49 more policy 20:39:58 more complexity 20:39:58 in other words, composition instead of inheritance :) 20:40:09 asalkeld: so would CM alarms allow aggregation of several metrics, to make the alarm decision? 20:40:13 shardy: Agreed! What I'm suggesting is that the fundamental thing that AS should operate on is a fully addressable array of servers, and that operators need that too. 20:40:29 yea you just tag the instances 20:40:34 +1 20:40:37 to SpamapS 20:40:51 SpamapS: yep, which moving to a nested stack would actually give you, as asalkeld pointed out 20:41:15 yeah thats basically already what I'm doing in tripleo because InstanceGroup doesn't allow addressing the servers. :-P 20:41:19 in the nested stack model, are the individual instances actually Resources? 20:41:28 yes 20:41:41 SpamapS: OK, well I think we're all agreed that we need to fix that ;) 20:41:44 so... doesn't this contradict the opinion that resources shouldn't be created/removed dynamically via API? 20:41:51 no 20:41:51 https://github.com/stackforge/tripleo-heat-templates/blob/master/nova-compute-group.yaml 20:41:59 for reference :) 20:42:00 you are updating the stack 20:42:09 asalkeld: are you suggesting that all manipulation of the substack be done with uploading a template? 20:42:21 radix: no, you are updating the resource, which just happens to be a nested stack underneath 20:42:23 Note "NovaCompute0" ... You can guess how we scale up/down :) 20:42:29 radix, no 20:42:33 shardy: right but there are resources you're creating and removing inside that substack 20:42:43 use the Stack() class 20:42:52 and have some smarts in heat 20:43:03 but talk to it via update 20:43:09 radix: yes, which is fine, and a side-effect of that is that they are addressable via a fully-qualified name via the API 20:43:10 or some action 20:43:10 I see 20:43:20 because the nested resources are real resources 20:43:26 just not defined at the top-level 20:43:49 so the "adjust" API (to be created) on the InstanceGroup will add or remove Instance resources from its substack? 20:43:52 internal substack 20:44:17 it also means we can use the same update method for user-template updates and internal resource updates (like is now done for LoadBalancer, thanks andrew_plunk ;) 20:44:38 ;) 20:44:39 radix, or update 20:44:50 yeah, whatever it's called 20:44:53 okay, that sounds sensible to me 20:44:56 and pass it some new metadata 20:45:06 therve: what do you think of this? 20:45:25 I like the idea of AutoScale-Service hitting this API in Heat to add or remove instances 20:45:34 radix, Trying to digest it :) 20:45:57 radix: that sounds like a good separation of concerns 20:45:59 if you want we can chat further about it 20:46:05 radix: there's no adjust API needed really, you just update the template, or trigger a scaling event 20:46:13 shardy: what's "trigger a scaling event"? 20:46:31 radix: webhook, either called from ceilometer, or somewhere else 20:46:33 shardy, You need an API if you want to support SpamapS' use cases 20:46:38 Like removing a specific instance 20:46:59 shardy: once we have the AS-Service, ceilometer should only be talking to it, IMO, and the AS-Service should talk to Heat in response 20:47:00 Hum or maybe there is a property of nested stack I'm missing 20:47:03 "removing a specific instance" should just be a template update as well, though wouldn't it? 20:47:16 assuming the nested stack implementation 20:47:21 randallburt: the problem is that substack wasn't created by a template 20:47:24 randallburt, You may be right 20:47:38 Yeah but it would be 20:47:45 radix, heat will always talk to as 20:48:17 therve: I'm still not convinced we need a whole new API to allow that, but I guess we'll see :) 20:48:17 you have to create/destory the as group/as policy resource 20:48:38 asalkeld: right I agree... I don't think that's related to my point though 20:48:53 I was just saying that long-term, ceilometer should not talk directly to Heat, but rather go through the AS-Service which is where scaling policies will live 20:48:53 radix: its not created by a template, but it has one, and that template can be manipulated by any entity, AS or not. 20:49:09 therve: I think a special type of resource update is all that is needed, ie a native scaling group resource with a very flexible properties schema 20:49:21 shardy, Sounds about right 20:49:24 SpamapS: agreed. I just think that if you as an administrator wants to remove a specific instance it's a crappy workflow to download a reflected template, delete a few lines from it, and then reupload it to that stack 20:49:45 I agree there's a need for an API for stack manipulation in an imperative fashion... that would need to be the only way users were allowed to mess with internally created stacks. 20:49:47 but that's not related to my problem :) 20:49:54 radix, that is not what we are suggesting 20:50:01 radix: maybe so but one that's really easy to automate 20:50:03 * stevebaker has to go 20:50:15 asalkeld: my statement to to spamaps is not about autoscaling at all 20:50:21 Basically AS is just a smart, hyper vigilant robotic operator. :) 20:50:25 it's unrelated. just an aside. 20:50:30 radix: I'm saying you allow control of it from the top-level template definition 20:50:44 or via a special type of webhook update 20:50:51 shardy: even to talk about individual instances? I guess there could be an array of IDs in the properties? 20:51:01 I think we're going to need more concrete foundations to stand on if we're going to continue this discussion. 20:51:08 yes, need a list of use cases. 20:51:08 Perhaps ideas can be put into templates and sent to the ML? 20:51:10 or a map/json property 20:51:27 radix: something like that, yes, expose the array via a resource attribute, and allow explicit specification of a list of instance uuids via a property 20:51:28 templates/api calls/etc I mean. 20:51:43 shardy: yeah I can get behind that 20:51:54 anyway, my poor kids are about to melt, have to take them for lunch now. 20:51:59 SpamapS: seeya :) 20:52:03 thanks for the input 20:52:09 radix: and thank you. :) 20:52:28 really thank you everyone for coming to this discussion with open minds... I think the end product is going to be really great. 20:52:35 * SpamapS disappears 20:52:55 Ok, well as mentioned earlier, I think we'll be better placed to plan the next steps of this after asalkeld is done with the watch-ceilometer work 20:53:04 but we can keep discussing it :) 20:53:22 real-world use cases are useful, lets keep adding those 20:53:28 :) 20:53:36 Ok, anyone have anything else before we wrap things up? 20:53:37 is anyone else particularly interested in starting an openstack-dev about this, or shall I still do that? 20:53:48 Just one thing maybe 20:53:59 I have been trying to think of how to write an email for like two days but still haven't been able to, but now with this conversation I think I have a good base 20:54:00 radix, you could alter the wiki 20:54:07 There was some concern that the ceilometer integration was backward incompatible 20:54:18 think about use cases 20:54:19 Could we create new resources to workaround that? 20:54:22 radix: I'd keep adding to and refining the wiki, and use openstack-dev for more contentious stuff where we need to argue a bit first ;) 20:54:28 alright, sounds good 20:54:44 therve: don't know what you're talking about 20:55:00 you mean the stuff asalkeld is working on? 20:55:04 radix, Keep the current resources using the cfn monitoring 20:55:05 Yeah 20:55:28 well we want to get rid of our cw 20:55:35 it is not scalable 20:55:49 and basically sucks 20:55:50 Okay :) 20:55:52 what's cw? 20:55:58 cloud watch 20:56:00 ah. ok 20:56:08 therve: at one point we did say we were going to maintain the old stuff for one cycle, but tbh, I'm not sure if we want to do that 20:56:11 but we do need to be careful 20:56:36 migrating alarms would be _interesting_ 20:56:42 if we do, then I guess we'll have to say there are limitations, like multiple engines can only work whem the cw stuff is disabled 20:56:47 shardy, Fair enough. It's not like the current implementation is going to work for many people :) 20:57:18 therve: exactly, it was just a starting point, now we know there's a better way, so may as well just do that :) 20:57:44 when we created the cw/watchrule stuff, CM was just starting, so we had no choice 20:57:55 3mins 20:58:22 shardy, any response about task work? 20:58:28 (qpid) 20:58:40 asalkeld: didn't see any, no... 20:58:41 can celery grow qpid support 20:58:47 as usual this has been a very helpful meeting 20:58:49 bummer 20:58:56 I'm sure it's possible (or oslo rpc) 20:59:02 just a question of who does it 20:59:07 yip 20:59:10 it's a blocker for now for sure 20:59:17 right out of time, thanks all! 20:59:17 celery uses kombu, but doesn't work directly with qpid last I checked 20:59:25 #endmeeting