20:00:49 #startmeeting heat 20:00:50 Meeting started Wed Jul 30 20:00:49 2014 UTC and is due to finish in 60 minutes. The chair is zaneb. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:51 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:53 The meeting name has been set to 'heat' 20:01:14 #topic roll call 20:01:18 here 20:01:20 hi 20:01:22 */ 20:01:22 o/ 20:01:24 o/ 20:01:26 Hi 20:01:29 hey 20:01:32 good evening :) 20:01:46 shardy is on PTO 20:02:03 ryansb? 20:02:05 SpamapS: you about? 20:02:45 #topic Review action items from last meeting 20:02:54 There weren't any \o/ 20:03:03 #topic Adding items to the agenda 20:03:14 #link https://wiki.openstack.org/wiki/Meetings/HeatAgenda#Agenda_.282014-07-30_2000_UTC.29 20:03:51 anything to add to that? 20:03:51 zaneb: I'd like to talk functional tests 20:03:57 ok 20:04:05 wow, pretty full agenda this week 20:04:15 indeed 20:04:18 functional tests = system tests = integration tests ? 20:04:40 mspreitz: sort of ;) 20:04:49 present 20:04:53 mspreitz: = Tempest scenario tests 20:04:55 #topic Spec approval criteria 20:05:22 it has been brought to my attention that the spec approval criteria are too wooly ;) 20:05:42 stevebaker: I do not understand why the answer is not "yes", but maybe it does not matter. We certainly need more better in tempest 20:06:21 so, to be clear: I don't think we want the PTL to be the only approver 20:06:52 my goal as PTL is to *reduce* the number of places where the PTL is the single point of failure 20:07:10 zaneb: I think some threshold rule before any core could +A 20:07:19 ... would make sense 20:07:27 zaneb: agree 20:07:28 however, it makes people nervous when I just say "use your judgement on when something has had enough discussion to approve: 20:07:33 mspreitz: functional testing and integration testing are conceptually a bit different, but practically speaking with heat there is huge overlap 20:08:05 so the proposal from asalkeld is that it would be OK to approve after 3x +2 20:08:31 obviously that wouldn't preclude using your judgement as well! 20:08:49 I think that makes sense for average specs. For something like convergence maybe a bit more? 20:08:55 so you don't have to approve after 3x +2, but nobody will yell at you if you do 20:09:04 that sounds fine, I've also been assuming that there is no barrier to start working on something which has yet to be approved 20:09:19 stevebaker: +1 20:09:43 tspatzier: the problem with that is that people become nervous about whether they should approve at all 20:09:49 for anything 20:09:49 stevebaker: possibly it will be faster in some cases :) 20:10:05 Although I hear some crazy companies insist that no work happens without an approved spec ;) 20:10:20 stevebaker: +1, although you have to accept the risk of your patch getting -2'd ;) 20:10:33 zaneb: absolutely 20:10:57 zaneb: agree. and I actually think that _had_ enough discussion. so we kind of reached threshold there. I was more thinking if something like this would come again ... 20:11:43 stevebaker: which is the Right Way IMO - the engineer working on it is best-placed to weigh the risk 20:12:20 stevebaker: o/ 20:12:24 talking about specs, could someone look at https://review.openstack.org/#/c/98742 and see if this is good to go? I am especially looking at stevebaker; SpamapS had some opinion as well some time back. 20:12:25 sorry lunch went long 20:12:28 sounds like there are no objections to this plan 20:12:34 tspatzier: I think people can be relied upon to seek out wider consensus for bigger changes like that without encoding it in the rules 20:12:52 tspatzier: will do 20:13:30 stevebaker: thanks - all your comments should be addressed; the last few cycles were typos only. ... and I started implementing it ;) 20:13:50 tspatzier: ok, sorry for being tardy getting back to it 20:13:53 zaneb: yes, agree. and I was not thinking of hard rules like fixed numbers. 20:14:04 zaneb: should we vote now about solution 3x +2 ? 20:14:10 stevebaker: np 20:14:44 zaneb: can we discuss functional testing early in the agenda, I need to leave early 20:14:53 skraynev: do we need a vote? 20:15:02 nobody has spoken up with a -1 20:15:10 skraynev: lazy consensus! 20:15:17 zaneb: ok, cool :) 20:15:29 #agreed Specs can be approved after 3 +2 votes from core team members, though of course people should exercise their judgement on more major changes, just as they already do with regular reviews 20:15:30 stevebaker: yeah.. 20:15:51 #topic functional testing 20:15:54 stevebaker: 20:16:39 So, it has been decided to move much project-specific functional testing out of tempest and back into the projects, because the process isn't scaling 20:16:48 \o/ 20:17:12 And I've started moving the scenario tests (which use heatclient) into heat.tests.functional 20:17:21 Hooray! 20:17:32 tspatzier: FYI, my feeling on that spec has changed of late. I'll discuss w/ you in open discussion or after in #heat. 20:17:44 the heat-slow job will go away, and there will be a new heat-functional job 20:17:50 SpamapS: ok 20:17:57 I'm quite happy that we'll be more responsible for our own functional testing. 20:18:03 stevebaker can we start submitting patches for new tempest tests then? 20:18:08 the functional tests will have zero dependency on tempest, so there is a bit of forklifting required 20:18:15 I never really liked the fact that we couldn't land our own tests. 20:18:31 Seemed like an unnecessary hurdle. 20:18:41 #link https://review.openstack.org/#/q/topic:bp/functional-tests,n,z 20:18:55 there is a spec and some WIP code ^ 20:19:06 stevebaker: I suppose, that other tests will be moved in heat.tests.unittests , right? 20:19:29 in the medium term there will be a tempest-lib which will contain another heat client, and then the API tests will move into the heat tree too 20:19:47 skraynev: meh, maybe 20:20:08 so what I actually wanted to discuss was... 20:20:08 stevebaker: that last part seems unnecessary from our perspective 20:20:18 (moving the API tests) 20:20:33 zaneb: yeah, I'm not concerned about the timeline for that 20:21:13 so as I'm currently implementing the functional tests, the following is happening: 20:21:44 there will be a "functional" env defined in tox.ini which only runs the functional tests 20:22:20 there will be no configuration file to specify cloud-specific options (although it does use oslo.config) 20:22:47 stevebaker, How "functional" will it be? devstack-ish? 20:23:10 therve: it's using devstack AIUI 20:23:15 functional tests will run if credentials are sourced, even for unit tests. If no credentials are sourced, functional tests are skipped 20:23:20 Oh, remote functional testing? 20:23:46 therve: devstack will configure things for the functional tests to run. They will assume a full working cloud 20:24:34 if glance doesn't have the exact named image required for a test, the test will be skipped 20:24:44 stevebaker, So devstack configures heat, but not the other services? 20:25:15 We should probably agree on a few standard image names 20:25:33 so one thing I wanted discussion on is skipping vs failing when test preconditions are not met (credentials, images) 20:25:50 mspreitz: cirros and not-cirros ... done. ;) 20:26:17 BTW, is that image-existence-checking covered in general, or is it only looking at parameters named image? 20:26:20 stevebaker: will it be voting job? 20:26:21 therve: no, devstack configures a full cloud, and the functional tests bring up all manner of full stacks 20:26:32 skraynev: for heat, yes 20:26:39 stevebaker: seems like the default should be to fail if missing whatever would be in the gate/check environment. 20:26:51 stevebaker: TBH I'd prefer that we _not_ run functional tests when running the unit tests, and fail if the preconditions are not met 20:27:17 mspreitz, SpamapS: image name could be something like heat-functional-test-image 20:27:21 and yea, tox -epy27 shouldn't run functional tests. 20:27:34 but 'tox' should 20:27:35 SpamapS: +1 20:27:41 SpamapS++ 20:27:43 SpamapS: +1 20:28:19 In that case we should probably tag individual tests as functional so we can do the same tox filtering as tempest 20:28:22 Will one image suffice for all the functional tests? 20:28:43 SpamapS +1 20:28:46 stevebaker: is there any value in keeping functional tests under heat.tests.* 20:28:47 ? 20:29:05 given that it's not actually using anything in heat, just heatclient 20:29:15 mspreitz: probably. It can have all the things installed on it with diskimage-builder 20:29:16 why not make it a separate package? 20:29:56 zaneb: I was wondering the same thing. 20:30:08 zaneb: heat.tests.functional seems to be an established convention in other projects already 20:30:22 but I have no strong feelings on that 20:30:23 It will be more cleanroom-ish if it is in its own repo 20:30:34 SpamapS: repo?! 20:30:37 no importing heat.common :) 20:30:41 stevebaker: yes 20:30:46 nooooooo 20:30:47 stevebaker: I would guess that they're doing something different with it 20:31:13 One could argue the best place for it is in heat-templates 20:31:45 SpamapS: there are advantages to a two-step commit process, but IMO only for the API tests 20:31:47 SpamapS: I do think it is important to keep it in tree, if for no other reason than to make it visible to heat developers. I will be doing a lot of "-1, write functional tests for this" 20:32:18 Will we have to write both functional and unit tests for everything? 20:32:27 mspreitz: absolutely, yes 20:32:29 Will functional test be enough, no unit test required? 20:32:41 I guess not 20:32:42 mspreitz: I mean, it depends on the feature 20:32:45 SpamapS: I do understand the suggestion, but I would vote for separate package in same repo 20:33:03 Yeah that's fine. I think it will make it easier to write tests, so +1 20:33:05 stevebaker: OMG. more and more tests... :) 20:33:28 Honestly we should sit down and review the tests we have, and see how many of them are actually functional. 20:33:34 mspreitz: a new resource needs full unit test coverage, and a functional test to prove it actually creates a real cloud resource 20:33:36 You know, scenario tests cover stuff outside Heat too 20:33:46 That's actually necessary as we move further into convergence, since some of the functional-ish unit tests will have to be exploded anyway. 20:33:49 stevebaker: it will be look like 5 strings main changes and 100 for tests :) 20:34:19 Anyway, sounds like we all agree and can move on? 20:34:21 any non-trivial template brings up things outside heat, so they're sort-of integration tests 20:34:38 stevebaker: if we're ready to move on, the next topic is peripherally related... 20:34:45 sure 20:35:10 #topic Heat Gap Coverage plan 20:35:21 #link https://wiki.openstack.org/wiki/Governance/TechnicalCommittee/Heat_Gap_Coverage 20:35:27 there is our plan ^ 20:35:39 the TC are happy with it 20:35:55 we need an owner and target for #3 20:36:19 if only there were a clear expert in this area... 20:36:23 well that's not so bad. 20:36:25 \o, but it will need to be rewritten for the brave new functional test world 20:36:26 * zaneb strokes chin 20:36:47 stevebaker: cool, thank you, and feel free to rewrite 20:36:55 OK, will do 20:37:03 we also need to target a milestone 20:37:19 which is weird because this is a task that is never really "done" 20:37:21 cool 20:37:38 I think the TC would like it to be juno-3 20:37:48 stevebaker: do you think that's realistic? 20:37:50 I'd like to think we can continue writing tests during feature freeze 20:37:51 (I don't) 20:38:09 kinda hinges on wether tests count as a feature 20:38:12 he-he. Now I know, that never = juno-3 20:38:24 It would be nice to have *a* voting job which runs something by juno-3 20:38:54 stevebaker: agreed 20:39:10 I don't know whether we could consider it "done" at that point though 20:39:10 I wonder how much infra work we'll need. That may be the bottleneck 20:39:16 maybe an appropriate milestone is juno release 20:39:20 the TC review at every milestone anyway 20:39:29 for progress 20:39:55 I don't feel like *a* voting job is too lofty a goal for j3 20:40:41 ryansb: no, it shouldn't be ;) It will be interesting to see how quick we can add the tests which have been lingering in the tempest reviews 20:40:52 stevebaker: ok, feel free to update the target as you see fit when you're rewriting it 20:41:09 #topic Scaling group member health maintenance 20:41:17 mspreitz: this is you 20:41:27 I'd like to get something done about this in Juno-3 20:41:29 * zaneb still hasn't finished the ML thread on this :/ 20:41:34 zaneb: OK. The TC will end up with something radically different to approve, but I'm sure they'll be fine with that 20:41:49 I want to avoid junking up the interface,... 20:41:56 follow AWS's lead, lean interface... 20:42:04 stevebaker: that's fine. it should take all of 3 minutes to re-review 20:42:05 you automatically get something simple if you say nothing. 20:42:21 mspreitz, I looked at the spec, and it's nothing but simple :/ 20:42:49 Also, I'm not sure we should go in the direction of adding monitoring to Heat 20:42:52 therve: maybe language issue, not sure what you mean. I think the interface is simple. 20:42:59 I am also concerned about the impl 20:42:59 -1 no monitoring in Heat 20:43:04 connect the API's 20:43:06 thats what we do 20:43:07 We barely removed what we add 20:43:14 added 20:43:16 Problems abound when I try to figure out how to outsource the monitoring 20:43:29 mspreitz: I'm concerned that we're going to spend a lot of time on something that is only applicable to autoscaling, when the same effort put into convergence fixes the problem everywhere 20:43:35 If we need actual monitoring, we should be asking a monitoring as a service to do it 20:43:45 mspreitz, I feel that the spec tried to address too many concerns at once 20:44:06 zaneb: not sure about that size comparison. I already have something running. 20:44:15 Also, we tried to decouple autoscaling and load balancing, and you're putting it right back 20:44:19 #link https://review.openstack.org/#/c/110354/1/specs/scaling-group-health.rst 20:44:58 therve: I think users want the option to have health determined by load balancer 20:45:14 therve: it's a better test than Nova existence and status 20:45:38 mspreitz, Sure, that doesn't mean the autoscaling group should know about the load balancer 20:45:47 It should be notified about the health of the members 20:46:11 So, no generalization of Ceilometer alarm actions in Juno-3 20:46:23 No adding notifications to Load Balancer in Juno-3 20:46:27 I already asked about those 20:46:51 Adding notifications to load balancer is a maybe for Kilo 20:47:13 wait, why is Heat monitoring your infrastructure at all (for any other reason than convergence)? Isn't that what you should set up via your template? 20:47:14 mspreitz, What do you mean, you asked in neutron and ceilometer projects? 20:47:29 I asked colleagues in those projects, yes 20:47:45 Okay, but Heat shouldn't take the burden of refused/postponed features 20:47:48 Heat isn't monitoring anybody's infrastructure. 20:48:21 I like Convergence, but (a) it will be a while before it is ready, and (b) it does not attempt to cover the whole problem 20:48:26 randallburt: I think what mspreitz is proposing is that you get some reasonable behavior comparable to AWS also if you don't do rocket science in a template 20:48:26 Convergence is not automatic 20:48:27 Heat does, however, query the state of API's it has requested things from, for the purpose of either re-requesting them or satisfying dependencies. Where we got this notion that it will monitor things, I don't really know. 20:48:39 Convergence does not pay attention to load balancer's opinion of health 20:48:50 Convergence does not take external advice on health 20:48:57 mspreitz: so write something that is automatic. Teach Heat to talk to it. 20:49:06 SpamapS: That's what I thought. I mean, convergence will want to know when reality doesn't match the template, but as far as the "health" of the service/infra you used heat to spin, up, Heat shouldn't monitor that. 20:49:41 randallburt: Heat cannot monitor that. Nor will it connect the dots from failure A to infrastucture piece B. 20:49:47 but Heat doesn't care about the semantics of the things it deploys and shouldn't; that should be expressed in your template using existing services IMO. 20:49:54 mspreitz: so if your external monitoring thing decides something is unhealthy, kill it and convergence will replace it 20:50:15 SpamapS: yeah, that's what I thought. 20:50:17 AutoScale is just a special case where we have simplified hooks for changing the scale of a group. 20:50:22 zaneb: no, convergence is not automatic 20:50:37 convergence is not 20:50:42 period 20:50:42 scaling group are where the user is NOT in control of the template 20:50:55 mspreitz: the idea of convergence is that it is automatic 20:50:55 but it will be, eventually, continuous, once we agree on how that should work. 20:51:57 * mspreitz is a little lost in parallel threads of conversation 20:52:09 SpamapS: Convergence will eventually be *what* ? 20:52:22 openstack IRC meeting 20:52:37 mspreitz: the point is that convergence, even when it is automatic, isn't going to try to provide ways that it can infer that if your response time is high that means your DB needs another slave or your app server farm needs another node. That is for the monitoring as a service bits to define. Heat just encapsulates the group. 20:53:10 mspreitz: continuous.. meaning eventually it will respond to the nova notification that your node went down 20:53:41 that is the last in the string of 3 sub-specs 20:53:43 SpamapS: in the interim while Convergence is not automatic, scaling a group does not replace a member that is deleted or sick, right? 20:53:53 mspreitz: right. 20:54:41 if you have an immediate need for bolting that on, I'd suggest doing so as a resource plugin 20:54:41 So a scaling group will accumulate junk (deleted or sick members) over time 20:54:47 or an external API 20:54:55 that's the current state 20:55:11 ((5 minutes)) 20:55:33 SpamapS: may continue in #heat ? 20:55:38 Let me ask a few ground rules... 20:55:48 OK to add an index or table to the DB schema? 20:56:00 mspreitz: let's take this back to #heat 20:56:05 OK to put implementation stuff in the template that implements a scaling group? 20:56:10 we have another agenda item to deal with today 20:56:16 zaneb: OK 20:56:25 ack 20:56:26 #topic Using rally for Heat benchmark and quotas patches 20:56:32 ok 20:56:39 boris-42: o/ 20:56:48 zaneb hi there 20:56:54 1. I told about short article: https://docs.google.com/a/mirantis.com/document/d/1s93IBuyx24dM3SmPcboBp7N47RQedT8u4AJPgOHp9-A/edit#heading=h.jr3hotxxhpit 20:57:14 sorry, it's not published yet, so just google doc. 20:57:21 ick 20:57:53 it's example how to use rally for heat performance 20:58:09 and second part about adding rally job for Jenkins reports 20:58:27 2 minutes 20:58:30 ok 20:58:36 zaneb I'll be here=) 20:59:01 skraynev: was this leading to a question? 20:59:07 We definitely need some benchmarks 20:59:17 metadata access was REALLY bad, it is just "kind of bad" now 20:59:30 +1 for benchmarks 20:59:31 anyway, we're out of time 20:59:40 to #heat! 20:59:50 zaneb: one question was about quotas patches. 20:59:58 * SpamapS steps outside into 98F weather .. 21:00:04 got to #heat... 21:00:06 skraynev: ok, let's discuss in #heat 21:00:10 #endmeeting