#openstack-meeting log

04:01:55 <ekcs> #startmeeting congressteammeeting
04:01:56 <openstack> Meeting started Fri Feb 22 04:01:55 2019 UTC and is due to finish in 60 minutes.  The chair is ekcs. Information about MeetBot at http://wiki.debian.org/MeetBot.
04:01:57 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
04:01:59 <openstack> The meeting name has been set to 'congressteammeeting'
04:02:31 <ekcs> hi. topics here as usual: https://etherpad.openstack.org/p/congress-meeting-topics
04:02:38 <akhil_jain> ekcs: Hi!
04:02:52 <ekcs> hi akhil_jain ! happy friday hope things are well.
04:03:25 <akhil_jain> Happy friday, yes everything fine. what about you?
04:03:35 <ekcs> I’m doing alright!
04:03:57 <ekcs> ok so first a quick reminder:feature freeze
04:03:58 <ekcs> 7 March 2019
04:04:11 <ekcs> RC1 due 21 March 2019
04:04:38 <ekcs> let’s dive in to the topics then.
04:04:47 <ekcs> #topic Managing alarms
04:04:54 <ekcs> akhil_jain: I assume that’s your topic?
04:05:35 <akhil_jain> Yes right, ekcs  as already discussed on mail. According to me. there can be faulty situation
04:07:41 <ekcs> yup. do you have a sample policy and scenario to help our analysis?
04:07:43 <akhil_jain> My usecase is. when an alarm is raised and it stays in congress with ACTIVE state. so lets consider operator created policy1 and handled the alarm situation. BUt state of alarm is still ACTIVE. so when he creates POLICY2 it can  again execute actions based on that alarm
04:08:07 <akhil_jain> which may have been handled by  policy1
04:09:31 <ekcs> right. ok I pulled up our email.
04:09:41 <akhil_jain> its 1.3 according to your mail. The one you specified to make new table
04:09:43 <ekcs> so the situation is either 3 or 4 in the email
04:09:44 <ekcs> 3. Alarms which had been activated and triggered some action, but the
04:09:45 <ekcs> alarm remains active because the action do not resolve the alarm.
04:09:46 <ekcs> 4. Alarms which had been activated and triggered some action, and the
04:09:47 <ekcs> action is in the process of resolving the alarm, but in the mean time the
04:09:48 <ekcs> alarm remains active.
04:10:09 <akhil_jain> yes right
04:11:22 <ekcs> so.. i want to consider a slightly different scenario. what if policy1 an policy2 have been active the whole time. that seems like the usual case.
04:11:32 <akhil_jain> also we can publish that discussion on openstack-discuss to get input from various developers as well
04:12:00 <ekcs> then when the alarm is activated, it would trigger action from BOTH policy1 and policy2
04:12:28 <akhil_jain> yes right even that can be harmful in various scenarios
04:13:01 <ekcs> ok. so let’s focus on the simple case first that does not have to do with timing and ordering.
04:13:29 <akhil_jain> like 1 policy pausing vms and other evacuating
04:13:30 <ekcs> how would you like the system to behave in the case that alarm activates while both policy1 and policy2 are in place?
04:14:40 <akhil_jain> thats a tough question :D anyways i would like only one policy to be executed on one alarm
04:15:12 <akhil_jain> that will cause issue of priority
04:16:04 <ekcs> I see. so… in my opinion in this case it’s up to the policy writer to write the policy in a way which decides what happens.
04:16:48 <ekcs> for example, if the writer wantns both actions to trigger, then she can write execute[action1] :- alarm1; execute[action2] :- alarm2
04:17:50 <ekcs> sorry i meant execute[action1] :- alarm1; execute[action2] :- alarm1
04:18:21 <ekcs> however, if she does not want both to trigger, then she can write something like execute[action1] :- alarm1, severity('low'); execute[action2] :- alarm1, severity('high')
04:20:39 <akhil_jain> i am not sure if policy writer can evaluate whether to take action on which alarm. they will get just list of alarms. lets suppose one alarm is 15 days old and still saying status=active. which was raised because compute node1 was down. But in actual the compute node1 is up now. as operator resolved the issue. but based on alarm policy writer can harm that node or vms created on that
04:22:15 <akhil_jain> saving alarms and computing actions on those is bit complex and can be wrongly used in real time scenarios.
04:22:46 <ekcs> ok that makes sense.
04:23:00 <akhil_jain> reaching to one solution will involve other communities as well i think. maybe monasca aodh n all
04:23:03 <ekcs> so would 1.3 solve the problem? one table for all active alarm. another table for most recent alarms.
04:23:52 <akhil_jain> yes maybe one table for all alarms and other for those on which policy is executed. i am not sure
04:24:28 <akhil_jain> or one field of policy_executed with alarm. dont know just a thought
04:25:12 <ekcs> yea there are several possibilities.
04:25:43 <ekcs> but i’m still trying to isolate the problem. is it distinguishing between new and old? or distinguishing between action taken and action not taken.
04:25:59 <akhil_jain> also gmann wants to discuss it in PTG if you are planning one
04:26:09 <ekcs> got it.
04:26:12 <ekcs> that’s great.
04:26:39 <akhil_jain> i would say both are the possibilities
04:26:58 <akhil_jain> 1.new n old
04:27:10 <akhil_jain> 2. action taken or not
04:28:36 <akhil_jain> hopefully i will be available too. for the PTG
04:29:03 <ekcs> ok. hopefully we can work through examples to see whether both are needed or just one. we can certainly implement things for both, but if one suffices then it’d be good to know it.
04:30:44 <ekcs> in general, I think it’s a good idea to leave the flexibility in the hands of the policy writer.
04:31:18 <ekcs> perhaps adding congress functions as possible actions would be one solution.
04:32:57 <akhil_jain> i didnt get the last point
04:33:07 <ekcs> something like this:
04:33:09 <ekcs> policy1:
04:33:10 <ekcs> execute[add tuple handled('alarm1') to congress] :- alarm1
04:33:11 <ekcs> execute[action1] :- alarm1, NOT handled('alarm1')
04:33:12 <ekcs> policy2:
04:33:13 <ekcs> execute[add tuple handled(alarm1) to congress] :- alarm1
04:33:14 <ekcs> execute[action2] :- alarm1, NOT handled('alarm1')
04:34:32 <ekcs> that’s a way for the policy writer to say: remember which alarm was already handled, and don’t do something again on the same alarm.
04:35:11 <akhil_jain> right seems good
04:36:58 <akhil_jain> adding another table can help this. also present thing will not be effected
04:37:34 <ekcs> well. unfortunately that still doesn’t solve the problem when both policy are active at the same time. the alarm will trigger both at the same time.
04:38:30 <ekcs> i hope to understand some more why policy writers cannot make sure to add conditions to their rules so that one alarm cannot trigger both actions.
04:38:47 <ekcs> who are the ones expected to write the policy?
04:39:55 <akhil_jain> i am not sure about that. i am not much into deployment side
04:41:05 <ekcs> ok. is there any way to find out who write the policy in projected use case? I think the policy workflow really changes our solutions.
04:42:03 <ekcs> for example, if it’s one “person” writing all the policy, then it’s not hard to make sure multiple rules don’t trigger on the same alarm.
04:42:29 <ekcs> but if it’s many different “people” writing policy independently, then it’s very hard to make sure of the same thing.
04:43:20 <ekcs> another factor is: how often are the policies changed? is it expected to be changed frequently by the operator? or more just operate the way its deployed?
04:44:35 <ekcs> oh and yes it’s ok to share that email on ML.
04:45:08 <akhil_jain> yes, there are serious multiple cases .
04:45:30 <ekcs> I think maybe the next step is to start an etherpad or something to start documenting the problem scenarios and also possible solutions.
04:46:19 <ekcs> and the closer we can understand the real policies and policy writing workflow the better we can solve the problems =)
04:46:26 <akhil_jain> yes everytime reaching on one soln. is alterted again on thinking next time
04:47:14 <ekcs> haha well this is an important and interesting problem. glad we’re discussing and hopefully solving.
04:47:31 <ekcs> should we move on to see if we can squeeze in the other topics before time’s up?
04:47:47 <akhil_jain> yes right
04:47:54 <ekcs> ok =)
04:47:57 <ekcs> #topic Adding created_at in nova servers
04:48:59 <ekcs> wanna start us off akhil_jain ?
04:49:02 <akhil_jain> oh yes. you already said its good to go. but is it possible to calcyulate based on current code to evealuate if server is older than one month or so
04:49:29 <akhil_jain> just adding created_at will be enough?
04:50:34 <akhil_jain> i am not sure if operators will work
04:51:22 <ekcs> oh hmm. is the created_at reported by nova api?
04:51:27 <akhil_jain> yes
04:51:40 <akhil_jain> field name is created
04:52:04 <ekcs> why do we need to calculate then?
04:52:15 <ekcs> sorry just clarifying so I understdand what the considerations are.
04:52:35 <ekcs> ooooh
04:52:37 <ekcs> I get it now.
04:52:44 <ekcs> we’re going to add the field.
04:52:52 <akhil_jain> policy automacially calculating if  servers are old enough n execute action on them
04:53:06 <ekcs> but the policy writer wants to find out whether a server is more than 1 month old. that requires builtins to handle parsing and calculating time.
04:53:18 <akhil_jain> yes
04:53:38 <ekcs> right. I’ll need to check the current builtins to see whether it handles the case.
04:53:45 <ekcs> i’ll get back to you on that.
04:54:08 <akhil_jain> ok great, thanks
04:54:27 <akhil_jain> last topic then, about tacker test
04:54:37 <ekcs> btw this is one case for why i have high hopes for the postgres version. they have already built all the builtins. and even when they haven’t someone online already built an extension for it.
04:54:58 <akhil_jain> yes i guess that can solve this
04:55:12 <ekcs> much harder for us to keep up wth what’s needed and much harder for the users when we use our own language and policy engine.
04:55:20 <ekcs> ok anyway moving on then like you said
04:55:23 <ekcs> #topic tacker test
04:55:32 <ekcs> so a couple things.
04:55:50 <akhil_jain> i saw your patch. i will test using same and add another etst for vnf as well
04:55:52 <ekcs> 1. i added zuul config to enable tacker plugin in CI job. but it’s failing right now.
04:56:03 <akhil_jain> yes saw that
04:56:15 <ekcs> i’m not sure why yet.
04:56:50 <ekcs> and yes I added a patch just to test whether the generic approach works for this case. I think it should but haven’t been able to tell because of the devstnack plugin failure.
04:56:51 <ekcs> https://review.openstack.org/#/c/638516/1/congress_tempest_plugin/tests/scenario/congress_datasources/test_tacker.py
04:57:14 <akhil_jain> hm thats y i will test it on my env and let you kniow
04:57:32 <ekcs> ok great!
04:57:37 <akhil_jain> earlier i didnt understood. now got it. thanks
04:57:45 <ekcs> awesome.
04:58:03 <ekcs> one last thing is:
04:58:53 <ekcs> if you have features you want to finish by feature freeze and don’t have time for the tacker tempest test, then we could merge the driver by FF and then merge the tempest test before RC1. not preferred but in a pinch it can be done.
04:59:06 <ekcs> just keep that in mind in prioritizing what you’d like to do =)
04:59:44 <ekcs> that’s all from me.
05:00:10 <akhil_jain> yes any reviews on tacker driver. i can complete that first. plus i am completeing tempest test as well
05:00:54 <ekcs> ok. I think the tacker driver is good to go. I was just waiting for tempest before merging. but if we can merge that first if tempest not ready by FF.
05:00:55 <akhil_jain> nothing else from my side.
05:01:12 <ekcs> ok then. well time’s up too.
05:01:15 <akhil_jain> ok sounds good to me
05:01:17 <ekcs> #endmeeting