16:00:50 #startmeeting Mistral 16:00:51 Meeting started Mon Jan 9 16:00:50 2017 UTC and is due to finish in 60 minutes. The chair is rakhmerov. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:52 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:55 The meeting name has been set to 'mistral' 16:01:27 hi 16:01:32 hi 16:01:37 Hello! 16:01:44 hello-hello ) 16:01:48 o/ 16:02:25 ok, let's begin 16:02:57 we haven't had meetings I guess at least for a month 16:03:46 so, essentially I would like to touchbase with you a little bit after long holidays 16:04:06 we didn't communicate about two weeks 16:04:52 #topic General syncup after long holidays 16:04:58 o/ 16:05:02 hi 16:05:20 so, any news that you'd like to share? 16:05:25 anything? 16:05:30 d0ugal, ddeja, sharatss 16:05:42 I was focused on gate fixing 16:05:51 Nothing much from me, with the holidays it has been very quiet. 16:05:51 ok 16:06:03 nothing from me too 16:06:09 I do hope to spend some time with rbrady looking at the custom actions work 16:06:09 ddeja: how is it going with the gate? 16:06:11 became active only today 16:06:12 and for the last 3 weeks I'd like to focused on making kombu driver multi-thread support 16:06:26 He put a patch up that needs reviews: https://review.openstack.org/#/c/411412/ 16:06:27 rakhmerov: despite the sshd_proxied action it is OK now 16:06:37 oh, yes, thanks d0ugal 16:06:40 ddeja: yes, it is important 16:07:18 hello 16:07:21 rbrady: we are just doing a quick catchup and I mentioned that I plan to help you with custom actions 16:07:25 ddeja: ok, then we can probabaly disable this test for now and make the gate voting? 16:07:25 rbrady: and I linked to your review. 16:07:26 d0ugal: oh, I was thinking it is a ling to another patch... 16:07:28 rbrady: hi 16:07:37 thanks d0ugal 16:07:41 rakhmerov: the regular one - yes 16:07:44 ddeja: hah, I am curious which one :) 16:08:01 ddeja: I'm just afraid that if we make it voting it may give us huge troubles once in a while 16:08:17 if we are ready to take this risk then it's ok 16:08:19 rakhmerov: the kombu one, I'd like to wait untill end of this developement cycle 16:08:27 rakhmerov: HM 16:08:32 yes, that's understandable 16:08:35 I think maybe we can wait till Pike? 16:08:37 rakhmerov: the risk isn't that big, we can always make it non-voting :) 16:08:38 with both 16:08:51 yes 16:08:56 I would go this path 16:09:08 and start new cycle with new gate 16:09:19 d0ugal: it's a pretty big patch 16:09:21 maybe I would be able to also fix the sshd_proxied test till then 16:09:30 rakhmerov: Which one? 16:09:33 #action rakhmerov: review https://review.openstack.org/#/c/411412/ 16:09:42 d0ugal: https://review.openstack.org/#/c/411412/ 16:09:50 :) 16:09:51 rakhmerov: yeah, that one. I think much of it is copied from Mistral but rbrady can explain it more. 16:09:58 I have only reviewed it 16:10:39 ok 16:10:56 that patch is WIP, but more feedback and discussion would probably be good. I started splitting some files up that may or may not be a good thing 16:11:04 so I guess it's pointless now to open our regular topic "Current status" 16:11:12 it's pretty much clear who is doing what 16:11:25 rbrady: ok 16:11:57 I'd like to take a look at it first but if you can give some general tips that could help review it please do 16:12:30 ddeja: btw, Dawid, there's another 'rerun' test that fails sometimes 16:12:36 for direct workflow 16:12:39 rakhmerov: Oh, ok 16:12:42 yep 16:12:42 I'll take a look 16:12:50 the namespaces have been changed a bit from what was laid out in the spec. following more of a pythonic approach. 16:12:50 I saw it failing pretty often 16:13:09 rbrady: ok 16:13:25 btw, guys, do you already know if you're going to the PTG? 16:13:36 I am...booking hotel and flight today 16:14:00 * d0ugal isn't going 16:14:04 rbrady: planning to join our sessions? At least partially 16:14:18 yes 16:14:27 d0ugal: I see, it's bad :( 16:14:31 rbrady: ok 16:14:38 * ddeja don't know yet... 16:14:54 ddeja: it would be cool if you could do that 16:15:17 rakhmerov: yeah, it is a shame. I shall try and add my input remotely if I can :) 16:15:41 d0ugal: maybe we can have you there via hangouts or something 16:16:02 as far as this actions stuff, it's definitely a topic for the PTG, I want to define a goal to figure out all principal things related to it 16:16:03 rbrady: hah, that could be cool - or I can just read the notes and add questions for you all to answer ;) 16:16:12 * rbrady carries around virtual d0ugal at PTG 16:16:25 :) 16:17:03 http://www.conowego.pl/uploads/pics/DoubleRobot-660x440.jpg 16:17:20 I think somebody used one of those at a previous summit... 16:17:28 but now we are just getting off topic :-D 16:17:35 yeah, I saw those folks :) 16:17:45 it's pretty cool 16:18:21 in case you didn't see it in ML, this is an etherpad for the PTG plans: https://etherpad.openstack.org/p/mistral-ptg-pike 16:19:11 let's focus on gathering all challenges that we need to discuss in this etherpad 16:19:28 OK 16:20:34 ddeja: btw, could you please remind me something? Those periodic kombu gate failures are related exactly with the fact that our kombu RPC server is not thread-safe, right? 16:20:39 or you are not sure? 16:20:56 I remember we were discussing it but don't remember the conclusion 16:21:26 rakhmerov: yes, they are related to the fact that it is not thread safe - and here is the fix https://review.openstack.org/#/c/414533 16:21:57 (another gate failuer is due to sshd_proxied action...) 16:22:11 ooh, awesome 16:22:22 so do you think it's finished? Ready to be reviewed? 16:22:31 I think so 16:22:34 * rakhmerov needs to review so much.. 16:22:47 I spend 2 days looking into o.m code 16:22:53 and make my similar 16:22:59 :)) 16:23:01 I see 16:23:04 great 16:23:23 hi guys 16:23:25 d0ugal: how about your time thing? 16:23:32 hope not to disturb you 16:23:39 do you think you found the solution? 16:23:50 tuan__: chime in, np 16:23:56 :) 16:24:06 rakhmerov: ddeja found an issue with it, I understand it now I think - so I just need to update it again. 16:24:10 okes, can i say something about our problem 16:24:22 d0ugal: ok 16:24:31 tuan__: sure, go ahead 16:24:35 cool 16:24:42 so, like this: 16:24:50 d0ugal, rakhmerov: yes, IMO old code was OK, just tests needs to be re-writen, but I may be wrong 16:24:59 when we have so sophisticating actions 16:25:20 ddeja: is there a patch for this already or you only found a reason? 16:25:34 ddeja: I think the old code is okay, but I think it is confusing. I don't think we should be passing around local datetimes anywhere. 16:25:35 if they are failed, mistral will return a huge error with input parameters and the root cause of the action 16:25:36 tuan__: ok, like what? 16:26:06 the problem is that: With the very long error returned by mistral, operator is hard to understand the root cause 16:26:17 it is based on the requirement of ETSI in telco 16:26:33 I've noticed this before. Tracking down where the problem is can be tricky 16:26:34 rakhmerov: there is a patch from d0ugal, it fixes it on his env (Scotland, so UTC0), but it brakes things on mine (Poland, so UTC+1) 16:26:42 stack trace must be human-readable 16:27:01 ddeja: :) 16:27:03 ok 16:27:09 ddeja: the real question is why does it behave differently on my machine and CI :) 16:27:09 e.g.: mistral will return something like this: Action failed....blabla 16:27:21 and then after that the root cause of the action 16:27:31 rakhmerov, tuan__ I've also noticed it, sometimes even errors can be very misleading 16:27:44 and lead to another action by mistake 16:27:50 tuan__: quick question 16:27:56 I guess it's a perfect topic for PTG 16:27:56 go ahead 16:28:06 rakhmerov: i am lsitening 16:28:22 ddeja: That is what we have to deal with 16:28:28 ddeja: yes, moreover, we already discussed this with some folks from Nokia and others 16:28:29 to make it human-readable 16:28:49 tuan__: can you give a very specific example? 16:28:58 with a very sophisticating actions, the error returned quite long 16:28:58 what action, what kind of failure? 16:29:03 and what Mistral returns 16:29:17 well, it is the custom actio 16:29:17 sophisticated like what? 16:29:30 yep, just a second.. 16:29:43 i am so sorry that i could not tell more since it is the policy 16:29:57 the reason I'm asking is that actions are allowed to return a structured result even in case of a failure 16:30:31 you can imagine that, we want mistral to run action that deploy hundreds of vms for specific vnf 16:30:47 therefore the input parameter of mistral is quite long 16:30:53 tuan__: ok, np, I understand. Then I'd like to ask you to come up with an example which is to some extent similar to your real one 16:30:55 when this action fails 16:31:12 tuan__: not necessary now (we don't have too much time) 16:31:21 mistral returns the error of something: Failed action .... 16:31:43 then at the end of the error is the root cause, something like: Javascript fails 16:31:45 I guess we need a bug report and a way to reproduce it 16:32:08 d0ugal: +1, that's what I'd like to achieve 16:32:10 but the operator has difficulty to find out the root cause of this, the line: Javascript fails 16:32:40 IMHO it is not a bug, it is just the hierachy of error 16:32:50 tuan__: yes, I see. Can you please file a ticket in our Launchpad? 16:32:52 or the return information from mistral 16:33:03 with as much info as possible 16:33:07 OK, i will try do it 16:33:11 thanks renat 16:33:11 yeah 16:33:46 please try to be very specific, like, for example, you say "Mistral returns.." 16:33:59 what do you mean by that? Returns where? 16:34:25 are you talking about logs or fields of some objects stored in DB 16:34:27 etc 16:35:24 the point I'm trying to make is that we might be able to solve this problem (at least partially) by writing actions themselves in a certain way 16:35:46 so that they return something more or less readable 16:35:53 even in case of an error 16:36:04 there's a protocol for that 16:36:15 maybe it's not documented well, then it's a different issue 16:36:29 that's why I'd like to see some example 16:37:03 tuan__: once you file a ticket please just join our IRC channel any time and we will discuss it 16:37:34 sure 16:37:40 ok 16:37:43 otherwise 16:38:01 https://bugs.launchpad.net/mistral/+bug/1624284 16:38:01 Launchpad bug 1624284 in Mistral "MessagingTimeout when executing mistral actions" [Critical,In progress] - Assigned to Dawid Deja (dawid-deja-0) 16:38:13 yep 16:38:14 why this bug is partially 16:38:17 ? 16:38:20 tuan__: Mostly fixed 16:38:30 i also think that 16:38:30 there is one corncer case that is not fixed yet 16:38:46 yes, it's 99% fixed, just one very crazy corner case is left 16:39:03 :) 16:39:09 but unless you do something like: 'mistral run-action `mistral_run_action`' it works fine 16:39:20 lol 16:39:29 ddeja: maybe it makes sense to close this bug and open another one which is more specific? 16:39:37 describing that specific corner case? 16:39:48 another words: the only thing that it is not fixed is when you use mistral run-action to start another action or workflow 16:39:53 rakhmerov: OK, I'll do tommorow 16:40:48 #action ddeja: close https://bugs.launchpad.net/mistral/+bug/1624284 and file a new more specific bug with the description of a corner case which is not yet fixed 16:40:48 Launchpad bug 1624284 in Mistral "MessagingTimeout when executing mistral actions" [Critical,In progress] - Assigned to Dawid Deja (dawid-deja-0) 16:41:13 tuan__: did you come across something similar? 16:41:21 It wouldn't take long - I guess my comment would make a good bug report ;) 16:41:48 rakhmerov: you mean the problem i have reported 16:41:48 ? 16:41:52 yes 16:42:07 this timeout bug 16:42:27 ahha, the bug of ddeja 16:42:29 okes 16:42:37 yep 16:42:52 I mean, I'm curious why you're asking about it 16:42:52 well, we have something related to timeout but im not sure because of this bug 16:43:03 let me try to describe it 16:43:09 yes 16:44:41 meanwhile, I'd like to share some info 16:45:17 just FYI: seems like I was able to make necessary changes so that we could run multiple Mistral engines safely 16:45:26 \o/ 16:45:33 there's still a bunch of testing ahead, for sure 16:45:56 Nice 16:46:00 but I did a lot already on my local env and found no issues yet 16:46:03 yeah 16:47:05 my further plan regarding this is to create a gate where we could start multiple Mistral engines and run our Rally scenarios against Mistral 16:47:19 theoretically it should not be that hard 16:47:42 but then theory would meet her sister - actuall work ;) 16:47:52 btw, I'm wondering if it's possible to create a gate that runs multiple VMs? 16:48:03 rakhmerov: I think so 16:48:04 ddeja: yes :) 16:48:13 Nova does so for testing upgrades 16:48:19 ddeja: do you know any examples? 16:48:23 hah... 16:48:26 interesting 16:48:28 ok 16:48:30 more specyfically 16:48:51 they have gate for testing if Live Migration of VMs is possible beetwen computes in different versions 16:49:17 so there must be at least 2 VMs 16:49:18 #info Nova uses multiple VMs on their gate to test Live Migration 16:49:28 I see 16:49:32 cool 16:50:00 yeah, ideally I'd like to be able to use more than one VM with Mistral components 16:50:44 tuan__: would you like to describe your issue now or separately? 16:50:52 you can also just file a bug 16:51:03 anything works 16:51:13 rakhmerov: im am trying to fidn some logs for that 16:51:21 ok 16:51:24 it is just the report back to us 16:51:35 we did not have enough info 16:51:36 we have about 8 mins 16:51:44 oh yeah, im sorry 16:51:54 then i think i will send it later 16:52:13 that's ok, that's why I said that you can come to our channel and discuss it separately 16:52:17 yeah 16:52:19 ok 16:52:55 if it's something related to stability (timeouts, locks etc.) I'd love to know more about it 16:54:32 ddeja, d0ugal, rbrady, sharatss: guys, can you please mention again (in the conclusion of the meeting) what you're planning to do next? 16:54:47 what bothers you, your priorities etc. 16:55:12 or something that you'd like to be working on but can't for some reason 16:55:41 My PLans: 1. End gate fixing 2. Focus on kombu driver multi-threading 16:55:51 ok 16:55:53 sounds good 16:56:06 and if I have any time before the O-3, I'll get back to preconditions 16:56:49 ddeja: with those RPC fixes we can start using Kombu RPC more actively 16:56:57 I'm planning to do that 16:57:06 great! 16:57:09 For me, custom actions. I'd like to know enough so that by the PTG it is easier for rbrady/you all to have a discussion about it. 16:57:10 custom actions: I want to complete the initial patch and incorporate mistral-lib as a dependency to mistral in an iterative way to ensure CI doesn't break. then look to expand/add features as necessary to ensure the data required in a custom action is injected into the context at execution. 16:57:20 theoretically it should perform better than o.m 16:58:07 rbrady, d0ugal: ok, sounds good 16:58:58 along with testing multiple engines this is going to be one of my top priorities moving forward too 16:59:18 ok, thanks a lot 16:59:29 I think it's time to close the meeting 16:59:56 thanks for coming, have a great week ) 17:00:03 bye 17:00:18 #endmeeting