16:02:20 #startmeeting Mistral 16:02:20 Meeting started Mon Oct 20 16:02:20 2014 UTC and is due to finish in 60 minutes. The chair is rakhmerov. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:02:21 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:02:24 The meeting name has been set to 'mistral' 16:02:26 sorry guys 16:02:27 :) 16:02:33 hi 16:02:35 rakhmerov: be my guest :) 16:02:56 thanks! 16:02:58 hi here 16:03:04 hi Dmitri 16:03:08 how are you? 16:03:09 hi 16:03:16 long time no type 16:03:30 hi bhavenst 16:04:13 hey bryan 16:04:18 how have you been? 16:04:44 doing fine, just busy @ work. But letting up so have been starting work on blueprints.. 16:04:46 hi ! 16:05:08 ooh, very cool 16:05:15 ok, let's start 16:05:27 I was talking about Mistral on openstack automation meetup, and got a strong feedback about need of ceilometer integration, exactly along the lines of Brian's blueprint. 16:05:35 Hi, sorry, I am late 16:05:42 sorry for not sending out an agenda, i've been really really busy these days 16:06:01 yeaah, that is cool 16:06:07 hi Nastya :) 16:06:13 ok 16:06:34 we didn't have any AIs from last meetings, they were really short 16:06:48 some folks were on vacations or busy with something 16:07:10 so let's go straight to the current status 16:07:22 #topic Current Status (by team members) 16:08:25 my status for the last couple of weeks is: I've been working mostly on bugs (both client and server), working on the examples, and preparing presentations 16:08:26 Almost all last week I worked on for-each, it just works fine now 16:08:38 I've added simple positive and negative tests for cron- triggers (API and CLI integration tests) 16:08:42 Starting thinking about metrics blueprints, sent questions to Ceilometer person, started an etherpad.. 16:08:44 I still need to review it 16:09:00 And today I found bug with auth in std.http 16:09:15 bhavenst, could you please send it out via openstack-dev? 16:09:31 nikolaym, ok, I saw your patch 16:09:34 Yeah, can do that when it's a bit more refined. :) 16:09:44 sure 16:10:20 as far as for-each 16:10:36 what Nikolay did looks ok 16:10:56 however, looks like we have some serious design issue 16:11:18 basically we have race condition between some transactions 16:11:21 in our engine 16:11:42 and in case of for-each it gets clearly revealed 16:12:37 the point is that when engine starts the workflow it creates all the tasks in DB (and execution) and currently we start tasks from within the same transaction 16:13:01 and there's a reason for this although it's considered anti-pattern 16:13:22 I mean to call any external things from DB transactions like rabbit mq 16:14:06 so it may happen that task is finished and its result comes back to engine before that first transaction completes 16:14:20 it on_task_resul() method won't find a task in DB 16:15:06 oh, nice. 16:15:11 I thought in case of READ_COMMITTED transactions there shouldn't be race conditions because the second transaction should block on the same object that is not committed yet 16:15:47 but either 1) I was wrong 2) or we are doing something inproperly somewhere else 16:16:04 e.g. configuring transaction isolation level 16:16:09 I hit something similar many times during testing of the failed workflow bug I worked on, but that was before the refactoring so not sure if it applies. 16:16:15 so it's something that we need to test more carefully 16:16:30 it might 16:16:35 yes 16:17:05 Multiple tasks doing things like echos, which I guess are fast enough to cause such an issue 16:17:23 well, first of all, if you run mistral for something serious (not unit tests) then forget about sqlite 16:17:36 foreach exacerbates the problem indeed. Now we have many (way too many) calls to rabbit within transactional scope. 16:17:39 yes, bhavenst, exactly! 16:17:49 yes 16:17:54 100% right 16:17:56 My solution was to add sleeps. :) 16:18:19 yeah, that's what Nikolay did I guess to make it work 16:18:49 so the two obvious options (ooh god, we discussed it already so many times): 16:19:25 1) run tasks after transaction completes 16:20:09 2) leave as is and use something to do proper synchronization (even though it's not clear to me) 16:20:39 option 1 has a problem of being vulnerable for failures 16:21:17 so if engine fails right after transaction and before pushing tasks into rabbit then the system will end up in an inconsistent state 16:21:40 and there will be no way to figure out if some tasks have already been put into rabbit 16:22:02 this is de-ja-vu. I need to recall all the details on the arguments we did... 16:22:06 so in other words, our DB state won't correspond to the state of the MQ 16:22:13 yeah 16:22:40 I think it's kinda challenge to discuss it in IRC for it being too complicated problem 16:22:56 but I'm just asking you to think about it if you have a chance 16:22:58 I recall we discussed "QUEING" status for a task.. 16:23:10 you may come up with some ideas 16:23:12 suggest we set up a time to brainstorm it. 16:23:17 yes 16:23:21 Are you guys going to be @ Paris summit? 16:23:26 so I'm just letting you know... 16:23:31 yes 16:23:35 we are 16:23:42 ok cool, I'll be there too 16:23:43 outside of this meeting (or if we have time left) 16:23:46 it may be a good time to get back to that problem 16:23:52 ooh, nice 16:24:08 I don't think we can fix it before the summit anyway 16:24:33 how will it work (or rather "not work") in between? 16:24:37 there's just a fundumental problem of keeping two systems (DB and MQ) in a consistent state 16:24:42 fast tasks will fail? 16:24:49 yes 16:25:10 surprisingly, it mostly works unless we don't use something like 'for-each' 16:25:32 I think the reason is that we always run tasks via oslo 16:25:37 even echo :) 16:26:00 again, I'm still hoping that we just need to cofigure mysql properly 16:26:06 but 16:26:13 it may not really be helpful 16:26:30 so, the general problem is keeping two systems in sync 16:26:33 the direction I will be thinking is "to rely on one source of truth", not DB and MQ. Use DB as a source of truth. 16:26:48 usually people use something like XA transactions for this 16:27:01 which are not available at this point for us 16:27:16 may be 16:27:50 we could even pass it by somehow if say executors could access DB 16:28:06 but not really likely 16:28:14 ok 16:28:28 let's move on now, just asking you to think about it again 16:28:53 w/o it the system won't actually work on any more or less serious load 16:29:20 another thing I was planning to discuss really quick is our planned release 0.2 16:29:44 #topic Release 0.2 16:29:47 https://launchpad.net/mistral/+milestone/0.2 16:30:11 the thing is that looks like we're seriously behind the schedule with it 16:30:29 basically we have just 9 business days left 16:30:54 If there is anything relatively simple I don't mind taking it, since blueprints are not at all pressing. 16:31:08 and our resources turned to be not enough because Nikolay and I got buried with a lot of unplanned stuff 16:31:26 yeah 16:31:28 winson has done #1380873 locally, expect it on review today/tomorrow. 16:31:40 ok, that's good 16:31:52 bhavenst, let me see what we have 16:31:55 sure 16:32:20 but he and I won't have time to do events mistral-event-listeners-http 16:32:32 but generally the situation is that of 9 days I have effectively 4-5 days, the rest I'll have to spend on summit preps and other activities 16:32:37 Nikolay too 16:32:46 yeah, I see 16:32:53 so two options again: 16:33:17 1) we shrink the scope of 0.2 and push it on Oct 31 as planned 16:33:48 2) we move the due date 2-3 weeks further 16:34:02 what do you think? 16:34:20 I guess what we could do is: 16:34:32 (by "do" I mean complete) 16:34:51 1. https://blueprints.launchpad.net/mistral/+spec/mistral-direct-workflow-join-control 16:35:32 2. https://blueprints.launchpad.net/mistral/+spec/mistral-pause-before-policy (btw, this one should be pretty easy and I could assign it to Bryan) 16:35:42 There will be holiday in Russia and Paris summit, so one week will out if scope 16:36:00 3. https://blueprints.launchpad.net/mistral/+spec/mistral-dataflow-collections It's mostly done except it's not gonna be usable with that race condition 16:36:11 yes 16:36:14 good concern 16:36:44 4. https://blueprints.launchpad.net/mistral/+spec/mistral-workflow-resume - Likely we could knock this down as well 16:37:36 I thought that n.2 connected with n.4 16:37:46 so we definitely won't be able to tackle HA (testing etc.), I guess HTTP listeners and I have doubts about workflow resume too 16:37:58 Resume and pause-before 16:38:45 well, logically yes. But strictly speaking they're separate things both needed for "manual checkpoints" 16:38:59 they could be done separately 16:39:22 hi there! seems I missed the beginning of meeting. Do you have open discussion section :)? 16:39:50 hi Timur 16:39:53 not yet ) 16:40:00 but soon 16:40:14 Hi Timur 16:40:19 so what do you guys thing about release date ? 16:40:27 let me put it this way... 16:40:41 IMO move out. 16:40:55 do you think it makes a lot of sense to push it before the summit whatever it takes? 16:41:03 but still do few things by Paris. 16:41:18 rakhmerov, okay. I have a little update about Merlin Workbook Builder for Mistral 16:41:24 without any official announcements? 16:41:38 tsufiev, sure, a couple of mins pls 16:41:54 "without any official announcements?" how do u mean/ 16:42:00 rakhmerov, np 16:42:11 my opinion: nobody will really get familiar with the release if we push it two days before the summit 16:42:49 dzimine, I mean "We're pleased to announce Mistral 0.2, here's the link to the new capabilities etc. etc." 16:43:29 so my suggestion is move it out but yes, implement most important things 16:43:46 for example, join 16:43:53 and try to fix that race condition 16:44:10 thoughts? 16:44:19 let's vote :) 16:44:20 the three big areas to me are 1) resuming workflow 2) work under load (including this transaction problem we dicsussed) and 3) refine REST API 16:44:21 sounds reasonable 16:44:57 and fixing race condintion, etc, need to take as needed. 16:45:06 dzimine, I agree but don't see chances to fully address all this before Nov 16:45:12 there's no point of "pause before" till we learn to resume :) 16:45:24 race condition for me is actually the #1 problem 16:45:27 that's why I am for moving the date out. 16:45:35 ok, I see 16:45:47 ok, any objections? 16:45:50 Nikolay? 16:45:54 Nastya? 16:46:10 and I agree race condition is #1 prio. 16:46:13 I guess that we need to move release 16:46:19 ok 16:46:24 nikolaym? 16:46:44 Yes, move out the release 16:46:46 basically we already have all estimates for the BPs so I could take some time and suggest a new date 16:47:02 like I said I guess it should be at least 2 weeks later 16:47:41 we just need to look at everyone's schedule and make a conscious decision 16:47:46 ok, decided 16:47:58 tsufiev, please speak :) 16:48:25 #action Race condition in engine is the #1 problem to fix 16:48:49 rakhmerov, so, good news: there is chance that we'll get UI for Workbook Builder in Merlin done by designer, not me :) 16:48:51 #action Suggest a new date for 0.2 release 16:49:00 ooh 16:49:01 cool 16:49:19 who is it going to be? Already known to us? 16:49:48 so if you have more feedback to share about the current state of Workbook Builder, you are strongly encouraged to share it - so it will be taken into account 16:50:01 rakhmerov, I've spoken with Bogdan Dudko 16:50:09 ok 16:50:21 he is from Mirantis Fuel team, and may have some free cycles to help Merlin 16:50:36 tsufiev, ooh, that is awesome 16:51:15 so, remember I sent you a list of sugestions.. 16:51:37 rakhmerov, yep, it will be the primary input for Bogdan 16:51:46 do you think they all could be done using this JS framework? 16:51:58 or how is it going to be done? 16:52:36 I'd like to keep as much as possible interactions on client-side to make Merlin more responsive (less calls to server) 16:52:37 I mean I am not really sure what depends on barricade JS and the designer skills :) 16:52:46 ok 16:52:55 let's see 16:53:10 and btw, we need to sync up on DSL changes again 16:53:11 well, the project definitely needs at least 2 people generating some ideas ) 16:53:32 I looked at Merlin about 3 days ago and there're some disrupancies 16:53:36 because I have some problems with simultaneous creating new design and implementing it 16:53:52 yup, totally understandable 16:54:17 rakhmerov, could you write about them to ML? 16:54:34 there is a thread already known to you... 16:55:25 I think some of us will be able to contribute after the summit when the dust settles 16:55:35 ok, I'll do that 16:55:44 rakhmerov, thanks! 16:55:52 Only after our release ) 16:56:22 #action Write about DSL discrepancies in Merlin to ML 16:56:31 :)) 16:56:37 I hope to find some contributors at summit or at least make some advertising from Merlin :) 16:56:46 s/from/for/ 16:56:51 will you be there too? 16:56:54 yes 16:56:58 cool 16:57:12 ok 16:57:17 guys, anything else? 16:57:41 I discussed the most important things that I wanted (race condition and 0.2 release date) 16:57:58 so let's then close the meeting 16:58:17 bye! 16:58:24 Bye 16:58:32 bye! 16:58:52 bhavenst, I would suggest you try 'pause-before' BP and we're waiting for the news about Ceilometer integration 16:59:08 I'll assign it to you 16:59:15 :) 16:59:15 OK, will give it a shot 16:59:21 ok, cool 16:59:24 thanks guys 16:59:27 bye-by 16:59:29 bye 16:59:34 #endmeeting