14:03:07 #startmeeting sahara 14:03:08 Meeting started Thu Oct 18 14:03:07 2018 UTC and is due to finish in 60 minutes. The chair is tosky. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:03:09 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:03:11 The meeting name has been set to 'sahara' 14:03:18 o/ 14:03:50 Telles is on vacation, but still, good to check the status 14:03:59 indeed 14:04:00 #topic News/Updates 14:04:40 I've been working on S3 testing (still, I know, but now there is more working code) 14:05:44 i have not had any quality time to work on features recently (unlike last cycle, the features are a bit more complicated). very slowly making progress on health repair (which I have a small discussion point about) 14:05:49 apart from that, I've been following the status of the gates and discussed with -infra and -releases about few issues and changes 14:06:10 thanks tosky for holding down the fort, both on the openstack-wide stuff and also on those big patches like s3 testing, doc refactor, etc 14:06:45 in the doc refactor one, at some point I had too many windows open with code floating 14:06:48 but yeah :) 14:07:50 last time I spoke with Telles, the unit tests for the split repositories were passing and he was going to start the real testing 14:08:14 I'm writing an email about the planned impact on deployment tools and users 14:08:46 indeed, i've seen interesting stuff on telles's github 14:09:00 and yes, the email is a good idea 14:09:03 when we are done with the split, apart from the bugs that can come from it, we can go full steam with API v2 and Python 3 14:09:15 which are the other big things 14:09:46 yup 14:11:00 I guess we are done with the news - any specific point to discuss? Until I'm out from the S3 pit, I don't have a lot more to add 14:12:11 oh, health repair 14:12:18 let's go for it, jeremyfreudberg 14:12:24 yes, health repair 14:12:26 #topic Health repair 14:13:34 so, i won't discuss every aspect of health repair, but there's one aspect that i've been grappling with recently 14:13:47 as you know (or not), the idea was to base health repair off of the existing health checks mechanism 14:14:02 I'm re-reading the minutes from the PTG 14:14:18 and in looking at that code, i'm surprised at how much database stuff are involved 14:15:28 and from what i can tell, the point of putting health checks in the DB is to make things stateful-- don't start a check when one is already in progress, etc. plus with the checks being (configurably) periodic the db kinda acts as a log 14:15:47 anyway, my question for today is 14:16:05 is all that DB stuff really necessary for health repair? 14:16:46 my sense is, kinda, but not totally 14:17:14 don't you see the same need for a synchronization point, so that the same operation don't start again? 14:17:24 or anyway, do you think it could be implemented differently? 14:17:53 short answer- yes to both 14:17:54 long answer- 14:20:32 because health repair is thought to only be executed by user request (NOT periodic), the synchronization point is easier to pin down. AND, I planned to make the repair modes as idempotent and non-disruptive as possible, so theoretically i don't care if the repair call gets sent twice in quick succession 14:20:53 but then again, some kind of locking mechanism seems intuitively necssary 14:21:44 not to mention, if there is no lock, then the user could send way-too-many repair requests launching way-too-many subprocesses 14:21:52 yep, that's the risk 14:22:02 "why it's not working, repair, REPAAAAIR" 14:22:03 yeah 14:22:18 so, a lock of some kind is necessary, i'm just not convinced that tossing around db state is the right way to go about it 14:22:23 not sure how else to do it, though 14:22:45 maybe minimizing the use of the DB may be enough 14:23:55 i'll see what i can trim out 14:24:02 i haven't done much of a deep dive yet 14:24:03 or we can have an hard-requirement on tooz (which is optional right now, used only for one functionality) 14:24:18 yes, there is tooz 14:24:24 ... if we can make it working with python3, the better 14:24:51 another subtopic about health repair: 14:25:18 i wrote this on the story last night 14:25:43 that there won't be a direct correspondence between all the existing health checks, and the new health repair modes 14:25:54 at least in the basic case 14:26:55 uh, what is the story? I didn't get the notification 14:27:07 but I should be subscribed to all sahara* notifications 14:27:20 the description update doesn't seem to trigger the email 14:27:25 https://storyboard.openstack.org/#!/story/2003842 14:27:28 oh, ok 14:29:24 let me try to remember what i mean by my point, actually 14:29:26 what would the main difference be then? 14:29:30 yeah, better 14:31:08 actually, i disagree now, with what i just said (not sure what i was thinking last night) 14:31:18 all of the health checks can have an inverse which is its repair mode 14:31:51 with the exception of this check https://github.com/openstack/sahara/blob/master/sahara/service/health/health_check_base.py#L133 14:36:34 oh, i think my point from last night was, health repair can eclipse health check 14:36:50 as in, we can write MORE health repair modes, beyond what limited checks we have 14:37:28 and my other point-- there is a minimal amount of work that needs to be done before the plugin split (the plugin-specific health repair modes need to be able to import the right stuff from core) 14:37:29 and then we will have more checks? If we have more repair modes, it means that we can check that something is really broken 14:37:54 uh, I didn't check if Telles also considered that 14:38:21 yes-- hopefully new repair modes will encourage new checks 14:39:06 regarding the split, i think it should only be one new import to cover 14:40:07 there would be a new module similar to sahara/service/health/health_check_base.py which has exceptions and the base class, for the plugin-specific repair modes to consume 14:41:50 actually, telles did something which i don't understand 14:42:40 looking at what he has on github (which may not be accurate), he simply moved health_check_base.py from sahara/service/health to sahara/plugins 14:42:58 but he didn't change the imports within that file 14:43:01 uh 14:44:01 he did fix the import on the, for example, sahara-plugin-ambari side though 14:44:06 anyway, i have to signoff in a minute 14:44:51 oki, I guess we discussed enough points 14:45:02 yes 14:45:08 i'll be sure to look further into health repair 14:45:10 Telles, when you read this, remember to recheck the rechecks 14:45:17 thanks! 14:45:32 so if there is nothing else to discuss, we can close it here 14:45:37 yup, thanks, let's close 14:45:48 see you next week 14:46:00 bye 14:46:14 (or even before, on the usual channel) 14:46:21 #endmeeting