#openstack-meeting-3 log

14:03:07 <tosky> #startmeeting sahara
14:03:08 <openstack> Meeting started Thu Oct 18 14:03:07 2018 UTC and is due to finish in 60 minutes.  The chair is tosky. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:03:09 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:03:11 <openstack> The meeting name has been set to 'sahara'
14:03:18 <jeremyfreudberg> o/
14:03:50 <tosky> Telles is on vacation, but still, good to check the status
14:03:59 <jeremyfreudberg> indeed
14:04:00 <tosky> #topic News/Updates
14:04:40 <tosky> I've been working on S3 testing (still, I know, but now there is more working code)
14:05:44 <jeremyfreudberg> i have not had any quality time to work on features recently (unlike last cycle, the features are a bit more complicated). very slowly making progress on health repair (which I have a small discussion point about)
14:05:49 <tosky> apart from that, I've been following the status of the gates and discussed with -infra and -releases about few issues and changes
14:06:10 <jeremyfreudberg> thanks tosky for holding down the fort, both on the openstack-wide stuff and also on those big patches like s3 testing, doc refactor, etc
14:06:45 <tosky> in the doc refactor one, at some point I had too many windows open with code floating
14:06:48 <tosky> but yeah :)
14:07:50 <tosky> last time I spoke with Telles, the unit tests for the split repositories were passing and he was going to start the real testing
14:08:14 <tosky> I'm writing an email about the planned impact on deployment tools and users
14:08:46 <jeremyfreudberg> indeed, i've seen interesting stuff on telles's github
14:09:00 <jeremyfreudberg> and yes, the email is a good idea
14:09:03 <tosky> when we are done with the split, apart from the bugs that can come from it, we can go full steam with API v2 and Python 3
14:09:15 <tosky> which are the other big things
14:09:46 <jeremyfreudberg> yup
14:11:00 <tosky> I guess we are done with the news - any specific point to discuss? Until I'm out from the S3 pit, I don't have a lot more to add
14:12:11 <tosky> oh, health repair
14:12:18 <tosky> let's go for it, jeremyfreudberg
14:12:24 <jeremyfreudberg> yes, health repair
14:12:26 <tosky> #topic Health repair
14:13:34 <jeremyfreudberg> so, i won't discuss every aspect of health repair, but there's one aspect that i've been grappling with recently
14:13:47 <jeremyfreudberg> as you know (or not), the idea was to base health repair off of the existing health checks mechanism
14:14:02 <tosky> I'm re-reading the minutes from the PTG
14:14:18 <jeremyfreudberg> and in looking at that code, i'm surprised at how much database stuff are involved
14:15:28 <jeremyfreudberg> and from what i can tell, the point of putting health checks in the DB is to make things stateful-- don't start a check when one is already in progress, etc. plus with the checks being (configurably) periodic the db kinda acts as a log
14:15:47 <jeremyfreudberg> anyway, my question for today is
14:16:05 <jeremyfreudberg> is all that DB stuff really necessary for health repair?
14:16:46 <jeremyfreudberg> my sense is, kinda, but not totally
14:17:14 <tosky> don't you see the same need for a synchronization point, so that the same operation don't start again?
14:17:24 <tosky> or anyway, do you think it could be implemented differently?
14:17:53 <jeremyfreudberg> short answer- yes to both
14:17:54 <jeremyfreudberg> long answer-
14:20:32 <jeremyfreudberg> because health repair is thought to only be executed by user request (NOT periodic), the synchronization point is easier to pin down. AND, I planned to make the repair modes as idempotent and non-disruptive as possible, so theoretically i don't care if the repair call gets sent twice in quick succession
14:20:53 <jeremyfreudberg> but then again, some kind of locking mechanism seems intuitively necssary
14:21:44 <jeremyfreudberg> not to mention, if there is no lock, then the user could send way-too-many repair requests launching way-too-many subprocesses
14:21:52 <tosky> yep, that's the risk
14:22:02 <tosky> "why it's not working, repair, REPAAAAIR"
14:22:03 <tosky> yeah
14:22:18 <jeremyfreudberg> so, a lock of some kind is necessary, i'm just not convinced that tossing around db state is the right way to go about it
14:22:23 <jeremyfreudberg> not sure how else to do it, though
14:22:45 <tosky> maybe minimizing the use of the DB may be enough
14:23:55 <jeremyfreudberg> i'll see what i can trim out
14:24:02 <jeremyfreudberg> i haven't done much of a deep dive yet
14:24:03 <tosky> or we can have an hard-requirement on tooz (which is optional right now, used only for one functionality)
14:24:18 <jeremyfreudberg> yes, there is tooz
14:24:24 <tosky> ... if we can make it working with python3, the better
14:24:51 <jeremyfreudberg> another subtopic about health repair:
14:25:18 <jeremyfreudberg> i wrote this on the story last night
14:25:43 <jeremyfreudberg> that there won't be a direct correspondence between all the existing health checks, and the new health repair modes
14:25:54 <jeremyfreudberg> at least in the basic case
14:26:55 <tosky> uh, what is the story? I didn't get the notification
14:27:07 <tosky> but I should be subscribed to all sahara* notifications
14:27:20 <jeremyfreudberg> the description update doesn't seem to trigger the email
14:27:25 <jeremyfreudberg> https://storyboard.openstack.org/#!/story/2003842
14:27:28 <tosky> oh, ok
14:29:24 <jeremyfreudberg> let me try to remember what i mean by my point, actually
14:29:26 <tosky> what would the main difference be then?
14:29:30 <tosky> yeah, better
14:31:08 <jeremyfreudberg> actually, i disagree now, with what i just said (not sure what i was thinking last night)
14:31:18 <jeremyfreudberg> all of the health checks can have an inverse which is its repair mode
14:31:51 <jeremyfreudberg> with the exception of this check https://github.com/openstack/sahara/blob/master/sahara/service/health/health_check_base.py#L133
14:36:34 <jeremyfreudberg> oh, i think my point from last night was, health repair can eclipse health check
14:36:50 <jeremyfreudberg> as in, we can write MORE health repair modes, beyond what limited checks we have
14:37:28 <jeremyfreudberg> and my other point-- there is a minimal amount of work that needs to be done before the plugin split (the plugin-specific health repair modes need to be able to import the right stuff from core)
14:37:29 <tosky> and then we will have more checks? If we have more repair modes, it means that we can check that something is really broken
14:37:54 <tosky> uh, I didn't check if Telles also considered that
14:38:21 <jeremyfreudberg> yes-- hopefully new repair modes will encourage new checks
14:39:06 <jeremyfreudberg> regarding the split, i think it should only be one new import to cover
14:40:07 <jeremyfreudberg> there would be a new module similar to sahara/service/health/health_check_base.py which has exceptions and the base class, for the plugin-specific repair modes to consume
14:41:50 <jeremyfreudberg> actually, telles did something which i don't understand
14:42:40 <jeremyfreudberg> looking at what he has on github (which may not be accurate), he simply moved health_check_base.py from sahara/service/health to sahara/plugins
14:42:58 <jeremyfreudberg> but he didn't change the imports within that file
14:43:01 <tosky> uh
14:44:01 <jeremyfreudberg> he did fix the import on the, for example, sahara-plugin-ambari side though
14:44:06 <jeremyfreudberg> anyway, i have to signoff in a minute
14:44:51 <tosky> oki, I guess we discussed enough points
14:45:02 <jeremyfreudberg> yes
14:45:08 <jeremyfreudberg> i'll be sure to look further into health repair
14:45:10 <tosky> Telles, when you read this, remember to recheck the rechecks
14:45:17 <tosky> thanks!
14:45:32 <tosky> so if there is nothing else to discuss, we can close it here
14:45:37 <jeremyfreudberg> yup, thanks, let's close
14:45:48 <tosky> see you next week
14:46:00 <jeremyfreudberg> bye
14:46:14 <tosky> (or even before, on the usual channel)
14:46:21 <tosky> #endmeeting