17:08:21 <aspiers> #startmeeting self-healing
17:08:22 <openstack> Meeting started Wed Jun  5 17:08:21 2019 UTC and is due to finish in 60 minutes.  The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:08:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:08:25 <openstack> The meeting name has been set to 'self_healing'
17:09:13 <aspiers> So this morning witek mentioned some ongoing discussions around billing, and the idea that instrumenting service code in order to provide metrics might work better than black-box monitoring for that
17:09:27 <aspiers> which ties in with https://storyboard.openstack.org/#!/story/2005632
17:09:45 <aspiers> #topic exporting metrics from services
17:10:03 <aspiers> BTW we seem to have a duplicate story in storyboard for this I think?
17:10:15 <aspiers> https://storyboard.openstack.org/#!/story/2005640
17:10:37 <aspiers> seem to remember some weirdness with StoryBoard when we were submitting stories recently
17:11:10 <ekcs> oh weird. yea I may have created a duplicate because of the weirdness.
17:11:22 <ekcs> I guess we should delete one?
17:11:33 <aspiers> yeah, https://storyboard.openstack.org/#!/story/2005632 has one fewer task
17:11:50 <ekcs> ok I’ll delete that one.
17:11:54 <aspiers> thanks
17:12:37 <aspiers> not much more to say on that right now except link to this morning's minutes
17:12:45 <aspiers> #link http://eavesdrop.openstack.org/meetings/self_healing/2019/self_healing.2019-06-05-09.05.html this morning's minutes
17:13:06 <aspiers> #topic heat + octavia + aodh
17:13:12 <ekcs> great. yea I read up on the morning meeting. sounds like there isn’t great support just yet, but great thing that witek is working on it.
17:13:22 <aspiers> so this popped up on the mailing list:
17:13:37 <aspiers> #link http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006582.html demo of app auto-healing via heat+octavia+aodh
17:13:52 <aspiers> Didn't get a response though
17:14:22 <aspiers> We can either keep chasing or try to document at least a skeleton for it ourselves
17:14:25 <aspiers> #action aspiers to create a story for documenting that use case
17:15:57 <ekcs> got it. yea first step maybe simply to link to that video in a skeletal doc. I can take a stab at that.
17:16:02 <aspiers> I'll finish that after the meeting
17:16:10 <aspiers> I mean, finish creating the story
17:16:18 <aspiers> That would be awesome if you could kick it off
17:16:24 <ekcs> yup.
17:16:26 <aspiers> We can totally merge a skeleton and flesh it out later
17:16:39 <aspiers> Main thing is promoting the discoverability / awareness
17:16:48 <aspiers> If people are aware and they need more details, they'll probably ask for them
17:17:06 <aspiers> #topic automated testing
17:17:08 <ekcs> sounds good
17:17:12 <aspiers> This old chestnut :)
17:17:34 <aspiers> So we *may* have an intern doing a masters thesis on this topic
17:17:44 <aspiers> in which case we could expect to see some progress
17:17:47 <aspiers> but nothing guaranteed yet
17:17:55 <aspiers> fingers crossed!
17:18:27 <ekcs> oh very nice! I also see that ricolin started some basic tempest setup.
17:18:50 <aspiers> Yup. IIRC it's still marked WIP so not sure if he needs any help with that
17:19:16 <ricolin> aspiers, ekcs, yes, it's  working already but I'm more working on how to make the test scenario test more stable
17:19:26 <aspiers> ricolin: cool!
17:19:43 <aspiers> #link https://storyboard.openstack.org/#!/story/2005830 New story for documenting Heat+Octavia+Aodh
17:19:58 <aspiers> ricolin: Let us know if you need any help
17:20:08 <ekcs> awesomeness
17:20:16 <aspiers> I think that was all I had for now
17:20:25 <aspiers> #topic AOB
17:20:26 <ricolin> the self-healing scenario is very unstable in https://review.opendev.org/656070 try to figure out why
17:20:34 <aspiers> ah OK
17:20:38 <aspiers> anything else?
17:20:52 * aspiers takes a look at that review
17:21:45 <ekcs> ricolin: are these similar to tests already being run on heat repos?
17:21:45 <aspiers> heat_tempest_plugin.common.exceptions.TimeoutException: Request timed out
17:21:59 <aspiers> Details: Stack SelfHealingTest-243821469/c9e222f4-e0f0-4cbf-ba58-dea30d2d6a08 failed to reach UPDATE_COMPLETE status within the required time (1200 s).
17:22:15 <aspiers> #topic heat self-healing tests
17:22:34 <ekcs> knowing what’s new exsting heat tests may help us diagnose.
17:22:41 <aspiers> true
17:23:24 <ricolin> the time out is when the healing process didn't start in any reason
17:23:41 <aspiers> OK
17:24:03 <aspiers> that's beyond my familiarity right now
17:24:31 <ricolin> Heat should play better role during entire process and help to make sure all component works well
17:24:49 <ricolin> and reduce the unstable cases
17:25:02 <aspiers> do you know why it didn't start?
17:25:46 <ricolin> I think I got some idea
17:26:04 <ricolin> but since next week is part of my wedding ceremony, I won't be that available before 6/15
17:26:15 <aspiers> Ah! No problem, enjoy! :-D
17:26:30 <ekcs> oh wow congrats!
17:27:02 <ricolin> and the rest part happen in 11/17 so it's going to be a very long years for me!lol
17:27:08 <ricolin> ekcs, aspiers thx!
17:27:14 <aspiers> haha
17:27:35 <aspiers> alright
17:27:44 <aspiers> anything else anyone want to discuss?
17:27:57 <ricolin> aspiers, in short, I think that test case fail because Heat didn't make sure the Mistral workflow is up and running stable before we assume next step
17:28:08 <aspiers> ahah, I see
17:28:24 <ricolin> I will look into that and hope I can bring some good knews
17:28:31 <ricolin> knews/news
17:28:31 <aspiers> perfect
17:28:44 <ekcs> great!
17:28:51 <ricolin> Once that test is stable, the rest gate job setting will be easy
17:29:10 <ricolin> since all required patch is already there
17:29:25 <aspiers> nice
17:29:46 <aspiers> I guess we need a short doc explaining it too
17:31:22 <ekcs> not a discussion topic per se, but I’ve been wavering in my personal priority between identifying and supporting new use cases vs documenting existing use cases. I think I settled on documenting existing as higher priority at this stage of the sig.
17:31:45 <aspiers> personally I think either is fine
17:31:55 <aspiers> Whatever you are more excited about ;)
17:32:06 <ekcs> = )
17:32:34 <aspiers> Any small contributions are a lot better than nothing :)
17:32:54 <ekcs> yup
17:33:18 <aspiers> We're all busy with other stuff, so IMO there's no problem at all with being selective and time-boxing SIG work
17:34:00 <aspiers> Alright, sounds like we're done for today?
17:34:17 <ekcs> yup
17:34:28 <aspiers> cool
17:34:34 <aspiers> thanks, and catch you soon!
17:35:00 <ekcs> yup later guys! have a great week!
17:35:28 <aspiers> o/
17:35:30 <aspiers> #endmeeting