08:01:02 <ifat_afek> #startmeeting vitrage
08:01:02 <openstack> Meeting started Wed Aug 24 08:01:02 2016 UTC and is due to finish in 60 minutes.  The chair is ifat_afek. Information about MeetBot at http://wiki.debian.org/MeetBot.
08:01:03 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
08:01:05 <openstack> The meeting name has been set to 'vitrage'
08:01:15 <ifat_afek> Hi everyone
08:01:32 <ddeja> hello
08:02:30 <idan_hefetz> Hi!
08:02:44 <alexey_weyl> Hey guys
08:02:58 <eyalb> hi
08:03:01 <elisha_r> hi
08:03:02 <nbloom> hi!
08:03:59 <ifat_afek> #topic Status and Updates
08:04:09 <ifat_afek> Newton3 feature freeze is tomorrow
08:04:24 <ifat_afek> I marked vitrage version 1.0.0 this morning
08:04:39 <ifat_afek> Hope to mark another version with the latest updates tomorrow. After that, we are supposed to push only critical bug fixes into stable/Newton branch
08:05:04 <ifat_afek> In the last TC meetings, they talked about defining cross project goals for Ocata. The two goals they defined for now are to support python 3.5, and to remove copies of incubated oslo code
08:05:12 <ifat_afek> #link https://review.openstack.org/349069
08:05:21 <ifat_afek> #link https://review.openstack.org/349070
08:05:36 <ifat_afek> Another issue is the design summit in Barcelona. We need to decide how many design sessions we want. I’ll email you the details
08:06:02 <ifat_afek> My updates: I’m almost done with the auto-evacuation of vms from a failed host. Will push it today
08:06:13 <ifat_afek> I created a blueprint:
08:06:20 <ifat_afek> #link https://blueprints.launchpad.net/vitrage/+spec/call-nova-host-evacuate
08:06:29 <ifat_afek> I also created a blueprint according to our discussion in the last meeting:
08:06:40 <ifat_afek> #link https://blueprints.launchpad.net/vitrage/+spec/zabbix-trigger-configure-affected-resource
08:06:46 <ifat_afek> Who wants to update?
08:07:22 <idan_hefetz> I will
08:08:01 <idan_hefetz> As part of the Vitrage testing we've been doing lately, i found a few issues regarding stability and commited a fix
08:08:49 <idan_hefetz> As testing progresses (with help from Arik) we should opens more issues in launchpad
08:09:18 <idan_hefetz> Seems that we are much more stable in 1.0 :)
08:09:33 <ifat_afek> :)
08:10:14 <nbloom> Hi, I'll update next
08:11:49 <nbloom> I'm currently working on a bug which involved upload timing.. it sometimes happen that upon starting, nova client in being initialized in Vitrage as a datasource but nova itself hasn't fully been loaded yet which result in a broken client.
08:13:30 <nbloom> Now, we poll info from nova periodically every 10 min so what happens is that for 10 minutes no information from nova is received which is not good of course..I'm trying to solve it either by retrying failed polling or by rescheduling the polling. updates to follow..
08:14:08 <ifat_afek> nbloom: thanks. And if we are using nova notifications it solves the problem?
08:15:42 <nbloom> not quite, nova notifications feature notifies vitrage via nova for every new event that's happening e.g. creation\deletion of a host meaning that if nothing happened during the polling interval, nothing will show for until the next polling interval
08:15:57 <ifat_afek> I see, thanks
08:16:31 <alexey_weyl> I will update
08:16:46 <alexey_weyl> we are working on the Heat datasource
08:17:21 <alexey_weyl> we will finish until tommorow
08:17:46 <alexey_weyl> we will support heat stacks and heat resources and connect them into the vitrage graph
08:18:23 <alexey_weyl> and on top of that we will be able to add our templates for different behaviours and scenarios
08:19:00 <alexey_weyl> in addition hopefully I will add finally another tempest rule in the gate for the different datasources
08:19:17 <alexey_weyl> today we have only the topology tempests at the gate
08:19:33 <alexey_weyl> thats it
08:19:40 <ifat_afek> thanks, sounds great
08:20:05 <ifat_afek> note that the gate is exteremely busy today. there is a queue of almost 24 hours…
08:21:29 <elisha_r> I have an update
08:22:14 <elisha_r> for those of you who have not seen the mailing list, alexey_weyl did some research into the size of our updates
08:22:21 <elisha_r> get_all and updates, that is
08:23:02 <elisha_r> seems that from Nova each instance comes with about 800K (!!) of data, most of it not important for Vitrage
08:23:22 <elisha_r> However, for large systems this could cause some performance issues, especially during the startup phase
08:23:51 <elisha_r> so, we've sent out questions to the Nova team to see if this could be reduced
08:23:57 <elisha_r> will update once we know more
08:24:01 <elisha_r> Also
08:24:18 <elisha_r> We plan on removing the validation for relationship types
08:24:58 <elisha_r> this way, when adding new datasources, and even more significantly when adding entities in the static-physical config file, there is much more flexibility
08:25:47 <elisha_r> however, its important to keep in mind that in the first phase, this flexibility will mean more chances of spelling mistakes in the different files (config and templates) to cause problems
08:26:27 <elisha_r> the correct solution should probably be to dynamically collect all the relationships from the config files and datasources and validate the templates against these
08:26:48 <elisha_r> I plan on opening a BP for this
08:26:51 <elisha_r> that's it
08:27:11 <ifat_afek> elisha_r: I agree, let’s fix it for Ocata, and at the moment let’s just remove the validations
08:27:13 <ifat_afek> thanks
08:29:59 <ifat_afek> Any other updates?
08:32:39 <Arik> From QA side I mainly tested so far the basic functionality of Vitrage. I checked that all the relevant datasources are being fetched each polling and that I see them clearly in Vitrage UI.
08:33:59 <ifat_afek> cool, thanks
08:34:06 <ifat_afek> any interesting bugs?
08:34:43 <Arik> As for now, I am starting to test Vitrage more deeply by creating different kind of alarms manually and see that they are fetched by Vitrage.
08:35:10 <ifat_afek> ok, thanks
08:35:24 <ifat_afek> anyone else wants to update?
08:36:08 <Arik> I did found several bugs which already solved or being investigated as we speak. some bugs regarding the upstart of the vitrage graph process in certain high availability scenarios.
08:36:51 <ifat_afek> I guess you are talking about the bug that nbloom is working on?
08:38:30 <Arik> Yes, indeed. There was also integration bug with Zabbix datasource not being fetched properly. However, it seems that simple restart to Vitrage graph service solved most of the issues.
08:38:53 <ifat_afek> ok, great. anything else?
08:39:45 <ddeja> I'm not a Vitrage developer, but I saw your mail on mailing list regarding host evacuation http://lists.openstack.org/pipermail/openstack-dev/2016-August/101648.html
08:40:03 <ifat_afek> ddeja: welcome :-)
08:40:06 <Arik> We also found that users without the right permissions getting exceptions when trying to open vitrage UI.   that's all for now. Still working on it.
08:40:18 <ifat_afek> Arik: thanks!
08:40:27 <ddeja> I work on automation of host evacuate for more than a year now
08:40:43 <ddeja> and wanted to ask what are your plan with this functionality?
08:41:18 <ifat_afek> ddeja: do you know a bit about vitrage?
08:41:26 <ddeja> ifat_afek: not much
08:41:52 <ddeja> from what I understand, it can raise alarm baised on some events
08:42:50 <ifat_afek> ok, so in two lines: the main purpose of Vitrage is to collect resources and alrams, and provide insights about them. one use case is to show RCA (root cause analysis) information. another use case is to raise “deduced alarms”. e.g. in case of host NIC failure, we can raise alarms on every instance running on this host, that the instance in unreachable
08:43:44 <ddeja> ifat_afek: OK, I see. And how 'raising the alarm' connets with taking actions? Is it also part of Vitrage?
08:44:18 <ifat_afek> a few weeks ago, we added support for “mark-down” functionality. In Vitrage templates, you can say that in case a specific severe alarm was raised on a host(e.g. by nagios or zabbix), you would like to notify nova that the host is down. nova has a specific force-down api for this purpose
08:44:40 <ifat_afek> now, I’m about ot add an option to also call nova and ask to evacuate all instances running on this host
08:44:54 <ddeja> ifat_afek: you mean to use nova-host-evacuate?
08:45:46 <ifat_afek> nova-host-evacuate is a cli, that calls nova evacuate api for every instance. I can’t call the cli directly from vitrage, so I call nova evacuate for every instance
08:45:56 <ddeja> ifat_afek: oh, I see
08:46:17 <ifat_afek> I’ll be happy to understand what your project is about, as you said you’ve been working on evacuate as well
08:46:39 <ddeja> basically, there is no project unfortunatelly
08:46:55 <ddeja> but there is a gropu of people metting as a 'HA-community'
08:47:08 <ifat_afek> as part of openstack?
08:47:14 <ddeja> ifat_afek: yes
08:47:33 <ifat_afek> and do you have any insights about it?
08:47:52 <ddeja> in Austin we decided to use project Mistral to perform the 'evacuate all the instances task'
08:48:02 <ifat_afek> I also know that this issue is discussed as part of OPNFV Doctor project
08:48:18 <ddeja> ifat_afek: yes, I'm aware of that project too
08:48:37 <ddeja> I'm also aware of at least two more soulutions
08:48:41 <ifat_afek> and are you working it already? I have some colleagues working on mistral
08:48:52 <ddeja> ifat_afek: I'm core in Mistral ;)
08:49:44 <ddeja> but in case of automatic host evacuation, I'm aware of 4 or 5 projects (inside or outside openstack) that are aiming to provide such functionality
08:50:11 <ddeja> and as an Ha-community we would like to get everybody to collaborate on one solution
08:50:36 <ddeja> so that is why I wanted to meet with you guys and know a little bit more about your plans :)
08:51:13 <ifat_afek> indeed it sounds like we should collaborate :-)
08:51:38 <ifat_afek> do you have any other questions about what we’re doing?
08:52:06 <ifat_afek> and how do you suggest we proceeed?
08:52:08 <ddeja> two short questions
08:52:19 <ddeja> 1) is it possible to collect alarams from pacemaker?
08:52:38 <ddeja> 2) us it possible to call mistral from vitrage?
08:54:08 <ifat_afek> 1) vitrage provides a pluggable infrastructure for getting alarms, either via polling or using push notifications. if pacemaker has an api or notifications, then it should be quite easy to add its alarms to vitrage
08:54:12 <ifat_afek> as a new datasource
08:54:35 <ifat_afek> and as a second phase, we can add vitrage templates to correlate pacemaker alarms with other alarms in the cloud
08:54:36 <ddeja> ifat_afek: oh, that's great
08:54:40 <ifat_afek> sounds cool
08:55:39 <ifat_afek> 2) sure. vitrage also has pluggable notifiers, so if you have a specific use case that involves mistral then it should be easy to integrate
08:56:02 <ifat_afek> I’m from nokia, and we have a bunch of mistral guys here :-)
08:56:22 <ddeja> ifat_afek: OK, thanks. I need to learn more about vitrage, but I will surely come back to you with more questions
08:56:42 <ifat_afek> ddjea: cool. you are welcome to ask more questions, either in irc or in the mailing list
08:56:57 <ddeja> and I hope to have a talk with you in Barcelona to talk about this specyfic use-case
08:57:34 <ifat_afek> we have a lot of documentation and some demos in vitrage wiki page: https://wiki.openstack.org/wiki/Vitrage
08:57:45 <ddeja> thanks for your time and sorry for overtaking your meeting ;)
08:58:10 <ifat_afek> no problem, I was just about to move to “Open Discussion” topic anyhow :-)
08:58:28 <ifat_afek> and I see that we are running out of time. let me know if you have any other questions
08:59:14 <ifat_afek> goodbye everyone
08:59:31 <ifat_afek> #endmeeting