08:01:02 #startmeeting vitrage 08:01:02 Meeting started Wed Aug 24 08:01:02 2016 UTC and is due to finish in 60 minutes. The chair is ifat_afek. Information about MeetBot at http://wiki.debian.org/MeetBot. 08:01:03 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 08:01:05 The meeting name has been set to 'vitrage' 08:01:15 Hi everyone 08:01:32 hello 08:02:30 Hi! 08:02:44 Hey guys 08:02:58 hi 08:03:01 hi 08:03:02 hi! 08:03:59 #topic Status and Updates 08:04:09 Newton3 feature freeze is tomorrow 08:04:24 I marked vitrage version 1.0.0 this morning 08:04:39 Hope to mark another version with the latest updates tomorrow. After that, we are supposed to push only critical bug fixes into stable/Newton branch 08:05:04 In the last TC meetings, they talked about defining cross project goals for Ocata. The two goals they defined for now are to support python 3.5, and to remove copies of incubated oslo code 08:05:12 #link https://review.openstack.org/349069 08:05:21 #link https://review.openstack.org/349070 08:05:36 Another issue is the design summit in Barcelona. We need to decide how many design sessions we want. I’ll email you the details 08:06:02 My updates: I’m almost done with the auto-evacuation of vms from a failed host. Will push it today 08:06:13 I created a blueprint: 08:06:20 #link https://blueprints.launchpad.net/vitrage/+spec/call-nova-host-evacuate 08:06:29 I also created a blueprint according to our discussion in the last meeting: 08:06:40 #link https://blueprints.launchpad.net/vitrage/+spec/zabbix-trigger-configure-affected-resource 08:06:46 Who wants to update? 08:07:22 I will 08:08:01 As part of the Vitrage testing we've been doing lately, i found a few issues regarding stability and commited a fix 08:08:49 As testing progresses (with help from Arik) we should opens more issues in launchpad 08:09:18 Seems that we are much more stable in 1.0 :) 08:09:33 :) 08:10:14 Hi, I'll update next 08:11:49 I'm currently working on a bug which involved upload timing.. it sometimes happen that upon starting, nova client in being initialized in Vitrage as a datasource but nova itself hasn't fully been loaded yet which result in a broken client. 08:13:30 Now, we poll info from nova periodically every 10 min so what happens is that for 10 minutes no information from nova is received which is not good of course..I'm trying to solve it either by retrying failed polling or by rescheduling the polling. updates to follow.. 08:14:08 nbloom: thanks. And if we are using nova notifications it solves the problem? 08:15:42 not quite, nova notifications feature notifies vitrage via nova for every new event that's happening e.g. creation\deletion of a host meaning that if nothing happened during the polling interval, nothing will show for until the next polling interval 08:15:57 I see, thanks 08:16:31 I will update 08:16:46 we are working on the Heat datasource 08:17:21 we will finish until tommorow 08:17:46 we will support heat stacks and heat resources and connect them into the vitrage graph 08:18:23 and on top of that we will be able to add our templates for different behaviours and scenarios 08:19:00 in addition hopefully I will add finally another tempest rule in the gate for the different datasources 08:19:17 today we have only the topology tempests at the gate 08:19:33 thats it 08:19:40 thanks, sounds great 08:20:05 note that the gate is exteremely busy today. there is a queue of almost 24 hours… 08:21:29 I have an update 08:22:14 for those of you who have not seen the mailing list, alexey_weyl did some research into the size of our updates 08:22:21 get_all and updates, that is 08:23:02 seems that from Nova each instance comes with about 800K (!!) of data, most of it not important for Vitrage 08:23:22 However, for large systems this could cause some performance issues, especially during the startup phase 08:23:51 so, we've sent out questions to the Nova team to see if this could be reduced 08:23:57 will update once we know more 08:24:01 Also 08:24:18 We plan on removing the validation for relationship types 08:24:58 this way, when adding new datasources, and even more significantly when adding entities in the static-physical config file, there is much more flexibility 08:25:47 however, its important to keep in mind that in the first phase, this flexibility will mean more chances of spelling mistakes in the different files (config and templates) to cause problems 08:26:27 the correct solution should probably be to dynamically collect all the relationships from the config files and datasources and validate the templates against these 08:26:48 I plan on opening a BP for this 08:26:51 that's it 08:27:11 elisha_r: I agree, let’s fix it for Ocata, and at the moment let’s just remove the validations 08:27:13 thanks 08:29:59 Any other updates? 08:32:39 From QA side I mainly tested so far the basic functionality of Vitrage. I checked that all the relevant datasources are being fetched each polling and that I see them clearly in Vitrage UI. 08:33:59 cool, thanks 08:34:06 any interesting bugs? 08:34:43 As for now, I am starting to test Vitrage more deeply by creating different kind of alarms manually and see that they are fetched by Vitrage. 08:35:10 ok, thanks 08:35:24 anyone else wants to update? 08:36:08 I did found several bugs which already solved or being investigated as we speak. some bugs regarding the upstart of the vitrage graph process in certain high availability scenarios. 08:36:51 I guess you are talking about the bug that nbloom is working on? 08:38:30 Yes, indeed. There was also integration bug with Zabbix datasource not being fetched properly. However, it seems that simple restart to Vitrage graph service solved most of the issues. 08:38:53 ok, great. anything else? 08:39:45 I'm not a Vitrage developer, but I saw your mail on mailing list regarding host evacuation http://lists.openstack.org/pipermail/openstack-dev/2016-August/101648.html 08:40:03 ddeja: welcome :-) 08:40:06 We also found that users without the right permissions getting exceptions when trying to open vitrage UI. that's all for now. Still working on it. 08:40:18 Arik: thanks! 08:40:27 I work on automation of host evacuate for more than a year now 08:40:43 and wanted to ask what are your plan with this functionality? 08:41:18 ddeja: do you know a bit about vitrage? 08:41:26 ifat_afek: not much 08:41:52 from what I understand, it can raise alarm baised on some events 08:42:50 ok, so in two lines: the main purpose of Vitrage is to collect resources and alrams, and provide insights about them. one use case is to show RCA (root cause analysis) information. another use case is to raise “deduced alarms”. e.g. in case of host NIC failure, we can raise alarms on every instance running on this host, that the instance in unreachable 08:43:44 ifat_afek: OK, I see. And how 'raising the alarm' connets with taking actions? Is it also part of Vitrage? 08:44:18 a few weeks ago, we added support for “mark-down” functionality. In Vitrage templates, you can say that in case a specific severe alarm was raised on a host(e.g. by nagios or zabbix), you would like to notify nova that the host is down. nova has a specific force-down api for this purpose 08:44:40 now, I’m about ot add an option to also call nova and ask to evacuate all instances running on this host 08:44:54 ifat_afek: you mean to use nova-host-evacuate? 08:45:46 nova-host-evacuate is a cli, that calls nova evacuate api for every instance. I can’t call the cli directly from vitrage, so I call nova evacuate for every instance 08:45:56 ifat_afek: oh, I see 08:46:17 I’ll be happy to understand what your project is about, as you said you’ve been working on evacuate as well 08:46:39 basically, there is no project unfortunatelly 08:46:55 but there is a gropu of people metting as a 'HA-community' 08:47:08 as part of openstack? 08:47:14 ifat_afek: yes 08:47:33 and do you have any insights about it? 08:47:52 in Austin we decided to use project Mistral to perform the 'evacuate all the instances task' 08:48:02 I also know that this issue is discussed as part of OPNFV Doctor project 08:48:18 ifat_afek: yes, I'm aware of that project too 08:48:37 I'm also aware of at least two more soulutions 08:48:41 and are you working it already? I have some colleagues working on mistral 08:48:52 ifat_afek: I'm core in Mistral ;) 08:49:44 but in case of automatic host evacuation, I'm aware of 4 or 5 projects (inside or outside openstack) that are aiming to provide such functionality 08:50:11 and as an Ha-community we would like to get everybody to collaborate on one solution 08:50:36 so that is why I wanted to meet with you guys and know a little bit more about your plans :) 08:51:13 indeed it sounds like we should collaborate :-) 08:51:38 do you have any other questions about what we’re doing? 08:52:06 and how do you suggest we proceeed? 08:52:08 two short questions 08:52:19 1) is it possible to collect alarams from pacemaker? 08:52:38 2) us it possible to call mistral from vitrage? 08:54:08 1) vitrage provides a pluggable infrastructure for getting alarms, either via polling or using push notifications. if pacemaker has an api or notifications, then it should be quite easy to add its alarms to vitrage 08:54:12 as a new datasource 08:54:35 and as a second phase, we can add vitrage templates to correlate pacemaker alarms with other alarms in the cloud 08:54:36 ifat_afek: oh, that's great 08:54:40 sounds cool 08:55:39 2) sure. vitrage also has pluggable notifiers, so if you have a specific use case that involves mistral then it should be easy to integrate 08:56:02 I’m from nokia, and we have a bunch of mistral guys here :-) 08:56:22 ifat_afek: OK, thanks. I need to learn more about vitrage, but I will surely come back to you with more questions 08:56:42 ddjea: cool. you are welcome to ask more questions, either in irc or in the mailing list 08:56:57 and I hope to have a talk with you in Barcelona to talk about this specyfic use-case 08:57:34 we have a lot of documentation and some demos in vitrage wiki page: https://wiki.openstack.org/wiki/Vitrage 08:57:45 thanks for your time and sorry for overtaking your meeting ;) 08:58:10 no problem, I was just about to move to “Open Discussion” topic anyhow :-) 08:58:28 and I see that we are running out of time. let me know if you have any other questions 08:59:14 goodbye everyone 08:59:31 #endmeeting