15:06:29 #startmeeting third-party 15:06:29 Meeting started Mon Jul 11 15:06:29 2016 UTC and is due to finish in 60 minutes. The chair is anteaya. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:06:31 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:06:33 The meeting name has been set to 'third_party' 15:06:36 thanks lennyb 15:06:46 I was deep in figuring out storyboard apis 15:06:48 :) 15:07:16 hi anteaya 15:07:23 how are you today, mmedvede? 15:07:30 all good, thanks 15:07:36 oh I'm so glad 15:07:39 nice day here 15:07:53 had a deer on my lawn last night for the longest time 15:08:18 does anyone have anything they would like to discuss today? 15:08:19 lots of deer and rabbit around where I am 15:08:24 :) 15:08:26 mmedvede: nice 15:08:27 :) 15:08:31 I love watching them 15:08:52 hi there 15:09:00 hey wznoinsk 15:09:12 #info http://lists.openstack.org/pipermail/openstack-dev/2016-July/098992.html pin nodepool 15:09:19 hi anteaya 15:09:30 so have you all read asselin__'s post to dev about pinning nodepool? 15:09:37 wznoinsk: nice to see you 15:10:02 hi. i'm here but double-booked. 15:10:14 anteaya, good to be here ;-) even tho still on vacation 15:10:26 asselin__: thanks for the post to dev 15:10:51 wznoinsk: oh my, well glad you are here but I hope you enjoy vacation 15:11:05 does anyone have anything they would like to discuss today? 15:11:20 I have a question regarding OpenStack infra monitoring 15:11:27 mmedvede: go ahead 15:11:30 (trying to set something up myself) 15:11:46 does the team use any automated notification system? 15:11:50 no 15:11:52 or looked into having one? 15:12:03 humans are much faster than any automated notifaction system 15:12:04 anteaya, don't mind this little break to get back and set my mind into a technical mode 15:12:21 we have purposely not wanted any automatic notification system 15:12:32 none of the infra team has a pager, nor do we want one 15:12:35 mmedvede: what are you looking for? 15:12:40 wznoinsk: fair enough 15:13:03 some infra team member purposely choose to work in this environment as a way of leaveing a pager behind 15:13:08 anteaya, this maybe due to amount of notifications teams may get and get around to do the 'really important' ones, isn't it? 15:13:30 I ideally am looking for alerts being sent to irc, triggered by anomaly detection in metrics 15:13:45 wznoinsk: well there a a couple of reasons, one is a lifestyle choice, when our infra folks are online they are responding to things in channel as fast as possible 15:13:55 when they are offline, they really need to be offline 15:13:56 anteaya: agree on no pager. I am talking about irc alerts 15:14:00 mmedvede, what monitoring system you have as the source of these alerts? 15:14:06 mmedvede: what kind of alerts? 15:14:38 wznoinsk: none yet. was considering graphite-beacon initially (simple thresholds trigger a script that sends irc message) 15:15:06 anteaya: any sorts of alerts, e.g. zuul lost connection to OpenStack gerrit 15:15:45 anteaya: but my question is mostly to find out if infra considered any tooling (as generally you chose good tools :) ) 15:15:54 zuul would reconnect then would it not, if it lost a connection to gerrit? 15:16:11 ah, well our irc bots are in need of an overhaul 15:16:22 but noone has had the time & interest to do it 15:16:48 as our irc bots are fraught with issues I think due to threading which requires us to pin a bot to a server 15:16:58 then if the server goes down we lose the bot 15:17:36 so in terms of irc messaging tool I believe that infra does not feel we are using the latest bright and shiny 15:17:52 mmedvede, can an event be used (like running an arbitrary command) when alert occurs in graphite? 15:18:45 wznoinsk: yes, but you need external tool (e.g. graphite-beacon) 15:19:15 on the other hand, the more puzzles in the 'notification' system the mroe prone it is it will not work in some cases 15:19:51 my setup was nagios + nagstamon (windows app that sits on your desktop/systray and makes flashing/noise when nagios sees someting bad) 15:19:52 anteaya: ok, that is more or less what I thought 15:20:27 mmedvede: it is an interesting discussion, I'm curious as to your motivation for it, what are you looking to fix or address? 15:20:43 I'd recommend something as simple as that for notification + externally hosted script to check the monitoring system (nagios, graphite) itself ;-) 15:20:48 wznoinsk: I found nagios so far hard to manage 15:21:26 mmedvede, I'm not promoting nagios in any form (it's just what I was used to and didn't want to learn new monitoring tool back then) 15:21:51 anteaya: main motivation is to react quickly to things going wrong. we do not have downstream users to complain, so sometimes it takes awhile to catch things 15:22:03 bad wording, again: I like nagios, but the above was just an example of noitification simplicity (to avoid problems with the notification system itself) 15:22:34 mmedvede: ah for your personal use, yes that makes sense 15:23:07 mmedvede: um, how many of your tools send you email alerts on failures? 15:23:11 wznoinsk: I understand what you are saying :) 15:24:01 mmedvede, good ;-) 15:24:06 anteaya: I did not configure emails on failure. I think after awhile you would start ignoring them, as noise 15:24:35 well we are talking about a system for you to know when your system is failing, are we not? 15:25:02 if the tools have the ability to email you, have you tried to figure out how to get that feature to work for you? 15:25:02 mmedvede: we send emails on 5 failures 15:25:26 anteaya: I thought you meant CI failures 15:25:35 like jenkins test failed 15:25:45 I don't want your emails 15:26:00 but I thought we were talking about solving a problem you have 15:26:07 so get your tools to email you 15:26:21 and you can configure it as you see fit, as lennyb suggests 15:26:32 lennyb: that works to a degree, but you might wait 5 hours before you get email 15:27:16 anteaya: getting a tool to email is not a problem. Main puzzle piece is what to use to decide when to send an alert 15:27:26 there are a lot of options 15:28:12 mmedvede: correct. my assumption is that a single failure is a developer responsibility, if there are a number of failures, that probably means that the problem is my CI. I still have in my todo list a script to compare my CI failure to the others 15:28:18 mmedvede: ah, yes I do agree 15:29:51 anteaya, do we have a way to know if the zuul of infra CI is down ASAP? I'm thinking maybe that is mmedvede 's question? 15:30:08 mmedvede: is that your question? 15:30:27 mmedvede, does graphite have a configuration on how many attempts a check has to fail before its marked as a WARNING/CRITICAL? 15:31:06 watanabe_isao: I brought up our CI's zuul as example 15:31:30 there are many more things we need to monitor, zuul was one of them that misbehaves frequently 15:31:58 wznoinsk: graphite is not monitoring tool, it is aggregation. So it does not have alerts 15:32:24 wznoinsk: so someone wrote graphite-beacon to monitor graphite metrics 15:32:29 mmedvede, I see. Well in my third party CI zuul hungs before due to some issue, and I need to check it every day now, which is a nightmare. 15:32:30 (there are many others) 15:32:37 mmedvede, is that the graphite youre talking about ? http://graphiteapp.org/ ? 15:32:52 mmedvede: well it sounds like there is no existing thing that does what you are looking for, my suggestion would be to put something in an etherpad that specificies _exactly_ what you want, since we seem to be getting lost guessing due to generalities 15:33:21 wznoinsk: in the context of OpenStack infra - http://graphite.openstack.org/ 15:33:31 then once you get a few people to read the etherpad who can then repeat back what you say you need such that they understand what you want, post to the infra list 15:33:45 since I will be honest, currently I don't know what it is you want 15:34:25 anteaya: it is ok, I know what I want to try already. And this discussion confirmed I did not missed some obscure super-cool tool everyone is using 15:34:48 ah that was the point of this conversation 15:34:50 okay great 15:34:55 glad you got what you needed 15:35:11 and yeah, I don't think you are missing out in any of the latest hotness 15:35:35 does anyone have anything more for this discussion? 15:35:41 anteaya, are we only talking about 3rd party tol here? May I ask something about devstack-gate? 15:35:55 watanabe_isao: you can ask 15:36:00 this is the third-party meeting 15:36:21 we use devstack-gate (some of us) 15:36:23 so anything you ask will be viewed in the context of third party operators and their activitiese 15:36:27 mmedvede, ok - I was only reading the 'about' section of graphite, sometimes it's hard to tailor a tool for data collecting/metrics for the monitoring/notifications purposes... would graphite be your only source of data you want to alert/notify on? or would you want to monitor output of different kinds (i.e.: particular processes on a machine, run a completely custom check etc.) ? 15:36:53 Does anyone considered about a mid_test_hook? 15:37:07 wznoinsk: right now it seems graphite (statsd metrics) is a good way to aggregate everything 15:37:20 what do you want a test hook to do in the middle of a test? 15:37:22 With is used to execute some commands before tempest 15:37:46 watanabe_isao: do you mean after devstack, but before tempest? 15:38:02 anteaya, to set up the environment, like add a node to ironic. 15:38:09 mmedvede, yes 15:38:11 watanabe_isao, if you're thinking what I think you're thinking about you probably want to use local.sh that devstack itself runs at the very end 15:38:34 watanabe_isao: we actually have ironic job that does something like that 15:38:40 wznoinsk, I know it also can e.x. add the node to ironic 15:39:04 we use pre_test_hook to create config with baremetal node information 15:40:00 wznoinsk, for example you want to add a node to ironic as late as you can. But devstack install takes too long time. 15:41:03 mmedvede, when I think about it... I agree, I could probably tailor nearly all of my custom scripts to output some form of metric to graphite... monitoring tools usually have graphing tools built-in tho... my main aim is to monitor, alert/notify 2nd to see historical graphs hence I use nagios 15:42:02 watanabe_isao, sorry, I can't help you here, don't use Ironic here yet 15:42:41 watanabe_isao: we considered adding node later, but for POC job ended up de-facto adding it before devstack 15:42:47 but shouldn't a stacked node, given OS_URL and other links to the controller/keystone, register itself up (note: lack of Ironic knowledge here) 15:42:49 wznoinsk, it's ok. well in my use case, I also want to run some local scripts before tempest 15:42:57 watanabe_isao: did you consider a devstack plugin? 15:43:27 mmedvede, no just some local scripts. 15:44:10 mmedvede, but it is a good point I think. 15:45:43 anything more on this topic or the monitoring one? 15:45:52 we seemed to be doing both at the same time 15:46:21 does anyone have anything else they would like to discuss today? 15:46:41 anteaya, we've agreed with moshele from Mellanox to submit a barcelona talk abstract about SRIOV/NFV CI setups we have... we both use openstackci toolset... I'd like to bring it up with os infra guys, is tomorrows 3rdparty WG meeting a good one for this? 15:46:42 anteaya, one more on ci-watch, please. 15:47:09 wznoinsk, you first. 15:47:32 watanabe_isao, go with yours, given the time 15:47:46 wznoinsk, thanks 15:48:30 anteaya, does any of us going to give ci-watch a filter? 15:48:44 what filter might they give? 15:49:37 hey all mmedvede is on his way back. Lost power and is reconnecting 15:49:48 mjturek1: thank you 15:49:59 np! 15:50:00 anteaya, my CI in cinder is always at below, and I don't want to see some CI's result. I'm talking about a filter to stop showing some results. 15:50:23 ah filter out results you don't want 15:50:35 anteaya, yes. 15:50:43 watanabe_isao: well the ci-watch code is using gerrit, so you could offer a patch 15:51:14 even if you aren't sure of the code, write a clear commit message saying what you want the patch to do, and hopefully kind reviewers can help you get the patch in shape 15:51:43 anteaya, got it. Currently it is just a idea. will do it. 15:52:26 #link http://git.openstack.org/cgit/openstack/third-party-ci-tools/ 15:52:33 I believe that is the repo 15:52:49 great, more on this or shall we move to wznoinsk's question? 15:53:06 anteaya, thank you. yes. please go. 15:53:11 thanks 15:53:16 wznoinsk: you are up 15:54:18 wznoinsk: I believe your question was about making the infra team aware of a talk you have submitted? 15:55:40 well a few things, I personally don't endorse anyone else's talk proposal, lest I be inindated with requests 15:55:41 you, os infra, guys would know best what community is looking for about 3rdparty CI setups so I wanted to have a chat about our yet-general sriov/nfv CI talk in barcelona on tomorrows 3rd party WG meeting 15:56:25 if you want to discuss the content of the proposal prior to proposing that is fine, you can ask questions in the infra channel 15:56:59 if you want the whole team to discuss something (both men and women, not just the guys) then you can add an agenda item to the infra team meeting: https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 15:57:15 ok, I'll share more on nthis soon then 15:57:26 I have a lot of meetings already and am hard pressed to attend more 15:57:48 I can't speak for other infra team members but if there is someone you would like to invite you are welcome to ask them 15:58:18 thanks 15:58:22 more on this topic? 15:58:39 nope, thanks 15:58:42 thank you 15:58:49 anyone with anything else today? 15:58:55 about 1 minutes remaining 15:59:01 1 minute 15:59:07 anteaya, sorry that I'm new to this meeting. Do we have another meeting, tomorrow? 15:59:20 watanabe_isao: thanks for attending, glad you have you 15:59:40 watanabe_isao: all openstack meetings are listed here: http://eavesdrop.openstack.org/ 16:00:07 #link http://eavesdrop.openstack.org/#Third_Party_Meeting 16:00:18 #link http://eavesdrop.openstack.org/#Third_Party_Working_Group_Meeting 16:00:28 those would be the links you are looking for 16:00:31 and time to end 16:00:32 anteaya, ohhh, 16:00:32 Third Party Working Group Meeting 16:00:43 thank you everyone see you next week 16:00:46 #endmeeting