15:00:51 #startmeeting ceilometer 15:00:51 Meeting started Thu Oct 3 15:00:51 2013 UTC and is due to finish in 60 minutes. The chair is jd__. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:53 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:55 The meeting name has been set to 'ceilometer' 15:01:00 o/ 15:01:04 o/ 15:01:05 o/ 15:01:05 o/ 15:01:09 o/ 15:01:12 hi guys 15:01:22 o/ 15:01:26 o/ 15:01:26 o/ 15:01:28 o/ 15:01:29 o/ 15:02:26 #topic Havana release 15:02:39 a few words about Havana, so RC1 has been released yesterday 15:02:41 * dragondm applauds 15:02:45 Wahoo! 15:02:47 \o/ 15:02:51 o/ 15:02:52 nice work, everyone! 15:02:54 congrats y'all 15:03:07 yaaay 15:03:16 the milestone-proposed branch has been cut to at some point include fixes if needed, if we need to respin a rc2 release 15:03:23 it's time to test heavily! 15:03:34 congratulations to you all too! :) 15:03:48 jd__: re. needing an rc2 release ... 15:04:04 on https://review.openstack.org/47542 we discussed considering the leaking of admin-ness to non-admin users as a separate bug 15:04:05 you can merge patches in master once again and that'll be in Icehouse 15:04:16 sounds like an RC2 candidate? 15:04:58 eglynn: maybe 15:05:28 jd__: k, I'll file a bug and work on a fix 15:05:42 I think I'd like to see the fix before deciding 15:05:45 jd__: (we can decide later if target'd at Icehouse or RC2) 15:05:48 (I know that sounds weird) 15:05:58 jd__: no, that's fair enough 15:06:02 cool :) 15:06:49 and for people not following at home, I think I'll still be the benevolent democractic dictator for the Icehouse release 15:07:21 * dragondm hands jd__ a handfull of medals and a funny hat. 15:07:26 hehe 15:07:26 with 100% of the vote, Stalin would have been proud of that ;) 15:07:39 Why'd you take my hat? 15:07:49 eglynn: 100 % of 0 vote :-) 15:08:01 Lol 15:08:05 :) 15:08:11 jd__: you just broke maths. 15:08:38 :-) 15:08:40 #topic Release python-ceilometerclient? 15:08:49 I'd like to include https://review.openstack.org/#/c/49551 if poss 15:09:03 works for me 15:09:06 cool 15:09:21 anybody ping me or eglynn to release 15:09:33 eglynn: so ping me or yourself to this to happen 15:09:38 eglynn, cool 15:09:43 cool 15:10:35 #topic Talking about Hardware Agent 15:10:41 #link https://blueprints.launchpad.net/ceilometer/+spec/monitoring-physical-devices 15:10:47 lsmola: floor is yours 15:10:54 I have prepared several questions 15:10:59 42! 15:11:13 i will just throw them in so we can start discuss that 15:11:22 1. Should it act as a central Agent? Or should it be deployed on every hardware. It seems that the original author planned both: 15:11:23 http://www.cloudcomp.ch/2013/07/hardware-extension-for-ceilometer/ 15:11:23 So the settings determining what is visible for the agent would reside in the agent somehow? 15:11:23 2. Regarding security, is it better to have an Agent on each baremetal? Or rather have there only some daemon like snmpd and allowing communication from and to central agent. 15:11:23 3. There is very little documentation about how to set this up. I couldn't find, from where the agent takes the list of resources it should poll. E.g. list of routers or list of baremetals it can see. Does anybody know? 15:11:29 4. About the IMPI inspector. It is probable that he IPMI credentials will reside in Ironic (at least for Undercloud), so the inspector should be able to talk both to Ironic API, or to IPMI directly, right? So it works also when Ceilometer knows the IPMI credentials. 15:11:32 ow my eyes 15:11:39 hehe 15:11:56 and you got only 49 minutes left to answer this 15:12:01 there was a warning :-) you have to catch 15:12:21 what are the security implications of running an agent on the server with an image owned by the tenant? 15:12:31 and how would we get the agent into the image in the first place? 15:13:01 dhellmann, our primary use case is to use it in udercloud via tripleo 15:13:13 ah, ok, that makes more sense :-) 15:13:15 dhellmann, so it would be image element of tripleo 15:13:26 sure 15:13:26 lsmola: "should it act as a central agent" == "should it not run on-host" ? 15:13:58 * eglynn confused on the "central agent" reference in #1 15:14:03 eglynn, if i get it correctly, central agent is running on Control node, right? 15:14:26 eglynn, so there is only one agent, polling everything 15:14:48 lsmola: well in theory it could be running anywhere, but yes you could call it a control node, and yes it polls everything 15:14:56 there would have to be something on the host answering the poll though. 15:14:57 jd__, btw. how is it with scaling of central agent? i thing that was the biggest concern of tripleo guys 15:15:13 lsmola: so you mean, load the hardware monitoring piece as an extension into the existing central agent? 15:15:15 would a central agent use IPMI and SNMP? 15:15:23 sandywalsh, for nsmp, it will be snmpd 15:15:27 hi guys, sorry I'm late... wanted to let you know I should have the Alembic removal patch updated and done today. 15:15:33 lsmola: gotcha 15:15:37 sandywalsh, there is also image element for this in tripleo 15:15:51 lsmola: I should be working on it soon 15:16:04 sandywalsh, though you have to allow communication in firewall 15:17:00 ok so, anybody has an idea how the architecture should work? 15:17:06 yes, there is also the rackspace agent, which is open sourced. Perhaps could be manipulated for guest-side reporting. It's available for windows and linux. 15:17:12 I'm not sure how we could handle baremetal securely using either mode for any use case other than tripleo. Are we trying for tenant images at all? 15:17:27 I think it's very xen-specific currently though 15:17:53 sandywalsh: yah, the rs agent uses xen store for communication. 15:18:12 yeah :/ 15:19:27 hm 15:19:34 if the user rans on baremetal, either the monitoring can be done by IPMI or the like, otherwise the user could have an agent posting via the API untrusted metrics that wouldn't not used for billing but could be used for stuff like autoscaling 15:19:36 #link https://github.com/rackerlabs/openstack-guest-agents-unix 15:19:48 #link https://github.com/rackerlabs/openstack-guest-agents-windows-xenserver 15:20:28 is IPMI alone sufficient though? 15:21:01 (from what we discussed previously, seemed that was limited to temperature, fan speed, voltage etc.) 15:21:02 eglynn, i am not sure, from metrics i saw, it didn't have everything 15:21:27 eglynn, there is a full list in the IPMI blueprint 15:22:27 jd__, the other thing is, that we might want to read syslog or other things from baremetal 15:22:30 lsmola: you mean the monitoring-physical-devices BP? 15:22:31 eglynn: it may not indeed 15:22:43 jd__, so it would be much easier with agent on the baremetal 15:22:52 lsmola: well everything that comes from baremetal cannot be trusted 15:23:00 you can run an agent on baremetal and uses the API to post samples 15:23:03 but not everyone wants an agent on their machines 15:23:12 tradeoff 15:23:18 eglynn, https://blueprints.launchpad.net/ceilometer/+spec/ipmi-inspector-for-monitoring-physical-devices 15:23:19 it should just be tagged as something coming from the user, not the operator 15:23:26 yep 15:23:26 sandywalsh: clearly, it'll be optional 15:23:27 lsmola: thanks 15:23:32 eglynn, it should be IMPI inspector for hardware agent 15:24:24 sandywalsh, so it seems to me, it should be matter of configuration 15:24:43 yep, if you want the buttery goodness, you have to install it 15:25:18 sandywalsh, for tripleo, there could be several images prepared, so.. 15:25:41 lsmola: right, our rackspace images have the agent preinstalled. 15:25:57 sandywalsh, yep 15:26:21 sandywalsh, and it could be preconfigured too, (or some configuraton scripts prepared) 15:26:49 some mount a "configuration drive" 15:27:24 so it seems like two alternative modes are possible (on- and off-host) ... are we saying that *both* approaches have their place? 15:27:39 by the way, regarding the 1, I have no idea how it works now, i haven't found anywhere the list of hosts that is polls, anybody have an idea? 15:27:40 (i.e. depends on the cloud deployer's policies) 15:27:57 eglynn, yes 15:27:57 yep, from the cm perspective I don't think things should really change. It's just a source of events. 15:28:18 be it from these agents/pollsters or something like Diamond or a periodic_task 15:28:43 whether it's a part of core CM is perhaps a bigger question 15:28:49 eglynn: I think so indeed 15:28:58 and in the off-host/central-agent case, is the potential for data acquisition more constrained? 15:29:05 should we just focus on providing a solid api for these things 15:29:13 (so the two approaches aren't fully inter-change-able) 15:29:47 (api = http/udp/rpc-cast/whatever) 15:29:47 eglynn: no they wouldn't be 15:30:09 sandywalsh: we do have an HTTP API for posting sample taht should be enough 15:30:32 I don't think the problem is in Ceilometer, it's rather on how to build things around what it offers right now :) 15:30:33 jd__: yep, depends on the volume/frequency, but a good start for sure 15:30:43 jd__, yes exactly 15:31:02 sandywalsh: yeah, I also think there's a good chance your baremetal might not have access to RabbitMQ for example for security reasons 15:31:07 sandywalsh: like different networks 15:31:44 sure, there's going to be a pile of deployment considerations ... too many to deal with. We should just focus on the api. and perhaps some example pollsters/agents 15:31:46 jd__, well for snmp, you need to allow udp communication with snmpd 15:31:49 (imho :) 15:31:56 jd__, if it's remote 15:32:12 lsmola: yeah but that's just for polling, not a problem :) 15:32:24 jd__, ok 15:32:37 does that answer enough of your questions lsmola ? 15:32:47 jd__, well 15:33:04 jd__, I have almost the same amount of confusion :-D 15:33:21 you're welcome. 15:33:29 hah 15:33:29 jd__, ok, so the main thing is, can we leave the hardware agent as it is? 15:33:39 jd__, as a separate from central 15:33:57 jd__, as it can be used as central of per host agent 15:34:02 lsmola: good question, I think we are going to work on the central agent and improve it 15:34:12 having to do that twice for no reason does not sound like a good option to me 15:34:15 jd__, this is a sttarting point for me, to get the hardware agent in 15:34:20 considering the code for the hardware agent is quite the same AFAIK 15:34:38 question is: is there anything you can't do with the central agent right now? 15:35:05 jd__, well i guess you can put central agent to each host too, if you want right? 15:35:26 jd__, is there some horizontal scaling of the central agent? 15:35:45 lsmola: not yet, but this is going to be implemented 15:35:49 jd__, tripleo is afraid of large deployments 15:35:57 jd__, ok, fair enough 15:36:06 (we all are :) 15:36:14 hehe 15:36:21 lsmola: if there is a hardwar agent I think it should be the agent polling things from inside the host and posting stuff to the REST API for example 15:36:36 so it's not polling really hardware, but instances-on-hardware 15:36:46 jd__, yes, or directly to message buss 15:37:05 lsmola: if you an access it, for sure 15:37:10 jd__, yes 15:37:26 so that means basically that you can reuse the central agent 15:37:36 only enable the pollsters that polls locally thinsg like CPU time etc 15:37:47 jd__, ok cool 15:37:49 and write a new publisher to publisher over HTTP 15:37:52 -er 15:38:03 jd__, then there is no reason to have separate hardware agent I guess 15:38:04 something like that (maybe I miss some detail, I'm thinking out loud) 15:38:11 jd__, at least i don't see any 15:38:19 lsmola: maybe there is and we don't see it yet, but for now I don't think so 15:38:24 let's keep things simple if we can! 15:38:24 if there's an agent posting samples to the REST API from on-host, then we have propogate keystone credentials onto every host also, right? 15:38:30 jd__, agree 15:38:44 eglynn: right 15:39:04 eglynn: maybe we could use a new role or something to have a better split on permissions about posting sample from trusted sources 15:39:13 jd__, i guess it will be safer to run only daemons on hosts, that can be polled 15:39:14 that's another (part of the) issue 15:39:39 lsmola: I can't see how safer it would be :) 15:39:46 yeah, we should support anonymous posts 15:39:56 for monitoring ... not billing 15:40:01 jd__: hmmm, yeah a very limited RBAC policy, plus possibly rate limiting to avoid DDoSing 15:40:12 yep 15:40:19 everything should be rate limited 15:40:31 or just use udp 15:40:36 eglynn: I'm pretty sure we can re-use or add a middleware on Oslo for that 15:40:42 eglynn: the DoSing thing 15:40:46 cool enough 15:40:51 but good idea 15:40:59 (yet another part of te problem :-) 15:41:29 lsmola: if you got further point don't hesitate to send a mail to the list I guess 15:41:49 jd__, ok, so my main testing scheme is run the central agent and SNMPd on each host 15:42:13 jd__, i think I can go with this 15:42:25 cool 15:42:34 jd__, i will try to talk with llu about merging of the hardware agent into central agent 15:42:44 lsmola: sounds like a good idea 15:42:47 jd__, he seems to be on vacation 15:42:55 it's too bad llu is not here 15:42:59 ok :) 15:43:00 jd__, so when i catch him 15:43:19 jd__, could you also post the decision to the blueprint? 15:43:56 hm I can try but if I forgot feel free to do it intead 15:44:13 jd__, ok fair enough :-) 15:44:22 #topic Open discussion 15:44:42 jd__, i think that's all regarding hardware agent, I have my starting point 15:44:45 :-) 15:45:31 Hello all. I want to ask about notification via emails. Is it planned to add it? 15:45:34 if anyone has anything else go ahead 15:45:40 vvechkanov: for alarms? 15:45:43 Yes 15:45:52 vvechkanov: it's planned somehow, but nobody took the task over AFAIK 15:46:07 IIRC we were thinking Marconi would provide SNS-like functionality 15:46:17 (for email/SMS/etc notifications) 15:46:31 not sure if that's still on the Marconi roadmap tho' 15:46:36 you can write a Marconi target too eventually 15:46:50 *possibly 15:46:58 perhaps we could do with a ceilo/marconi session at summit? 15:47:05 As I know Marconi don't plan to provide notfication... 15:47:08 k 15:47:17 I am wrong? 15:47:35 dunno, I'd need to check with the Marconi folks 15:47:57 (I've had that on my mind also, so I'll chase it up) 15:48:13 i had a question about notification.info queue... is that strictly for ceilometer? apparently some ppl have been listening to that queue (for whatever reason) and when we enable ceilometer, they start racing and grabbing random msgs. wondering what's the correct way to impl this? 15:48:14 eglynn: yeah good idea 15:48:43 gordc: no, we consume from it by default but you can configure another queue to be used 15:48:57 gordc: just add more topic in nova.conf for example, change the topic in ceilometer.conf 15:49:22 gordc: we have our services configured to publish to two exchanges, one for stacktach/ceilometer and another for our billing dept. 15:49:24 notifications_topics=notification,ceilometer in nova.conf and notificatiosn_topics=ceilometer in ceilometer.conf 15:49:30 something like that 15:50:08 jd__, sandywalsh: cool cool. that's what i thought. wanted to confirm thats how everyone else is doing it before i make stuff up. :) 15:50:43 sandywalsh: by the way I think that your notification patch introduces a new issue 15:50:47 (kinda) 15:51:02 conflicting rabbit configs? 15:51:04 because in such a context, now Ceilometer publishes and consumes only on "notification_topics" 15:51:18 so with something I've described, it's problematic 15:51:31 we should have notification_topics and notification_topics_to_consume 15:52:01 right, because we can't inject into the rpc_notifier ... so we'll need the _to_consume part 15:52:07 k, I'll look into that 15:53:26 cool 15:53:40 btw, I gather all of the -2's for FF will be lifted soon? 15:53:53 dragondm: already done (unless I forgot some) 15:54:06 Yah, there are still some. 15:54:34 dragondm: feel free to send them to me so I can lift :) 15:54:53 The auto-expire hit a few. I've revived mine: https://review.openstack.org/#/c/42713/ 15:55:02 ah indeed 15:55:14 I didn't looked into expired ones 15:55:49 anything else guys? otherwise closing in a minute 15:56:34 #endmeeting