20:00:04 #startmeeting Octavia 20:00:06 Meeting started Wed Apr 6 20:00:04 2016 UTC and is due to finish in 60 minutes. The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:07 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:09 The meeting name has been set to 'octavia' 20:00:12 o 20:00:15 hi 20:00:15 o/ 20:00:21 Hi everyone 20:00:38 This will probably be a short meeting. 20:00:40 o/ 20:00:47 is that a challenge? 20:00:51 #topic Announcements 20:00:54 HA 20:01:14 I would not do such a thing.... 20:01:21 o/ 20:01:23 We will end up talking about endpoints again 20:01:41 Ok, I don't really have any announcements this week. Anyone else? 20:01:41 o/ 20:02:06 #topic Docu-geddon 20:02:11 I've got single-create "functioning" off the "/graphs" endpoint, but testing has been broken by my changes... I'm hunting it down now 20:02:13 #link https://bugs.launchpad.net/octavia/+bugs?field.tag=docs 20:02:19 O/ 20:02:20 Ha ha, my bad. 20:02:27 I thought we were doin progress reports :D 20:02:38 FYI, we are still looking for folks to sign up for documentation bugs. 20:03:16 We had some folks asking about documentation again today, so well worth our time 20:03:22 #topic Brief progress reports 20:03:32 TrevorV Ok, now... grin 20:03:37 I've got single-create "functioning" off the "/graphs" endpoint, but testing has been broken by my changes... I'm hunting it down now 20:03:39 :D 20:03:45 the following patches needs review:#link https://review.openstack.org/#/c/300689/ 20:03:45 #link https://review.openstack.org/#/c/299687/ 20:03:46 #link https://review.openstack.org/#/c/288208/ 20:04:01 #link https://review.openstack.org/#/c/300689/ 20:04:11 minwang2 missed a newline in there :D 20:04:23 ah, thanks for pointing out 20:04:33 I have been working on moving haproxy into a namespace in the amphora. I think I have it functioning now, still have some testing to do and clean up the unit tests I recently broke figuring this stuff out. 20:05:17 Otherwise it looks like a bunch of bug fixes going on. 20:05:20 Good stuff. 20:05:42 Any other progress reports to discuss? 20:05:56 #topic Open Discussion 20:06:02 I have a topic 20:06:06 Ok, other topics 20:06:26 Multiple controller workers. Are we anywhere near ready to have the conversation on "how to do this"? 20:06:37 Sure. 20:06:41 In theory it works 20:06:51 it just needs a new endpoint. 20:06:59 jk, put the damn torches down. 20:07:21 * johnsom considers a vote on worker endpoints 20:07:22 dougwig I think I'll give you a papercut during the summit... just to say I cut you 20:07:27 Do I need a paint brush? 20:08:00 TrevorV Was there a particular concern about running multiple workers? 20:08:10 I think we run multiple workers... 20:08:10 sorry im late 20:08:19 Yeah, we do 20:08:46 But not sure if it works at scale 20:09:04 johnsom not specifically so much as we were talking about scaling issues coming up for Rackspace deployment and stuff, and didn't know if we had this figured out 20:10:29 If it "works" then I guess we at Rackspace will just have to make sure it actually does when we have the chance :D 20:10:34 Yeah, to my knowledge multiple workers functions just fine. We still want to move to job board for HA reasons, but I think multiple workers is functional. 20:10:36 I do have one more topic 20:11:10 Then, on the scale side, there is still the multiple tenant/multiple management LAN work 20:11:34 Which may/may not be an issue in your deployment 20:11:52 TrevorV Ok, what is next? 20:12:55 It may be johnsom 20:13:01 The next topic I have is "billing" 20:13:14 So at Rackspace we talked about billing on a few things, and were discussing how to collect that information 20:13:22 * johnsom thinks "there goes the short meeting" 20:14:02 crc32_znc (Carlos) mentioned the heartbeat mechanism already collects certain stats and sends them along, but we ignore them in favor of the "time" and the "id" 20:14:09 Today we have bytes in/out and connections in the octavia DB 20:14:18 Oh, we do? 20:14:24 Nice! 20:14:50 I'm not convinced there aren't bugs around failover and those stats, but yes, it's being collected 20:14:56 So what about "uptime"? Haproxy *should* be able to give that to us as well, right? Would adding that to the heartbeat and storing it afterwards be a "problem"? 20:14:59 Yeah. Though probably brittle since we take it from haproxy 20:15:13 do we have a defined set of metrics we want, and an api, and if we're talking to ceilometer? if we make a ref amphora only solution, it'll be a pain to shoehorn in other drivers later, if we're standalone. 20:15:49 I'm okay with it being modular, as much of our other driver-related implementations are, dougwig 20:15:54 And dougwig gets to the point.... 20:16:27 I think I set it up modular with the Nixon 20:16:33 Mixin 20:16:54 I thought the mixin got removed... 20:17:14 I'm fairly certain it was removed as well 20:17:18 We can resurrect that... 20:17:31 im pretty sure its still driverized 20:17:51 Right, so the thing I was getting at was the "size" of the heartbeat 20:17:59 So, for Octavia, we have this API defined: http://www.octavia.io/review/master/api/octaviaapi.html#list-listener-statistics 20:18:02 If we append a bit more information to it, will it be problematic? 20:18:35 TrevorV Help me with "uptime" as it is a highly abused word. 20:18:37 sorry a bit late in joining - are we talking about ceilometer support for lbaas v2? 20:19:03 Yeah, wouldn't nova know that? 20:19:05 neelashah Kind of. We are talking about billing topics. 20:19:07 neelashah its just a "metrics/billing" conversation in general 20:19:27 xgerman if you're talking about instance uptime, that's NOT the same as haproxy-service uptime. 20:19:29 in mitaka, ceilometer did add support for metrics for lbaas v2 20:19:54 TrevorV To me, uptime is a metric around availability, typically collected outside the system. 20:20:29 neelashah Right, but some deployers may not use ceilometer, so we'll need to help that be consumed another way 20:20:39 TrevorV: might be good enough for billing 20:20:46 If you are talking about how log a load balancer has been provisioned, that is probably something that would come from the database 20:21:20 johnsom except what if their LB dies for 6 hours, should we bill them? 20:21:27 yes 20:21:29 :) 20:21:40 +1 20:21:42 dammit brandon 20:21:44 Ok, so you are talking about traditional uptime 20:22:01 Sure, mostly because I didn't know another way to have that defined :D 20:22:16 they're still taking up resources and its up to the cloud company to be proactive about refunds or the customer 20:22:18 I *think* haproxy has that information somewhere, right? 20:22:49 TrevorV I don't think so. If it's down it's not going to tell you much 20:23:12 johnsom sure, but does it have a stat that's something like, "I've been active this long" or something? 20:23:13 We could also bill time and a half for LBs with no members attached. 20:23:23 We can calculate certain levels of down-time based on that number being reset and such 20:23:35 Yep, and the thing might be working fine in our world but some network is hosed making it not work for the customer 20:24:05 I don't think so. Here is the list: http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#9.1 20:24:06 xgerman which means we'd still bill them because its out of our control. That's like saying "we won't bill you, mr customer, if you delete your app" 20:24:14 Again, I think you would have problems with failover, etc. 20:25:03 How do you bill vm? 20:26:11 i think traditional uptime is a reasonable goal at first 20:26:17 xgerman I'm not sure, I'm just trying to get a feel for things, you know? 20:26:33 Ooops, left on accident 20:26:45 Never paid for mine:-) 20:26:57 So that was an interesting find, haproxy has a "downtime" value 20:27:27 I wonder if we should talk about billing the difference between instance up-time and the haproxy reported "downtime"... 20:27:29 Hmmm 20:27:34 downtime as in all members were down 20:28:01 That's the customers fault 20:28:06 Right 20:28:22 Oooh I follow then 20:28:42 Then I'm not even sure how to collect that data for uptime 20:28:59 the traditional way 20:29:03 :) 20:29:05 Well, you can always count heartbeats 20:29:17 Pingdom? 20:29:27 :-) 20:29:46 Are heartbeats specific enough to include an instance is errored? 20:29:56 Sorry, we probably aren't being super helpful right now. 20:29:58 heartbeats overwrite each other 20:30:00 Like, process is "running" but its not actually loadbalancing? 20:30:11 so you wouldn't be able to see a history of heartbeats 20:30:14 except through logs maybe 20:30:21 It check on the stats socket 20:30:47 Counting heartbeat would mean work 20:30:47 Yeah, heartbeats check stats and make sure the lister count matches expected. 20:30:57 xgerman so you're saying in getting stats that haproxy is loadbalancing? There will never be a time that haproxy will report stats when its not actually loadbalancing? 20:31:11 if the network in front of it is down 20:31:15 the vip network 20:31:22 or something north of that 20:31:23 +1 20:31:31 blogan is that in our control or no? 20:31:38 yes and no 20:32:02 ? 20:32:05 octavia's? no, company infrastructure? yes 20:32:22 In other words, they should not be billed for it. 20:33:05 i think thats better handled by auditing after an event like that happens 20:33:29 So, in my opinion, we (maybe just Rackspace) will need a mechanism to capture/calculate up-time, and have it reported somehow to the CW to put it in the DB 20:33:32 Right now, for health monitoring, we consider it healthy if we get stats back and the listeners are all present. 20:33:33 https://github.com/openstack/octavia/blob/master/octavia/controller/healthmanager/update_db.py#L66 20:34:29 Right johnsom but we just identified a scenario where that's "inaccurate" 20:34:34 When the VIP network is having issues/ 20:34:35 TrevorV: the simple calculation of uptime by when the lb was created doesn't satisfy that? 20:34:44 blogan that's overbilling 20:34:58 TrevorV: thats subjective 20:35:04 Makes more money... 20:35:10 Yep, and jorge would say we can't do that 20:35:31 Him! 20:35:32 TrevorV: would he? 20:35:35 blogan I can say that billing off "amphora" uptime wouldn't necessarily be a problem 20:35:49 I think it is really a policy thing to define your SLA and how it's measured. 20:35:57 johnsom: agreed 20:36:01 Since we capture bandwidth and (other thing I can't remember) already, we'll still be billing for resources consumed 20:36:03 TrevorV: lets talk about if offline 20:36:05 with jorge 20:36:10 bc i think thats a lofty requirement 20:36:18 kk blogan 20:36:43 Looks like we're tired of the discussion gents, but knowing that we already store those other 2 fields in the DB, we (at rackspace) should be able to move forward when we're ready, thanks guys! 20:37:13 TrevorV: we store it by amp, still need an aggregator 20:37:28 Yea we discussed a good portion of this yesterday 20:37:32 blogan I know, but we can make that happen in the "driver" we write for it 20:37:40 which maybe the metering service handles, i dotn know 20:37:47 ah okay 20:37:48 ptoohill my objective was to identify what parts "weren't" done yet. 20:37:50 Yeah, to a point I think as long as it's provisioned, it's billed, and it's our responsibility to keep it healthy within the SLA. I.e. Act/Stndby, etc 20:38:09 man i already feel out of the loop with internal workings 20:38:20 Yeah, I agree johnsom I was just looking for the easiest way to get that information 20:38:31 Make things easier, etc etc 20:38:36 For the billing peeps I mean 20:38:41 Alright, I'm good. 20:39:05 Ok. Any other topics for this week? 20:39:47 Ok, thanks folks! 20:39:52 :D 20:39:53 #endmeeting