#openstack-meeting-alt log

17:02:04 <vinod1> #startmeeting Designate
17:02:04 <openstack> Meeting started Wed Sep 10 17:02:04 2014 UTC and is due to finish in 60 minutes.  The chair is vinod1. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:02:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:02:08 <openstack> The meeting name has been set to 'designate'
17:02:14 <vinod1> Who's around?
17:02:19 <Kiall> o/
17:02:20 <rjrjr_> o/
17:02:33 <Kiall> mugsie is AFK today
17:02:54 <vinod1> Emmanuel and others are busy with some other issue here
17:02:56 <Kiall> Nobody else here today?
17:03:00 <vinod1> So thin attendance today
17:03:09 <Kiall> Yep.. Oh well :)
17:03:13 <vinod1> #topic Action Items from last week
17:03:44 <vinod1> kiall to do client release today
17:03:53 <Kiall> Done - 1.1.0 is released.. https://pypi.python.org/pypi/python-designateclient/
17:04:12 <vinod1> cool
17:04:17 <vinod1> Kiall to discuss FF exceptions with theirry during 1:1 tomorrow.
17:04:45 <Kiall> Also done, as an incubated rather than integrated project, it's up to the core team to make FFE decisions.
17:04:59 <Kiall> The process for "documenting" that is to mark the BP as being for juno-rc1
17:06:28 <vinod1> #topic Release Status (kiall - recurring)
17:06:41 <Kiall> Okay, so j3 is out the door - woo - https://launchpad.net/designate/+milestone/juno-3
17:06:51 <rjrjr_> congratulations!
17:06:59 <Kiall> and juno-rc1 bugs/bp's are being tracked here https://launchpad.net/designate/+milestone/juno-rc1
17:07:21 <Kiall> bug 1366821 is a pretty big one, hopefully the curent review solves iut
17:07:22 <uvirtbot> Launchpad bug 1366821 in designate "Backends don't implement create/update/delete_recordset" [Critical,In progress] https://launchpad.net/bugs/1366821
17:07:45 <Kiall> Beyond that - I don't think we have much else to discuss on rc1
17:07:58 <Kiall> Other than - rc1 is Sept 25th
17:08:02 <vinod1> How about TSIG?
17:08:04 <Kiall> 2 weeks
17:08:23 <Kiall> vinod1: well, dnspython released the fix, so in theory we can try implement it as a FEE
17:08:25 <Kiall> FFE*
17:08:45 <vinod1> just wanted to check if  we want that as an FFE or move it to kilo?
17:09:01 <Kiall> But - getting the openstack/requirements change in to get the version we need will be harder - since dependancies are frozen too ;)
17:09:17 <vinod1> so move it to kilo then
17:09:33 <Kiall> I'll see about getting the o/r change in, if it does, we'll move from kilo->rc1?
17:09:50 <vinod1> how about transfer zones? are we still targeting it as an FFE?
17:10:31 <Kiall> Yes, mugsie was about 80% through the rebase yesterday, I suspect he'll have it done in the next few days
17:10:45 <vinod1> ok
17:10:49 <vinod1> moving on
17:10:52 <Kiall> #action kiall to attempt an o/r change for dnspython
17:11:21 <vinod1> i will switch the order a bit to utilize Kiall's time here
17:11:28 <vinod1> #topic Server Pools Implementation Order
17:11:42 <vinod1> #link https://wiki.openstack.org/wiki/Designate/SubTeams/Pools#Server_Pools_Implementation_Order
17:11:59 <vinod1> I wrote up an initial implementation order of the work items for the first pass of server pools
17:12:01 <Kiall> I was going to suggest leaving that till next week when mugsie is about, I've not personally put much thought into it
17:12:16 <rjrjr_> +1
17:12:22 <vinod1> Okay
17:12:31 <vinod1> The remaining 2 items too are about server pools
17:12:41 <vinod1> #topic Server Pools - some questions clarifications
17:12:50 <vinod1> currently we have status values of pending, active, deleted - should we have a value for error? How long can a change be in pending? Do we need to track pending_since?
17:13:20 <rjrjr_> my thinking is we don't need error or any other status.
17:13:42 <Kiall> rjrjr_: so, you think status should go entirely from the API?
17:13:55 <rjrjr_> sorry, no new status.
17:14:11 <vinod1> This is the status in the database tables and communicating the status the user
17:14:24 <rjrjr_> pending, active, deleted cover everything IMHO.
17:14:33 <Kiall> So.. I think we should - there are too many ways for things to fail, with an async request, how do we report failure to the user without "error"?
17:15:48 <Kiall> No other thoughts?
17:15:49 <vinod1> I agree
17:16:15 <Kiall> I'd agree with rjrjr_ partly though - I wouldn't want to see 1000's of status
17:16:19 <timsim> Pending, Active, Deleted, Error seem fine to me.
17:16:33 <rjrjr_> what does error report exactly?
17:16:48 <timsim> That something has gone wrong?
17:16:53 <timsim> Backend failure or something like that.
17:16:58 <rjrjr_> one server failed to get updated?
17:17:08 <vinod1> Something that the user has no direct control over
17:17:09 <rjrjr_> the threshold failed to get updated?
17:17:11 <timsim> If that's your threshold after a certain time.
17:17:38 <Kiall> rjrjr_: It reports that "something" exploded after your initial API call responded, that something is anything that we can't recover from automatically..
17:18:03 <Kiall> Usually that means errors we didn't think would happen, so didn't code around
17:18:31 <vinod1> Does the status move from ERROR to ACTIVE?
17:19:28 <rjrjr_> i hate being the odd man out here, but we have the server pool actively retrying things.   my problem is i would like the communication between mini-dns and pool manager to be less, not more.
17:19:29 <Kiall> I think ERROR should only ever show when we have no way to auto-correct, which means it wouldn't automatically go from ERROR -> ACTIVE, but an admin might "fix" whatever the issue was and reset the state.. cinder/nova/etc have similar concepts
17:20:05 <timsim> rjrjr_: If it has to communicate active, what is communicating error adding?
17:20:20 <Kiall> rjrjr_: I actually think this isn't adding any more communication - it's what we do when, for example, an unhandled exception occurs
17:20:43 <rjrjr_> more calls.  i wanted to propose when mini-dns cannot do something (errors) to just not report that to the pool manager.
17:21:07 <rjrjr_> we can look at the timestamps of the last successful polls to determine if things are okay or not.
17:21:32 <rjrjr_> this gets into the whole pool manager design in the spec.
17:21:43 <vinod1> failing silently seems risky to me
17:22:06 <rjrjr_> it's not silent. pool manager is keeping track of the date of successful polls.
17:22:34 <Kiall> rjrjr_: so, I agree that mdns/poolmgr needs to decide how it handles errors - but this status isn't really mDNS or even poolmgr specific :)
17:22:45 <rjrjr_> also, i'm hoping you are showing problems in the mini-dns logs.
17:23:30 <rjrjr_> kiall, i understand.
17:24:15 <Kiall> For example, creating a new domain once multiple pools exist, the domain goes to PENDING and we return the user.. In the background, the future scheduler starts trying to find a suitable pool - if no suitable pool is found - we should have a way to return "That Failed" to the user..  similar to what happens when you boot a VM and have no capacity remaining ;)
17:24:53 <rjrjr_> okay.  but we do not have a scheduler right now.
17:25:08 <Kiall> Sure - There's lots of possible things that might be a trigger for moving a resource to ERROR, mdns may or may not be that thing
17:26:13 <Kiall> if mDNS doesn't feed back out to the status field on error, that might be OK, but I think we have plenty of other places for stuff to explode and need a reporting mechanism the moment we switch to async
17:26:50 <rjrjr_> here's the problem, unless we are tracking each and every update with a request ID of some sort, it will be hard to report an error to the user.
17:27:17 <rjrjr_> and if we are tracking each update with a request ID, our pool manager database grows exponentially.
17:27:49 <rjrjr_> i have a design that keeps the pool manager database (table) as small as possible for performance reasons.
17:27:51 <Kiall> Yea, I see the concern :) It easy-ish to reason about what an ERROR state is when something like domain creation fails, but harder to come up with good examples for things like RRSet modifications etc
17:28:20 <rjrjr_> i have no problem with Error for domain create/failure.
17:29:01 <Kiall> (I've gotta run in 5mins)
17:29:36 <Kiall> So - Anyway, I think we should come back to this one next week with everyone around - and some concrete examples of what might trigger an ERROR status etc
17:30:01 <rjrjr_> i want to get rid of the status when we are polling for serial number and just have successful serial numbers reported back to pool manager.
17:30:42 <rjrjr_> the status for updates can't be used anyway.
17:30:51 <rjrjr_> agree about waiting until next week.
17:30:58 <Kiall> :)
17:31:06 <vinod1> okay will come back to it next week
17:31:12 <Kiall> Moving very quickly on :D When a recordset is deleted - do we show it to the user in the api? We track when it moves from pending to deleted. Do we give this information back to the user?
17:31:15 <rjrjr_> maybe we can brainstorm on error reasons and have a point to start the discussion next week.
17:31:44 <rjrjr_> kiall, only if the user queries designate again.
17:31:47 <Kiall> No, I don't believe we should ever show normal users deleted resources... But, an admin might want to see them
17:32:00 <rjrjr_> i'm thinking very similar to what nova does.
17:32:07 <rjrjr_> when a VM is created/deleted.
17:32:16 <Kiall> rjrjr_: yep - agreed
17:32:30 <vinod1> My point here is when a domain is deleted, do we track whether it was removed from the nameservers?
17:32:39 <rjrjr_> yes.
17:32:50 <Kiall> We certainly should anyway :)
17:33:18 <rjrjr_> how about i write up something and the team can add to it.
17:33:26 <Kiall> rjrjr_: sure, sounds good
17:33:28 <rjrjr_> for where errors can occur.
17:33:30 <vinod1> If we track the information, why not show it the user?
17:33:45 <rjrjr_> vinod, we will, if they query again.
17:34:35 <vinod1> #topic Open Discussion
17:34:37 <rjrjr_> so, the user deletes a record.  async call.  they are done.  (think 'nova boot').  then they run a query to see if the record has been deleted yet or not.
17:35:01 <Kiall> Okay - really gotta go :) Sorry for needing to bail early!
17:35:08 <rjrjr_> sure.  l8r
17:35:36 <vinod1> #action rjrjr_: Write up on handling error status in server pools
17:36:12 <vinod1> okay if there is nothing else then we can end the meeting
17:36:37 <vinod1> #endmeeting