#openstack-meeting-alt log

20:00:15 <hub_cap> #startmeeting trove
20:00:16 <openstack> Meeting started Wed Jul 17 20:00:15 2013 UTC.  The chair is hub_cap. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
20:00:19 <openstack> The meeting name has been set to 'trove'
20:00:21 <vipul> o/
20:00:22 <djohnstone> o/
20:00:23 <hub_cap> #link https://wiki.openstack.org/wiki/Meetings/TroveMeeting
20:00:28 <datsun180b> o7
20:00:35 <hub_cap> &o
20:00:42 <juice> o/
20:00:58 <kevinconway> \o/
20:01:12 <hub_cap> crap i put a bad link on the wiki :p
20:01:22 <hub_cap> #link http://eavesdrop.openstack.org/meetings/trove/2013/trove.2013-07-10-20.00.html
20:01:30 <hub_cap> #topic action items
20:01:49 <hub_cap> not many AI's. SlickNik is not around?
20:01:58 <hub_cap> vipul: get a chance to do any more wikifying?
20:02:11 <vipul> hub_cap: No, didnt spend any time on this one
20:02:17 <vipul> likely an ongoing thing
20:02:28 <SlickNik> o/
20:02:31 <grapex> o/
20:02:31 <hub_cap> kk, lets action item it again
20:02:32 <imsplitbit> o/
20:02:40 <vipul> #action Vipul to continue to update reddwarf -> trove
20:02:40 <hub_cap> SlickNik: hey, yer up. initial stab @ dev docs
20:02:40 <pdmars> o/
20:02:42 <hub_cap> i saw something
20:02:47 <hub_cap> can u link the review?
20:02:57 <SlickNik> yeah, one sec.
20:03:20 <SlickNik> #link https://review.openstack.org/#/c/37379/
20:03:31 <hub_cap> SWEET
20:03:40 <hub_cap> good work.
20:03:46 <hub_cap> anything else to add wrt the action items?
20:03:57 <SlickNik> I've taken the initial info from the wiki and the trove-integration README.
20:04:14 <grapex> SlickNik: Nice!
20:04:36 <SlickNik> Once that's approved, I can turn on the CI-doc job that builds it.
20:04:40 <hub_cap> :)
20:04:40 <vipul> thanks SlickNik
20:04:49 <hub_cap> lets get that done then!!! ;)
20:04:55 <SlickNik> And then I need to contact annegentle to add the link to the openstack site.
20:05:13 <hub_cap> okey moving on then?
20:05:16 <SlickNik> yup.
20:05:23 <hub_cap> #topic h2 milestone released
20:05:27 <hub_cap> #link https://github.com/openstack/trove/tree/milestone-proposed
20:05:28 <hub_cap> WOO
20:05:33 <datsun180b> WOO
20:05:34 <hub_cap> they will cut it i think, thursday?
20:05:39 <SlickNik> w00t!
20:05:44 <konetzed> \o/
20:05:51 <hub_cap> #lnk http://tarballs.openstack.org/trove/
20:05:53 <hub_cap> doh
20:05:55 <hub_cap> #link http://tarballs.openstack.org/trove/
20:05:58 <hub_cap> there we are
20:06:04 <vipul> woah look at that
20:06:18 <datsun180b> Did you see all those issues marked as Released by Thierry C?
20:06:25 <hub_cap> yes i did
20:06:26 <SlickNik> yup :)
20:06:28 <hub_cap> cuz i get ALL of them ;)
20:06:38 <hub_cap> we can move critical bugs back to h2 if we need to
20:06:41 <hub_cap> i suspect we wont
20:06:49 <hub_cap> since no one is really gonna deploy it
20:07:01 <hub_cap> its more just to get us understanding how things work around here
20:07:06 <SlickNik> I don't know of any critical bugs, atm.
20:07:10 <hub_cap> Aight enough w/ the glass clinking, time to move on
20:07:20 <hub_cap> feel free to view the links
20:07:26 <hub_cap> #link https://wiki.openstack.org/wiki/GerritJenkinsGithub#Authoring_Changes_for_milestone-proposed
20:07:31 <hub_cap> #link https://wiki.openstack.org/wiki/PTLguide#Backporting_fixes_to_milestone-proposed_.28Wednesday.2FThursday.29
20:07:35 <hub_cap> if u want to know more about the process
20:07:46 <hub_cap> #topic Restart mysql
20:07:55 <hub_cap> doh forgot the word test
20:07:57 <hub_cap> #link https://github.com/openstack/trove/blob/master/trove/tests/api/instances_actions.py#L256-262
20:08:04 <hub_cap> lets spend a bit of time discussing the validity of this
20:08:10 <hub_cap> and then spend the rest of the time on replication
20:08:30 <hub_cap> SlickNik: all u
20:08:48 <SlickNik> So, I agree with grapex that we need a test to validate what the guest agent behavior is when mysql is down.
20:09:14 <SlickNik> But I think that that's exactly what the mysql stop tests are doing.
20:09:37 <hub_cap> link?
20:09:41 <vipul> #link https://github.com/openstack/trove/blob/master/trove/tests/api/instances_actions.py#L320-L324
20:10:11 <grapex> SlickNik: The only major difference is that explicitly tells the guest to stop MySQL, versus letting the status thread do its thing
20:10:40 <hub_cap> as in, we are testing the periodic task does its job?
20:10:44 <vipul> right but isn't the status thread still the thing that's updating status
20:10:51 <vipul> it's just a different way of stopping mysql
20:11:02 <vipul> one is explictly other is by messing up logfiles
20:11:20 <grapex> vipul: True, but the stop rpc call also updates the DB when its finished
20:11:32 <datsun180b> and that ib_logfile behavior is very deliberately for mysql, right?
20:12:05 <grapex> SlickNik: Can you give another summary of the issue the test itself is having?
20:12:23 <grapex> Isn't it that MySQL actually can't start up again when the test tries to restart it?
20:12:36 <SlickNik> grapex: when we corrupt the logfiles, mysql doesn't come up.
20:12:48 <SlickNik> the upstart scripts keep trying to respawn mysql since it can't come up.
20:13:01 <grapex> SlickNik: Does the reference guest not delete those iblogfiles?
20:13:17 <vipul> i think the tests do
20:13:30 <datsun180b> that sounds right
20:13:33 <hub_cap> correct
20:13:45 <SlickNik> grapex: not delete; but mess up so that they are zeroed out.
20:13:56 <hub_cap> so teh difference is
20:14:06 <hub_cap> 1 test stops mysql, the other kills it behind the scenes
20:14:07 <SlickNik> Now since upstart is trying to bring mysql up, it has a lock on the logfiles.
20:14:16 <hub_cap> the latter test waits for the periodic task to signal its broken
20:14:23 <hub_cap> the former test updates the db as part of the stop
20:14:24 <hub_cap> ya?
20:14:24 <grapex> So Sneaky Pete actually wipes the ib logfiles. Maybe that's something the reference guest should do?
20:14:41 <grapex> It does it as part of the restart command
20:14:57 <hub_cap> lets first try to figure out if the tests are truly different
20:15:10 <hub_cap> and then once we agree it needs to stay (if it does) we can think about solutions
20:15:11 <grapex> Well that one also makes sure the iblogfiles are wiped
20:15:30 <vipul> grapex: won't that mean mysql can start again?
20:15:40 <grapex> vipul: Yes.
20:15:43 <SlickNik> So there's also the point that this test takes about ~4-5 mins.
20:15:55 <vipul> then this test will fail, because the test expects that it cannot start
20:16:50 <SlickNik> So one question is that do we think that this 1 scenario (which isn't all that different from the stop tests) warrants an extra addition of ~4-5 minutes on every test run?
20:17:06 <hub_cap> if it tests something different i think its warranted
20:17:09 <SlickNik> (in parens) = my opinion
20:17:26 <hub_cap> is exactly the same != isint all that different
20:17:29 <hub_cap> are they testing different things?
20:17:35 <grapex> I'm sorry, I misspoke about wiping the iblogfiles - that happens on resizes and other cases, not for restart
20:17:36 <hub_cap> thats what i want us to agree upon
20:17:54 <SlickNik> well, in either case we are testing for a broken connection.
20:18:00 <hub_cap> are we?
20:18:06 <SlickNik> And mysql not running is causing the broken connection.
20:18:07 <grapex> SlickNik: I disagree
20:18:14 <hub_cap> i thought the screw_up_mysql tests that the periodic task updates the db properly
20:18:24 <hub_cap> and the explicit stop tests that the stop updates the db synchronously
20:18:25 <grapex> I think also whether a test has value is a different question from whether we want to run it every single time as part of CI if it's impeding people
20:18:28 <hub_cap> is that not the case?
20:18:41 <SlickNik> grapex: what code path does the restart tests hit that the resize tests don't also hit?
20:18:53 <SlickNik> do*
20:19:03 <grapex> restart truly makes sure the status thread sees MySQL die and updates appropriately
20:19:16 <vipul> so the stop_db code seems to set that state = None
20:19:19 <vipul> self.instance.update_db(task_status=inst_models.InstanceTasks.NONE)
20:19:22 <hub_cap> correct
20:19:24 <grapex> stop is actually stopping it, so it updates the database as part of that RPC code path, not the thread
20:19:27 <vipul> Which means the status thread will set it to shutdown
20:19:55 <hub_cap> sure but it does taht based on different circonstances vipul
20:20:10 <hub_cap> 1) it checks the task is NONE vs 2) it cant talk to mysql, right?
20:20:34 <vipul> it checks the status is shutdown and can't talk to mysql
20:20:52 <hub_cap> ok
20:20:59 <hub_cap> does the other tests update the task to none?
20:21:03 <hub_cap> *test
20:21:13 <vipul> restart also sets it to None
20:21:15 <grapex> Also keep in mind, the Sneaky Pete tests actually sets the status to stop as part of that RPC call. If you're saying the reference guest doesn't, I'm not sure why it wouldn't
20:21:44 <hub_cap> ok weve got 4 more min on this and im gonna call it for now
20:21:46 <hub_cap> as undecided
20:21:51 <hub_cap> id like to discuss replication
20:22:00 <imsplitbit> +1 on that
20:22:07 <hub_cap> lol imsplitbit
20:22:12 <grapex> Well, I want to suggest something
20:22:18 <hub_cap> sure
20:22:21 <hub_cap> youve got a few min
20:22:23 <KennethWilke> hub_cap: i will gladly accept a gist link of the chat
20:22:25 <hub_cap> go!
20:22:25 <grapex> If this test is really being a bother, lets just take it out of the "blackbox" group but keep it in the code.
20:22:46 <hub_cap> KennethWilke: its logged, you can see it on http://eavesdrop.openstack.org/meetings/trove/
20:22:48 <grapex> We run it at Rackspace all the time and I find it useful. It could still be run nightly or something.
20:22:52 <KennethWilke> hub_cap: ty
20:23:08 <SlickNik> grapex: I'd totally be fine with that.
20:23:09 <grapex> (nightly for the completely public Ubuntu / KVM / Reference guest code)
20:23:27 <hub_cap> ya but id argue we shouldnt remove it till we do the nightly different tests
20:23:32 <grapex> SlickNik: Maybe the solution is to create a second group called "nightly" which just has more test groups added to it
20:23:35 <grapex> hub_cap: Seconded.
20:23:36 <hub_cap> if it in fact does test somethign different
20:23:45 <juice> +2
20:24:16 <grapex> hub_cap: +1
20:24:17 <cp16net> +1
20:24:18 <datsun180b> i vote keep it, even if it means moving it
20:24:26 <hub_cap> which im still not sure it _does_ test something different at this point
20:24:32 <hub_cap> but lets move on
20:24:39 <hub_cap> i think we have a reasonable consensus to keep it but move it
20:24:54 <hub_cap> i don't want your goddamn lettuce
20:25:08 <hub_cap> moving on?
20:25:09 <grapex> hub_cap: Are you talking to a rabbit?
20:25:09 <vipul> need some more research to verify that it is indeed different
20:25:12 <vipul> can we just action it?
20:25:13 <hub_cap> yes vipul
20:25:16 <hub_cap> go head
20:25:40 <vipul> #action SlickNik, vipul to compare Stop test and Unsuccessful Restart tests to identify differences
20:25:44 <hub_cap> grapex: no. google it
20:25:53 <hub_cap> ok. repl time
20:25:59 <hub_cap> #replication :o
20:26:01 <SlickNik> hub_cap / grapex: I'll move it for now so that we don't keep hitting it on rdjenkins. I'll also look to see if we can fix the test so we don't run into the upstart issue (also the research that vipul actioned).
20:26:02 <hub_cap> lol
20:26:12 <hub_cap> #topic replication :o
20:26:21 <imsplitbit> let me relink
20:26:21 <hub_cap> +1 SlickNik cuz we will have to deal w/ fedora soon too ;)
20:26:25 <hub_cap> plz do
20:26:32 <imsplitbit> #link https://wiki.openstack.org/wiki/Trove-Replication-And-Clustering-API
20:26:41 <imsplitbit> #link https://wiki.openstack.org/wiki/Trove-Replication-And-Clustering-API-Using-Instances
20:26:49 <imsplitbit> hub_cap: go!
20:26:56 <SlickNik> thanks guys, on to replication!
20:27:08 <hub_cap> #all i wanted was a cheeseburger
20:27:15 <hub_cap> ok SO
20:27:22 <hub_cap> weve gone back and forth on this topic for a while now
20:27:29 <SlickNik> #all I got was a lousy T-shirt?
20:27:35 <hub_cap> lol
20:27:47 <hub_cap> 2 schools, /instances has ALL instances, some error conditions on things like resize
20:28:06 <hub_cap> or /instances and /clusters, and things move from /instances to /clusters when they get promoted
20:28:16 <hub_cap> and cluster nodes are never a part of /instances
20:28:18 <vipul> no demorris :) we can decide w/o him
20:28:23 <hub_cap> HA nice
20:28:29 <imsplitbit> he should be here
20:28:31 <demorris> vipul: u wish
20:28:34 <imsplitbit> just to stir the pot
20:28:36 <imsplitbit> :)
20:28:36 <vipul> boo
20:28:38 <hub_cap> NICE
20:28:40 <SlickNik> lol, he's lurking.
20:28:45 <demorris> always
20:29:07 <hub_cap> ok i was of the opinion that when we promote to /clusters we move the instance there
20:29:12 <hub_cap> so u can do things u shouldnt do on it
20:29:17 <hub_cap> as in, if its a slave, u shouldnt add a user
20:29:23 <hub_cap> or u shodlnt delete a master
20:29:42 <hub_cap> but after thinking about it for a while, there arent a LOT of failure cases for modifying /instances
20:29:50 <hub_cap> the only one i can think of is deleting a master
20:30:18 <hub_cap> u shoudl be able to add a RO user to a slave, u shoudl be able to resize a slave to something that might not be ok for the cluster
20:30:26 <hub_cap> the permutations for what u shouldnt be able to do are NOT small
20:30:35 <hub_cap> and are different for differetn cases of a cluster
20:30:41 <hub_cap> and different types of clusters
20:30:55 <hub_cap> hell they are probably close to infinite given differente circonstances
20:31:13 <hub_cap> so id rather keep things in /instances, and just limit very few cases for modifying an instance in a cluster
20:31:33 <hub_cap> if we find something that should _never_ be done, then so be it, we add it as a failure case in /instances
20:31:42 <vipul> so.. would /cluster has the same set of operations that /instances has (create user, add db, etc)
20:31:47 <hub_cap> no
20:31:55 <hub_cap> it would be helper for doing things to an entire cluster
20:31:58 <hub_cap> and thats it
20:32:11 <imsplitbit> create/resize/delete
20:32:12 <hub_cap> add db/create user
20:32:18 <hub_cap> we cant really define how a user will use a slave
20:32:29 <vipul> but they may not always be slaves right
20:32:31 <hub_cap> i had a extra db on slaves on some of my setups w/ urchin a while ago
20:32:38 <hub_cap> and different users
20:32:56 <vipul> you may have a galera cluster.. wehre the users / schemas will all be replicated across
20:33:00 <vipul> no matter which one you write to
20:33:02 <hub_cap> yes
20:33:09 <hub_cap> so given that case it doesnt matter where u write it
20:33:15 <hub_cap> so then we cant restrict it
20:33:20 <vipul> so why not write it to /cluster.. why do they have to pick one
20:33:22 <imsplitbit> or shouldn't
20:33:23 <hub_cap> there is no "master master" there
20:33:34 <hub_cap> because i want to add a RO user to slave 1
20:33:36 <hub_cap> how do i do that
20:33:40 <imsplitbit> vipul: I think there is a good case to add some helper things to /cluster
20:33:55 <imsplitbit> but it isn't needed to implement a cluster and support it
20:33:58 <SlickNik> So what's the ultimate reason for not doing this on the cluster but doing it on the individual instances?
20:34:08 <SlickNik> Duplication of code?
20:34:14 <hub_cap> duplication of schema
20:34:20 <hub_cap> and complication ot end user
20:34:35 <hub_cap> 1/2 my instances in one /path and 1/2 in /another seems very unintuitive
20:34:43 <imsplitbit> agreed
20:34:48 <hub_cap> i have a bunch of /instances, period
20:34:54 <hub_cap> at the end of the day that what they are anywya
20:35:09 <hub_cap> and vipul im not tryign to define what we can and cant do on /clusters
20:35:16 <hub_cap> im tryin to get consensus on where /instances live
20:35:19 <vipul> It seems like as we do auto-failover, etc.. we'd want to abstract the actual 'type' of instance away from the user.. so the user only sees a db as a single endpoint
20:35:47 <vipul> in a cluster.. you could see a single endpoint that's load balalcned also
20:35:48 <imsplitbit> vipul: if we do that then you have to separate replication from clustering
20:35:54 <imsplitbit> because they aren't the same
20:36:00 <hub_cap> :o
20:36:03 <imsplitbit> yet they share alot of functionality
20:36:29 <vipul> but is it that different?  if we promote a slave to a master on behalf of the user.. and spin up a new slave for them
20:36:35 <hub_cap> we will still ahve list /clusters
20:36:40 <hub_cap> and u can show a single endpoint
20:36:47 <vipul> from the user's perpective it doesn't matter if it's a multi-master or single master/slave
20:36:57 <hub_cap> fwiw tho all clsutering apis dont use single endpoint
20:37:08 <vipul> agreed, we can't yet
20:37:09 <hub_cap> i believe tungsten uses its own internal code to determine where to write to
20:37:14 <grapex> I've got a question as the infamous No-NoSQL guy.
20:37:15 <cp16net> imsplitbit: as i am catching up on this feature those links of the API and API with instances are the 2 proposed plans we are debating?
20:37:16 <hub_cap> in its connector api
20:37:38 <imsplitbit> vipul: but you're assuming use, what if I have a db on one instance and I want to keep a spare copy of it warm on another host but also want to use that host as a db server for a complete different dataset?
20:37:40 <hub_cap> but again, if you list /clusters we can provide a single endpoint
20:37:47 <imsplitbit> cp16net:  yes
20:37:54 <imsplitbit> hub_cap: exactly
20:37:56 <hub_cap> but if you _want_ you can enact on a slave / "other master" in /instances
20:38:06 <hub_cap> im saying dont remove them from /instances
20:38:11 <imsplitbit> if the cluster type supports a single endpoint then /clusters should return that information
20:38:12 <hub_cap> we can still totally do what vipul wants in /clusters
20:38:18 <hub_cap> you are essentially paying for every /instance
20:38:21 <hub_cap> so we shoudl show them
20:38:28 <hub_cap> even if u ahve auto failover
20:38:33 <hub_cap> u buy 2 or 3 or X instances
20:38:39 <hub_cap> and use one ip
20:38:52 <vipul> yea i think the instnace info should be visible.. but at some point in the future.. we may have a single dns entry returned or something
20:38:55 <demorris> i would separate out billing from it though
20:38:56 <hub_cap> if i was paying for 9 instances in a auto failover cluster, id like to see them all in /instances
20:39:12 <imsplitbit> vipul: and that will be returned with the cluster ref
20:39:15 <hub_cap> demorris: there is no billing in it, just providing a point from a customer point of view
20:39:18 <imsplitbit> if applicable
20:39:24 <hub_cap> vipul: we can do that, now even if applic... grr dsal
20:39:31 <demorris> hub_cap: k
20:39:33 <hub_cap> just say what i was gonna say why dont ya
20:39:43 <hub_cap> i got a can of these baked beans too
20:39:49 <konetzed> vipul: why couldnt you create a single dns entery returned for the cluster but still have dns for each instance like it is now?
20:40:01 <hub_cap> id want that
20:40:09 <hub_cap> cuz if i had to connect to instance X to clean it up manually
20:40:12 <hub_cap> id want to be able to
20:40:19 <imsplitbit> konetzed: I would think most people would
20:40:21 <vipul> konetzed: I guess you could.. but then the customer would end up breaking if they happened ot use one of the instance entries
20:40:26 <hub_cap> like auto-failover is ont working, let me get on node X to prmote it
20:40:38 <konetzed> vipul: you can only protect stupid so much
20:40:43 <hub_cap> HAH
20:40:43 <konetzed> :D
20:40:47 <cp16net> hah
20:40:51 <hub_cap> ya none yall proteced fro me
20:40:54 <vipul> this is really a question of how much do we hide from the user, so even if they are stupid they can use it
20:40:59 <hub_cap> why u think they moved me to cali
20:41:13 <hub_cap> sure vipul and i think we could concede on some of that
20:41:20 <hub_cap> tahts not set in stone
20:41:27 <konetzed> +1
20:41:29 <hub_cap> we could even rev teh api a bit when we have > 1 cluster
20:41:30 <hub_cap> SHIT
20:41:32 <hub_cap> im out of power
20:41:32 <SlickNik> Okay, so I guess it depends on what we're shooting for here.
20:41:40 <imsplitbit> dude
20:41:44 <imsplitbit> hub_cap: FAIL
20:41:44 <hub_cap> sweet found a plug
20:41:45 <konetzed> vipul: i think you will find enough arguments for each way
20:41:53 <vipul> agreed
20:42:05 <imsplitbit> well there's 2 types of users right?  power users and button pushers
20:42:13 <imsplitbit> you need to find enough to facilitate both
20:42:16 <hub_cap> yes
20:42:24 <hub_cap> or provide a RBAC solution
20:42:31 <hub_cap> that allows the installer to decide
20:42:39 <SlickNik> If we're looking for a managed DB solution here that exposes a simple clustering API to the user, then I think that is probably better served by having a single endpoint for it.
20:43:04 <hub_cap> i think we are looking to provide a service that is extensible enough to do that
20:43:09 <hub_cap> _or_ allow the user access to all
20:43:10 <hub_cap> frankly
20:43:20 <hub_cap> we WILL NEVER be able to provide a fully turnkey solution
20:43:29 <hub_cap> otherwise someone else woudlve
20:43:32 <hub_cap> mysql is a tricky beast
20:43:34 <imsplitbit> SlickNik: no one is arguing against providing a single endpoint for users who want one
20:43:52 <hub_cap> we will always need to provide a way for a user or operator to get to any instance
20:43:55 <vipul> one thing to keep in mind is the more that we hide, the less the user can faak us up.. like break our ability to auto-failover
20:43:55 <SlickNik> But if we're talking about letting users do things like have an instance as part of a cluster, as well as able to connect to the db directly, there's no way of getting away from a complex clustering API with actions spread across /instances and /clusters
20:44:07 <hub_cap> actions yes SlickNik
20:44:13 <hub_cap> but entities, no
20:44:18 <hub_cap> thats the first line of agreement
20:44:23 <hub_cap> as long as we are all on the same page tehre
20:44:30 <hub_cap> it makes the api closer to concrete
20:44:43 <hub_cap> im sure we can, eventually, hide instances if we want to
20:44:50 <hub_cap> shown_to_user=False
20:44:53 <hub_cap> easy as pie
20:44:59 <vipul> or at least not allow them to operate on them
20:45:02 <hub_cap> lets solve the easy solution first
20:45:07 <hub_cap> sure vipul
20:45:09 <demorris> I always go back to some of this being up to the provider / operator of Trove and separating that out from what the API supports
20:45:10 <hub_cap> managed vms
20:45:17 <konetzed> i was just going to say sounds like were going down a rabbit hole
20:45:17 <hub_cap> we need that anyway for nova
20:45:29 <demorris> why can't each cluster type have a policy that dictates what can and cannot be done to the cluster or instances themselves
20:45:34 <hub_cap> cuz they can just muck w/ them in nova if you are using their user to prov instances ;)
20:45:39 <hub_cap> yes demorris RBAC
20:45:40 <vipul> demorris: +1
20:45:47 <demorris> if my policy says, individual operations are not support on /instacnes, then you don't allow it
20:45:47 <hub_cap> i said that like ~5 min ago
20:45:58 <vipul> it really is a deployment type of decision it seems
20:46:00 <esp> SlickNik: having a single endpoint might restrict users from building a system that reads from all nodes and only writes to one.
20:46:01 <hub_cap> lets just solve the easy solution first tho
20:46:05 <hub_cap> we are getting out of hand
20:46:10 <demorris> hub_cap: you know I can't follow every message in here…brain won't allow it :)
20:46:10 <hub_cap> we need to solve master/slave
20:46:15 <hub_cap> before we get to magical clsutering
20:46:22 <hub_cap> demorris: transplant ;)
20:46:36 <vipul> hub_cap: is master/slave /cluster then?
20:46:39 <hub_cap> we understand the set of actions in /clusters can grow
20:46:42 <hub_cap> thats fine
20:46:44 <hub_cap> yes
20:46:50 <vipul> ok
20:46:53 <hub_cap> but both isntances are avail via /instances
20:47:02 <imsplitbit> I don't like the use of the word clusters for replication because it implies too much
20:47:03 <hub_cap> and u can resize the slave down via /instances/id/resize
20:47:08 <imsplitbit> but we can't think of a better term for it
20:47:16 * hub_cap shreds imsplitbit with a suspicious knife
20:47:21 * hub_cap boxes imsplitbit with an authentic cup
20:47:21 * hub_cap slaps imsplitbit around with a tiny and bloodstained penguin
20:47:23 * hub_cap belts imsplitbit with a medium sized donkey
20:47:26 * hub_cap tortures imsplitbit with a real shelf
20:47:32 <imsplitbit> :)
20:47:40 <imsplitbit> I won't give up that fight
20:48:01 <imsplitbit> but I acknowledge that it doesn't need to be fought right now
20:48:12 <vipul> even though cluster is overloaded, it does fit even if it's master/slave
20:48:14 <vipul> imo
20:48:18 <hub_cap> does what i say make sense vipul?
20:48:22 <hub_cap> create master slave via /cluster
20:48:31 <hub_cap> resize both nodes cuz youre on oprah, /cluster/id/resize
20:48:33 <vipul> yep, makese sense
20:48:42 <hub_cap> resize indiv node cuz youre cheap /instance/id/resize
20:49:15 <hub_cap> create db on slave cuz u need a local store for some operation on an application /instance/id/db
20:50:01 <SlickNik> what about create db/user on master? does that go through /instance/id or /cluster/id?
20:50:02 <hub_cap> if u want to create it on all of the, create it on the master ;)
20:50:18 <hub_cap> u _know_ u have a master, why not let the user just do that
20:50:27 <hub_cap> this only applies for master/slave
20:50:44 <imsplitbit> hub_cap: I think that is the least prescriptive approach
20:50:47 <hub_cap> for what its worth
20:50:58 <vipul> right, but we should allow it to be created on the /cluster as well
20:51:00 <hub_cap> /clusters/id/resize is NOT going to be easy
20:51:07 <hub_cap> i have 9 instances
20:51:09 <hub_cap> 3 failed
20:51:11 <hub_cap> 1 is now broken
20:51:17 <hub_cap> the master just went down
20:51:17 <SlickNik> So is there a difference between create db on master vs create db on cluster?
20:51:19 <konetzed> fix it so it never fails
20:51:19 <hub_cap> what do i do
20:51:32 <hub_cap> konetzed: youre out yo mind
20:51:41 <SlickNik> i.e. if I do /instance/id/db CREATE, it is a local instance that will not get replicated?
20:51:45 <konetzed> hub_cap: the hp ppl didnt know that already
20:51:47 <SlickNik> on the master
20:51:56 <vipul> hub_cap: but that same scenario would exist if you did a single instance resize... where that one failed
20:52:02 <vipul> now the user is stuck..
20:52:06 <vipul> cuz they have to fix it
20:52:14 <vipul> where as in /cluster/resize we'd fix it
20:52:18 <hub_cap> right but thats up to you to control vipul
20:52:27 <hub_cap> think about the permutations there vipul
20:52:28 <konetzed> SlickNik: i think user adds on the master would be replicated
20:52:34 <hub_cap> lets at least defer it
20:52:41 <hub_cap> till we see some real world scenarios
20:52:51 <hub_cap> id prever "acting" on clusters to come later
20:52:56 <SlickNik> konetzed: what about db adds?
20:52:56 <hub_cap> because its /hard/
20:53:16 <konetzed> imsplitbit: arnt all crud operations done on the master sent to slaves?
20:53:20 <esp> resizing a cluster sounds like it might easier to migrate the data to a new cluster..
20:53:41 <hub_cap> :P esp
20:53:41 <imsplitbit> konetzed: yes
20:53:43 <esp> rather than trying to resize each individual node if that's what we are talking about :)
20:53:49 <vipul> esp: that could be one way to do it..
20:53:56 <hub_cap> create db will go to a slave if issued on a master
20:54:00 <imsplitbit> esp: maybe so but if the dataset is 500GB that may not be true
20:54:05 <SlickNik> imsplitbit: you can choose to replicate only certain dbs if you so desire
20:54:05 <konetzed> imsplitbit: so to answer SlickNik's question user and db adds all get replicated
20:54:13 <esp> if you asked me to individually resize a 9 node cluster I would scream at you.
20:54:33 <hub_cap> esp: even if 90% of the time it failed for you if u did /cluster/id/resize
20:54:39 <imsplitbit> esp: agreed which is why we would want to support doing a cluster resize
20:54:42 <hub_cap> taht means you would have to issue it 9 times anwyay
20:54:47 <hub_cap> and if one failed to upgrade
20:54:50 <imsplitbit> but hub_cap's point is it's not gonna be easy
20:54:51 <hub_cap> then u gotta downgrade the others
20:54:52 <esp> imsplitbit: I gotcha, doesn't cover all cases.
20:54:52 <hub_cap> double downtime
20:54:57 <hub_cap> right
20:54:59 <SlickNik> imsplitbit: so why should we allow extraneous dbs (outside the cluster) to be created on slaves but not on master?
20:55:03 <hub_cap> lets defer "Actions" to /clusters
20:55:06 <hub_cap> to get _something_ done
20:55:13 <hub_cap> to summarize
20:55:15 <hub_cap> we have 5 min
20:55:22 <hub_cap> instances are all in /instances
20:55:23 <imsplitbit> SlickNik: because it's a mistake to assume what a user will want to do
20:55:27 <konetzed> i think we need to get past resizes failing, because that has nothing to do with clusters
20:55:27 <hub_cap> u can enact on them indiv
20:55:30 <vipul> SlickNik: good point.. is this a valid use case even?  i'm no DBA.. but why would you do that
20:55:37 <hub_cap> ok maybe no summary............
20:55:48 * hub_cap waits for the fire to calm down b4 going on
20:55:51 <vipul> do DBAs create dbs on slaves...
20:55:56 <hub_cap> why not vipul
20:56:02 <imsplitbit> vipul: I have configured db setups for very large corporations in our intensive department and I can say it happens often
20:56:07 <hub_cap> yes
20:56:08 <vipul> because at any time, you'd promote that
20:56:09 <hub_cap> i have done it
20:56:16 <hub_cap> not necessarily vipul
20:56:19 <vipul> and you'd need to do it again on the new slave
20:56:25 <hub_cap> read slaves are not 100% promotion material
20:56:29 <hub_cap> theya re sometimes to _juist_ read
20:56:45 <demorris> you may just have a slave to run backups on
20:56:45 <hub_cap> we cant guaranteee everyone will use it the same way
20:56:48 <vipul> yea I get that.. but they are reading master data
20:56:52 <hub_cap> hence the need to _not_ be perscriptive
20:56:59 <hub_cap> ya and could be 10 minutes behind vipul
20:57:09 <hub_cap> ok lets chill it out
20:57:12 <hub_cap> let me summarize
20:57:12 <vipul> demorris: then the additional dbs you created are also backed up..
20:57:14 <hub_cap> we have 3 min
20:57:19 <vipul> lol hub_cap
20:57:24 <hub_cap> or ill just decide w/o anyone elses input
20:57:29 <hub_cap> ill be the DTL
20:57:29 <SlickNik> hub_cap: you need a timer bot :)
20:57:33 <hub_cap> u can decide what the D means
20:57:41 <SlickNik> Guido van hub_cap
20:57:44 <hub_cap> summary
20:57:44 <vipul> if we have backup slaves.. should those additional DBs/Users be backed  up?
20:57:54 <hub_cap> lets take indiv questions offline vipul plz
20:58:02 <vipul> sorry :)
20:58:03 <hub_cap> here is the first cut of the api
20:58:19 <hub_cap> instances are in /instances, all of them, all visible, all actions can happen to them
20:58:35 <hub_cap> /clusters is used for create/delete only as a helper api
20:58:42 <hub_cap> that will be V1 of clusters
20:58:45 <demorris> hub_cap: and also some atomic actions
20:58:48 <hub_cap> as we decide we need more stuff, we will add it
20:58:50 <kevinconway> hub_cap: I'm bought on the idea of instance stuff going in /instances. But does the instance still contain cluster data now?
20:59:01 <kevinconway> this magic "attributes" addition?
20:59:06 <hub_cap> yes kevinconway it will have to, we can decide that stuff later
20:59:14 <hub_cap> there will be some indication
20:59:41 <hub_cap> once we have a need for more operations, be them atomic or acting upon many instances, we will add to /clusters
20:59:43 <demorris> hub_cap: when did we drop having actions on clusters?
20:59:49 <hub_cap> otherwise we will be coding this forever
20:59:54 <SlickNik> kevinconway: It's necessary if you want to have any sort of ruleset dictating what is possible on the instance vs on the cluster.
20:59:57 <hub_cap> demorris: i made an executive decison for R1
20:59:58 <hub_cap> V1
21:00:01 <hub_cap> we can always add them
21:00:05 <hub_cap> but if they suck we cant remove them
21:00:15 <hub_cap> lets just get something up
21:00:17 <hub_cap> and working
21:00:19 <imsplitbit> no actions!!!!
21:00:21 <demorris> hub_cap: I would vote for V1 to have at least atomic actions - add nodes, resize flavors, resize storage…in that they happen to the whole cluster
21:00:24 <kevinconway> SlickNik: you can mark an instance as part of a cluster without modifying the instance resource though
21:00:26 <vipul> it seems like it's easier to have a /clusters API that's completely isolated from /instances.. if we remove the 'promote' existing instance requirement
21:00:38 <hub_cap> demorris: we had a whole conversation about problem permutations
21:00:46 <hub_cap> goto #openstack-trove to contineu
21:00:48 <hub_cap> #endmeeting