18:00:46 #startmeeting trove-bp-review 18:00:47 Meeting started Mon Sep 15 18:00:46 2014 UTC and is due to finish in 60 minutes. The chair is SlickNik. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:48 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:00:51 The meeting name has been set to 'trove_bp_review' 18:00:55 o/ 18:01:29 o/ 18:01:31 o/ 18:01:35 ./ 18:01:40 agenda at: 18:01:40 O/ 18:01:42 #link https://wiki.openstack.org/wiki/Meetings/TroveBPMeeting 18:02:04 o/ 18:02:30 #topic OSProfiler integration 18:02:36 zhiyan_: around? 18:03:51 Looks like not. 18:04:08 Let's come back to this one later. 18:04:25 #Topic Cassandra clustering 18:04:36 it is mine 18:04:37 o/ 18:04:39 #link https://blueprints.launchpad.net/trove/+spec/cassandra-cluster 18:04:48 o/ 18:04:56 initial framework was merged 18:05:42 and as i can see, it's time to go forward with adding custering support for other datastores 18:05:58 this topic is about Cassandra clustering 18:06:39 denis_makogon: https://wiki.openstack.org/wiki/Trove/cassandra-clustering seems a little light on details. 18:06:58 For example, what data partitioning strategy are we proposing to use? 18:07:09 agreed, it needs to resemble https://wiki.openstack.org/wiki/Trove/Clusters-MongoDB 18:07:22 I posted some comments this AM 18:07:23 Are we doing replication? How are Snitches handled? 18:07:28 amrith: denis_makogon SlickNik : pls give your valuable comments/suggestions on https://review.openstack.org/#/c/103186/ 18:08:03 using the terminology of a code review; if this were put into an RST etc., I would believe that this needs more work 18:08:15 at this stage, I think it is premature to approve 18:08:31 because I'm not able to tell for sure whether cassandra will 'fit' in the current framework cleanly 18:08:42 and what the 'rough edges' could be. 18:08:46 but I think the bp is a start 18:08:51 and needs to be finished 18:09:06 also should we start using specs repo since we re looking at bps for kilo 18:09:18 amrith: +1 18:09:41 o/ 18:09:42 o/ 18:09:58 iccha2: Yes, getting that repo up and going is in the works. Should be done for next week's BP meeting. 18:10:07 In particular, I would like to make sure that whatever we implement/extend of the framework is also compatible with things (like MySQL/Percona) 18:10:14 swwet thanks SlickNik 18:10:15 iccha2, +1 18:10:49 iccha2 / amrith: https://review.openstack.org/#/c/121457/ 18:11:32 But coming back to this BP — denis_makogon can you switch to the rst format as you continue working on it? 18:11:51 I suspect folks will have quite a few comments on this, and it'll be easier to review the .rst 18:12:05 ok 18:12:41 denis_makogon: thanks. 18:13:35 #topic Clustering int-tests 18:13:53 that one is pretty simple 18:13:57 #link https://blueprints.launchpad.net/trove/+spec/clustering-int-tests 18:14:05 there's no int tests for mongo clustering at all 18:14:20 it was added to be approved by PTL 18:14:43 for now we have several API endpoints that sould be tested 18:15:39 denis_makogon: is the intent of adding this to the agenda to ask about its status or ? 18:15:49 denis_makogon: The design for this needs to be thought through as well. Should every int-tests be spinning up 5 instances for a cluster? How does this effect int-test workload, and run time? 18:17:06 amcrn, i planned to work on it, if there's no objections 18:17:06 I think we should add int-tests that are optional- as in , we don't *have* to run them all the time and ruin the gate- but *could* be run 18:17:24 and could also run in fake mode- which would require fixing the event simulator. :/ I could do that. 18:17:31 denis_makogon: afaik amcrn, and mat-lowery have been trying to answer some of these design questions to come up with something that is acceptable to run in the gate. 18:17:50 SlickNik: +1, but we wouldn't mind your help denis_makogon 18:18:04 SlickNik, ok, thanks 18:18:11 denis_makogon: we'll try to share some details fairly soon about the options we've been discussing 18:18:30 At this point, I think the work is "definition" 18:18:35 grapex: +1. Especially on getting fake mode to cover this so that we can test that in the gate. 18:18:40 amcrn, awesome, thanks 18:18:44 I understood the approval to mean that in principle we agree with the proposal 18:18:49 but the devil is in the details 18:18:55 so who is ironing out the details ;) 18:19:09 I'm assuming this is mat-lowery amrith SlickNik ... 18:19:11 yes? 18:19:31 mat-lowery: didn't you have tests that were almost running until you found out about the fake mode limitation? 18:19:43 Yeah due to the resources required for a real int test, I was planning on the fake mode-only route first. But then I ran into event simulator limitations. 18:19:56 amrith, i guess amcrn and mat-lowery would answer all questions since they are already working on them, but i didn't know about that (my bad =( ) 18:20:01 Let's proceed with those but not add them to any groups which would run in the gate 18:20:08 then hopefully I can fix the event simulator limitations 18:20:36 denis_makogon: we should have flipped the blueprint assignee, i screwed up there, apologies. 18:20:38 amrith: I believe in addition to amcrn and grapex as well. 18:20:42 SlickNik amcrn amrith: Would that work for you? 18:20:56 amcrn, it's totally fine 18:21:19 grapex, absolutely 18:21:24 no objections from me. 18:21:25 grapex / mat-lowery: Yes, I think adding int-tests even if it's not part of the gate would be good. 18:21:54 I would like us to keep one thing in mind as we make these choices 18:21:56 not sure what i'm agreeing to, but sure :) 18:22:03 for example 18:22:10 SUSE is goign to run their own CI 18:22:27 we should make sure that the things we choose will work on their CI as well, without too much of a burden 18:22:46 otherwise, things that we have approved, such as SUSE support would be impacted 18:23:02 because we implicitly are relying on their CI to find issues in code we commit 18:23:23 amrith, not sure about that, since SUSE CI is a third-party CI, if they would have an issues - they would come to us with questions 18:23:52 because we can't say for sure that it would work for them 18:24:16 denis_makogon, my issue was related to things like the numebr of instances we'd spin up 18:24:19 for each test 18:24:22 or things like that 18:24:31 ok, i get that 18:24:34 if we expect that to test clustering, your CI infrastructure should be enormous 18:24:40 taht may have a complicating effect 18:24:46 on other database vendors who want to participate 18:25:01 I don't think there is an imminent threat of that happening 18:25:06 but just something to keep in mind 18:25:17 dougshelley66, has a CI setup that we're operating 18:25:22 I have to keep that in mind as well 18:25:27 similarly, Percona 18:25:33 georgelorch, will have one he ahs to operate 18:25:35 and so on 18:25:41 i'm onboard for fake-mode, but real int-tests is difficult to get working on a single box for clusters; we have all sorts of hacks to get it to work on a fairly beefy box. 18:26:14 * georgelorch nods 18:26:20 i can imagine 18:26:30 amrith: so if we can't run it, i'm not sure the real value of writing it 18:26:39 we might also speak with infra guys to understand how much of resources we're able to use for our clustering tests 18:26:43 amrith: Agreed. I think we should accept having monstrous tests like clustering in the code even if we don't run them all the time, with the expectation that we can't expect them to always be catching bugs since they won't run frequently. But it would be nice to have them since they will run in fake mode, and we could still run them before the big lettered releases. 18:26:46 for the real one 18:27:15 amcrn, I think the value of having the tests in there is that there will be runs of those tests, even if they are not on a per commit basis 18:27:15 grapex: but we internally can't even run them on a single box without hacks to nova + diskimage-builder 18:27:19 as grapex says above 18:27:39 amcrn: Still, fake mode by itself will give a lot of feedback and keep smaller bugs from being introduced. 18:27:43 therefore to an earlier point that SlickNik made, there is the question of how these tests should be structured 18:28:08 The big risks left open is communication between Trove and the guest agent- but if people change clustering releated RPC calls they will hopefully expect risk in those areas. 18:28:11 and I think that at this point that thinking is something that you (grapex, amcrn, mat-lowery, SlickNik ...) are in the best position to do. 18:28:15 grapex: right, i get fake-mode; i'm saying int-tests that aren't runnable. 18:28:23 (as if "hopefully expecting risk" has ever saved anyone in this profession) 18:28:30 bah, i mean non-fake isn't really runnable on a single box setup*. 18:28:40 amcrn, I think that's fine 18:28:42 wish irc had an edit button sometimes :) 18:29:22 I'm assuming that with replication and clustering there will be larger test infrastructures that are required for some testing. and that's goodness. 18:29:46 so the tl;dr here is we're going to go with fake-mode first, which has a dependency on the event simulator fix. beyond this, we need larger deployed test infras to run non-fake int-tests. 18:29:59 if i'm understanding correctly. 18:30:23 amcrn: I agree with that tl;dr 18:30:37 guys, we might take a look at Sahara project, since they are deploying heavy clusters of Hadoop over infra gates 18:30:54 denis_makogon: i believe they're doing single node, but good point, we should confirm. 18:30:56 In the meantime, I can start some conversations with the infra folks to see how we can get creative with testing this on infra. 18:31:35 SlickNik, that would be awesome 18:32:23 and when we'll be ready with fake-mode test we would proceed to real-mode ones, and that'll be a bit tough 18:32:24 denis_makogon / amcrn: I think that's the case too — denis_makogon, can you recheck with sergey? 18:32:33 SlickNik, sure 18:32:36 thanks 18:32:40 np 18:33:02 Alright, I think we have a plan in place for that. 18:33:09 zhiyan_: back yet? 18:33:35 . 18:33:42 #topic Open Discussion 18:33:46 question, what is tl;dr? 18:33:57 Too long; didn't read 18:33:58 too long; didn't read 18:34:05 HA! 18:34:06 aka summary 18:34:09 I love these calls 18:34:12 I learn something new 18:34:57 too long; didn't read 18:35:02 :-P 18:35:29 #endmeeting