18:00:40 #startmeeting trove 18:00:41 Meeting started Wed Jan 18 18:00:40 2017 UTC and is due to finish in 60 minutes. The chair is amrith. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:42 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:00:45 The meeting name has been set to 'trove' 18:00:50 o/ 18:00:51 ping peterstac johnma slicknik dougshelley66 pmalik vgnbkr mvandijk trevormc aliadil spilla songjian trevormc apsarshaik 18:00:57 o/ 18:00:58 hi peter 18:01:00 hi trevor 18:01:07 hi all 18:01:12 o/ 18:01:16 #agenda https://wiki.openstack.org/wiki/Meetings/TroveMeeting 18:01:23 o/ 18:01:50 johnma said she'd be here, let's give her a couple of minutes 18:02:14 peterstac, would you please remind vgnbkr that he has a -1 on 337914 that may be addressed now. 18:02:22 o/ 18:02:54 sure, but he's not at his desk right now ... 18:02:58 hi mariam, let's get started. 18:03:04 thx peterstac when you get a chance 18:03:13 #topic Stability of the gate 18:03:23 so, it appears to be on the fritz again 18:03:29 thx peterstac for trying to get it going again 18:03:32 how goes that battle? 18:03:44 well, there seem to be multiple issues 18:03:55 specifically, are there a small set of tests that we can skip for now 18:03:55 not sure how many are our fault either 18:03:57 and get around it. 18:04:15 I've got a patch up to restrict resize-vol and resize-inst to just mysql 18:04:32 but then the postgres tests fail for an unrelated issue ... 18:04:35 i saw that, but it appears to have failed for some other reason 18:05:27 it's in the gate now, see if it merges this time 18:05:47 what of the postgres failure? 18:05:47 that'll help somewhat, but I still believe we have xenial/neutron issues on some of the clouds 18:06:00 you mean the timeout for qemu? 18:06:12 I believe it was one where the instance just didn't go active 18:06:36 that's the only case now where we don't get a guest log printed out so it'll be hard to debug 18:06:56 (doesn't just happen for postgresql either, I've seen it on redis too) 18:07:06 (any maybe mysql as well) 18:07:13 s/any/and/ 18:07:33 ok 18:07:59 so basically I think it'll be more stable soon, but we're not entirely out of the woods yet 18:08:14 how confident are you that we can get (for example) a string of changes like i18n merged soon 18:09:03 I'm guessing we can still only expect 50-60% to run without any issues 18:09:16 ok, thx peterstac 18:09:26 will go look at those instance not going active cases 18:09:38 one thing that could cause that is a kernel panic in the guest 18:09:40 sounds good, thx amrith 18:09:42 which I've been watching for 18:09:49 it is a side effect of the kvm/qemu change 18:10:28 I might be out of the loop here amrith but what is this kvm/qemu change we are talking about 18:10:32 hmmm, hard to know if that's causing it - maybe we need to resurrect your 'pipe guest over to conductor' patch ? 18:10:42 johnma, a while ago I made a change 18:10:48 let me find a link 18:11:02 I85364c6530058e964a8eba7fb515d7deadfd5d72 18:11:19 #link https://review.openstack.org/#/c/413166/ 18:11:24 the gist of it is this ... 18:11:40 #link https://review.openstack.org/#/c/412011/ 18:11:41 we look during a test run and see whether we can force nova to use KVM 18:11:48 ^--- that's the first one 18:12:16 peterstac, yours was the test, https://review.openstack.org/#/c/413166/ was the finalized one which had your suggested changes (like the name of the method) 18:12:20 so anyway 18:12:26 at run time we see whether we can force KVM 18:12:37 now the infra folks warned me that if we do this we could have some number of panics 18:12:55 the symptoms would be just as peterstac felt, the guest never makes it far enough 18:13:05 oh ok 18:13:11 but in these cases, typically we have the whole test go away 18:13:30 that seems to happen around 5-10% of the time 18:13:39 that ~ instance not launching 18:13:53 right, and then the wheels fall off 18:13:55 looking at kernel panics, I see them about 1 in a 1000 vm's 18:14:12 so ther's some orders of magnitude difference between the two 18:14:15 which needs investigating 18:14:37 in any event, I'll look into that 18:14:42 and get back to you at next meeting 18:14:45 right - it could also be a networking issue, or a path mismatch 18:14:58 yes, what was that .1 issue you mentioned? 18:15:05 might be quick to determine if we could get into the guest 18:15:21 ah, we calculate CONTROLLER_IP as the actual ip of the box 18:15:47 we use 'hostname -I' however that lists gateways and other stuff too 18:15:58 so all the .1 addresses were filtered out 18:16:17 some clouds however will give out .1 addresses as a valid ip 18:16:39 in that (rare) case the tests would fail since the controller ip wouldn't be set properly 18:16:53 worth pointing out taht ironic and lbaas have been having similar issues with nested virt. It might be worthwhile to talk to them about what they are seeing and possibly debug together 18:16:59 I didn't see it that often, but enough that I wanted to fix it 18:17:01 hmm, ok 18:17:12 clarkb, I have replied to that email thread 18:17:28 there's also an issue with Redis backups - I've put a bug in for that 18:17:36 sounds good peterstac is there a fix for the .1 issue? 18:18:04 yes, it's merged (yesterday or early today) so hopefully that's resolved 18:18:08 yup I saw, but there is a lot mroe detail in here so far :) in any case better communication around shared issues can only help 18:18:18 #link https://bugs.launchpad.net/trove/+bug/1656432 18:18:18 Launchpad bug 1656432 in OpenStack DBaaS (Trove) "Redis backup can fail if auto one already running" [Undecided,New] 18:18:23 ^^^ redis bug 18:18:45 ok, thx 18:18:54 clarkb, I'll take a look at the thread and see if anything new is revealed 18:19:53 ok, so if we're done with that topic of gate stability, let's move along ... 18:20:12 #topic Code Reviews (peterstac) 18:20:18 peterstac, you are up 18:20:41 ok, it's just a couple of reviews that have been there for a little while that I'd like to go in soon 18:20:56 (mostly because there are other ones that are waiting on them) 18:21:11 module-instances 18:21:21 #link https://review.openstack.org/#/c/403287/ 18:21:35 cluster-restart 18:21:37 #link https://review.openstack.org/#/c/417454/ 18:22:01 And redis from compile (less critical, but I think it's ready) 18:22:03 #link https://review.openstack.org/#/c/416361/ 18:22:50 oh amrith I talked to your concern on the cluster-restart one 18:22:53 ok, this afternoon. getting them to merge, left as an exercise to peterstac (the unbreaker of the gate) 18:22:59 let me look 18:23:40 thx peterstac, that was my hope, thanks for confirming 18:23:58 peterstac: I reviewed all of them this morning. I havent gotten to testing them though. I tried but I think something about my env is messed up. Tried testing mongo and redis changes and things keep failing. So I am rebuilding my env. Hopefully once thats fixed I can test these real quick 18:24:37 thx johnma 18:25:21 thx johnma 18:25:49 peterstac, the only concern I have about the redis (and also postgresql) compile from source each time is that it is an awful waste of time 18:25:59 but I can't think of a good way to automate it either 18:26:08 sure, but with Redis we may not have a choice 18:26:24 we were using a ppa and it looks like the maintainer is letting it go stale 18:26:25 so what I was thinking was to make a repo of just the distros we use for testing 18:26:35 and call it something like (trove-db-packages) 18:26:41 and import packages from there 18:26:51 or have them push artifacts into some place we can get them from 18:26:59 and a build job to rebuild the packages, maybe? 18:28:02 anyway, we can take that up at a different time 18:28:12 anything else on the subject of reviews ... 18:28:28 I uploaded more troveclient changes for osc :) 18:28:32 #link https://review.openstack.org/#/q/status:open+project:openstack/python-troveclient+branch:master+topic:bp/trove-support-in-python-openstackclient 18:28:52 do we have any reviews left for troveclient before the deadline this week 18:29:25 johnma, good question 18:29:48 #link https://review.openstack.org/#/q/project:openstack/python-troveclient+status:open 18:30:45 so there are a couple 18:31:09 so if the patches aren't merged by the deadline then they will be going into pike? 18:31:09 peterstac, can you help get https://review.openstack.org/#/c/402802/6 to go? 18:31:19 trevormc, no 18:31:26 let's talk about the deadline later 18:31:31 this week is a soft freeze for the client 18:31:32 ok 18:31:37 so we can prioritize those for review 18:31:51 amrith, I can look at that 18:32:00 client freeze is next week 18:32:15 peterstac, if you can get that one to go, there's its client change which looks good 18:32:27 right, I'll test both at the same time 18:32:44 I'll +2 it but since i co-authored it, I won't approve 18:33:38 so the priority for the next week of reviews is https://review.openstack.org/#/q/project:openstack/python-troveclient+status:open 18:33:46 to get the client stuff done by next week 18:33:51 when our client will freeze 18:34:49 so let's move along 18:34:51 #topic Reviews for abandonment 18:34:57 I abandoned four reviews this morning 18:35:10 they've been sitting around and idle for a while with negative review comments 18:35:24 if someone feels strongly about them, please restore and get them ready to go. 18:35:42 that's all I wanted to mention 18:35:49 anyone want to add something ... 18:36:45 #topic Ocata Release Schedule - Update 18:36:54 #link https://releases.openstack.org/ocata/schedule.html 18:37:19 so, to johnma's earlier point, we have a soft freeze for the trove client and guest requirements this week 18:37:36 I see nothing changing that (other than the redis compile change, maybe) 18:37:44 we should be tood to go with that 18:37:50 there's no actual 'deliverable' at this stage 18:37:55 the only deliverable comes next week 18:38:02 note that next week, we freeze the client. 18:38:09 no if's, and's, or but's. 18:38:23 trevormc, anything osc related that doesn't merge by jan 27 is not in ocata 18:38:34 similarly peterstac the volume type support client side stuff 18:39:51 we had one community goal for ocata 18:39:53 #link https://review.openstack.org/#/c/396267/ 18:39:55 we completed it 18:40:11 it is also the string soft freeze 18:40:18 so the i18n changes should merge if at all possible 18:40:25 the hard freeze for the i18n changes would be R-3 18:40:33 any questions ... 18:41:05 hearing none 18:41:08 #topic open discussion 18:41:59 anybody ... 18:42:30 so it sounds like there is more pressure on reviews this week than uploading new patch sets to the troveclient. I can stop uploading patches if thats what we want for this week 18:43:05 meaning I'll have more time for reviews... :) 18:43:06 you can upload all you want, they may not get reviewed :) but it'd be great if all could review/test this week 18:43:10 I dont think that is what amrith meant trevormc. 18:43:28 yes, what johnma says 18:43:35 you should go ahead and upload whatever you want to get into this release 18:43:55 we should also do our part in helping with reviews :) 18:44:11 for the i18n stuff it's not the reviews that's the problem, it the gate :) 18:44:34 if the gate is stable, the saturday morning merge schedule is best 18:44:48 yeah I do that a lot over weekends :D 18:44:49 I queued up the rechecks for 530am on saturday and they fired automatically 18:44:59 next time, I'll do them better; i.e. not 18 at a time :) 18:45:07 but two or three at a time 18:45:30 Thanks for doing the housekeeping on those. 18:46:11 no worries trevormc; happy to do it, I understand the constraint you have there 18:46:32 ok, anything else for open discussion ... 18:46:39 nothing here 18:46:47 https://review.openstack.org/#/c/421631/2 Perhaps we need to look at this patch, I reproduce the problem today, do not know whether it is universal 18:47:27 i was thinking I'd look at it once the submitter could get basic tests to pass 18:47:40 just a pep8 failure 18:47:42 I don't understand either the problem or the fix and the commit message is empty 18:47:45 should be easy to respin 18:48:00 so if that isn't fixed, and there's no explanation for the change, it will be harder to review 18:48:03 yeah, it'd be good if the bug info was copied into the commit message 18:48:14 maybe I'll put a note there :) 18:49:00 also, I'm not entirely positive of the environment setting that the guy shows in the bug 18:49:07 I don't know that it is all legitimate 18:49:23 are we sure that the settings aren't junk to begin with? 18:49:49 is it 'really a bug'? 18:49:57 what's the severity 18:50:08 should there be additional tests to make sure this isn't breaking something else 18:50:14 after all, we're 1 week from client freeze 18:50:28 I'm not going to do back flips to merge this unless we're damn sure that it is right 18:50:31 and not going to regress 18:50:53 and if we aren't sure, I'd rather not take it, and wait till ocata comes out and respin the client 18:51:53 well, it only affects log-tail so it's fairly low-risk 18:51:59 johnma, trevormc, aliadil, anything for open discussion 18:52:31 thx,I also think so, stay focused 18:52:34 no I'm good. 18:52:42 even so, if it is fairly low risk and limited to log tail, we can take it after the release is cut 18:52:46 I am good. Thanks amrith 18:53:00 ok, have a good afternoon folks 18:53:04 #endmeeting