19:02:20 #startmeeting tripleo 19:02:21 Meeting started Tue Mar 25 19:02:20 2014 UTC and is due to finish in 60 minutes. The chair is lifeless. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:02:22 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:02:23 O/ 19:02:24 The meeting name has been set to 'tripleo' 19:02:26 hi 19:03:02 o/ 19:03:18 hi 19:03:19 #topic agenda 19:03:46 bugs 19:03:47 reviews 19:03:47 Projects needing releases 19:03:47 CD Cloud status 19:03:47 CI 19:03:49 Insert one-off agenda items here 19:03:51 open discussion 19:03:55 #topic bugs 19:04:04 #link https://bugs.launchpad.net/tripleo/ 19:04:04 #link https://bugs.launchpad.net/diskimage-builder/ 19:04:04 #link https://bugs.launchpad.net/os-refresh-config 19:04:04 #link https://bugs.launchpad.net/os-apply-config 19:04:04 #link https://bugs.launchpad.net/os-collect-config 19:04:06 #link https://bugs.launchpad.net/tuskar 19:04:09 #link https://bugs.launchpad.net/python-tuskarclient 19:05:19 so first up, untriaged stuff... 19:05:58 https://bugs.launchpad.net/diskimage-builder/ has 4 untriaged bugs 19:06:30 and so does https://bugs.launchpad.net/tripleo 19:06:45 https://bugs.launchpad.net/diskimage-builder/+bug/1290541 is fix I believe 19:06:50 any suggests for keeping on top of that other than saying 'we need to try harder' ? 19:08:18 a script that will collect untriaged bugs and post a list of those to #tripleo periodically? 19:08:37 rpodolyaka1: sounds cool. are you wolunteering ? 19:08:42 lifeless: sure 19:08:45 \o/ 19:08:53 #action rpodolyaka1 to make an untriaged nag bot 19:09:08 criticals... 19:09:12 The dib bugs all have asignee's, just no importance :/ 19:10:15 greghaynes: that typically means self filed-and-assigned, which is where folk need guidance 19:10:21 greghaynes: e.g. its an antipattern all of its own 19:10:48 So, as someone not too actively working on TripleO, it occurs to me -- if a whole team of people is supposed to be doing triage, but no one is, that tells me that people really don't care about triage. 19:11:08 Or the problem would be self-correcting. 19:11:20 matty_dubs: or that triage is hard/annoying/painful - not doing doesn't imply not caring 19:11:35 matty_dubs: (see 'Switch' for citations on that) 19:11:53 Yeah, true enough. 19:12:12 erm, I mean 'not doing doesn't *only* imply not caring' 19:12:28 i think the bot rpodolyaka1 mentioned would help a good bit. For me the most annoying thing about triage is looking if there are any untriaged bugs. 19:12:41 e.g. having to look per-project 19:12:41 https://bugs.launchpad.net/tripleo/+bugs?search=Search&field.importance=Critical 19:12:48 according to logstash we havn't had any occurrences of https://bugs.launchpad.net/tripleo/+bug/1292141 in ci since upping the pip timeout, we should close it and reopen another for a local pip chache (no longer critical) 19:12:48 jistr: yeah, its quite terrible 19:13:18 derekh: I saw a different error string, but still a pypi issue last night 19:13:39 lifeless: any idea where? or what the error was? 19:13:41 derekh: broke my dib less cp's patch (which was broken for other reasons, but spurious fail is still a problem) 19:13:54 derekh: sec 19:14:31 https://review.openstack.org/#/c/82683/ http://logs.openstack.org/83/82683/1/check-tripleo/check-tripleo-seed-precise/1f5ff5b/console.html 19:14:34 2014-03-25 01:25:40.327 | data = self._sock.recv(left) 19:14:37 2014-03-25 01:25:40.327 | error: [Errno 104] Connection reset by peer 19:15:44 derekh: so, I think we need more error strings for network / pip fails, and the bug is still really important - but if the data doesn't support that, sure, lets downgrade. 19:16:20 are 1293782 and 1295703 the same issue ? 19:17:01 ah the flow one is fixed 19:17:17 lifeless: ok, will downgrade, the error quoted in the bug seems to be gone, or close and open a new one? 19:17:59 i submitted a couple reviews this morning for https://bugs.launchpad.net/tripleo/+bug/1270646 19:18:02 derekh: IMO given two bugs with the same cause and different symptoms we should dupe them :) 19:18:22 lifeless: k 19:18:35 derekh: so I'm not sure there is any benefit shuffling the metadata around - we depend on a network resource that isn't reliable - thats the bug 19:18:36 i think we should really consider just making the default mtu be 1400 in the dnsmasq options for neutron-dhcp-agent 19:18:39 on the overcloud 19:18:43 it fixes the issue for me 19:19:59 lifeless: yes but this particular problem wasn't specific to our netwrok (or at least may not be), I put a similar fix into devstack and it seems to have gotten rid of the bug there also, anyways getting sidetracked 19:20:52 slagle: I'm torn. Its clearly not a deployment caused bug. So yes, doing the workaround is appropriate. 19:21:13 slagle: OTOH OMG WTF ARE THEY THINKING, DONT MESS WITH THE END TO END SIGNALLING MECHANISMS! 19:21:20 i don't feel like i have enough networking expertise to say that using 1400 is "right" thing 19:21:39 but...all the solutions i can find online just say to use it and be done with it 19:21:56 lets talk after the meeting briefly 19:22:05 could just be collective ignorance :) 19:22:12 what about 1290490 ? 19:22:22 and 1292141 ? 19:22:32 ah 141 is what derekh mentioned 19:22:53 downgrading the MTU is a must as it's now 19:24:21 I've retitled 141 19:25:40 derekh: ^ 19:26:05 lifeless: cool, ok to downgrad to hig? since its not as common as it was ? 19:26:12 *high 19:26:41 derekh: sure 19:26:50 I'd like to downgrade https://bugs.launchpad.net/tripleo/+bug/1290490 too 19:26:55 any objetions ? 19:28:57 lifeless: looks like it happens about 3 times a day, interestingly in the last week mostly happened on the 21st 19:29:12 1290490 ? 19:29:20 lifeless: yup 19:30:17 lifeless: anyways no objection from me, just info 19:30:53 huh, I didn't realise 1290490 was hitting CI 19:31:01 if it is, I think it is still very important 19:31:05 derekh: got pointers? 19:31:26 ok what about https://bugs.launchpad.net/tripleo/+bug/1271344 ? 19:31:35 lifeless: I'll refine my search later to ensure I'm seraching on the correct thing 19:32:49 I think we can downgrade https://bugs.launchpad.net/tripleo/+bug/1271344 - its very important but not breaking CI at the moment, right ? 19:34:03 lifeless: correct, at least I haven't seen it 19:34:07 ok 19:37:05 ok, time to move on 19:37:18 we've touched on all the criticals for which people are here, I believe. 19:37:21 #topic reviews 19:37:29 http://russellbryant.net/openstack-stats/tripleo-openreviews.html 19:37:42 19:37:42 Stats since the last revision without -1 or -2 : 19:37:42 Average wait time: 5 days, 6 hours, 2 minutes 19:37:42 1rd quartile wait time: 0 days, 21 hours, 34 minutes 19:37:42 Median wait time: 3 days, 23 hours, 44 minutes 19:37:44 3rd quartile wait time: 5 days, 23 hours, 7 minutes 19:37:54 we're in trouble :( 19:38:09 any thoughts on why? 19:38:27 lifeless: lots more people on the team and no more cores 19:38:38 +1 19:38:41 lifeless: oh and I could do better too 19:39:04 http://russellbryant.net/openstack-stats/tripleo-reviewers-30.txt if we want to look at reviewer activity 19:39:33 only 6 people doing more than 3 reviews a day on average 19:39:41 I've fallen way behind where I was :/ 19:39:44 derekh: # of cores doesn't affect the stats above 19:40:10 maybe we should have a review day or afternoon 19:40:14 * slagle steals idea from ironic 19:40:15 derekh: because - if the patch is ready, cores can rubber stamp and go very fast 19:41:15 slagle: maybe!. I'd be quite keen on that if it looked like we're overloaded, but what the stats above say to me is that we've collectively stopped pushing as a community 19:41:29 also 19:41:37 a lot of the new contributors are not yet reviewing consistently 19:42:09 like 1 review or so, and (from what I've seen) primarily on patches they need (e.g. from colleagues fixing something affecting them) 19:42:10 slagle: that we have review days is a symptom of cores not reviewing steadily IMHO 19:42:16 which isn't a bad thing 19:42:21 lifeless: e.g. I started contributing and I hope to be reviewing soon, but it takes a bit of time to get up to speed 19:42:23 but its not the thing we need 19:42:28 andreaf: hey - cool! 19:42:41 slagle: and so when we have external deadlines, it takes a concerted effort to push through the backlog, which is not ideal 19:42:44 andreaf: please do. Note that the fastest way to get feedback on your reviews is to start commenting, right or wrong. 19:43:18 ok, fair enough 19:43:20 lifeless: I dissagree people have to spend time on a review to know its ready... anyways we also have a backlog from when ci was busted which never got cleared 19:44:04 derekh: point taken, and I'll send that mail asap :) 19:44:20 once I sort more beaureaucrap @ work 19:44:35 It's a tricky project to review properly. Lots of interdependencies between git repos, lots of new (to me) components, and no unit tests so you have to verify logic in addition to everything else. 19:45:12 stackalytics graph for review rate seems pretty steady lately 19:45:16 bnemec: thats interesting! perhaps we need a guide explaining things a bit more? To me, its easy to review because we know 'if you change a public API, you'll break something' 19:45:45 lifeless: Right, but entire elements in dib and t-i-e are completely untested in the gate. 19:45:57 We're working on Fedora, but even with that we won't be hitting everything. 19:46:10 bnemec: yes, I know - long way to go 19:46:37 more stats 19:46:47 Total reviews: 2176 (72.5/day) 19:46:48 Total reviewers: 82 (avg 0.9 reviews/day) 19:46:48 Total reviews by core team: 1174 (39.1/day) 19:46:48 Core team size: 20 (avg 2.0 reviews/day) 19:46:48 New patch sets in the last 30 days: 1343 (44.8/day) 19:47:12 cores are doing 40 reviews/day and 44 patch sets are being pushed a day 19:47:28 but 19:47:29 Queue growth in the last 30 days: 64 (2.1/day) 19:47:33 so we're falling behind 19:47:42 what if we ask all cores to do *one more review a day* 19:48:06 thats 16 more reviews a day 19:48:14 which could in principle land 8 changesets a day 19:48:30 and tip things back in the right direction - can everyone here commit to 1 more review a day ? 19:48:46 * derekh doesn't count them but will try 19:48:47 so what is that? let's be explicit 19:48:53 2 reviews a day? 19:48:58 * jistr will try 19:49:10 There's some pretty low-hanging fruit out there too. For example: https://review.openstack.org/#/c/80337/ 19:49:29 lifeleless, I will try 19:49:57 slagle: sure, lets say that. 19:50:04 lsmola: thanks 19:50:33 #action lifeless to propose minimum of 2 reviews/day commitment from core reviewers in tripleo 19:50:40 bnemec: more low hanging fruit: https://review.openstack.org/#/c/81813/ and https://review.openstack.org/#/c/82035/ 19:50:43 I will do 19:50:54 ack 19:51:56 we need to roll 19:51:59 #topic 19:51:59 Projects needing releases 19:52:04 #topic Projects needing releases 19:52:08 rpodolyaka1: you up for this? 19:52:42 lifeless: yep! 19:52:45 \o/ 19:52:52 #topic CI cloud status 19:53:15 HP region is ok, tripleo-cd still paused, but SpamapS thinks we can unpause soon with the new heat shiny 19:53:28 Redhat region I believe is ready to add to the CI system \o/ 19:53:37 anyone have more to add ? 19:53:38 \o/ 19:53:50 lifeless: patch is ready, I've been holding off on getting the nodepool vm's back to 8G 19:53:50 #topic CI 19:54:05 derekh: ack, because of machine size in that region ? 19:54:28 derekh: andreaf: tempest - I think tempest should just run from the slave like it does for devstack 19:54:29 lifeless: well because of the number of them 64G x 3 in overcloud 19:54:32 lifeless: machine size and number of machines 19:54:40 dprince: ack 19:54:45 lifeless: hi! on CI, I'd like to bring up a question 19:54:50 devananda: shoot! 19:55:00 lifeless: specifically, how soon can Ironic start relying on tripleo-ci 19:55:05 derekh: andreaf: I mailed the -dev list about tempest. 19:55:12 lifeless: so we can't use an element to configure tempest then? 19:55:16 ya'll are already posting -nv checks on our patch sets, which is great 19:55:35 andreaf: I think you'll get worse results 19:56:07 also, as a non-integrated project, afaik, it's OK for tripleo to vote on ironic... and I think I trust you guys ;) 19:56:07 derekh: we'll get more machines in the future so lets go for it now (regardless of size) 19:56:09 andreaf: you can and may want to do for prod deploys, but CI is resource limited, we need to be juidicious where we run stuff 19:56:24 lifeless: yup, saw it, either is good with me, just thougt it would be good to reuse what we have from seed but whatever 19:56:55 derekh: andreaf: you guys should do whatever works best, was really just getting my thoughts out there where you can see them :) 19:57:17 lifeless: ok so we need some alternate lib/tempest to set-up tempest for tripleo kind of environment - or some more ifs in the existing one 19:57:18 devananda: CI for ironic with tripleo: https://bugs.launchpad.net/ironic/+bug/1297063 this bug 19:57:38 devananda: lists *all* the outstanding patchsets need to be running Ironic properly in check 19:57:41 dprince: ok, will push it up later 19:57:51 devananda: one each in Ironic, tripleo-incubator, image-elements, heat-templates 19:58:11 devananda: we get those four landed, and you should see failures such as the one reported in that bug 19:58:23 devananda: w.r.t. voting - once we're multi region we'll start turning the voting bit on. 19:58:31 with infra's cooperation 19:58:45 andreaf: really be guided by derekh here 19:58:48 lifeless: once we are multi-region and stable 19:58:53 andreaf: he's spent way more time on it than I 19:59:02 lifeless: voting globally vs. voting on ironic are different topics 19:59:21 on fedora ci runs we need https://review.openstack.org/#/c/82562/ and https://review.openstack.org/#/q/status:open+branch:master+topic:add-f20-jobs,n,z 19:59:21 devananda: lets follow that up when its actually something we can do 19:59:27 lifeless: ack 19:59:28 lifeless: ok that's fine at least I've got a clear direction on where to focus on now, thanks 20:00:24 #topic open discussion 20:00:32 30 seconds 20:00:40 lifeless: do you need anything from ironic folks for the tie/tht/t-i bugs? 20:00:48 devananda: reviews :) 20:00:55 ack 20:01:32 crap, time flies on these meetings 20:03:06 thanks guys, have a great week 20:03:14 thanks, see ya 20:05:08 #endmeeting