19:00:20 #startmeeting tripleo 19:00:20 Meeting started Tue Nov 19 19:00:20 2013 UTC and is due to finish in 60 minutes. The chair is lifeless. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:21 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:24 The meeting name has been set to 'tripleo' 19:00:25 hi 19:00:28 hello 19:00:31 #topic agenda 19:00:41 bugs 19:00:41 reviews 19:00:42 Projects needing releases 19:00:42 CD Cloud status 19:00:42 CI virtualized testing progress 19:00:44 Insert one-off agenda items here 19:00:46 open discussion 19:00:49 #topic bugs 19:00:55 #link https://bugs.launchpad.net/tripleo/ 19:00:55 #link https://bugs.launchpad.net/diskimage-builder/ 19:00:56 #link https://bugs.launchpad.net/os-refresh-config 19:00:56 #link https://bugs.launchpad.net/os-apply-config 19:00:56 #link https://bugs.launchpad.net/os-collect-config 19:00:58 #link https://bugs.launchpad.net/tuskar 19:01:00 #link https://bugs.launchpad.net/tuskar-ui 19:01:03 #link https://bugs.launchpad.net/python-tuskarclient 19:01:09 one critical fixed - bug 1251166 19:01:17 o/ 19:01:24 \o 19:01:27 o/ 19:01:31 good afternoon 19:01:31 |o 19:01:39 heya 19:01:40 i broke the wave 19:01:40 hi 19:01:43 i am so sorry 19:01:44 hey 19:02:04 'evening 19:02:12 hi 19:03:13 sorry elocal there 19:04:14 ok so all other bugs triaged it looks like 19:04:17 and no other criticals - cool 19:04:22 any other bug business? 19:05:21 lifeless: question about general OS fixes that affect tripleO as well. 19:05:27 dprince: shoot 19:05:36 lifeless: should we register those against tripleO in LP as well? 19:05:44 lifeless: and/or put them in the trello 19:06:13 lifeless: or just fix them in each project (Nova) and mention in on tripleO IRC 19:06:31 * dprince doesn't like extra busy work but wants to keep people in the loop 19:06:35 dprince: can you give a for-instance ? 19:07:17 lifeless: for instance... if any sort of integration point breaks that we aren't gating on 19:07:17 sqlalchemy-migrate versioning bug 19:07:24 ok 19:07:27 ^^ or that, sure 19:07:40 that broke nova and other api services for a few hours 19:07:51 so, IMO if we're going to make a code change (e.g. version lock to work around) then it's entirely appropriate to have a bug in TripleO 19:08:06 if it's something we're going to wait and see / just fix in the source location immediately 19:08:18 then I think a trello card maybe, and/or a topic update to surface the info 19:08:38 dprince: how does that sound? 19:09:00 lifeless: okay. thanks 19:09:07 +1 19:09:20 ok 19:09:23 #topic reviews 19:09:31 http://russellbryant.net/openstack-stats/tripleo-openreviews.html 19:09:34 http://russellbryant.net/openstack-stats/tripleo-reviewers-30.txt 19:09:37 http://russellbryant.net/openstack-stats/tripleo-reviewers-90.txt 19:09:52 * SpamapS dons UTC watch 19:10:04 Median wait time: 0 days, 6 hours, 15 minutes 19:10:05 3rd quartile wait time: 1 days, 4 hours, 7 minutes 19:10:10 so we're a little behind, *but* 19:10:19 WIP? 19:10:26 https://review.openstack.org/#/c/52045/ is the reason, and I think it's meant to be marked work in progress 19:10:27 reviews seem to be totally under control... so much so that I frequently don't get to see things until after they land 19:10:29 but isn't 19:10:34 * matty_dubs actually wears UTC watch 19:10:57 marios: gerrit sadly requires you to click on 'work in progress' after each push 19:11:14 matty_dubs: but only 12 h mode :P 19:11:49 lifeless: oh is that what that was 19:12:13 sorry, fixing 19:12:30 I have not been doing my usual share of reviews the last week as well. Just getting back into it today. 19:13:55 ok, so yeah good stuff! 19:14:08 #topic Projects needing releases 19:14:14 we've landed a bunch of code 19:14:24 do we have a volunteer to do releases of them all ? 19:14:34 pick me :) 19:14:39 rpodolyaka1: tag! 19:15:13 #action rpodolyaka1 to release all the things 19:15:25 #topic CD Cloud status 19:16:11 its pretty unwell 19:16:24 I've not lead by example and fixed it this week - sorry 19:17:09 I have a POC that will at least let us track the map of hardware<->instances 19:17:32 Should push that up for review later today. 19:17:52 I have a hypothesis that certain hardware is causing timeouts/problems while others allow it to finish properly. 19:18:25 SpamapS: do we have a mixed bag of hardware in the CD cloud then? 19:19:04 not really 19:19:08 we have two hardware configs 19:19:11 in theory 19:19:19 but we have some oddness 19:19:35 SpamapS: it's failing 100% of the time atm though 19:19:42 SpamapS: so I think diagnosing directly is needed 19:19:47 lifeless: network driver reload again? 19:20:06 SpamapS: I don't know :) 19:20:14 SpamapS: but I want a larger set of people responding and fixing 19:20:35 lifeless: agreed diagnosing directly is needed. My suggestion is that it is 100% failing because the good hardware is all taken. But I am willing to accept evidence of other problems. Right now we have very little data beyond "it fails" 19:20:49 SpamapS: we have 4 machines deployed and 50 in the rack. 19:20:53 SpamapS: your theory is wrong. 19:20:59 SpamapS: :) 19:21:02 unless we're using the same 4. 19:21:43 SpamapS: so, I don't want to rathole right now. 19:21:51 SpamapS: lets talk process and coordination instead. 19:21:57 nor do I. But I don't want to leave without a plan of action. 19:22:17 SpamapS: I think a volunteer poking directly and gathering data is a good start. 19:22:33 SpamapS: e.g. stop the service, run 'nova boot' and see if a machine comes up 19:23:18 SpamapS: I do want us to gather more automated data 19:23:28 SpamapS: but we also need to get into a production mindset and keep the thing itself running. 19:24:01 SpamapS: my concern is that right now we have 9 sysadmins, but the cloud has been down for 3+ days 19:24:30 the cloud is down for several minutes every hour 19:24:38 I don't think it matters to anyone yet that it goes down... 19:25:24 SpamapS: if it fails to deploy, that means we have a problem that may affect everyone doing devtest. 19:25:27 lifeless: I've been uninterested in part because it feels more important to preserve state first so we can have some kind of SLA/monitoring of "cloud is up" before we chase these problems. 19:25:39 lifeless: So I think the issue now is we (as a team) are chasing things on multiple fronts. Once CI and CD get closer we'll have more interest in CD cloud errors for sure. 19:25:58 lifeless: I only logged into the CD cloud for the first time this week... 19:26:09 lifeless: and then only briefly 19:26:14 dprince: mmmm I can see that angle. The way I think of this is that this period is our learning period. 19:26:22 what happened to the status announcements in #tripleo? 19:26:28 * dprince needs a crash course perhaps in our CD cloud setup 19:26:33 or is that part of what's down? 19:26:59 slagle: that would seem to be down too 19:26:59 slagle: it may be turned off. 19:27:19 slagle: which means either someone disabled the service (and didn't say so somewhere visible like the channel topic) 19:27:29 or the undercloud host has lost its firmware marbles again 19:27:32 :) 19:27:38 so that's part of the problem 19:27:44 agreed! 19:28:07 I have linked references to a HPCS ticket about these cards/firmware having issue 19:28:08 Let's just agree to make it a priority and do analysis on why it has been failing. 19:28:15 ok 19:28:25 SpamapS: can you continue the meeting? ELOCAL 19:28:50 lifeless: perhaps.. does the bot let you transfer? 19:29:29 SpamapS: I don't think so... but we can unofficially follow your lead 19:29:48 Ok sure. :) 19:29:51 SpamapS: not 100% on that though 19:30:06 documentation doesn't show any way to take it 19:30:38 anyway, I think we agree that analysis is needed and that we should probably hold it to a higher standard. 19:31:20 #topic CI virtualized testing progress 19:31:31 pleia2: is this still your area? 19:31:46 yeah 19:32:05 I owe a patch to create-nodes and would like to talk a little about boot-seed-vm 19:32:48 we could use boot-seed-vm in setting up the test environment, but we'd need to modularize it a bit more, right now we have it as an optional option to actually boot the seed vm, we could also make it optional to build the seed vm 19:33:16 so essentially it would just run configure-vm without building the image, or we could just run configure-vm outside of boot-seed-vm 19:33:42 thoughts? 19:33:46 pleia2: I like splitting things into smaller and smaller tools 19:34:46 anyone else have strong opinions? 19:35:17 hi, backish 19:35:30 ok, beyond that I'm working on how to tackle network addressing for all these test envs 19:35:33 we need to separate out all the libvirt manipulation 19:35:48 e.g. configure-vm should be run just once as part of setting up the test environment 19:36:08 so I'd like to see boot-seed-vm behaviour change to assume the vm is defined 19:36:19 ok, great 19:36:20 and just build the image, copy to $place, and start it. 19:36:30 * pleia2 nods 19:37:20 that's all from me, dprince and derekh are looking into the gearman stuff https://etherpad.openstack.org/p/3rYI32gvfu 19:37:24 * dprince wishes this meeting didn't run concurrently to the infra team meeting 19:38:00 pleia2: cool, good progress. The gearman stuff is coming along nicely too 19:39:33 On the gearman front I believe we are still hitting issues with the python gearman client. But if you see the etherpad above the PHP client seems to work fine. 19:39:51 So... we might need to have a closer look at fixing that. 19:41:27 Is it worth having another CI focussed google hangout this week? pleia2/derekh? 19:41:40 durrh, sorry, just saw tuskar meeting invite email. 19:41:54 dprince: yeah, maybe tomorrow? I'm on the tail end of a flu so today still isn't great 19:42:21 pleia2: okay. sound good 19:42:38 * dprince hopes pleia2 feels better 19:42:47 thanks :) 19:43:38 I think that's it for the CI stuff in this meeting 19:43:44 ok properly back 19:43:48 ok so let's just move on 19:44:04 #topic Open Discussion 19:44:06 dprince: I'm up for a CI hangout 19:44:16 oh and now he's back ;) 19:44:22 so i didn't make the meeting last time but wanted to give my +1 to the idea of gate involving techs like puppet or chef, for projects where it's helpful 19:44:32 lifeless: cool, sounds like tomorrow is the best day for it 19:44:54 if we don't gate on it, the potential bugs won't go away, we'll just hit them more painfully 19:45:11 and in the end someone will have to fix it anyway 19:45:23 so we might as well fix it *before* stuff gets in 19:45:51 and if some dev feels blocked by tech he doesn't understand, he can ask for assistance 19:46:16 jistr: so, I agree, but I think it's a -infra discussion at this point 19:46:33 ok :) 19:49:02 #topic Open Discussion 19:49:11 I'll time out the meeting in a minute 19:49:49 lifeless: I have a question about my local dev env... which I've been chipping away in again 19:50:07 lifeless: sent you derek and liz the email a few days back... 19:50:22 lifeless: got back to it today, and still hitting an issue 19:50:46 lifeless: which is, I can't ping my overcloud from outside my seed VM 19:50:56 lifeless: which sort of breaks the devtest story :( 19:51:00 the overcloud host 19:51:04 or an instance in the overcloud 19:51:13 lifeless: overcloud host (all-in-one) 19:51:31 lifeless: I can ping the undercloud host fine 19:51:34 login via the console using stack:stack and check the networking is setup 19:51:40 sensibly 19:51:41 lifeless: did that 19:51:51 \/win 4 19:51:56 ! 19:51:58 lifeless: didn't see anything too odd 19:52:07 dprince: in particular check there is a default route out via 192.0.2.1 19:52:17 dprince: cause that being missing would give the symptoms you're describing 19:52:36 lifeless: I think it was, but I can check. 19:52:51 well that could explain the 100% fail rate on the CD cloud... 19:54:22 haven't tested Dan's all-in-one template yet, but devtest is working right now 19:54:26 dprince: if the route is there, start a ping from outside to your overcloud host, and tcpdump each hop along the way 19:54:40 lifeless: So if it isn't set should I use the NeutronPublicInterfaceDefaultRoute in the template? 19:55:07 lifeless: did that. I saw incoming ICMP packages. But no responses on br-ex from the overcloud 19:55:32 only to override the default route set AFAIK 19:55:42 lifeless: on a related note... it would be really cool if our standard image included tcpdump 19:55:54 dprince: it does 19:55:55 dprince: :) 19:56:02 dprince: maybe not on fedora? 19:56:03 lifeless: not on Fedora :( 19:56:14 dprince: I'm entirely happy for us to add it :) 19:56:40 lifeless: yep. Which element is it in again (DIB presumably) 19:56:44 dprince: not sure if NeutronPublicInterfaceDefaultRoute would be appropriate 19:56:48 dprince: there is a problem with br-ex when hand executing devtest (https://bugs.launchpad.net/tripleo/+bug/1252304), but it should only affect floating ips 19:56:57 not pings of overcloud host 19:56:58 dprince: I'd add it to the fedora element TBH 19:57:12 rpodolyaka1; thanks, I did see your ticket earlier this week 19:57:25 lifeless; thanks 19:58:04 ok, we're out of time :) 19:58:12 thanks everyone! 19:58:14 #endmeeting