19:01:03 <lifeless> #startmeeting tripleo
19:01:05 <openstack> Meeting started Tue Apr 29 19:01:03 2014 UTC and is due to finish in 60 minutes.  The chair is lifeless. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:09 <openstack> The meeting name has been set to 'tripleo'
19:01:13 <derekh> yo
19:01:15 <SpamapS> some projects took this week off for meetings.
19:01:23 <greghaynes> O/
19:01:32 <marios> SpamapS: yeah, neutron did
19:01:39 <jdob> i think heat did too
19:01:48 <GheRivero> o/
19:01:53 * marios looks at slavedriver lifeless
19:02:07 <marios> :)
19:02:17 <lsmola2> hello
19:02:22 <lifeless> slackers
19:02:48 <lifeless> #topic agenda
19:02:49 <lifeless> bugs
19:02:49 <lifeless> reviews
19:02:49 <lifeless> Projects needing releases
19:02:50 <lifeless> CD Cloud status
19:02:51 <lifeless> CI
19:02:54 <lifeless> Insert one-off agenda items here
19:02:57 <lifeless> open discussion
19:02:59 <lifeless> #topic bugs
19:03:07 <lifeless> #link https://bugs.launchpad.net/tripleo/
19:03:07 <lifeless> #link https://bugs.launchpad.net/diskimage-builder/
19:03:07 <lifeless> #link https://bugs.launchpad.net/os-refresh-config
19:03:07 <lifeless> #link https://bugs.launchpad.net/os-apply-config
19:03:07 <lifeless> #link https://bugs.launchpad.net/os-collect-config
19:03:09 <lifeless> #link https://bugs.launchpad.net/tuskar
19:03:12 <lifeless> #link https://bugs.launchpad.net/python-tuskarclient
19:03:44 <lifeless> rpodolyaka1: https://review.openstack.org/#/c/88597/ needs a rebase
19:03:55 <rpodolyaka1> lifeless: ack
19:04:08 <lifeless> I think we're fully triaged. yay bots
19:04:16 <lifeless> now, what about criticals
19:04:18 <marios> been looking at https://bugs.launchpad.net/tripleo/+bug/1290486 after comparing notes with tchaypo today, seems not so easy to reproduce for f20 envs. updated ticket. will investigate some more tomorrow
19:04:55 <lifeless> slagle: https://bugs.launchpad.net/os-collect-config - two criticals there I thought the release last week would have closed ?
19:04:58 <tchaypo> Thanks marios
19:05:22 <lifeless> slagle: or was that the project SpamapS ninjad? and if so... SpamapS did you skip part of the proces?
19:06:11 * lifeless pauses for SpamapS / slagle  :)
19:06:58 <SpamapS> lifeless: sorryrry whasorry I'm having internet issues
19:07:14 <lifeless> 07:04 < lifeless> slagle: https://bugs.launchpad.net/os-collect-config - two criticals there I thought
19:07:17 <lifeless> the release last week would have closed ?
19:07:20 <lifeless> 07:04 < tchaypo> Thanks marios
19:07:22 <lifeless> 07:05 < lifeless> slagle: or was that the project SpamapS ninjad? and if so... SpamapS did you skip part
19:07:25 <lifeless> of the proces?
19:07:58 <lifeless> SpamapS: which project did you release last week ?
19:08:24 <slagle> i don't think i released occ
19:08:46 <lifeless> then SpamapS - I'll leave it with you to resolve whether those bugs are meant to be closed
19:08:50 <lifeless> and keep the meeting moving
19:08:52 <SpamapS> lifeless: ninja release did not close the bugs. I closed them now.
19:09:02 <SpamapS> ahhh.. internet storm passed
19:09:12 <SpamapS> lifeless: os-collect-config is the one I released
19:09:20 <lifeless> https://bugs.launchpad.net/diskimage-builder/ has one critical, I've pushed a fix (I believe).
19:09:40 <lifeless> and https://bugs.launchpad.net/tripleo/ has 8 criticals
19:09:43 <slagle> so my fixes for https://bugs.launchpad.net/tripleo/+bug/1270646 have merged
19:09:49 <slagle> do you want to close that one?
19:10:02 <slagle> i don't know what else to do for it at this point
19:10:07 <lifeless> slagle: thats the workaround by advertising mtu ?
19:10:12 <slagle> yes
19:10:36 <lifeless> slagle: we should close the tripleo task then. Much as I have a huge philosophical issue with the approach, its not our call
19:10:43 <slagle> there was a fix to tie and tht that have merged
19:10:46 <slagle> lifeless: ok
19:10:54 <lifeless> slagle: as in, its an upstream ovs root cause thing
19:11:04 <lifeless> and ranting and railing over here won't help
19:11:44 <lifeless> I believe 1272803 was biting SpamapS yesterday
19:12:14 <lifeless> I think we can close https://bugs.launchpad.net/tripleo/+bug/1280941
19:12:16 <derekh> lifeless: I dropped the ball on 1272803 , picking it back up now
19:12:17 <SpamapS> lifeless: yeah, derekh confirmed that and said he had a plan
19:12:27 <SpamapS> oh there you are :)
19:12:30 <lifeless> derekh: cool
19:12:49 <lifeless> I see slagle has patches open for 1287453
19:13:07 <lifeless> and there is a fun discussion about keystone v3 on the list related to it
19:13:16 <lifeless> I hopefully roped morganfainberg_Z into that
19:13:22 <lifeless> have not had time to check this morning
19:13:28 <slagle> yea, well, i marked my patches WIP
19:13:35 <slagle> since os-cloud-config already has the code as well
19:13:51 <slagle> makes sense to just get that ready and switch to that
19:13:56 <lifeless> yup
19:14:31 <lifeless> 1293782 - I don't believe stevenk is working on that right now, and we have a workaround in place. shall we unassign and drop to high ?
19:14:49 <lifeless> since the defect isn't 'cloud broken', its 'cloud slow'
19:15:33 <lifeless> rpodolyaka1 has a fix for 1304424 that I noted needs a rebase above
19:15:47 <slagle> fine by me on 1293782
19:15:48 <lifeless> derekh: what about 1308407 is it still a thing ?
19:15:48 <slagle> will update
19:16:15 <SpamapS> wait
19:16:19 <SpamapS> os-cloud-config has the heat domain code?
19:16:28 <SpamapS> if so I have some patches that _I_ need to WIP :)
19:16:30 <SpamapS> or even abandon
19:16:48 <rpodolyaka1> lifeless: done
19:16:49 <slagle> now you're making me second guess :)
19:17:01 <derekh> lifeless: yup, 1308407 is still a thing, still waiting on reviews
19:17:05 <lifeless> SpamapS:
19:17:19 <lifeless> SpamapS: have a look in it, and its review queue :)
19:17:33 <lifeless> https://bugs.launchpad.net/tripleo/+bug/1306596 has an abandoned patch
19:17:46 <lifeless> Ng: do you know cial's IRC handle ?
19:17:48 <SpamapS> lifeless: will do!
19:18:01 <lifeless> Ng: I mean Cian
19:18:19 <slagle> SpamapS: yea, it's in http://git.openstack.org/cgit/openstack/os-cloud-config/tree/os_cloud_config/keystone.py
19:19:13 <lifeless> ok no comments re https://bugs.launchpad.net/tripleo/+bug/1280941 so closing it
19:19:23 <Ng> lifeless: hmm, no
19:19:30 <SpamapS> slagle: cool! I missed that.
19:20:04 <lifeless> Ng: can I ask you to ping them about that review, since its a critical bug...
19:20:16 <lifeless> Ng: and while its abandoned noone else can tweak it
19:20:40 <lifeless> rpodolyaka1: prbably want to toggle in-progress to tiraged for that bug too
19:21:10 <lifeless> ok any other bugs business?
19:21:23 <lifeless> fav bug? left by wayside bug ?
19:21:26 <rpodolyaka1> lifeless: done
19:22:23 <Ng> lifeless: k
19:22:44 <lifeless> #topic reviews
19:22:52 <lifeless> Ng: they just need to un-abandon it
19:23:00 <lifeless> http://russellbryant.net/openstack-stats/tripleo-openreviews.html
19:23:02 <Ng> lifeless: yup, composing a mail now
19:23:05 <lifeless> http://russellbryant.net/openstack-stats/tripleo-reviewers-30.txt
19:23:27 <bnemec> Stats are having issues since the Gerrit upgrade.
19:23:30 <derekh> lifeless: I believe ^^ isn't getting updated since the gerrit upgrade
19:23:35 <lifeless> ugh
19:23:43 <lifeless> still, one day old will be reasonably indicative
19:23:49 <lifeless> ah, nothing.
19:23:53 <lifeless> *real* issues.
19:24:02 <bnemec> Yeah, openreviews is completely dead.
19:24:05 <lifeless> rustlebee: do you need a hand w/that ?
19:24:17 <bnemec> lifeless: He has a new baby. :-)
19:24:23 <SpamapS> http://www.stackalytics.com/report/contribution/tripleo-group/30
19:24:42 <SpamapS> http://www.stackalytics.com/report/reviews/tripleo-group/open
19:24:59 <SpamapS> lifeless: ^^
19:25:09 <marios> weird. the stackalytics one seems less updated for me
19:26:04 <lifeless> the stackalytics stuff is subtly different, no? completely different codebase
19:26:33 <bnemec> Yeah
19:27:24 <marios> lifeless: no idea. just pointing out the differenc e(i.e. if russel's stats are bad after gerrit upgrade, stackalytics isn't fairing better
19:27:50 <SpamapS> lifeless: they've tended to be close enough that the variation is not statistically significant in my experience
19:27:53 <lifeless> marios: was reposnding to SpamapS - and yes, agree with you
19:27:58 <lifeless> anyhow
19:27:59 <SpamapS> lifeless: both show the trends and activity
19:28:06 <lifeless> clearly we're not in good shape
19:28:39 <SpamapS> I've been focusing on making CI more healthy rather than doing reviews
19:28:46 <SpamapS> with the idea that healthy CI enables reviews :)
19:29:10 <lifeless> SpamapS: if you do three reviews a day, thats the commitment benchmark, and then its all gravy from there :)
19:29:18 <SpamapS> Well yeah I'm still hitting that :)
19:29:30 <lifeless> ok so
19:29:40 <lifeless> I see a bunch of passthrough candidates
19:29:53 <lifeless> but indeed we've had a lot of CI issues - different section to the meeting
19:30:00 <lifeless> any proposals around reviews?
19:30:02 <SpamapS> we have 11 cores that are not though
19:30:11 <bnemec> Ooh, more passthroughs for me to -2? :-)
19:30:14 <lifeless> and any omg I need X reviewed  requests ?
19:30:28 <lifeless> bnemec: more that they haven't gone from the system yet
19:30:33 <lifeless> bnemec: so we still see them in stats
19:30:35 <marios> lifeless: from quick look seems we still haven't -2 all the things wrt 'config foo'
19:30:49 <bnemec> Oh, I mostly look at the stats since last -1 or -2.
19:31:19 <greghaynes> hrm, I tried to do a quick scan for those a while ago... maybe more have been added since
19:31:24 <lifeless> marios: yeah - or perhaps they are things we should be modelling which is some of them
19:31:34 <lifeless> bnemec: yeah, stackalytics isn't showing that
19:31:43 <lifeless> ok, moving on in a sec
19:32:01 <lifeless> #topic
19:32:02 <lifeless> Projects needing releases
19:32:07 <lifeless> #topic Projects needing releases
19:32:23 <lifeless> Do we have a selfless volunteer?
19:32:27 <SpamapS> IIRC there was some problem with os-apply-config 0.1.15 not reaching pypi
19:32:44 <SpamapS> Not sure what that problem was tho
19:32:48 <SpamapS> 0.1.14 is still the max version
19:32:54 <rpodolyaka1> lifeless: I haven't done releases for a while :)
19:33:05 <lifeless> ok, can the volunteer get that fixed too ?
19:33:08 <lifeless> rpodolyaka1: \o/
19:33:12 <rpodolyaka1> ack
19:33:18 <lifeless> #action rpodolyaka1 volunteered to release the world
19:33:28 <lifeless> #topic
19:33:33 <lifeless> #topic CD Cloud status
19:33:47 <SpamapS> heh
19:34:09 <lifeless> AFAICT the underclouds and CI clouds in both regions are happy ?
19:34:17 <SpamapS> so I'm attacking the list of bad/missing/w'ever machines in the HP region with JIRA tickets for our DC ops.
19:34:19 <lifeless> with the exception of mellanox being mellanox
19:34:27 <lifeless> SpamapS: \o/
19:34:47 <SpamapS> My intention is to resurrect tripleo-cd, which needs a couple more machines avaiable. I think we have 3 working, and we'll need 5 if we ever get to HA :)
19:35:13 <derekh> lifeless: things seem to be plodding along ci wise http://goodsquishy.com/downloads/s_tripleo-jobs.html
19:35:21 <lifeless> #topic CI
19:35:27 <derekh> try not to be dazzeled by the pretty colors
19:35:27 <lifeless> derekh: you were saying :)
19:35:47 <SpamapS> Anyway, if we can get tripleo-cd up, we can then use the images from that to update ci-overcloud (as in, R1) to trusty and that should eliminate the mellanox fails.
19:35:52 <derekh> R1 overcloud jobs are running 30 slower then R2
19:36:05 <derekh> but I don't think its the spec of the machines
19:36:15 <lifeless> derekh: linked that to the meetings page
19:36:15 <dprince> derekh: that is 30 minutes right?
19:36:41 <SpamapS> we are down two compute nodes in R1 right now
19:36:46 <derekh> quit a lot of the R1 jobs spend 20 minutes waiting on the testenv , so we have a but there somewhere I need to track down
19:36:52 <lifeless> SpamapS: you know you can just build the images directly right - ci-overcloud is deployed by devtest_overcloud.sh
19:36:57 <derekh> but -> bug
19:37:15 <SpamapS> lifeless: I do. But.. actual.. tested images.. would be amazing. :)
19:37:20 <lifeless> SpamapS: :)
19:37:24 <derekh> dprince: yup
19:37:33 <lifeless> derekh: could it be we have more slaves than testenvs ?
19:37:36 <SpamapS> I'll give up on it in a couple of days and just jam new images onto ci-overcloud if I can't get a working tripleo-cd
19:37:45 <lifeless> ok
19:38:03 <derekh> lifeless: I thought that was a possibility but it seems to be consistently 20 minutes
19:38:10 <lifeless> is R1 HP ?
19:38:22 <derekh> lifeless: yup R1 == HP
19:38:50 <derekh> lifeless: so I tried to rebuild 3 TE hosts today to confirm and they went to error state
19:38:52 <lifeless> yesterday I saw lots of spurious failures on seed node bringup
19:39:24 <lifeless> I'd love to migrate to Ironic in that rack
19:39:35 <lifeless> its so close to being 'there' for that
19:40:29 <derekh> lifeless: anyways in summary, I have 2 issues to look into that are currently causing a time difference between the 2 racks
19:41:22 <lifeless> 20m on the testenv
19:41:25 <lifeless> whats the second one ?
19:41:30 <lifeless> second issue I mean
19:41:47 <derekh> heat stack-create overcloud
19:41:56 <derekh> take 10 minutes longer (at least) on R1
19:41:58 <SpamapS> lifeless: do we have patches available for preserve ephemeral?
19:42:10 <derekh> I'm guessing the ovs brige needs tweaking
19:42:13 <lifeless> SpamapS: shrews has one up for the rebuil side of it
19:42:19 <derekh> lifeless:  ^
19:42:20 <tchaypo> R1 is HP/ubuntu, R2 is redhat?
19:42:28 <derekh> tchaypo: yup
19:42:32 <lifeless> SpamapS: I'm not sure the driver maps the ephemeral siz across yet either. need to double check
19:42:42 <lifeless> derekh: *interesting*
19:43:07 <SpamapS> sounds like we need to do some measurement
19:43:32 <SpamapS> derekh: do the RH machines have battery backed write cache?
19:43:46 <derekh> SpamapS: no idea
19:43:53 <lifeless> SpamapS: we kvm in unsafe-cache mode
19:44:04 <SpamapS> lifeless: but we don't build images in vms
19:44:12 <SpamapS> lifeless: and we do eventually commit it all to disk
19:44:12 <lifeless> SpamapS: yeah we do
19:44:23 <SpamapS> oh I thought the images happened in the testenvs
19:44:30 <lifeless> SpamapS: though we don't build VMs in *those* Vms
19:44:32 <SpamapS> ok then _harumph_
19:44:36 <lifeless> SpamapS: we build images in jenkins slaves.
19:44:41 <lifeless> SpamapS: in the ci-overcloud
19:44:41 <SpamapS> right
19:44:43 <derekh> kvm unsafe should be on both regions
19:44:49 <SpamapS> understanding clicking back into place
19:45:06 <SpamapS> yeah so network probs seem more likely than anything else.
19:45:08 <lifeless> for the testenvs - dprince was looking at doing a nova feature to let us control that from the API
19:45:21 <derekh> this isn't building images, the time difference is from nova boot too  bm-deploy-help /POST
19:45:35 <lifeless> yeah
19:45:50 <SpamapS> might we also have divergent undercloud configurations?
19:46:08 <derekh> SpamapS: I'd say wildly different
19:46:13 <lifeless> SpamapS: not relevant in this context AFAICT though...
19:46:30 <lifeless> SpamapS: more relevant might be that the host OS for the cloud is f20 vs ubuntu
19:46:44 <derekh> anyway I'll get more info once I carve myself away a host to work with
19:46:58 <SpamapS> nova boot to bm-deploy-helper is all handled on the local network on the testenvs, right?
19:47:02 <lifeless> yup
19:47:20 <lifeless> I think the testenvs in region2 are also f20 based
19:47:21 <SpamapS> we should, if nothing else, give trusty testenvs a try and measure that.
19:47:30 <derekh> lifeless: yes they are
19:47:49 * lifeless is a little sad at the variation there - just having more variables
19:47:52 <SpamapS> Entirely possible something magical happened to local networking in 3.13 :-P
19:47:58 <derekh> SpamapS: I'm all for that, if its still a problem we can dig further
19:48:05 <lifeless> the clouds we have an entirely good reason to want variation - its product!
19:48:22 <lifeless> ok, so
19:48:31 <lifeless> #topic open discussion
19:49:06 <lifeless> dprince: SpamapS: you guys were having an animated discussion in #tripleo
19:49:40 <SpamapS> Yes, let me see if I can summarize
19:49:43 <dprince> lifeless: yes, I'm sending a mail to the list about that.
19:49:51 <SpamapS> * We have infrastructure for our development efforts
19:50:04 <SpamapS> * Currently it is all over the place, monkey-patched and hand-deployed (at least, in R1)
19:50:19 <SpamapS> * We should collaborate on this infrastructure.
19:50:53 <SpamapS> * How we do that, and how it relates to openstack-infra, are all open questions.
19:52:28 <SpamapS> dprince: ^ agree?
19:53:01 <dprince> SpamapS: yes, specifically I was thinking about mirrors myself this weekend and how to go about setting them up in both racks.
19:53:43 <SpamapS> Right, so my thinking is, use diskimage-builder elements, os-*-config, and heat.
19:53:44 <dprince> SpamapS: and I was a bit surprised that you guys already have an Ubuntu mirror in the HP rack. We'll need one in the Red Hat rack as well... so how to do that?
19:54:04 <lifeless> dprince: the apt-mirror element
19:54:24 <SpamapS> lifeless: debian-mirror
19:54:25 <dprince> SpamapS: My initial thought was we are providing our CI overcloud as a resource to openstack-infra (i.e. they already run slaves there...)
19:54:28 <lifeless> dprince: + a tiny apache fragment, I *think* I pushed a WIP with it in it
19:54:29 <SpamapS> which is probably.. the wrong name
19:54:33 <lifeless> SpamapS: uh, yeah.
19:54:33 <dprince> SpamapS: So why not mirrors too?
19:55:17 <dprince> But the larger question to me was who is responsible for these things, thinking ahead to the fact that we want to be a gate...
19:55:45 <SpamapS> Well openstack-infra will already be maintaining mirrors for the supported OS's to support devstack.
19:55:53 <lifeless> SpamapS: they aren't
19:55:56 <SpamapS> they aren't?
19:55:59 <lifeless> SpamapS: they aren't
19:56:02 <SpamapS> So does it fall on cloud provider?
19:56:04 <derekh> I'm starting to thing there I soo many thing we *could* mirror we should instead start with a caching/proxy in each rack wait a few days and then see whats still hurting us
19:56:04 <SpamapS> (that's good IMO)
19:56:17 <dprince> I can't answer this question myself. I have my opinion (which would probably be to use TripleO tools for it BTW). But I'm also trying to be mindful of it.
19:56:22 <lifeless> SpamapS: today yes, since they can't use glance even, anyhow. Thats changing though.
19:56:28 <SpamapS> lifeless: ah
19:56:30 <dprince> mindful of infra rather
19:56:48 <lifeless> FWIW I've had a number of inconclusive discussions about this (nearly all in -infra channel)
19:56:51 <dprince> derekh: exactly.
19:57:01 <dprince> derekh: my take was lets run a public mirror https://review.openstack.org/#/c/90875/
19:57:08 <dprince> derekh: and then squid it up in our Racks
19:57:20 <SpamapS> ok
19:57:30 <SpamapS> So a thread is likely going to be more productive. :)
19:57:32 * dprince has a squid element he will push today
19:57:43 <lifeless> the issues as dprince says are about gating requirements, responsibility, availability etc
19:57:44 <dprince> SpamapS: my thought as well
19:58:03 <SpamapS> dprince: wow, I'm actually shocked there's no squid element already. :)
19:58:39 <dprince> SpamapS: I think we need/want both. Mirros are good for stability.
19:58:50 <dprince> SpamapS: squid is good at the local caching bits
19:58:50 <SpamapS> yes
20:00:05 <SpamapS> Ok, endmeeting?
20:00:16 <lifeless> #endmeeting