14:00:03 <shardy> #startmeeting tripleo
14:00:09 <ccamacho> o/
14:00:10 <openstack> Meeting started Tue Jun  7 14:00:03 2016 UTC and is due to finish in 60 minutes.  The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:13 <openstack> The meeting name has been set to 'tripleo'
14:00:20 <shardy> #topic rollcall
14:00:22 <EmilienM> o/
14:00:27 <shardy> Hey all, who's around?
14:00:31 <marios> \o
14:00:36 <dprince> hey
14:00:41 <slagle> hi
14:00:46 <skramaja> hi
14:00:48 <rohitpagedar__> Hi
14:00:52 <derekh> o/
14:00:53 <jdob> o/
14:01:06 <jokke_> o/
14:01:21 <pradk> o/
14:01:24 <shadower> hey
14:01:24 <trown> o/
14:01:50 <shardy> #link https://wiki.openstack.org/wiki/Meetings/TripleO
14:02:01 <shardy> #topic agenda
14:02:02 <shardy> * one off agenda items
14:02:02 <shardy> * bugs
14:02:02 <shardy> * Projects releases or stable backports
14:02:02 <shardy> * CI
14:02:04 <shardy> * Specs
14:02:06 <shardy> * open discussion
14:02:15 <shardy> Anyone have anything to add today?  We don't have any one-off items in the wiki
14:02:48 <rohitpagedar__> o/
14:02:52 <jrist> o/
14:03:10 <rbrady> o/
14:03:11 <adarazs> o/
14:03:13 <beagles> o/
14:03:21 <d0ugal> o/
14:03:38 <shardy> Ok then, lets get started
14:03:43 <shardy> #topic bugs
14:04:10 <jistr> o/
14:04:13 <saneax> o/
14:04:13 <shardy> So, I flipped all released bugs to Fix Released for n-1:
14:04:16 <shardy> https://launchpad.net/tripleo/+milestone/newton-1
14:04:34 <shardy> if folks can target bugs to n-2 from now on so we can track them that would be good
14:05:28 <shardy> #link https://bugs.launchpad.net/tripleo/+bugs?orderby=-id&start=0
14:05:39 <shardy> Anyone have anythink to highlight re bugs this week?
14:05:55 <shardy> I know we have various CI issues, but anything else we need to prioritize re bugs?
14:06:26 <shardy> I raised https://bugs.launchpad.net/tripleo/+bug/1589983 earlier, would be good if someone feels like picking that up
14:06:26 <openstack> Launchpad bug 1589983 in tripleo "tripleoclient leaks temporary files" [Medium,Triaged]
14:07:25 <weshay> \o
14:07:27 <shardy> beagles: how's your bug cleanup been going?
14:07:35 <myoung> o/
14:07:49 <shardy> I've been attempting to clean up old bugs too, a bunch of things are due to expire in the next 2 weeks, which is good
14:08:01 <shardy> and I've set all old Fix Committed things to Fix Released
14:08:10 <beagles> pretty well.. I didn't do much last week but got up to 1 year old bugs :)
14:08:23 <shardy> beagles: ack, well thanks for your help :)
14:08:42 <dprince> ls
14:08:45 <beagles> np.. I was just noticing on our current bug list that at least one is an RFE...
14:09:09 <beagles> what was the story with those... are bugs fine for that kind of thing
14:09:11 <shardy> beagles: that's OK, but they should be tagged spec-lite and marked as wishlist priority
14:09:18 <beagles> cool
14:09:48 <shardy> http://docs.openstack.org/developer/glance/contributing/blueprints.html#glance-spec-lite
14:10:03 <beagles> thanks
14:10:05 <shardy> we agreed a while back that folks could use a process similar to glance if they wish
14:10:18 <shardy> blueprints with/without specs as appropriate are also fine
14:10:35 <shardy> the main thing is we target things to the milestones so we have an idea on progress
14:10:45 <shardy> Ok anything else re bugs or shall we move on?
14:11:08 <shardy> #topic Projects releases or stable backports
14:11:27 <shardy> So, thanks very much to EmilienM and anyone else who helped get n-1 shipped last week :)
14:11:41 <trown> EmilienM++
14:11:41 <EmilienM> cool, I missed puppet release :P
14:11:45 <EmilienM> but no worries
14:12:12 <shardy> One thing I've seen discussed in the context of RDO is a tagging releases for stable branches
14:12:32 <shardy> does anyone have any thoughts on how best to handle that, now that stable-maint don't tag periodic releases?
14:12:42 <shardy> trown: any suggestions on preferences there?
14:13:07 <shardy> the issue I've heard is some folks want to consume RDO stable/mitaka which is the latest tagged release, not the heat of the stable branch
14:13:13 <slagle> we've done releases from stable branches before
14:13:13 <shardy> so they end up missing backported things
14:13:22 <trown> shardy: I wonder if we could just do stable milestone releases when we do the master milestone releases
14:13:25 <slagle> is it not just requesting a new release via openstack/releases?
14:13:36 <jokke_> shardy: In glance we have been trying to follow the idea that we tag when ever we merge something and it makes sense
14:13:52 <trown> shardy: assuming there is something new to release
14:13:56 <shardy> slagle: yup, I'm just wondering if we should have a plan re their period or other criteria which triggers semi-regular releases
14:14:06 <jokke_> as in we tag if something critical gets backported and otherwise we tag in bundles of few patches merged
14:14:33 <slagle> yea, i was more or less doing it on demand previously
14:15:01 <jokke_> shardy: that was discussed quite a bit when stable decided that no periodicals and basically the consensus was that anything worth of backporting should be worth of releasing
14:15:01 <shardy> jokke_: ack, yeah we could do that, but TripleO has quite a lot of repos, so it's somewhat easier to just do periodic releases of everything vs monitoring actively everywhere
14:15:32 <jokke_> shardy: isn't that what stable liaison is for ;)
14:16:36 <shardy> jokke_: Yeah, I think by default that's the PTL but if anyone else wants to step up to help I'm more than happy :)
14:16:57 <shardy> EmilienM: has offered to help as a release liason, so if anyone wants to share the load of proposing stable releases, that'd be good
14:17:00 <shardy> https://wiki.openstack.org/wiki/CrossProjectLiaisons
14:17:00 <EmilienM> yep
14:17:11 * coolsvap_mobile interested
14:17:18 <jokke_> shardy: let me get used to the codebase first, maybe next cycle ;)
14:17:21 <EmilienM> I'm happy to help with that, I'm already familiar with release management
14:17:28 <coolsvap_mobile> But I will need some help initially
14:17:58 <shardy> Ok, lets follow up in #tripleo - IMO it'd be best if we can spread the load around vs expecting one person to do everything
14:18:06 <shardy> given the number of repos/branches involved
14:18:25 <jokke_> I'm happy to help with any stable/release general questions so feel free to shoot so we don't overload the few carrying the responsibilities
14:19:02 <shardy> Ok, anything else on release/backports before we move on to CI?
14:19:49 <shardy> #topic CI
14:19:57 <EmilienM> I noticed stable/mitaka nonha job is broken https://review.openstack.org/#/c/324527/
14:20:05 <EmilienM> I haven't spent time to figure why yet
14:20:21 <EmilienM> but imho we should high-prio this work too
14:20:28 <shardy> EmilienM: yeah, was just about to mention that - I started looking at it yesterday but not yet figured out the problem
14:20:33 <shardy> EmilienM: can you raise a bug please?
14:20:46 <EmilienM> shardy: I will
14:20:46 <shardy> AFAICT pingtest is failing because nova can't connect to keystone
14:21:09 <EmilienM> #action EmilienM to investigate & file a bug to why stable/mitaka nonha job is failing
14:21:29 <shardy> So, more generally, CI has been broken by the move to centos7 slaves
14:21:45 <shardy> pabelanger: Do you have any update on the revert of that?
14:22:01 <EmilienM> I ran a recheck on a patch to see if CI is now working
14:22:11 <EmilienM> but not sure if something else needs to be updated
14:22:19 <pabelanger> no, in fact, I don't believe the issue is related to centos-7 DIBs. But more the exposure of actually failures running on centos-7.
14:22:27 <pabelanger> 2 issue
14:22:41 <pabelanger> 1) failure to connect to gearman servers. I don't have the ability to look into this
14:22:48 <fungi> the details in https://review.openstack.org/326182 were meager. is the suspected issue that your jobs currently don't work on centos 7 and you need time to port forward form centos 6?
14:23:05 <pabelanger> 2) openstack overcloud deploy takes 2+ hour (again no logs generated for this)
14:23:16 <dprince> pabelanger: I saw a few spurious gearman failures, but most of the issues were related to stack timeouts I think
14:23:26 <dprince> agree there may be multiple issues though
14:23:54 <pabelanger> To move forward, I think we need tripleo members to review the failures and see what is going on (and why no logs are produced)
14:24:49 <pabelanger> http://logs.openstack.org/11/326311/4/check-tripleo/gate-tripleo-ci-f22-nonha/208cda5/console.html
14:24:50 <trown> there was no way to test this in advance?
14:24:51 <dprince> using Centos 7 would be more ideal for TripleO. And pabelanger mentions that infra wants to soon deprecate the way TripleO builds up its jenkins image (rather would just use DIB directly)
14:25:01 <derekh> pabelanger: I wasn't in yesterday so havn't looked but on 2) sueually we get no logs if a job timesout
14:25:04 <pabelanger> gearman failures are also happening on fedora-22
14:25:05 <dprince> so we do want to support the effort to switch sooner rather than later
14:25:21 <derekh> pabelanger: as the last pard of the job is to copy logs off the nodes , this doesn't get executed
14:25:40 <derekh> pabelanger: what were the gearman failures?
14:25:50 <dprince> trown: I think we can test more in advance, yes
14:25:56 <EmilienM> derekh: why isn't it a jenkins publisher?
14:25:57 <pabelanger> derekh: okay, it should be easy to reproduce now in triploe-experimental. I think we need logs to see what is going on with deploy scripts
14:26:43 <pabelanger> trown: I did test in advance
14:26:45 <derekh> EmilienM: the publisher copies them to the log server, but the end of the tripleo jobs copies them onto the jenkins slave
14:27:03 <shardy> gear.NoConnectedServersError: No connected Gearman servers
14:27:11 <EmilienM> yeah pabelanger did experimental-jobs testing, it worked fine afict
14:27:14 <shardy> that doesn't sound like a deploy failure to me
14:27:30 <trown> pabelanger: how did it work then and not now?
14:27:38 <pabelanger> shardy: jobs are timing out after 2.5 hours
14:27:42 * bnemec recalls that the patch merged with 2 of the 3 experimental voting jobs failing
14:27:43 <fungi> we work around that in devstack-gate (i know you never got around to reimplementing under d-g) by having an inner timeout to kill the test payload with enough time to still reshuffle logs into the location the publisher expects
14:27:44 <trown> just trying to understand how we ended up with CI totally down
14:27:47 <bnemec> but I could be wrong.
14:27:51 <shardy> it looks like it's just getting stuck and not really running the job?
14:28:05 <derekh> shardy: seems like a problem with our infrastructure, especially is it was happening on fedora slaves too
14:28:33 <pabelanger> trown: because I believe the issue is not centos-7 but the infra we are deploying too. EG: gearmans servers and other external factors.
14:28:49 <derekh> fungi: yup, we should do something similar
14:29:01 <shardy> Ok, lets work together after the meeting and try to figure it out
14:29:24 <shardy> pabelanger: ack - I think it was the timing which made us want to revert the centos7 change, e.g it merged and everything broke around the same time
14:29:31 <pabelanger> What dprince mention is correct.  If we can all iterate on this today, we can likely fix centos-7 with a short amount of work
14:29:34 <derekh> pabelanger: shardy was the gear.NoConnectedServersError: error happening on fedora slaves also?
14:29:44 <pabelanger> derekh: it is now
14:29:52 <pabelanger> derekh: see the log I just linked
14:30:09 <derekh> pabelanger: ack, will take a look after this meeting
14:30:14 <dprince> pabelanger: yep, thanks for your work on this. fast tracking reverts is healthy too though :)
14:30:51 <shardy> Ok, shall we move on and continue discussion after the meeting?
14:30:59 <pabelanger> Sure, we are close. Just want to get us over the line
14:31:12 <shardy> #topic Specs
14:32:09 <shardy> So, in terms of features, we've got about 5 weeks until n-2
14:32:16 <shardy> #link https://launchpad.net/tripleo/+milestone/newton-2
14:32:36 <shardy> we've got 10 blueprints targetted, and we landed zero for n-1
14:32:52 <dprince> shardy: I would like to land remote execution in this timeline. I think it is useful by itself... and also will be helpful to bootstrap Ansible validations
14:33:14 <shardy> I'm wondering if we need to break some of the down, of if we're confident e.g we can get composable services, custom rols and the Mistral API landed in the next few weeks
14:33:50 <shardy> I've had some complaints that some (large) blueprints slipped into n-2 so we probably need to show incremental progress
14:33:58 <dprince> shardy: regarding the composable side of things. I'd like to see us focus on composable services first
14:34:14 <dprince> shardy: the composable roles bits can come later I think
14:34:48 <shardy> dprince: Yeah, although I think it's quite related - e.g we need to sync steps on all roles, or you can't move an e.g "step 4" service from say controller to compute
14:34:51 <EmilienM> dprince++
14:35:11 <dprince> shardy: for the Mistral workflow stuff I would very much like to see us cut over and start using Mistral for workflows vi python-tripleoclient ASAP
14:35:14 <shardy> there are a few ways to do that, but one way is to template in all the steps for the roles via jinja, which would be an initial step towards composable roles
14:35:41 <dprince> shardy: no need to take baby steps and maintain backwards compat via "python library" code I think
14:36:01 <dprince> shardy: in fact having to code paths for the python-tripleoclient to choose is actually worse I think.
14:36:11 <shardy> dprince: agreed, I'm just wondering if we should have a couple of mini-sprints where we just push and get folks to concentrate on landing one feature for a few days
14:36:39 <dprince> shardy: well, I think these are different groups of people. So they can co-exist
14:37:14 <shardy> dprince: Yeah, I'm just trying to avoid fragmentation of effort meaning $many of the 10 blueprints slipping into n-3
14:37:15 <dprince> shardy: composable services is coming along slowly but surely I think
14:37:35 <EmilienM> slowly mainly because CI issues
14:37:45 <EmilienM> a lot of work is already done or WIP
14:37:58 <dprince> yes, CI is slowing down composable work a bit
14:38:05 <shardy> Yeah, composable services is definitely progressing
14:38:30 <dprince> For Mistral I think we've got a nice spec. But we have just recently worked out some of the finer interface arrangements around the zaqar queue, etc.
14:38:37 <dprince> so there is still some design work ongoing there
14:39:01 <shardy> jistr: https://launchpad.net/tripleo/+milestone/newton-2 has three high priority BPs with no assignee
14:39:14 <shardy> can you get folks to assign themselves and set the delivery status?
14:39:33 <dprince> furthermore, I would like to see us focus on "parity" with the initial workflows... we are actually getting lots of good feedback from guys like dtansur for the baremetal.yaml workflows for example
14:39:36 <beagles> fwiw, I was able to spin up an l3 agent and metadata agent (required for DVR) with the existing merged code and a simple environment file .. but we need that step sync thing to keep that working
14:39:37 <shardy> I'll start deferring anything not obviously started fairly soon to avoid last-minute slips
14:39:49 * beagles blurts sometimes ;)
14:40:13 <shardy> beagles: you mean sync the steps between roles?
14:40:23 <dprince> feedback for future improvements I thinks. Rather than implement all of the improvements for workflows I would suggest capturing the new ideas in bugs and moving on
14:40:25 <jistr> shardy: will try to figure it out. Personally i'm presently tasked 100% downstream, will try to push to at least carve out time for reviews.
14:40:29 <marios> shardy: i thought it was already assigned to me so grabbing https://blueprints.launchpad.net/tripleo/+spec/overcloud-upgrades-workflow-mitaka-to-newton since am working on it
14:40:37 <marios> shardy: is for n2
14:41:04 <shardy> jistr: Ok, that's fine, anything not assigned will just get deferred to n-3 so it's clear it's at-risk
14:41:13 <beagles> shardy: I think so .. basically that the l3 and metadata agent roles do their thing at the proper step...
14:41:17 <dprince> beagles: once we decompose all the roles I think we can rework how -post.yaml's work to better accomidate syncing the steps
14:41:36 <beagles> shardy: or maybe I misunderstood what you meant
14:41:45 <dprince> beagles: we probably need to decompose it all first though so we can move things around more capably in t-h-t
14:41:51 <beagles> dprince: ack
14:42:13 <dprince> beagles: part of the vision I think... we just have to finish composable services first to get there
14:42:18 <beagles> I was just pretty jazzed to try it and see it kind of work this early in the game
14:42:20 <shardy> dprince: Yeah, I think we actually have to move away from -post per-role completely, because you can never sync resources between different nested stacks
14:42:30 <dprince> shardy: exactly, this was the idea
14:42:35 <jokke_> I have cycles, I just will need quite a bit guidance
14:42:57 <shardy> dprince: I'm trying an approach which might work which jinja templates all the steps, for each role, in overcloud.yaml
14:42:58 <jokke_> if it's storage related I have really easy to justify time for it
14:43:00 <dprince> shardy: I think this will be a top level overcloud.yaml architecture thing once we finish
14:43:07 <shardy> that would work, and fits well with the custom roles stuff I think
14:43:16 <shardy> dprince: +1
14:43:27 <shardy> good, sounds like we've reached the same conclusion :)
14:43:50 <shardy> Ok, we've kinda derailed from specs somewhat
14:43:52 <dprince> shardy: I'm less convinced the core team agrees on the Mistral workflow conclusions BTW
14:43:59 <shardy> anything else to raise re specs/features?
14:44:08 <dprince> shardy: seems to be some ambiguity about which code we are moving over to tripleo-common
14:44:27 * dprince wants the "library" concept for tripleo-common to be kept at a minimum
14:44:37 <shardy> dprince: that was why I mentioned a mini-sprint, I'm not sure there's lack of agreement, I just think many folks haven't even looked at it
14:45:01 <shardy> dprince: rbrady was talking about a tripleo-workflows repo to contain all the mistral pieces
14:45:09 <d0ugal> I think there is ambiguity about tripleo-common in general :)
14:45:14 <dprince> shardy: yep, I'm very much a fan of the separate repo too
14:45:16 <shardy> and I agree we should stop (ab)using tripleo-common to put all the things in
14:45:17 <trown> indeed
14:45:27 <trown> wrt ambiguity of tripleo-common
14:45:37 <shardy> I always thought it was weird we decided to put API stuff in there
14:45:38 <dprince> shardy: that said, I'd rather see us cut over and actually use a Mistral workflow for something sooner rather than later
14:45:52 <dprince> shardy: all this talk... and we still aren't actually using it yet
14:46:11 <shardy> dprince: agreed, Ok lets try to get some eyes on it then
14:46:37 <shardy> #topic open discussion
14:46:51 <shardy> I suppose we've already started open discussion, but anything else folks want to raise?
14:47:08 <trown> just something semi-related to the above
14:47:21 <trown> do we intend to present image building through a mistral workflow?
14:47:27 <slagle> just wanted to mention i pushed some patches for multinode ci jobs via nodepool
14:47:33 <trown> or is that a case where we use tripleo-common as a library
14:47:37 <slagle> i'm trying to move forward with that and see how it goes
14:47:38 <d0ugal> trown: I don't think that is a candidate for the first pass
14:47:41 <EmilienM> slagle: w00t, links?
14:47:44 <dprince> trown: meh, I mean... that is a wierd one I think
14:47:57 <shardy> slagle: nice!
14:47:59 <trown> d0ugal: cool, I think it shouldn't be a mistral workflow fwiw
14:48:03 <dprince> trown: if the UI requires it... then maybe. But I'd rather see the UI just require it
14:48:04 <slagle> EmilienM: https://review.openstack.org/#/q/topic:tripleo-multinode
14:48:08 <shardy> trown: IMO it's not something we should consider initially
14:48:13 <d0ugal> trown: Yeah, I wasn't sure but I've not really thought about it yet.
14:48:14 <dprince> trown: rather the UI just require images exist
14:48:19 <shardy> lets just get the basic deployment workflow working first
14:48:23 <d0ugal> dprince: +1
14:48:25 <dprince> trown: building images via Mistral would be an abuse
14:48:42 <trown> dprince: ya that was my thought as well, just wanted to check, because tripleo-common is pretty ambiguous :)
14:49:17 <EmilienM> slagle: just an FYI, https://review.openstack.org/#/c/326095/ needs to be rebased
14:49:30 <jokke_> just comment from the fresh eyes trying to wrap my head around tripleO ... the current amount of repos is already quite a handful trying to get hold of
14:49:31 <dprince> shardy: nice on the multinode work
14:49:48 <jokke_> Just saying, please don't introduce too many new one just because we can
14:49:57 <trown> context: there are a couple folks looking at updating tripleoclient to use the yaml based image build library in tripleo-common, and I didnt want to steer them wrong
14:50:09 <dprince> jokke_: ack, I think the goal is to eventually drop some... like perhaps tripleo-image-elements
14:50:21 <slagle> EmilienM: yes there's actually other issues with it since nodepool is installed from pypi in the jenkins jobs, so my needed depends-on isn't pulled in
14:50:27 <EmilienM> slagle: nice work anyway, thanks
14:50:39 <dprince> trown: I'm okay for the image building YAML's to live in tripleo-common for now
14:50:41 <slagle> EmilienM: so i'll have to rework it a bit to get around that initially
14:50:53 <dprince> trown: perhaps not where we'll have it long term but it is okay
14:51:12 <EmilienM> pabelanger: just an FYI to put this in your radar: https://review.openstack.org/#/q/topic:tripleo-multinode
14:51:34 <shardy> talking of reducing repos, it'd be good to figure out what needs to be moved so we can retire tripleo-incubator
14:52:08 <shardy> The outdated README in that comes up above tripleo-docs in a google search :(
14:52:11 <trown> shardy: is anything still using it?
14:53:01 <shardy> trown: I thought a couple of the scripts were still used in CI but I may be wrong
14:53:11 <dprince> shardy: did we ever land your patch to switch to t-ht- driving os-net-config?
14:53:14 <shardy> perhaps I'll push a patch deleting everything and see what happens ;)
14:53:16 <dprince> shardy: via software deployments?
14:53:24 <shardy> dprince: No, I need to rebase it
14:53:30 <shardy> landing that would be good though
14:53:35 <shardy> I'll rebase it today
14:53:40 <dprince> shardy: I gave you some -1's for nits but I very much like that idea
14:53:52 <bnemec> +1
14:53:54 <pabelanger> EmilienM: thanks, just left a comment on 326095
14:53:57 <shardy> dprince: ack, thanks, I'll address those and hopefully we can get it passing CI
14:54:42 <bnemec> shardy: Note that there was just a patch posted related to the os-net-config script: https://review.openstack.org/#/c/326511
14:54:52 <bnemec> Looks like a valid bug, so we should make sure we don't lose the fix.
14:55:04 <EmilienM> slagle: please let me know if you need help with this work & bindep stuffs
14:55:17 <shardy> bnemec: thanks, I'll rebase on top of that
14:55:25 <bnemec> Cool
14:55:58 <bnemec> Oh, we're still tracking ci-admins in incubator too.
14:56:27 <shardy> bnemec: Yeah I thought there was some stuff like that
14:56:41 <shardy> it'd be good to move it to tripleo-ci then reture -incubator IMO
14:56:48 <slagle> pabelanger: thanks, i'll have a look at the bindep/jjb stuff
14:56:49 <bnemec> +1000
14:57:39 <shardy> Ok, anything else or shall we declare EOM?
14:58:04 <dprince> shardy: thanks for hosting
14:58:04 <shardy> Thanks all!
14:58:13 <shardy> #endmeeting