#openstack-meeting-alt log

14:00:15 <shardy> #startmeeting tripleo
14:00:16 <openstack> Meeting started Tue Aug  2 14:00:15 2016 UTC and is due to finish in 60 minutes.  The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:20 <openstack> The meeting name has been set to 'tripleo'
14:00:21 <shardy> #topic rollcall
14:00:25 <dprince> hi
14:00:26 <EmilienM> o/
14:00:28 <shadower> hey
14:00:28 <beagles> o/
14:00:29 <skramaja> hello
14:00:30 <shardy> Hi all, who's around?
14:00:35 <marios> o/ here for first half have call later
14:00:47 <d0ugal> -o/
14:00:47 <ccamacho> Hello guys! o/
14:00:48 <jtomasek> o/
14:00:50 <saneax> o/
14:01:13 <pabelanger> morning
14:01:16 <weshay> o/
14:01:19 <thrash> o/
14:01:21 <jpich> o/
14:01:24 <adarazs> o/
14:01:26 <bandini> o/
14:01:31 <sshnaidm> o/
14:01:34 <radek_> o/
14:02:02 <shardy> Ok then, let's get started :)
14:02:03 <bswartz> ¬_¬
14:02:07 <shardy> #topic agenda
14:02:09 <panda> \o
14:02:16 <shardy> #link https://etherpad.openstack.org/p/tripleo-meeting-items
14:02:28 <shardy> * one off agenda items
14:02:28 <shardy> * bugs
14:02:28 <shardy> * Projects releases or stable backports
14:02:28 <shardy> * CI
14:02:28 <shardy> * Specs
14:02:30 <shardy> * open discussion
14:02:38 <shardy> Anyone have anything to add to the one-off items?
14:03:15 <shardy> Ok, we now have two (thanks bandini!) feel free to add more and we can start
14:03:22 <bandini> :)
14:03:26 <shardy> #info  deprecating ServiceNetMap keys?
14:03:40 <sshnaidm> shardy, added about promotion script as well
14:03:56 <shardy> So, I've been experimenting with ways to generate things like the $service_ips list, ref https://review.openstack.org/#/c/348974/
14:04:01 <shardy> sshnaidm: ack, thanks
14:04:26 <shardy> And one of the barriers is that the keys for ServiceNetMap do not align with the service_name in the service templates
14:04:59 <shardy> I've got a scary yaql hack in there atm which semi works around it, but ideally I'd like to deprecate the current e.g RabbitMqNetwork
14:05:04 <shardy> and instead have rabbitmq_network
14:05:14 <shardy> where the key must align with service_name
14:05:22 <shardy> What are folks thoughts on that?
14:05:38 <Jokke_> o/
14:05:44 <shardy> clearly, we'll need to maintain a translation (either in mistral or t-h-t) to support the old keys for a transitional period
14:05:50 <shardy> since this is a very user-visible interface
14:06:24 <shardy> I would prefer to do the translation in mistral to simplify t-h-t, but there are upgrade implications we'll need to address there
14:06:50 <dprince> shardy: I'm okay with the approach. When we named the ServiceNetMap keys we didn't realize these would need to be standardized
14:07:36 <shardy> dprince: Ok, cool - yeah clearly it was unforseen, I'm just trying to figure out the best path forward
14:07:47 <hrybacki> o/
14:08:24 <shardy> Ok, well unless folks have strong objections, I'll implement a mistral patch and we can continue discussion on the review
14:08:31 <dprince> shardy: sounds good to me
14:08:35 <d0ugal> +1
14:08:43 <shardy> it will depend on getting the mistral based deployments working, which I've been debugging today
14:09:01 <marios> shardy: thanks for highlighting that review, i haven't seen it before so hard to comment immediately but i'd like to look at it tomorrow. i'm not clear where service_ips is coming from now but haven't looked long there
14:09:12 <shardy> #info (bandini) Will just give a short update on HA NG
14:09:19 <bandini> shardy: from a high-level POV do we plan to implement the upgrades via a mistral workflow (starting from M->N)?
14:09:48 <marios> bandini: you mean overcloud upgrade?
14:09:53 <shardy> marios: it's a way to auto-generate this mapping https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud.yaml#L911
14:09:55 <bandini> marios: yes
14:10:22 <marios> bandini: not that i've heard of, at least unless i missed it. the major expected change is rework for composable upgrades but that is newton to ocata
14:10:26 <dprince> bandini: I would like to see us consider this as an option and make it composable
14:10:32 <bandini> or are we keeping the current multiple manual steps with specialized heat templates around?
14:10:50 <shardy> I think we're likely to run out of time for composable upgrades this cycle
14:10:50 <bandini> dprince, marios: ack, was just curious if anything was decided there already. thanks
14:10:57 <shardy> my assumption is we'll do it during mitaka
14:11:06 <shardy> sorry ocata :)
14:11:07 <marios> bandini: right, my understanding is, mitaka to newton current workflow. newton is where all the things are composable. so newton to ocata is when we have composable upgrades
14:11:09 <bandini> ocata ;)
14:11:15 <shardy> -ENOCOFFEE
14:11:15 <marios> shardy: heh...
14:11:20 <bandini> ack got it thanks
14:11:40 <shardy> bandini: so, NG HA is pretty much ready based on your ML thread?
14:11:47 <shardy> we just need reviews and testing?
14:11:55 <marios> shardy: i think that gives us summit as good place to discuss that.. finally hav ethat discussion i mean bout the composable and how/what/ansible/mistral etc
14:11:59 <marios> but i digress
14:12:04 <bandini> so I sent a short HA NG update on the other night. Basically after aodh goes in we should be good to go (patch is extremely tiny at this point)
14:12:18 <shardy> marios: Yeah, it'll give us some more time to do prototyping and consider the best approach
14:12:35 <bandini> we had some help with testing from the QE HA folks so in terms of failovers things look fairly solid so far
14:12:46 <shardy> bandini: Ok, sounds good, thanks for the update!
14:13:02 <shardy> what state is the aodh patch in?
14:13:03 <bandini> so yeah I'd love to merge it sooner rather than later so we have time work out any kinks we have not seen
14:13:27 <bandini> the aodh stuff is looking better, we still need to fix one last issue
14:13:34 <bandini> pradk and I are working on that
14:13:40 <bandini> that's all from my side ;)
14:13:52 <shardy> bandini: Ok, thanks, well ping us when it's ready and passing CI, then we can hopefully get it landed
14:14:02 <bandini> shardy: yes, sir! thanks
14:14:15 <shardy> #info does promotion script work? (sshnaidm)
14:14:27 <shardy> so, slagle or dprince maybe you can help with this
14:14:41 <shardy> it seems the periodic jobs passed yesterday, but no promote happened for current-tripleo
14:14:53 <shardy> can we see the cron job logs to understand why?
14:15:00 <slagle> yes, i'll check
14:15:05 <slagle> this happened last week too
14:15:19 <slagle> turned out the mirror server had run out of space, so derekh had to clean some things out
14:15:32 <slagle> we may have forgotten to re-enable the cron job after that
14:15:33 <shardy> Ok, hopefully it's another similar issue
14:15:38 <shardy> lol :)
14:16:00 <shardy> Ok, thanks, well we can follow up after the meeting on that
14:16:01 <sshnaidm> slagle, where does it run?
14:16:12 <shardy> sshnaidm: thanks for highlighting the issue
14:16:16 <slagle> sshnaidm: the mirror server in rh2
14:16:33 <bnemec> Looks like it's commented out in crontab
14:16:42 <slagle> if you do a nova list, you will see a server called mirror-server
14:16:52 <slagle> bnemec: ok, so yea, derekh probably forgot to remove that :)
14:17:03 <shardy> k, sounds like an easy fix :)
14:17:24 <shardy> Any other one-off items before we move on?
14:18:21 <shardy> #topic bugs
14:18:27 <shardy> #link https://bugs.launchpad.net/tripleo/+milestone/newton-3
14:18:40 <shardy> So, we've got under a month until n-3
14:19:00 <shardy> so I wanted to ask, can everyone please prioritize reviews to help burn down the 40+ open bugs we've got targetted
14:19:05 <shardy> as well as the features, obviously
14:19:49 <shardy> I was going to ask gfidente about the CI impacting ceph bug, but looks like he's not here
14:19:58 <shardy> anyone else have an update on that?
14:20:46 <slagle> shardy: perhaps we could mark some additional bp's Essential, if they truly are
14:20:49 <slagle> to help prioritize
14:21:45 <shardy> slagle: Yeah, perhaps - tbh I think we really need to deliver all of the high ones
14:22:20 <shardy> marios: we discussed breaking down the upgrades one into bugs, vs that one essential blueprint
14:22:23 <shardy> and tagging the bugs
14:22:26 <shardy> did that happen?
14:23:52 <shardy> Ok, well lets follow up on that later then I guess
14:24:08 <marios> shardy: sry too many windows
14:24:45 <marios> shardy: i dont recall the tagging bit though for the main n3 blueprint.there have been some delays
14:25:04 <shardy> marios: that's the only BP we have marked essential, which means we can't release without it
14:25:06 <marios> with people pulled for mitaka work
14:25:23 <marios> ok i will revisit discuss with you offline
14:25:25 <shardy> so we have to figure out the status of that, and either break it into bugs, or deliver it
14:25:30 <shardy> marios: ack, thanks!
14:25:33 <shardy> #topic Projects releases or stable backports
14:25:44 <shardy> So, we already discussed n3
14:26:17 <shardy> It was noticed that we've not done any stable branch releases for a while, so I was going to do that this week
14:26:25 <shardy> based on a recent periodic job pass
14:26:45 <shardy> does anyone have any suggestions re stable release cadence, should we be looking to release more often?
14:27:04 <shardy> I think that my help folks consuming TripleO e.g via RDO
14:27:19 * beagles nods
14:27:39 <beagles> as to what is a reasonable cadence... err
14:27:57 * beagles concedes not at all helpful
14:27:59 <shardy> I'll write a script then we can do it more regularly with minimal overhead
14:28:11 <shardy> Ok anything re releases or backports to raise?
14:28:25 <EmilienM> shardy: in puppet modules, we try to do stable releases every month or so
14:28:29 <Jokke_> shardy: I personally like the general idea that every patch worth of backporting should be worth of releasing. So stable releases should be done quite fast after we backport fixes there (obviosly if there is multiple things on flight it makes sense not to release for each of them)
14:29:15 <shardy> Jokke_: That would be reasonable if we automate it, but if it's human driven probably a periodic release will be easier
14:29:27 <shardy> EmilienM: ack, maybe we can start doing the same
14:29:41 <shardy> #topic CI
14:29:58 <shardy> We talked about the promote problems, does anyone have an update re RH1?
14:29:59 <Jokke_> shardy: once you get hang of the process, requesting release is really not that big of a job to do ;)
14:30:04 <Jokke_> even manually
14:30:20 <shardy> I know derekh is on PTO, can anyone else give a status update?
14:30:42 <shardy> Jokke_: Ya, it's still not something I want to do every day by hand ;)
14:30:51 <bnemec> Jobs are running on rh1
14:31:03 <beagles> oh, I thought that was Jokke_ volunteering
14:31:07 <bnemec> With the Ceph issue I don't imagine any have passed though.
14:31:14 <pabelanger> yes
14:31:18 <pabelanger> tripleo-test-cloud-rh1 is online
14:31:26 <pabelanger> with 2 max-servers at the moment
14:32:03 <shardy> So what issue remain before we can increase the capacity?
14:32:10 <pabelanger> So, if jobs are passing under tripleo-test-cloud-rh1, I increase the amount of servers and start to shut off tripleo-test-cloud-rh2
14:32:21 <pabelanger> I just need somebody to say move the jobs
14:32:48 <shardy> sounds like we need to fix the ceph issue and get some green jobs before moving anything
14:32:51 <bnemec> Can we leave rh2 online?  Based on some discussion yesterday it sounds like we may be keeping that a little longer.
14:33:16 <slagle> also wanted to mention it looks like my fix for https://bugs.launchpad.net/tripleo/+bug/1608226 did not work in rh2
14:33:16 <openstack> Launchpad bug 1608226 in tripleo "Ci: new ntp-wait command in devstack-gate hangs for 100 minutes which breaks jobs" [Critical,New] - Assigned to Sagi (Sergey) Shnaidman (sshnaidm)
14:33:20 <gfidente> shardy the ceph issue should be sorted, we had master branch passing in a previous patchset but were hitting a different issue for the liberty/mitaka branches
14:33:34 <pabelanger> looking at the stuff derekh added, we upped the limited to 50 servers in tripleo-test-cloud-rh1
14:33:36 <slagle> so all CI jobs on rh2 are stalling for 10 minutes waiting for ntp
14:33:38 <gfidente> so I've updated one of the submissions and we're now waiting for the new results
14:33:46 <shardy> slagle: ouch :(
14:34:07 <shardy> gfidente: ack, thanks for the update
14:34:14 <pabelanger> right, tripleo-test-cloud-rh1 doesn't have the ntp-wait issue
14:34:29 <pabelanger> appears we don't have something upstream filtering firewalls
14:35:55 <shardy> Ok, so more investigation required then
14:35:58 <shardy> thanks for the update all
14:36:20 <shardy> Anything else re CI - weshay got anything to mention re the 3rd party jobs?
14:37:26 <weshay> shardy, I sent an update.. we have upgrade 3rd party and basic jobs enabled, we've been watching those to make sure they are running consistently.
14:37:53 <weshay> I believe apetrich is hitting issues w/ the liberty -> mitaka upgrade and has been pushing issues along
14:37:53 <shardy> weshay: Ok, what's the status of the upgrade job, did we get it passing?
14:38:24 <shardy> Ok, can we run mitaka->newton upgrades on upstream patches via an experimental job?
14:38:35 <weshay> liberty->mitaka fails apetrich have a bug? mitaka -> mitaka is working, mitaka -> master is failing on the overcloud
14:38:36 <apetrich> shardy, nope.
14:38:55 <shardy> apetrich: I mean, can we run it, let it fail and see why
14:39:01 <weshay> adarazs, has not yet started on a RHEL based third party job afaik
14:39:15 <apetrich> shardy, https://bugs.launchpad.net/tripleo/+bug/1608867
14:39:15 <openstack> Launchpad bug 1608867 in tripleo "Upgrade liberty to mitaka fails" [Undecided,New]
14:39:57 <apetrich> shardy, bandini was helping me to see if we can find anything.
14:39:59 <shardy> Ok, the reason for my question is we just said mitaka->master upgrades was our #1 priority for this cycle
14:40:01 <adarazs> weshay: no, not done anything to the RHEL job yet, we need to solve the log collection.
14:40:06 <shardy> so we need a way to test it asap :)
14:40:27 <EmilienM> just a note about CI, I'm working on a CI job to test undercloud upgrade from mitaka -> newton
14:40:35 <EmilienM> you can follow the work here: https://review.openstack.org/346995
14:40:45 <shardy> EmilienM: ack, that will be useful as a first step, thanks
14:40:52 <bnemec> On that note: https://review.openstack.org/349737
14:41:02 <bnemec> We need to decide _how_ we're going to handle undercloud upgrades.
14:41:12 <EmilienM> bnemec: oh wow
14:41:29 <EmilienM> bnemec: I'll review it today
14:41:34 <shardy> #topic specs
14:41:42 * bnemec got started early
14:41:44 <shardy> since we're now talking about specs :)
14:41:55 <shardy> I wanted to point out, we've only landed one spec for newton
14:42:00 <shardy> and it's nearly the end of the cycle
14:42:18 <shardy> how can we make the spec process work better for us and encourage more reviews?
14:42:30 <shardy> Or, should we instead adopt e.g spec-lite bugs from ocata?
14:43:05 <bnemec> Is the problem the spec process or the fact that we had some huge pieces of work this cycle that are taking basically the whole six months and are eating up a ton of core reviewer time?
14:43:08 <shardy> I'd like to see us iterate much faster and not get bogged down for months in detailed review when a spec is supposed to be direction not a detailed design document
14:44:01 <shardy> bnemec: the entire team has been overloaded I think, I'm not complaining, just recognising the current process isn't working
14:44:33 <shardy> is it realistic to expect more core reviewer time for specs next cycle, or is this a recurring problem?
14:45:36 <bnemec> I would think next cycle will be somewhat less of a scramble, with composable stuff and the API implemented.
14:45:43 <bnemec> Of course, I say this every cycle. :-)
14:45:46 <shardy> lol
14:46:00 * shardy will believe it when it happens :D
14:46:05 <shardy> Ok, well lets see how it goes
14:46:12 <bnemec> At least this cycle everyone was swamped with new things, not backporting to old releases.
14:46:27 <dprince> bnemec: oh we've got other ideas that'll keep us busy I think
14:46:32 <slagle> i'm not sure i fully grasp the spec-lite process
14:46:34 <shardy> for now, please everybody review specs so we can land those we expect to deliver before the release branches :)
14:46:39 <slagle> it seems it's just specs with less detail
14:46:44 <slagle> which i'd be fine with
14:46:56 <slagle> just put less implementation details into specs...just define a direction
14:47:04 <shardy> slagle: basically you just raise a bug, with a description of the change, then tag it spec-lite
14:47:22 <bnemec> And basically we don't review it?
14:47:31 <bnemec> Bugs are not a good medium for design collaboration.
14:47:43 <shardy> slagle: Yeah, I think it's basically a spec without the chance for 4 months of review iterations, but with a means to comment
14:47:50 <slagle> shardy: ok, i thought glance was whom had defined this. and you also have to still propose a "lite" spec to glance-specs
14:48:17 <shardy> #link http://specs.openstack.org/openstack/glance-specs/specs/lite-specs.html
14:48:29 <shardy> slagle: Ah, that's true, I evidently misinterpreted it somewhat
14:48:46 <bnemec> Personally I'd rather use full specs and just call people out if they're stuck in the implementation detail weeds.
14:49:01 <shardy> http://docs.openstack.org/developer/glance/contributing/blueprints.html
14:49:19 <shardy> bnemec: Ok, I'm fine with sticking with them, but we must iterate faster
14:49:42 <shardy> e.g it's pointeless if we're still in nitpick review phase when the implementation is landing or landed
14:50:02 <paramite> ok, regarding the conversation here, it seems that there is no chance avail. monitoring spec will get some attention, right?
14:50:17 <Jokke_> so we moved away from spec bugs in glance
14:50:22 <slagle> paramite: i've given it attention
14:50:28 <Jokke_> might be lack of doc update
14:50:40 <shardy> paramite: well, we're trying to figure out how to give it (and others) attention
14:50:44 <paramite> slagle, ah nice, thanks!
14:51:08 <shardy> Jokke_: Ok, what was the reason, they didn't work out?
14:51:38 <shardy> I personally would prefer all features to be blueprints, but not if we always stall at the blueprint review phase
14:51:50 <shardy> makes it easier from a tracking perspective
14:52:09 <bnemec> To a large extent I think this is a case of "software design is hard".
14:52:20 <bnemec> And knowing how detailed to make your design is also hard.
14:52:23 <slagle> yea, i definitely like seeing a bp for all features
14:52:35 <slagle> but for some reason you can't really review bp's
14:52:38 <slagle> no way to comment, etc
14:52:48 <bnemec> That's why the spec process came about.
14:52:52 <shardy> bnemec: Yeah, but I guess the expecation I'm trying to set is that we don't need to do detailed design in specs
14:53:01 <shardy> that's what code review iterations are for
14:53:08 <slagle> yes, so let's just have less detail in specs :)
14:53:20 <shardy> Ok then, lets do that :)
14:53:55 <bnemec> Maybe we could do a spec retrospective?
14:53:57 <dprince> the composable services spec was a good example. A too large blueprint that never landed because it never made everyone happy about the details
14:54:18 <bnemec> Look back at the specs we worked on this cycle and consider whether the scope was too large, too small, or just right.
14:54:44 <shardy> bnemec: that's a good idea, and during next cycle we could have spec review sprints (e.g of a couple of hours) periodically?
14:54:49 <bnemec> It would be good if we had some examples of specs that were written at the appropriate level to point people at as an example.
14:55:07 <Jokke_> shardy: we effectively wanted all the new features being documented in single place
14:55:10 <shardy> #topic open discussion
14:55:15 <bnemec> shardy: Yeah, review sprints can be helpful if you can get people to participate.
14:55:33 <shardy> Jokke_: Ok, that's interesting - similar to the preferences expressed here for all features to be blueprints
14:55:53 <shardy> sounds like we have some good ideas for optimizations going into next cycle, thanks all
14:56:08 <shardy> bnemec: maybe I should be offering free beer ;)
14:56:33 <shardy> Anyone have anything else to raise this week?
14:56:44 <shardy> We set the topic for the deep dive:
14:56:54 <shardy> https://etherpad.openstack.org/p/tripleo-deep-dive-topics
14:56:57 <Jokke_> shardy: having bugs as blueprints did not work out well doing reviews, release documentation etc. It was just too confusing
14:57:05 <shardy> slagle and I will cover some "undercloud under the hood" content
14:57:19 * Jokke_ has horrible lag on the connection atm. sorry for that
14:57:19 <shardy> Jokke_: ack, thanks
14:57:36 <shardy> Actually that reminds me, we should consider using reno
14:57:42 <bnemec> +1
14:57:48 <bnemec> It's been on my todo list for a while to set that up.
14:57:48 <Jokke_> shardy: +1
14:58:16 <shardy> Ok, we could add the reno output into t-h-t as a starting point
14:58:21 <Jokke_> again experience from Glance world. It needs proactivity, a lot of it
14:58:53 <Jokke_> and if reviewers don't drop their -s there those renos will just not be made
14:59:05 <pabelanger> I just pushed up a patch to start removing private infrastructure from tripleo-ci, moving image uploads to tarballs.o.o: https://review.openstack.org/#/c/350061/
14:59:10 <dprince> I'm game for reno
14:59:17 <pabelanger> wouldn't mind some help in that effort
14:59:19 * bnemec is always happy to -1 ;-)
15:00:07 <shardy> Ok, out of time, thanks all!
15:00:10 <shardy> #endmeeting