14:02:37 <shardy> #startmeeting #tripleo
14:02:37 <openstack> Meeting started Tue Apr 19 14:02:37 2016 UTC and is due to finish in 60 minutes.  The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:02:38 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:02:40 <openstack> The meeting name has been set to '_tripleo'
14:02:43 <EmilienM> o/
14:02:44 <adarazs> o/
14:02:45 <shardy> #topic rollcall
14:02:47 <jdob> o/
14:02:48 <michchap_> o/
14:02:49 <shadower> hey
14:02:50 <shardy> thanks for the reminder EmilienM ;)
14:02:54 <beagles> o.
14:03:07 <EmilienM> shardy: calendar as a service !
14:03:08 <derekh> o/
14:03:12 <trown> o/
14:03:15 <shardy> #link https://wiki.openstack.org/wiki/Meetings/TripleO
14:03:18 <tzumainn> hiya
14:03:29 <slagle> hi
14:03:54 <bnemec> o/
14:03:57 <dprince> hello
14:04:21 <shardy> #topic agenda
14:04:21 <shardy> * one off agenda items
14:04:21 <shardy> * bugs
14:04:22 <shardy> * Projects releases or stable backports
14:04:22 <shardy> * CI
14:04:24 <shardy> * Specs
14:04:28 <shardy> * open discussion
14:04:43 <shardy> Anyone have any additional one-off items other than the one dprince added?
14:04:52 <michchap_> I have a Q about hiera.yaml composition
14:04:56 <michchap_> if we have time
14:05:06 <shardy> michchap_: kk, can we do that in open discussion?
14:05:12 <michchap_> shardy: sure
14:05:17 <shardy> Ok, thanks
14:05:27 <shardy> #topic one off agenda items
14:05:35 <shardy> * CI status report broke. Potential fix here: https://review.openstack.org/#/c/307532/
14:05:48 <shardy> dprince: this is due to the firewalling right?
14:05:57 <EmilienM> yes ^
14:05:59 <dprince> shardy: yes
14:06:20 <dprince> shardy: ccamacho and I are working on a patch to generate a new report
14:06:35 <trown> cool
14:06:35 <dprince> derekh: on that note, do we want to keep the old tripleo-jobs.py as-is?
14:06:41 * bnemec subscribes
14:06:42 <shardy> Sounds good, would be nice to get cistatus working again
14:06:44 <dprince> derekh: in case the firewall opens up?
14:06:53 <ccamacho> o/
14:06:59 <trown> ya I think jenkins restriction is temporary
14:07:02 <dprince> derekh: like maybe our patch should be a tripleo-jobs-gerrit.py or something
14:07:08 <trown> though probably a few weeks
14:07:45 <derekh> dprince: we could revert it again, or keep it, either is fine with me
14:07:47 <dprince> trown: there is possibility... in the longterm that infra could move away from Jenkins I think and just use a zuul worker directly
14:08:00 <dprince> trown: whereas... I don't think gerrit is going away long term
14:08:14 <trown> and even after it is unfirewalled, we will still likely need to patch tripleo-jobs.py to account for CSRF protections
14:08:25 <dprince> derekh: I may keep them both side,by side
14:08:37 <trown> dprince: ya, the only problem with gerrit is no ability to check periodic
14:08:50 <dprince> trown: yep, we lose that
14:08:54 <trown> I am pro side-by-side approach
14:09:06 <derekh> dprince: the gerrit version wont get us the periodic jobs, best I can come up with its traversing the log server http://paste.openstack.org/show/494640/
14:09:46 <derekh> trown: best solution I can come up with for periodic jobs ^ its a WIP
14:09:48 <dprince> derekh: lets do it man.
14:10:01 <dprince> are we starting an unoffical "screenscrapers" team here guys?
14:10:19 <trown> :)
14:10:35 <derekh> dprince: can't think of an alternative ;-(
14:10:39 <shardy> hehe - well it sounds like we've got a workable interim plan then
14:11:02 <shardy> shall we move on, and hopefully find out more about the duration of the firewalling etc next week?
14:11:09 <dprince> shardy: yep, we are done I think
14:11:18 <shardy> #topic bugs
14:11:38 <shardy> #link https://bugs.launchpad.net/tripleo
14:12:07 <shardy> So https://bugs.launchpad.net/tripleo/+bug/1571708 is one I was looking at with EmilienM and chem yesterday
14:12:08 <openstack> Launchpad bug 1571708 in tripleo "undercloud user role assignments deleted by undercloud install" [Undecided,New]
14:12:33 <shardy> it's an issue where we delete the _member_ role assignments when upgrading from quite old environments
14:12:59 <shardy> https://review.openstack.org/#/c/307352/ is a proposed fix, which is a little ugly but hopefully workable
14:13:02 <shardy> feedback welcome
14:13:11 <shardy> any other bugs to highlight this week?
14:13:19 <EmilienM> shardy: I like your approach
14:13:32 <slagle> shardy: do you confirm that's a problem using all upstream bits?
14:14:02 <shardy> slagle: It's a problem if you installed your undercloud then kept updating it for sufficiently long
14:14:17 <shardy> the change was keystone used to create _member_ as part of the bootstrap of the DB
14:14:17 <slagle> there were some patches carried downstream that were reworked before they ever landed upstream, related to recreating users at the end of the install
14:14:25 <slagle> ok
14:14:42 <slagle> just wanted to make sure the right root cause was identified
14:15:33 <shardy> slagle: sanity check welcome, I was under the impression the role assignments would have been created by keystone along with the _member_ role, but we can rework if that's not the case
14:16:20 <slagle> shardy: i'd need to go back and refresh my memory
14:16:31 <shardy> kk, any feedback welcome
14:16:35 <slagle> but based on your simple reproducer in the bug, it's probably not related to what i was thinking of
14:17:15 <shardy> Well the reproducer actually isn't fixed by my patch - if an operator adds a role to any of the pre-created users, puppet will erase the role assignments every undercloud install
14:17:24 <shardy> but it demonstrates the same issue
14:18:02 <shardy> Was https://bugs.launchpad.net/tripleo/+bug/1571384 fixed by the revised package build?
14:18:03 <openstack> Launchpad bug 1571384 in tripleo "CI periodic jobs: undercloud keystone fails to start" [Undecided,New]
14:18:27 <dprince> shardy: I think so
14:18:45 <dprince> shardy: 3-4 packaging fixes later and we are good :)
14:18:52 <trown> lol
14:18:58 <trown> ya that was a mess
14:18:58 <shardy> dprince: Ok, sounds like we can close that one off then :)
14:20:54 <shardy> #topic Projects releases or stable backports
14:21:34 <shardy> So I think stable CI was broken which should be fixed when we land https://review.openstack.org/#/c/306524/
14:22:00 <shardy> The other thing I wanted to raise (which we can discuss in more detail next week) was the release schedule we plan to observe for Newton
14:22:17 <shardy> until now, TripleO has kind of done it's own thing under the "independent" release model
14:22:20 <derekh> Yup the liberty job is accidentally using the cached instack master image ... sorry .. this patch fixes it (in addition to moving some stuff into a function) https://review.openstack.org/#/c/306524/
14:22:35 <shardy> but IMO it would be a good thing if we aligned more closely with the coordinated release
14:22:46 <shardy> given that we're maintaining branches etc now
14:23:00 <trown> +1
14:23:07 <shardy> #link https://github.com/openstack/releases/blob/master/doc/source/newton/schedule.rst
14:23:23 <shardy> There are basically two possible approaches
14:23:46 <shardy> we either align with the main Newton release, with some number of intermediate milestones
14:24:06 <shardy> Or we adopt the new "trailing" release which allows a 2 week window after the main release
14:24:11 <EmilienM> I think tripleo is a good candidate for https://review.openstack.org/#/c/306037/
14:24:32 <EmilienM> +1 for trailing release
14:24:39 <trown> I think trailing makes sense if we have an actual hard date still
14:24:46 <trown> EmilienM: is puppet also on that?
14:24:47 <EmilienM> fwiw, that's what puppet openstack + fuel + kolla + ansible are going to follow
14:25:03 <bnemec> I mean, deployment systems are explicitly called out as the intended use case.
14:25:04 <beagles> +1 for trailing release
14:25:06 <EmilienM> trown: yes, probably
14:25:06 <trown> I think we are pretty tied to what puppet does
14:25:31 <marios> o/ sorry am late
14:25:33 <shardy> Yeah, and also there's a natural lag in that we are dependent on RDO branches existing for a release etc
14:25:33 <EmilienM> trown: we'll do our best to release at the official date though
14:25:46 <EmilienM> trown: like we did for Mitaka, we released 1 week earlier
14:25:57 <rhallisey> hi (missed role call)
14:26:04 <shardy> EmilienM: Ok, if puppet is going with that tag, I think we'll have to do likewise
14:26:11 <slagle> +1
14:26:15 <shardy> but I like the idea of aiming for sooner-than-two-weeks
14:26:27 <EmilienM> right
14:26:39 <EmilienM> for sure we (puppet) won't wait 2 weeks
14:27:03 <shardy> Ok then, it sounds like we have consesus but I'll follow up on the ML and folks can vote or raise concerns there
14:27:04 <trown> ya, "no later than 2 weeks" is good
14:27:14 <EmilienM> shardy: ++
14:27:27 <shardy> Anything else re releases to raise?
14:27:42 <trown> note, that the trailing release model still involves milestones
14:27:51 <EmilienM> that are not required afik
14:28:14 <trown> it is in the requirements on https://review.openstack.org/#/c/306037/2/reference/tags/release_cycle-trailing.rst
14:28:25 <trown> though not very specifically defined
14:28:33 <EmilienM> trown: yeah, I read it too, but iiuc when talking with doug it was not required
14:28:40 <trown> k
14:28:46 <EmilienM> maybe we can ask dhellmann to confirm (later)
14:28:48 <shardy> trown: Ya, I think it'll be good for us to consider some intermediate releases
14:29:04 <shardy> we'll have to decide if we follow the milestone or intemediary model tho
14:29:09 <EmilienM> shardy: yes, release more often would really help to iterate
14:30:04 <shardy> Ok I'll try to summarise a plan on the ML and folks can reply there until we reach agreement
14:30:15 <shardy> #topic CI
14:30:32 <derekh> One of our RH1 compute nodes went offline today, it was the compute node that hosted our squid and geard, so all jobs failed
14:30:38 <derekh> its back running now, we should know soon if everything is ok, can't check jenkins anymore to see progress ;-(
14:30:41 <shardy> Anyone like to give us a summry of the CI status, obviously been another fairly rough week
14:30:50 <derekh> We successfully moved the tripleo pin yesterday to a repo from friday, we are now caching the IPA image,
14:30:59 <derekh> this patch should allow us to use it https://review.openstack.org/#/c/301699/
14:31:05 <derekh> and this patch will start us caching the overcloud image https://review.openstack.org/#/c/306499/ I'll work on a patch to use it once its cached
14:31:22 <slagle> #info we are testing stack-updates to the tripleo-heat-templates patch being tested in the upgrades job now
14:31:24 <shardy> derekh: did the automated promotion etc all work when the periodic job passed?
14:31:43 <shardy> slagle: nice :)
14:31:45 <bnemec> I wonder if that's why my undercloud upgrade job started failing yesterday...
14:31:47 <EmilienM> slagle: w00t
14:32:03 <shardy> I believe therve posted a fix to the ResourceChain update issue also, has that been proven?
14:32:18 <EmilienM> do we deploy newton already? (puppet CI is failing at that currently, I'm working on it)
14:32:30 <derekh> shardy: yes, its was an atomatic promotion, but it was a fake periodic job (didn't want to wait another 24 hrs https://review.openstack.org/#/c/229789/)
14:32:35 <trown> EmilienM: ya, but only with pingtest
14:32:58 <EmilienM> trown: it's very cool, I'll investigate all my tempests failures when our CI is back
14:33:13 <trown> EmilienM: but I have had success manually testing tempest, with only ceilo notification and keystone v3 failures
14:34:15 <derekh> I've been trying to make the toci_ scripts usable by more people to reproduce problems, would appreciate some eyes here to help out the cause https://review.openstack.org/#/c/306506/
14:34:40 <derekh> also
14:34:45 <derekh> I've been asked when would be a good time for the new RAM to be installed on the tripleo hosts, my suggestion was Friday 29th, this would leave CI offline from friday until the following monday when we bring it back up again.
14:34:50 <derekh> Most people will be traveling back from the summit so disruption shouldn't be too bad
14:34:55 <derekh> how does this sound to people?
14:35:01 <trown> +1
14:35:05 <shardy> derekh: +1
14:35:05 <EmilienM> ++
14:35:09 <derekh> This is assuming we have the RAM in time if course, if not we'll have to pick some other time, we could wait until the Rack moves to a new DC but thats the beginning of July
14:35:20 <shardy> good to hear we've got a definite timeline on those upgrades :)
14:35:39 <shardy> derekh: would be nice to not wait that long if possible
14:35:59 <derekh> shardy: the PO is sent, so we should have it available once the supplier sends them too us
14:36:12 <shardy> One CI related thing, I did a bunch of memory profiling last week, resulting in https://bugs.launchpad.net/heat/+bug/1570983 and https://bugs.launchpad.net/heat/+bug/1570974
14:36:14 <openstack> Launchpad bug 1570983 in heat "raw_template files duplication wastes DB space and memory" [High,In progress] - Assigned to Crag Wolfe (cwolfe)
14:36:15 <openstack> Launchpad bug 1570974 in heat "Possible reference loops lead to high memory usage when idle" [Undecided,New]
14:36:37 <shardy> tl;dr - there are some issues which we may be able to fix to improve the heat-engine memory usage somewhat
14:37:08 <shardy> Anything else CI related?
14:37:30 <rhallisey> container job was working for a bit, it is no longer
14:37:53 <shardy> rhallisey: what was the cause, tripleo code or the images?
14:37:55 <marios> shardy: looks like 1570983 is addressed already, and if i understood correctly will make significant improvement in memory use
14:38:03 <rhallisey> even if I fix it though, it still won't turn up from for the composable container stuff until I do all the roles
14:38:06 <rhallisey> since we run on atomic
14:38:24 <shardy> marios: Yeah the patch isn't ready yet but it may make a big difference when done
14:38:37 <rhallisey> shardy, it was working for a while, but about week ago it stopped
14:38:54 <EmilienM> should we enable it as voting?^
14:38:59 <rhallisey> shardy, I think we should hold off on that CI job until composable container stuff is all set
14:40:34 <rhallisey> EmilienM, it's not ready yet
14:40:50 <shardy> #topic Specs
14:40:57 <shardy> #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open
14:40:57 <rhallisey> the compute node is going to undergo a lot of change, so gating on anything there won't really do much
14:41:12 <shardy> There's the lightweight HA one which could use feedback
14:41:33 <shardy> should we leave other spec discussions until during/after summit?
14:41:44 <shardy> or does anyone have anything to raise now?
14:41:53 <shadower> can we have people look at the validation one again?
14:42:09 <shardy> shadower: ah, yes sorry I meant to mention that one too
14:42:10 <shadower> it's prerequisite (the mistral api) has landed and I don't see any negative feedback
14:42:16 <shadower> *its
14:42:33 <shadower> but both mandre and myself will be at the summit so we can talk abuot it there, too
14:42:46 <shadower> no worries shardy :-)
14:42:50 <marios> shadower: thanks for reminder i want to revisit. will this include things like 'is my pcs cluster ok?'
14:43:27 <shadower> marios: we're focusing on validating the predeployment stuff because it can shave a ton of deployment time. But we do have some postdeployment validations
14:43:38 <shadower> not 100% sure they belong there -- with sensu being a thing and all
14:43:46 <marios> shadower: ack thx and sorry i see we bump discussion/take offline. just for sthing i was looking at today
14:44:07 <shadower> yeah, np
14:44:17 <shadower> that's a question i'd like to resolve soonish anyway
14:44:42 <marios> shadower: link to the spec would be good in minutes
14:44:53 <shadower> https://review.openstack.org/#/c/255792/
14:45:00 <shadower> sorry
14:45:07 <shardy> #link https://review.openstack.org/#/c/255792
14:45:24 <shardy> #topic Open Discussion
14:45:36 <shardy> So, one thing before we address michchap_'s question
14:45:39 <tzumainn> have the topics for summit been announced yet?
14:45:54 <shardy> #link http://www.openstack.org/summit/austin-2016/summit-schedule/global-search?t=TripleO
14:45:58 <slagle> #link https://etherpad.openstack.org/p/newton-tripleo-sessions
14:46:09 <shardy> tzumainn: hehe
14:46:20 <tzumainn> shardy, lol, thanks!
14:46:28 <shardy> I'll create the etherpads per-topic and link them into the main page later today
14:46:49 <shardy> There are a few conflicts with Heat sessions in the afternoon unfortunately, but otherwise all seems to be in good shape
14:47:11 <shardy> let me know if you see anything that needs adjustment
14:48:02 <shardy> I'll also chat to folks in #tripleo and we can nominate spread the load of leading sessions around a bit
14:48:24 <shardy> any questions/comments re summit to add?
14:49:14 <shardy> Ok, michchap_ - your question?
14:49:20 <michchap_> I will be quick
14:49:30 <michchap_> I was looking at extraconfig things for doing numa support as an optional component and noticed the hiera.yaml on compute nodes has a set of hardcoded files for each neutron plugin. I was thinking the composable services stuff might be a good time to look at making hiera.yaml itself composable, or perhaps merging things into existing yaml files.
14:50:04 <michchap_> I just wanted to raise that as a thing and ask if anyone had already looked at doing it
14:50:22 <shardy> michchap_: Yeah, the history there is until recently heat didn't have any way to join two (or more) lists
14:50:42 <shardy> I fixed that in mitaka, so we can probably make that list composable now
14:50:51 <shardy> although how we handle the ordering could be interesting
14:50:55 <michchap_> shardy: Ah, but we might be in a good spot to do it now.
14:51:11 <michchap_> shardy: ordering is going to be difficult.
14:51:18 <shardy> michchap_: Yeah, I think it should be do-able as part of the composable services work
14:51:27 <shardy> dprince: Have you looked into that yet at all?
14:52:12 <dprince> shardy: not specifically, but it sounds doable
14:52:26 <shardy> kk, thanks for mentioning it michchap_
14:52:31 <dprince> yep, good idea
14:52:33 <michchap_> thanks for your time
14:52:42 <shardy> I'll see if I can throw an initial patch up which illustrates how we might do it
14:52:47 <michchap_> I might try to make a patch but heat isn't my strong suit.
14:52:49 <shardy> and we can take it from there
14:52:54 <michchap_> ^ happy to review
14:53:20 <shardy> Anyone have any other topics to raise before we wrap up?
14:54:19 <shardy> Ok then, thanks all, see you in Austin!
14:54:33 <shardy> #endmeeting