14:00:27 #startmeeting tripleo 14:00:28 Meeting started Tue May 10 14:00:27 2016 UTC and is due to finish in 60 minutes. The chair is shardy. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:29 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:31 The meeting name has been set to 'tripleo' 14:00:39 #topic rollcall 14:00:41 o/ 14:00:47 Hey all, who's around? 14:00:57 hi! 14:01:05 hi 14:01:14 o/ 14:01:56 o/ 14:02:09 o/ 14:02:22 Ok then, let's get started :) 14:02:32 #topic agenda 14:02:32 * one off agenda items 14:02:32 * bugs 14:02:32 * Projects releases or stable backports 14:02:32 * CI 14:02:35 * Specs 14:02:35 hi 14:02:39 * open discussion 14:02:47 Anyone have anything to add to the one-off items? 14:02:57 there's one from me and one from beagles atm: 14:03:05 #link https://wiki.openstack.org/wiki/Meetings/TripleO#One-off_agenda_items 14:03:21 hey (sorry for being late) 14:03:34 hey shadower, np 14:03:50 #topic one off agenda items 14:04:04 beagles: Hey, do you want to cover your item first here? 14:04:10 shardy: sure 14:04:25 #info bug tag for partial bug fixes 14:04:40 I've been going through some of the older bugs especially the ones that are "in progress" 14:05:06 o/ 14:05:10 there have been patches landed as workarounds, but its not clear that they are needed any longer so I would like to tag bugs like this for now 14:05:22 so that we can go through and clean them as time and resources permit 14:05:37 I like the idea, specially for things related to our CI 14:06:13 there is already a CI tag, but +1 on having a possibly-obsolete-hack-workaround tag (naming to be determined) 14:06:38 ++ 14:06:47 anything that allows us to divide and conquer so we start purging old and no longer valid bugs gets my +1 :) 14:06:54 beagles: thanks for helping out here! 14:07:04 cool... I'm open to suggestions on the name. Naming is always the hardest part 14:07:21 shardy: np .. it's good stuff. Nice way to get some historical context on things 14:08:06 temp-workaround might be reasonably descriptive and not be as scary as "hack" 14:08:31 anyways, that's that in a nutshell we can bikeshed on names later 14:08:33 beagles: +1. or just "workaround" would be OK I guess 14:08:46 both wfm 14:08:49 k 14:08:51 we can decide the name in #tripleo later, thanks! 14:09:05 Ok next topic was midcycle plans 14:09:28 this was mentioned a couple of times recently, and I'd not considered organizing any f2f meetup this time 14:09:36 +1 14:09:49 what do folks think, should we aim for some sort of virtual hackfest/meetup around the middle of the cycle? 14:10:00 +1 on virtual meetup 14:10:05 I could arrange a series of focussed topic video calls or something 14:10:13 shardy: sounds good 14:10:14 +1 on using #openstack-sprint + videoconf if needed 14:10:21 yea better than traveling 14:10:23 +1 virtual meetup 14:10:38 obviously we can do that at any time, but it'd maybe be good to encourage some high-bandwidth discussion via some google hangouts or whatever 14:10:46 or you can come in Quebec and I can cook french cooking 14:11:03 :) 14:11:16 +1 to virtual 14:12:14 Ok, sounds like vague consensus, I'll start ML thread where we can decide the date and agenda 14:12:29 shardy: I can help you to start agenda 14:12:37 #info agreed to aim for virtual mid-cycle meetup, ML thread pending 14:12:45 I'm happy to go virtual if bnemec turns on his cam so we can see what kind of gestures he is making at the screen ;-) 14:13:01 Any other one-off items before we continue? 14:13:06 derekh: :-) 14:13:36 #topic bugs 14:13:56 #link https://bugs.launchpad.net/tripleo/ 14:14:23 Current bug on trunk https://bugs.launchpad.net/tripleo/+bug/1580076 14:14:24 Launchpad bug 1580076 in tripleo "Upgrades job failing pingtest with "Message: No valid host was found."" [Critical,Triaged] 14:14:37 causing all upgrades jobs to fail 14:15:25 hmm, is that happening before of after the stack-update? 14:15:26 weird, I swear I saw pingtest working last night 14:15:27 There's also https://bugs.launchpad.net/tripleo/+bug/1580170 which looks like a puppet module version mismatch on liberty->mitaka upgrade, possibly need to get more info on that 14:15:28 Launchpad bug 1580170 in tripleo "overcloud upgrade liberty to mitaka failed" [Undecided,New] 14:15:41 i would assume after since other jobs are passing the pingtest 14:15:42 (I saw pingtest working *after* stack update last night) 14:15:48 i wonder if we have a real upgrades bug? 14:15:56 derekh, but not only affects upgrades, also (sometimes) creating the overcloud 14:16:14 slagle: after, I have a hunch this patch started the problem https://review.openstack.org/#/c/312300/1 but its just currently a hunch 14:16:27 Im trying to reproduce it creating the overcloud in my CI 14:16:44 ccamacho: ok 14:17:24 slagle: yup, we may possible have found a real upgrades bug 14:17:29 derekh: why? 14:17:38 why 312300 ? do you have logs? 14:18:17 derekh: are you tested a puppet pin that uses keystone prior to that yet? 14:18:52 EmilienM: dunno, don't worry about it being that patch until I run some tests, its just a hunch based on the auth problem we're seeing, when the problem started and the timing of that patch 14:18:57 dprince: it's nova 14:19:12 derekh: 5th may iirc 14:19:40 derekh, agreed on the starting time for the issues 14:19:55 derekh: it can't be this one, upgrade job passed at 2016-05-05 23:52:20 14:20:06 and the puppet-nova patch merged at May 5 6:23 PM 14:20:18 source: http://tripleo.org/cistatus.html and https://review.openstack.org/#/c/312300/ 14:20:31 EmilienM: okay, either way I'd say lets test them both to see 14:20:41 * EmilienM checks in tripleo logs to check we had the commit 14:21:17 Ok, so it sounds like we've got enough eyes on this issue, are there any other bugs folks want to highlight? 14:21:30 EmilienM: ok, btw I'm not suggesting we jump into reverting it or anything, just letting people know what my current train of thought was 14:21:46 derekh: wait, I checked in the logs, and this is the commit in puppet-nova that worked: https://github.com/openstack/puppet-nova/commits/b108a7c36bbc733b3aa90786540e978f5c0ec059 14:21:55 and we don't have the one you mentionned, so it's still a possibility 14:22:17 dprince: will try a temprevert, kindof tried something similar here but it didn't even get to the ping test https://review.openstack.org/#/c/314510/ 14:22:22 EmilienM: ack 14:22:32 Just a reminder to please target bugs when you triage them, e.g if it's an actual bug in TripleO pieces vs a CI fix 14:22:35 https://launchpad.net/tripleo/+milestone/newton-1 14:22:56 Then we can burn down the open list for the milestone and know when it's a good time to release 14:22:56 derekh: I see. I should know to always check the "nothing to see" patches first ;) 14:23:15 dprince: you should have learned by now ;-) 14:23:22 derekh: well, 314510 has same effect as temprevert... and it fails :( 14:23:47 EmilienM: ya but it didn't get as far as the ping test, 14:23:51 EmilienM: going to recheck now 14:23:58 derekh: lets recheck that on I think 14:24:09 and also ref the cleanup beagles has been helping with - please review the list of bugs raised by you, and close any ye-olde ones which are no longer valid 14:24:25 dprince: done 14:25:20 Ok shall we continue and defer further discussion re bug #1580076 to after the meeting? 14:25:21 bug 1580076 in tripleo "Upgrades job failing pingtest with "Message: No valid host was found."" [Critical,Triaged] https://launchpad.net/bugs/1580076 14:25:29 ++ 14:25:40 +1 14:25:42 #topic Projects releases or stable backports 14:26:12 #link http://releases.openstack.org/newton/schedule.html 14:26:26 So, I wanted to run a plan past you all for the n-1 milestone 14:27:04 I was thinking it'd be good to do a coordinated release of all the tripleo pieces, based on a passing periodic CI job, around the time of the n-1 milestone (e.g in about 3 weeks time) 14:27:32 I'll probably write a script that can scrape the latest periodic CI pass and propose all-the-things to openstack/releases 14:27:46 then at any time, we can tag a release for a combination of things we know to work 14:28:02 just an FYI about puppet modules, we might produce a first newton release by end of this month 14:28:17 shardy: would be cool to display that on tripleo.org perhaps too. Maybe in the CI status page or something 14:28:41 shardy: once the releases were done, how would people consume them? 14:29:19 shardy: your script can look at this file http://trunk.rdoproject.org/centos7/current-tripleo/versions.csv to see what version of each project is included (just FYI in case you didn't know) 14:29:33 just wondering how we'd be able to definitevely install a n-1 14:29:34 slagle: that's a good question - I've not yet figured that out - I was hoping we could wire up tripleo-quickstart to enable easily deploy for a given release 14:29:49 obviously we can also publish the delorean hash for the passing CI run 14:30:06 but I was hoping we could get the tagged repos more directly consumable 14:30:26 * shardy looks around for trown 14:31:21 shardy: havn't seen trown around, I guess his wife had the baby, it could be a while before we see him 14:31:36 slagle: I guess I was focussing on the first step, which is to define a point-in-time release which we expect to work, and has a known bunch of features/bug-fixes in it 14:31:50 slagle: you're right, we need to then define and document how folks consume it 14:31:55 derekh: ah, cool 14:32:24 shardy: k, i think we could come up with something 14:32:45 If folks are OK with the idea of milestone related releases, we can do that and figure out the consuming of it 14:33:18 shardy: I always use trunk, but whatever :) 14:34:18 Ok, lets table the release discussion and work it out over the next couple of weeks 14:34:24 anything else release related? 14:34:52 https://review.openstack.org/#/c/308236/ has some discussion re our stable branches FYI 14:35:11 there's some resistance to our application for the follows-stable tag 14:36:33 feel free to pitch in there if you have an opinion - I may start a ML thread on the same topic 14:36:56 #topic CI 14:37:09 So, other than the upgrades job, any CI news to discuss? 14:37:25 yes 14:37:31 the newly discovered step removal aka "turbo" option? :) 14:37:34 shardy: Emilien is chopping some of the steps from deployment 14:37:37 yesterday I released puppet-ceph 14:37:43 all jobs now 7 minutes faster! 14:37:44 that should speed things up I think 14:37:48 upgrades job now 14 minutes faster! 14:37:55 and we need to bump puppet-ceph to stable/hammer 14:37:58 please review https://review.openstack.org/#/c/314311/ 14:38:05 I'm not sure this patch does what I want 14:38:32 Cool, well nice work on the optimization! :) 14:38:45 EmilienM: we aren't using stable for the other modules though? 14:38:47 and yeah, I'm also working on step6 removal https://review.openstack.org/314253 14:38:51 Its a tripleo improvment though not just ci, I wouldn't like people thinking we just fixed something in CI 14:38:52 EmilienM: why puppet-ceph? 14:38:53 dprince: yeah but ceph is special 14:39:04 I've been in discussion with zaneb and other heat folks, and it sounds like the heat memory usage issues are likely to be improved soon 14:39:16 dprince: not a lot of people are working on this module, and we found out tripleo can't deploy Jewel *right now*, so better to pin 14:39:36 I'd like you consider tempest for periodic nonha jobs, please: https://review.openstack.org/#/c/297038/ 14:39:46 https://review.openstack.org/#/c/311837/ is the first step if you'd like to follow it 14:40:20 EmilienM: https://review.openstack.org/#/c/184844/ was an idea too 14:40:47 shardy: was wondering if we should setup a periodic job to test with convergance? 14:41:08 shardy: what do you think our plans should be around convergance? do you think we could switch in newton? 14:41:09 dprince: should I update my patch or mine works too? 14:41:19 slagle: Probably not yet - I tested it locally quite recently, it makes the memory usage and DB utilization/timeout issues much, much worse atm 14:41:28 slagle: yeah, hearing about the memory usage of convergence at Austin was a bit concerning 14:41:29 i see, ok 14:42:02 slagle: I think we should probably wait for things to get optimized and configure for non-convergence at least until later in Newton 14:42:37 Heat may switch the default relatively soon, but IMHO the benefits don't yet outweigh the performance issues for the TripleO use-case 14:42:43 EmilienM: I'd just like to clarify what we are doing where. If we do it in your patch at the very least a comment explaining why. Initially I thought we'd have a separate element for this though. That was my point. 14:42:55 dprince: commit message is not enough? 14:43:14 EmilienM: #comment 14:43:21 kk 14:43:40 slagle: we could set up a periodic job tho I suppose, or maybe an experimental job that we can try 14:44:18 shardy: the non-convergence path would eventually get removed though right/ 14:44:35 shardy: so we'd be on borrowed time to fully migrate over once heat switches? 14:44:40 dprince: eventually I guess, but there's no discussion of that happening anytime soon 14:45:12 it's not even deprecated yet, they'll switch the default, then we'd need *at least* two full cycles before anything got removed, I anticpate it being longer than that 14:45:26 shardy: what is the progress on dropping OPM rpm in stable jobs and use git? 14:46:03 EmilienM: I proposed an approach, but dprince had a preference for a different implementation - I've not yet had time to revisit, so it's still TODO 14:46:20 if anyone wants to pick that up, feel free 14:46:29 ack 14:46:55 we actually need to switch master to using the per-module delorean packages too ref trown's item last week 14:47:09 Ok, anything else re CI before we continue? 14:47:19 Main thing I got to report is that it looks like we're going ahead with the HW upgrade tomorrow 14:47:25 I'll likely take the cloud down around 1pm UTC, and expect it down for about 12 hours 14:47:36 sshnaidm: once thats done, assuming it speeds things up even more I think we can start thinking about adding tempest to the period job 14:47:48 derekh: hope it goes well 14:47:48 derekh, cool, thanks 14:48:00 derekh: nice - do you need any help or do you have the bringup post-upgrade covered? 14:48:05 I've got one thing that has to happen before the 14th as well. Very important 14:48:14 dprince: certs ? 14:48:14 We need a new SSL cert :/ 14:48:23 derekh: yep :) 14:48:29 * derekh meant to remind dprince last week, opps 14:48:38 heh, good reminder :) 14:48:41 derekh: I got my own reminder. So its on the list 14:48:47 Who needs certs when your cloud is down for upgrades? :-) 14:48:57 OK #topic Specs 14:48:59 another topic - please comment my elastic-recheck related mail in maillist if you're interested 14:49:02 sorry :) 14:49:03 bnemec: well, if it comes back up we'll need em 14:49:06 #topic Specs 14:49:17 shardy: I think I'm got it handled, will ping people if I need extra hands 14:49:26 So, there's a ML message about two specs related to opnfv 14:49:32 would be good to get some eyes on those 14:49:59 #link https://review.openstack.org/#/c/313871/ 14:50:12 #link https://review.openstack.org/#/c/313872/ 14:50:49 I also added https://blueprints.launchpad.net/tripleo/+spec/custom-roles to track the fully-composable model we discussed at summit, which those two will probably require 14:51:18 #link http://lists.openstack.org/pipermail/openstack-dev/2016-May/094287.html 14:52:21 fwiw, I don't think they actually know how the dpdk one is going to work out just yet. The SR-IOV one is "closer" in terms of reality at the moment 14:52:44 there are some non-tripleo related things to work out with respect to dpdk 14:53:03 beagles: cool, please comment on the specs 14:53:10 yup 14:53:18 there's a lot of detail in both, but not that much clarity on the actual implementation AFAICT 14:53:39 Anything else spec related? 14:54:05 https://launchpad.net/tripleo/+milestone/newton-1 14:54:30 we only have two features on the n-1 list, so we should add anything we expect to land in the next 2-3 weeks 14:54:46 #topic open discussion 14:54:58 Sorry, only 5 mins left - anything to raise? 14:55:34 shardy: I added a new spec too 14:55:37 * derekh runs out the door a little early 14:55:38 #link https://blueprints.launchpad.net/tripleo/+spec/remote-execution 14:56:02 The power there is really in the Mistral workflow bits... but the CLI work shows it nicely I think 14:56:42 I like it, but I wondered about security here, and the capacity of running malicious software remotely 14:57:01 can we try and link etherpads tracking patches/reviews into the blueprints? 14:57:09 EmilienM: I'm using the same mechanism we use for Heat. So nothing new really I think 14:57:13 dprince: Nice, I saw that, looks good - it'd be interesting to see how that aligns with operator requests re e.g running ansible against a dynamic inventory generated by TripleO 14:57:14 if someone gets admin credentials, it's easy to run malicious software remotely 14:57:15 i noticed there were ones for composable services and mistral 14:57:19 etherpads, that is 14:57:22 but no one can find them 14:57:31 EmilienM: it's already easy to do that if you have credentials 14:57:42 EmilienM: if you already have access to the pvt keys then you can do whatever you want anyway (e.g. from undercloud) 14:57:45 ok, I'm just highlighting, just in case. 14:58:09 slagle: https://etherpad.openstack.org/p/tripleo-composable-roles-work ? 14:58:23 slagle: good point, lets link them from the whiteboards on the blueprints 14:58:28 EmilienM: yes, pads like that 14:58:41 the link isn't discoverable unless you already know it 14:58:41 composable services is here: https://etherpad.openstack.org/p/tripleo-composable-services 14:59:00 lol, or we end up with 2 etherpads :) 14:59:13 14:59:16 qed 14:59:20 Ok, we're out of time - thanks all! 14:59:24 #endmeeting