14:00:36 <mwhahaha> #startmeeting tripleo
14:00:36 <mwhahaha> #topic agenda
14:00:36 <mwhahaha> * Review past action items
14:00:36 <mwhahaha> * One off agenda items
14:00:36 <mwhahaha> * Squad status
14:00:36 <mwhahaha> * Bugs & Blueprints
14:00:36 <mwhahaha> * Projects releases or stable backports
14:00:37 <slagle> sdoran: the context is that we are running ansible as a user that has no home dir, and there are some localhost tasks, so we need to override remote_tmp
14:00:37 <openstack> Meeting started Tue Nov  7 14:00:36 2017 UTC and is due to finish in 60 minutes.  The chair is mwhahaha. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:37 <mwhahaha> * Specs
14:00:37 <mwhahaha> * open discussion
14:00:38 <mwhahaha> Anyone can use the #link, #action and #info commands, not just the moderatorǃ
14:00:38 <mwhahaha> Hi everyone! who is around today?
14:00:38 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:41 <openstack> The meeting name has been set to 'tripleo'
14:01:08 <beagles> o/
14:01:09 * mwhahaha watches as everyone disappears
14:01:14 <panda|ruck> o/
14:01:17 <ccamacho> hola!
14:01:18 <fultonj> o/
14:01:27 <jkilpatr> o/
14:01:33 <abishop> o/
14:01:51 <jpich> o/
14:01:58 <jkilpatr> should I bring up quickstart commits here or wait for the TripleO CI meeting?
14:01:58 <oidgar> o/
14:02:13 <slagle> hi
14:02:14 <mwhahaha> jkilpatr: wait until the open discussion in the meeting plz
14:02:37 <jkilpatr> sure
14:02:55 <owalsh> o/
14:03:04 <jfrancoa> o/
14:03:10 <mwhahaha> ok lets get started
14:03:15 <mwhahaha> #topic review past action items
14:03:21 <mwhahaha> EmilienM to prepare an etherpad for tripleo onboarding session in Sydney - DONE
14:03:29 <mwhahaha> well i assume it's done since he's in Sydney
14:03:37 <mwhahaha> marios, matbu, chem provide doc/status of upgrade workflow
14:04:10 <mwhahaha> marios, matbu, chem - any update?
14:04:20 <chem> mwhahaha: yes, there is a review hold on
14:04:40 <openstackgerrit> Sagi Shnaidman proposed openstack/tripleo-quickstart-extras master: Fix devmode by right order of playbooks  https://review.openstack.org/518336
14:04:52 <chem> mwhahaha: https://review.openstack.org/#/c/517916/
14:04:59 <chem> mwhahaha: this is the skeleton
14:05:05 <mwhahaha> cool
14:05:09 <marios> mwhahaha: o/ thought there wouldn't be a meeting /me late ... not that i know of but sounds like chem knows more
14:05:31 <mwhahaha> #action team to review upgrades developer docs https://review.openstack.org/#/c/517916/
14:05:40 <mwhahaha> gfidente put together issues around multiple service instances
14:05:44 <chem> mwhahaha: we plan on filling this as we go, maybe adding TODO and merging this one, no sure how to proceed
14:05:46 <jrist> o/
14:05:59 <mwhahaha> chem: yea let's get the skeleton merged and iterate
14:06:20 <chem> mwhahaha: ack
14:06:27 <gfidente> mwhahaha I added just two lines into the integration sqad etherpad https://etherpad.openstack.org/p/tripleo-integration-squad-status
14:06:28 <jfrancoa> mwhahaha: chem: yes, that's the best way to proceed I think
14:06:33 <chem> jfrancoa: ^
14:06:48 <mwhahaha> gfidente: ok thanks
14:06:53 <mwhahaha> mwhahaha to move medium bugs to queens-3 - DONE
14:07:01 <gfidente> mwhahaha but I'd like to get some feedback from people about those and how to approach it
14:07:10 <mwhahaha> i moved all the unstarted medium bugs to queens-3
14:07:19 <gfidente> https://etherpad.openstack.org/p/tripleo-integration-squad-status lines 10 > 12
14:07:25 <mwhahaha> gfidente: ok probably wouldn't hurt to solicit feedback via the ML
14:07:30 <gfidente> mwhahaha ack
14:08:11 <mwhahaha> #action gfidente to send a note requesting feedback on the ML about multiple service instances issues
14:08:15 <mwhahaha> ci squad to start gathering gate failure metrics and information
14:08:31 <mwhahaha> weshay, adarazs|rover, panda|ruck: any updates on the metrics?
14:09:12 <matbu> o/
14:09:25 <panda|ruck> mwhahaha: sova has a adedicated page for the gate jobs now
14:09:44 <panda|ruck> mwhahaha: we have an aggregate RSS to look at the failures too
14:10:06 <mwhahaha> panda|ruck: cool, please make sure to communicate this information so others can follow
14:10:15 <ooolpbot> URGENT TRIPLEO TASKS NEED ATTENTION
14:10:15 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1727406
14:10:17 <openstack> Launchpad bug 1727406 in tripleo "Zaqar subscriptions failed to report deployment error" [Critical,Triaged] - Assigned to Thomas Herve (therve)
14:10:17 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1729253
14:10:17 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1729586
14:10:18 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1730111
14:10:19 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1730477
14:10:19 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1730671
14:10:20 <openstack> Launchpad bug 1729253 in tripleo "CI issue: Failed to run cinder task ScheduleCreateVolumeTask, No valid backend was found" [Critical,In progress] - Assigned to Martin André (mandre)
14:10:21 <openstack> Launchpad bug 1729586 in tripleo "CI: rdocloud node randomly going offline during jobs" [Critical,Triaged] - Assigned to Gabriele Cerami (gcerami)
14:10:22 <openstack> Launchpad bug 1730111 in tripleo "Volume service hostgroup@tripleo_iscsi failed to start.: CappedVersionUnknown: Unrecoverable Error" [Critical,Triaged]
14:10:23 <openstack> Launchpad bug 1730477 in tripleo "legacy-puppet-syntax-3 job missing on instack-undercloud stable/newton branch" [Critical,Triaged]
14:10:24 <openstack> Launchpad bug 1730671 in tripleo "overcloud installation times out without useful errors in the logs" [Critical,Triaged]
14:10:34 <panda|ruck> mwhahaha: ok, I'll send an email
14:10:41 <mwhahaha> panda|ruck: thanks
14:10:50 <mwhahaha> #topic one off agenda items
14:10:50 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-meeting-items
14:11:06 <mwhahaha> the agenda is empty, so unless anyone has anything they wish to bring up now I'll move on
14:11:35 <jkilpatr> I just wanted to bring up a quickstart patch.
14:11:50 <mwhahaha> jkilpatr: sure which one
14:11:52 <jkilpatr> https://review.openstack.org/#/c/497950/20
14:12:12 <jkilpatr> I'm not really sure where it belongs. But it needs to live somewhere.
14:12:25 <jkilpatr> if that has to be my own repo fine, if it's in extras great. Just want a verdict on that.
14:12:42 <mwhahaha> jkilpatr: personally i think quickstart-extras is the correct place
14:13:18 <mwhahaha> let's get some more folks to weigh in but it seems like something useful for ci/developers
14:13:57 <jkilpatr> yup, trying to get a nonvoting job running so it would be great if I didn'th ave to cherry pick it in.
14:14:10 <fultonj> where else could it go?
14:14:11 <panda|ruck> will this be confined to CI or will it be useful for customers too ?
14:14:11 <trown> seems like there are 2 things in that patch
14:14:29 <jkilpatr> fultonj, in theory you can pull in arbitrary repos containing extra roles for quickstart.
14:14:33 <mwhahaha> it's a dev/qe thing for the most part
14:14:51 <trown> there is an update role, and a disruption role...
14:15:04 <mwhahaha> customers want no distruptions but we need to be able to test that is true
14:15:09 <jkilpatr> trown, yes there was no role for trying the various stack settings change possibilities. I can split that out if you like?
14:15:12 <jkilpatr> it's a semantics issue.
14:15:28 <mwhahaha> trown: panda|ruck, please take the comments to the review
14:15:53 <trown> sure
14:15:55 <panda|ruck> ok
14:16:05 <mwhahaha> ok moving on to squad status
14:16:09 <trown> seems like a good idea... we just dont have wall time for even upgrades atm
14:16:18 <trown> let alone multi upgrades with special code
14:16:28 <mwhahaha> it could be a periodic
14:16:34 <mwhahaha> but anyway
14:16:37 <mwhahaha> #topic Squad status
14:16:37 <mwhahaha> ci
14:16:37 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-ci-squad-scrum
14:16:37 <mwhahaha> upgrade
14:16:38 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-upgrade-squad-status
14:16:38 <mwhahaha> containers
14:16:38 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-containers-squad-status
14:16:38 <mwhahaha> integration
14:16:38 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-integration-squad-status
14:16:39 <mwhahaha> ui/cli
14:16:39 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-ui-cli-squad-status
14:16:40 <mwhahaha> validations
14:16:40 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-validations-squad-status
14:16:41 <mwhahaha> networking
14:16:41 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-networking-squad-status
14:17:10 <mwhahaha> jrist: ui/cli status missing updates
14:17:26 <jrist> sorry, will update
14:17:56 <mwhahaha> workflows - thrash|biab please provide a status when you have a chance
14:18:15 <mwhahaha> everyone else it looks like statuses have been updated, please take a look ad review
14:18:48 <openstackgerrit> Sagi Shnaidman proposed openstack/tripleo-quickstart-extras master: Send ARA statistics to Graphite server  https://review.openstack.org/479882
14:18:57 <mwhahaha> moving on
14:19:00 <mwhahaha> #topic bugs & blueprints
14:19:00 <mwhahaha> #link https://launchpad.net/tripleo/+milestone/queens-2
14:19:00 <mwhahaha> For Queens we currently have 70 (+1) blueprints and about 521 (+24) open bugs. 254 queens-2 and 267 queens-3.
14:19:16 <mwhahaha> so it seems last week hasn't been a good week for bugs as we're +24
14:19:37 <mwhahaha> also CI has been hosed for some time
14:20:29 <mwhahaha> the queue keeps failing with timeouts so we need to work on critical bugs
14:20:46 <mwhahaha> any other bug related items?
14:21:41 <mwhahaha> sounds like nope
14:21:42 <mwhahaha> #topic projects releases or stable backports
14:21:56 <mwhahaha> Any stable backports people need eyes on?
14:22:11 <mwhahaha> I think EmilienM is working through some release issues if they haven't already been resolved
14:22:30 <mwhahaha> I believe we'll be lining up another release for stable stuff next week depending on the status of previous ones
14:24:08 <mwhahaha> moving on
14:24:12 <mwhahaha> #topic specs
14:24:13 <mwhahaha> #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open
14:25:37 <mwhahaha> please take some time to review the openspecs and if you have an open one with comments, please update it
14:26:11 <mwhahaha> dsneddon, lhinds, ccamacho: looks like you folks have some open specs with -1s
14:27:01 <mwhahaha> moving on
14:27:03 <mwhahaha> #topic open discussion
14:27:05 <ccamacho> mwhahaha yeah Ill ask you some questions after the meeting thanks
14:27:41 <mwhahaha> so CI...
14:28:13 <mwhahaha> adarazs|rover, panda|ruck: do we have updates on where we're at with CI and why we're running into so many timeouts?
14:28:47 <adarazs|rover> mwhahaha: nope. I just opened the bug for it, as I didn't find anything more enlightening after looking around in the logs.
14:29:02 <adarazs|rover> here's the bug: https://bugs.launchpad.net/tripleo/+bug/1730671
14:29:03 <openstack> Launchpad bug 1730671 in tripleo "overcloud installation times out without useful errors in the logs" [Critical,Triaged]
14:29:21 <mwhahaha> ok so the gate is 24+ hours behind and it seems that we're hitting a timeout consistently on some jobs
14:30:11 <mwhahaha> i'll take a look at that bug and see if i can add some more information
14:30:43 <adarazs|rover> mwhahaha: thanks!
14:31:07 <panda|ruck> I really need to understand why the zuul queue can grow indefinitely. Chasing gate failures is something we certainly have to do, but it looks like a bug that we ad and add changes to a queue that grow the possibility of getting a job to fail and reset everything again
14:31:08 <mwhahaha> so at this point we're pretty much blocked again in the gate unless we can figure out what's timing out
14:31:51 <mwhahaha> panda|ruck: it grows because people are approving stuff or rechecking. and when a reset occurs that's another 2+ hours of not advancing
14:32:19 <mwhahaha> we probably need a stop order on all aprovals until we figure out what is causing the timeouts
14:32:46 <mwhahaha> unless a patch is going to fix a known blocker bug I think we need to -2 for now
14:32:47 <panda|ruck> mwhahaha: well, zuul could just say "i'll teke the first five and chenge only them together, so the risk of getting a reset is lower"
14:33:15 <Tengu> hmm there's also the random issue with the volumes
14:33:37 <panda|ruck> this kind of optimization zuul is doing, doesn't work very well with our rate of failures
14:33:51 <mwhahaha> well the argument is that we shouldn't have this rate of failures
14:33:54 <mwhahaha> cause we shouldn't
14:34:27 <mwhahaha> so anyway i'm going to send a note about not aproving anything else and we may have to clear the queue to right this
14:34:31 <adarazs|rover> I mean if you have a 99% chance of passing and you try to run it 20 times you're already at only an 80% chance of passing.
14:34:41 <adarazs|rover> and 99% is quite ideal.
14:35:08 <mwhahaha> #action mwhahaha send a note about CI to ML and propsing no more merging of items not specifically critical CI bugs
14:35:08 <panda|ruck> we can keep the rate low, putting a lot of effeort, but I don't think the situation is going to improve in the long run
14:35:09 <jaosorior> Tengu: got logs for it?
14:35:26 <Tengu> jaosorior: there's two issues about that already. 2s, getting logs.
14:36:15 <Tengu> jaosorior: Launchpad bug 1729253 in tripleo "CI issue: Failed to run cinder task ScheduleCreateVolumeTask, No valid backend was found" and "Launchpad bug 1730111 in tripleo "Volume service hostgroup@tripleo_iscsi failed to start.: CappedVersionUnknown: Unrecoverable Error"" - I regularly hit the first one on my apache review.
14:36:17 <openstack> Launchpad bug 1729253 in tripleo "CI issue: Failed to run cinder task ScheduleCreateVolumeTask, No valid backend was found" [Critical,In progress] https://launchpad.net/bugs/1729253 - Assigned to Martin André (mandre)
14:36:18 <openstack> Launchpad bug 1730111 in tripleo "Volume service hostgroup@tripleo_iscsi failed to start.: CappedVersionUnknown: Unrecoverable Error" [Critical,Triaged] https://launchpad.net/bugs/1730111
14:36:41 <ccamacho> mwhahaha what about patches/bug fixes already +2 but waiting for merge?
14:36:52 <ccamacho> I mean +A
14:36:59 <mwhahaha> ccamacho: unless it's a CI bug no we shouldn't recheck
14:37:03 <ccamacho> ack
14:37:29 <mwhahaha> ok anything else?
14:40:34 <mwhahaha> sounds like nope
14:40:36 <mwhahaha> thanks everyone
14:40:38 <mwhahaha> #endmeeting