14:00:11 <mwhahaha> #startmeeting tripleo
14:00:11 <mwhahaha> #topic agenda
14:00:12 <mwhahaha> * Review past action items
14:00:12 <mwhahaha> * One off agenda items
14:00:12 <mwhahaha> * Squad status
14:00:12 <mwhahaha> * Bugs & Blueprints
14:00:12 <openstack> Meeting started Tue Oct 31 14:00:11 2017 UTC and is due to finish in 60 minutes.  The chair is mwhahaha. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:12 <mwhahaha> * Projects releases or stable backports
14:00:13 <mwhahaha> * Specs
14:00:13 <mwhahaha> * open discussion
14:00:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:14 <mwhahaha> Anyone can use the #link, #action and #info commands, not just the moderatorǃ
14:00:14 <mwhahaha> Hi everyone! who is around today?
14:00:16 <openstack> The meeting name has been set to 'tripleo'
14:00:19 <fultonj> o/
14:00:32 <akrivoka> o/
14:00:41 <ccamacho> \o/
14:00:47 <rtailor> shardy, even rabbitmq is also not running.
14:00:54 <jtomasek> o/
14:00:55 <beagles> o-
14:00:58 <atoth> o/
14:01:02 <EmilienM> o/
14:01:05 <dpeacock> o/
14:01:06 <jfrancoa> o/
14:01:15 <d0ugal> o/
14:01:17 <panda|ruck> o/
14:01:18 <owalsh> o/
14:01:47 <trown> o/
14:01:53 <matbu> o/
14:01:58 <gfidente> o/
14:02:09 <chem> o/
14:02:34 <jpich> o/
14:02:48 <saneax> o/
14:02:54 <mwhahaha> #topic review past action items
14:03:00 <mwhahaha> EmilienM to prepare an etherpad for tripleo onboarding session in Sydney
14:03:08 <EmilienM> oops
14:03:14 <EmilienM> let me prepare it today
14:03:15 <mwhahaha> #action EmilienM to prepare an etherpad for tripleo onboarding session in Sydney
14:03:15 <sshnaidm> o/
14:03:26 <mwhahaha> (postponed) merge https://review.openstack.org/#/c/512569 (gfidente)
14:03:30 <jrist> o/
14:03:34 <mwhahaha> not merged yet
14:03:41 <EmilienM> #link https://etherpad.openstack.org/p/SYD-forum-tripleo-onboarding
14:03:51 <mwhahaha> but +A so it should merge today, i'll keep an eye on it
14:03:59 <gfidente> mwhahaha thanks
14:04:01 <gfidente> me too
14:04:02 <mwhahaha> team to review https://review.openstack.org/#/c/511509/ asap
14:04:13 <mwhahaha> is pending merge
14:04:20 <mwhahaha> (still)
14:04:42 <mwhahaha> ugh the ci gate queue is 20 hours+
14:04:45 * mwhahaha makes note for later
14:04:46 <rtailor> owalsh, you around ?
14:04:52 <mwhahaha> (postponed) marios and matbu to provide an upgrade workflow status
14:05:01 <mwhahaha> marios, matbu: update?
14:05:30 <shardy> o/ sorry late
14:05:33 <marios> mwhahaha: o/ no, we discussed the overall workflow last week
14:05:36 <matbu> mwhahaha: hey, so we have been in a upgrade meetup last week
14:06:01 <owalsh> rtailor: yea, what's up?
14:06:06 <matbu> mwhahaha: we have a lot of action items to adressed, i think the next week we can have some good input to share with the team
14:06:10 <marios> mwhahaha: to be honest i'm not sure what the task is for.. is this for tracking major upgrade workflow tasks (in which case this is a good start https://bugs.launchpad.net/tripleo/+bug/1728905 )
14:06:11 <openstack> Launchpad bug 1728905 in tripleo "RFE: Feature parity for CLI and UI via mistral/tripleo-common for Queens major upgrade" [High,Triaged] - Assigned to Marios Andreou (marios-b)
14:06:53 <openstackgerrit> Jose Luis Franco proposed openstack/tripleo-heat-templates master: Gate test - do not merged  https://review.openstack.org/494593
14:06:54 <mwhahaha> marios: the original ask was to provide current status of workflow since we wanted to make sure that it's established for folks to start helping out with ASAP
14:07:08 <rtailor> owalsh, The tls stuff, I couldn't deploy overcloud with my changes, its failing everytime due to timeout issue, And I couldn't even get any service running on overcloud-controller
14:07:26 <chem> mwhahaha: we have some documentation comming about the expected workflow, a dev doc kind of thing
14:07:41 <rtailor> owalsh, the rabbitmq is also not running and on overcloud compute, I could see rabbitmq not reachable error messages.
14:07:41 <mwhahaha> k so maybe next week provide a summary for everyone
14:07:54 <marios> mwhahaha: ack ok this makes sense... and as others are pointing out it was one of the main things that came up last week too so we'll have action on it soon. will likely be combination of things, docs for starters
14:08:17 <chem> mwhahaha: how to for main course
14:08:23 <chem> mwhahaha: and ci for desert
14:08:26 <mwhahaha> #action marios, matbu, chem provide doc/status of upgrade workflow
14:08:30 <marios> chem: :D
14:08:30 <rtailor> owalsh, If you want to look at my setup, which I tried to deploy today.
14:08:32 <mwhahaha> k we'll revisit next week :)
14:08:44 <matbu> mwhahaha: thx :)
14:08:46 <mwhahaha> (postponed) document that upgrades on baremetal aren't tested in Pike and Queens - I think done in https://review.openstack.org/#/c/496223/
14:08:48 <owalsh> rtailor: yea, I can take a look
14:09:24 <mwhahaha> ok that's all on the past action items
14:09:32 <mwhahaha> #topic one off agenda items
14:09:32 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-meeting-items
14:09:38 <mwhahaha> (gfidente) is anybody else interested into deploying multiple instances of a service configured differently on different nodes?
14:09:44 <mwhahaha> #link https://blueprints.launchpad.net/tripleo/+spec/deploy-multiple-ceph-clusters
14:09:55 <gfidente> mwhahaha so let me introduce a bit better the topic
14:10:00 <shardy> we discussed this earlier, and there are similarities to the requirements for spine/leaf deployments
14:10:08 <EmilienM> ah, we had this discussion when we started containers
14:10:41 <mwhahaha> I ran across a similar request for multiple rabbitmq clusters
14:10:48 <gfidente> for ceph this ends up being a way of deploying multiple instances of the ceph services, with different configurations
14:10:58 <mwhahaha> ie for notification/mesaging
14:11:04 <shardy> Yeah, also e.g per service DB instances to enable scaling beyond a single DB cluster
14:11:05 <gfidente> I would like to team up with somebody else so that we can come up with a better implementation and idea
14:11:10 <EmilienM> can ceph run configuration from something else than a config file? (etcd, etc)
14:11:11 <shardy> there are a few use-cases I think
14:11:28 <gfidente> EmilienM I don't think
14:11:47 <EmilienM> I also think we can pass multiple --config-file arguments to the OpenStack services
14:11:53 <EmilienM> and there must be some sort of override
14:12:06 <EmilienM> but that's kind of an hack
14:12:06 <gfidente> EmilienM yah that's possible with ceph as well
14:12:09 <shardy> well I think role specific parameters already solves part of this
14:12:29 <shardy> but we're missing part for overriding per-node parameters (e.g things like the bind IPs)
14:12:44 <gfidente> shardy and also we make up the list of monitors or osds out of the service name
14:12:54 <gfidente> but in this case, for each cluster, the list should be made up from the node with a given role
14:13:08 <gfidente> not just those hosting the service
14:13:18 <shardy> gfidente: yup, but I think we agreed roles_data tags could be used to subcategorize where a service spans multiple roles?
14:13:41 <gfidente> shardy yeah I haven't looked into how to use those yet, but yes, tags are a good idea
14:13:46 <shardy> but yeah there's a few things we'll need to address, probably need a spec or at least some more prototyping
14:14:12 <shardy> I like the idea of using the dynamic inventory to group the nodes
14:14:17 <gfidente> so I wonder if it might be useful organize a weekly call with interested people?
14:14:30 <shardy> which should be possible if we go ahead with the split stack model and drive all configuration via ansible
14:15:06 <EmilienM> ++
14:15:17 <shardy> gfidente: should we start with an etherpad to enumerate the issues, and then set up a call when we know what there is to discuss?
14:15:27 <shardy> also we can link WIP patches etc for each issue
14:15:29 <gfidente> shardy ok great yes, will do
14:15:32 <EmilienM> shardy: good idea
14:16:15 <fultonj> +1
14:16:21 <gfidente> maybe I can share that during next meeting with some dradts
14:16:27 <gfidente> *drafts
14:17:42 <mwhahaha> gfidente: do you want an action item reminder or will you do another one-off agenda item?
14:17:50 <gfidente> action item
14:18:18 <mwhahaha> #action gfidente put together issues around multiple service instances
14:18:21 <mwhahaha> k done
14:18:31 <owalsh> gfidente: /me interested too
14:18:33 <mwhahaha> any other one off items anyone want to bring up?
14:18:39 <gfidente> owalsh great
14:19:22 <mwhahaha> moving on to status
14:19:39 <mwhahaha> #topic Squad status
14:19:39 <mwhahaha> ci
14:19:39 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-ci-squad-scrum
14:19:40 <mwhahaha> upgrade
14:19:40 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-upgrade-squad-status
14:19:40 <mwhahaha> containers
14:19:40 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-containers-squad-status
14:19:40 <mwhahaha> integration
14:19:40 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-integration-squad-status
14:19:41 <mwhahaha> ui/cli
14:19:41 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-ui-cli-squad-status
14:19:42 <mwhahaha> validations
14:19:42 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-validations-squad-status
14:19:43 <mwhahaha> networking
14:19:43 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-networking-squad-status
14:20:12 <mwhahaha> mandre, dprince: looks like contatiners status is not updated for this week
14:20:26 <mwhahaha> gfidente: integration status is also stale
14:20:40 <EmilienM> to which squad is the config-download work being assigned?
14:20:41 <mandre> jeez, screw the DST
14:20:48 <gfidente> mwhahaha yeah we didn't do much work this week
14:20:50 <mandre> mwhahaha: will do it ASAP
14:20:53 <mwhahaha> gfidente: k
14:20:56 <mwhahaha> mandre: thanks
14:21:12 <mwhahaha> workflows: thrash any pressing issues or status related items?
14:21:33 <gfidente> (we converted scenario004 to ceph-ansible though)
14:21:34 <dprince> mwhahaha: I can send an update to the list today
14:21:38 <dprince> would that work?
14:21:45 <gfidente> (so we now have rgw and mds via ceph-ansible tested in CI)
14:21:46 <mwhahaha> dprince: that or just update the etherpad
14:22:00 <mwhahaha> dprince: your preference :)
14:22:00 <dprince> mwhahaha: ack, perhaps both
14:23:14 <openstackgerrit> Merged openstack/tripleo-docs master: Stop using deprecated options for the prepare cmd  https://review.openstack.org/516388
14:23:40 <mwhahaha> ok moving on
14:23:49 <mwhahaha> #topic bugs & blueprints
14:23:49 <mwhahaha> #link https://launchpad.net/tripleo/+milestone/queens-2
14:23:49 <mwhahaha> For Queens we currently have 69 (+0) blueprints and about 497 (+0) open bugs. 490 queens-2 and 7 queens-3.
14:24:01 <mwhahaha> #action mwhahaha to move medium bugs to queens-3
14:24:09 <mwhahaha> forgot to do that before i went on pto, i'll get to that today
14:24:11 <EmilienM> (I'll follow-up with my unanswered question later)
14:24:26 <EmilienM> mwhahaha: I moved all blueprints to queens-2 FYI (last week)
14:24:27 <mwhahaha> EmilienM: oh that's probably integration
14:24:48 <EmilienM> sounds good
14:25:30 <mwhahaha> any bugs/blueprints that people would like to talk about or highlight?
14:26:28 <mwhahaha> sounds like nope
14:26:30 <mwhahaha> #topic projects releases or stable backports
14:26:35 <EmilienM> so
14:26:48 <EmilienM> we were (and still are?) stuck by some zuul v3 jobs
14:26:51 <EmilienM> to produce tarballs
14:26:58 <EmilienM> it should be sorted out today or tomorrow
14:27:08 <EmilienM> once we have that, I'll re-propose a release for newton/ocata/pike
14:27:19 <EmilienM> queens-1 has been sorted out, and pushed
14:27:32 <EmilienM> so we hope to have new stable releases by end of this week
14:27:41 <EmilienM> second update:
14:28:12 <EmilienM> the topic we've discuss about extending the stable-policy for installers in paused now, we'll discuss about that during the Summit and try to make progress in person
14:28:20 <EmilienM> I'll keep everyone posted on the mailing list
14:28:25 <EmilienM> mwhahaha: done
14:28:32 <mwhahaha> EmilienM: thanks
14:28:42 <mwhahaha> #topic specs
14:28:42 <mwhahaha> #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open
14:29:03 <mwhahaha> reminder, we should try and get all the specs merged by the end of queens-2
14:29:09 <mwhahaha> I sent a ML note about that a while back
14:29:16 <mwhahaha> please take some time to review the open specs
14:29:34 <mwhahaha> besides that, does anyone want to talk about a specific spec?
14:30:14 <mwhahaha> sounds like nope
14:30:15 <mwhahaha> #topic open discussion
14:30:22 <mwhahaha> soooo where are we at on master promotions
14:30:30 <jrist> thanks everyone with your hard work on ci and promotions
14:30:37 <EmilienM> afiu we have some issues with the build container job
14:30:44 <EmilienM> (in RDO CI)
14:30:55 <EmilienM> but other than that, all jobs seem to pass well
14:31:15 <mwhahaha> do we have bugs created for the container build failures?
14:31:18 <EmilienM> adarazs|rover, panda|ruck can give more details
14:31:31 <EmilienM> amoralej and mandre as well
14:32:00 <amoralej> EmilienM, https://bugs.launchpad.net/kolla/+bug/1728915
14:32:01 <openstack> Launchpad bug 1728915 in kolla "kolla tries to install python2-cffi but latest version is python-cffi in base" [Undecided,In progress] - Assigned to Alfredo Moralejo (amoralej)
14:32:11 <mandre> review at https://review.openstack.org/#/c/516643/
14:32:21 <mandre> it's about to merge
14:32:39 <amoralej> mandre, oracle jobs timed out, i had to recheck
14:32:40 <jfrancoa> EmilienM: I just wanted to raise a concern, as the upgrade jobs seem to be passing but that's because they are not running the upgrade step https://bugs.launchpad.net/tripleo/+bug/1728917
14:32:41 <openstack> Launchpad bug 1728917 in tripleo "Upgrades CI jobs in master not performing upgrade" [Undecided,In progress] - Assigned to Jose Luis Franco (jfrancoa)
14:32:43 <EmilienM> mandre: nice. Is that the only known blocker?
14:32:54 <EmilienM> jfrancoa: on master?
14:32:58 <jfrancoa> EmilienM: yes
14:32:58 <EmilienM> jfrancoa: I wouldn't look at them now
14:33:00 <amoralej> EmilienM, yes, the only known one
14:33:05 <EmilienM> jfrancoa: containers are from Pike
14:33:13 <EmilienM> jfrancoa: we don't even have QUeens containers yet
14:33:17 <amoralej> but one of the ovb jobs in last periodic run seems hang in deploy overcloud
14:33:24 <EmilienM> jfrancoa: I would wait until we have a promotion and then try out the upgrades again
14:33:25 <jfrancoa> EmilienM: ok, so it's not so worrying then
14:33:30 <EmilienM> jfrancoa: nope.
14:33:42 <EmilienM> amoralej: any logs?
14:33:57 <amoralej> EmilienM, still running https://review.rdoproject.org/jenkins/job/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-master/282/console
14:34:12 <amoralej> it's more that 3hr in that step
14:34:19 <EmilienM> mhh ok
14:34:28 <amoralej> the other two ovb passed
14:34:34 <EmilienM> could it be i/o in the VM?
14:34:36 <amoralej> which looks like a good signal
14:35:06 <amoralej> and there is an intermitent issue with pushing containers
14:35:28 <amoralej> last containers build job in pike failed with it, and we've hit it in master in previous runs also
14:36:49 <mwhahaha> ok so sounds like we're just waiting on some CI and we'll keep an eye on that
14:36:53 <EmilienM> amoralej: the one where it fails to push on localhost?
14:37:13 <amoralej> yeah
14:37:22 <EmilienM> have we checked i/o on the machine?
14:37:41 <EmilienM> pushing on a container on localhost shouldn't be a problem really
14:37:48 <EmilienM> unless the VM is just dying
14:38:23 <amoralej> let me check if we have metrics
14:38:34 <EmilienM> we can check that offline
14:39:05 <EmilienM> on #rdo probably
14:39:09 <mwhahaha> so the other thing to talk about, what is the current status of the gate pipeline? do we know why we consistenly keep resetting the gate?
14:39:46 * EmilienM hasn't checked yet this morning - looks at adarazs|rover / panda|ruck
14:40:01 <openstackgerrit> Jose Luis Franco proposed openstack/tripleo-heat-templates stable/pike: Gate test - do not merge  https://review.openstack.org/516700
14:40:06 <adarazs|rover> mwhahaha: I was wondering about that too, I was looking at one gate yesterday and it doesn't make much sense what zuul does with the gate jobs, killing them after 2 hours and starting them again.
14:40:07 <mwhahaha> it was 20hours earlier in the meeting now everything says 0 so i wonder if they reset zuul
14:40:23 <mwhahaha> adarazs|rover: usually that happens if a job fails in teh queue, the reset get reset
14:40:29 <EmilienM> adarazs|rover: yes it does make sense, we talked about it last week
14:40:44 <mwhahaha> do we have a way to track & research all gate failures?
14:40:45 <EmilienM> zuul makes sure patches about to merge don't break another patch also about to merge
14:41:04 <EmilienM> that's why we have the shared queue
14:41:11 <EmilienM> and that's why we need to look at every failure
14:41:24 <EmilienM> 1) make sure they are reported in Launchpad
14:41:26 <mwhahaha> i've asked infra and they don't have anything, but do we have that information available to us via the ci status pages or the extend status tool?
14:41:29 <EmilienM> 2) make sure they are in elastic recheck
14:41:53 <adarazs|rover> EmilienM: ah, okay, I didn't follow that discussion.
14:41:56 <EmilienM> we should really do 1) and 2) before any recheck
14:42:05 <EmilienM> adarazs|rover: it's how zuul works
14:42:05 <openstackgerrit> Gael Chamoulaud proposed openstack/tripleo-quickstart-extras master: Add pre-deployment negative tests for validations  https://review.openstack.org/488495
14:42:10 <EmilienM> adarazs|rover: I can find the doc if you need
14:43:31 <EmilienM> adarazs|rover: https://docs.openstack.org/infra/zuul/user/gating.html
14:43:46 <mwhahaha> adarazs|rover, panda|ruck: so one of the things I would like to see from the CI squad is the review/metrics around these gate failures. Do you think you folks can look into getting this info
14:44:09 <adarazs|rover> EmilienM: I think I just haven't looked closely at how it works until now. :)
14:44:14 <adarazs|rover> EmilienM: thanks for the docs.
14:44:23 <EmilienM> cool!
14:45:25 <openstackgerrit> Gael Chamoulaud proposed openstack/tripleo-quickstart-extras master: Add post-deployment negative tests for validations  https://review.openstack.org/504014
14:46:30 <openstackgerrit> Gael Chamoulaud proposed openstack/tripleo-quickstart-extras master: Add timestamper_cmd for tripleo-validations  https://review.openstack.org/490013
14:46:35 <adarazs|rover> mwhahaha: hm, so you want some stats on why gate jobs fail; I don't think we have anything like that as they are the same jobs as the check jobs, so I think their stats gets merged in the rest of the jobs on SOVA.
14:46:57 <adarazs|rover> mwhahaha: we would need to name them differently to discern the failures of gates vs. check jobs.
14:47:13 <mwhahaha> adarazs|rover: I assume this info is in SOVA but yes we need to surface the gate failures
14:47:16 <openstackgerrit> Gael Chamoulaud proposed openstack/tripleo-quickstart-extras master: Fix ansible-lint.sh to check playbooks  https://review.openstack.org/446525
14:47:21 <EmilienM> adarazs|rover: I think what is really needed *now* is to track failures in *gate* and report them correctly in Launchpad and elastic recheck (queries)
14:47:23 <mwhahaha> adarazs|rover: it's very important that we actively look at those immediately
14:47:53 <mwhahaha> #action ci squad to start gathering gate failure metrics and information
14:49:11 <mwhahaha> any other topics or things folks would like to discuss?
14:50:27 <mwhahaha> sounds like nope
14:50:30 <mwhahaha> thanks everyone
14:50:31 <mwhahaha> #endmeeting