14:00:11 #startmeeting tripleo 14:00:11 #topic agenda 14:00:12 * Review past action items 14:00:12 * One off agenda items 14:00:12 * Squad status 14:00:12 * Bugs & Blueprints 14:00:12 Meeting started Tue Oct 31 14:00:11 2017 UTC and is due to finish in 60 minutes. The chair is mwhahaha. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:12 * Projects releases or stable backports 14:00:13 * Specs 14:00:13 * open discussion 14:00:13 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:14 Anyone can use the #link, #action and #info commands, not just the moderatorǃ 14:00:14 Hi everyone! who is around today? 14:00:16 The meeting name has been set to 'tripleo' 14:00:19 o/ 14:00:32 o/ 14:00:41 \o/ 14:00:47 shardy, even rabbitmq is also not running. 14:00:54 o/ 14:00:55 o- 14:00:58 o/ 14:01:02 o/ 14:01:05 o/ 14:01:06 o/ 14:01:15 o/ 14:01:17 o/ 14:01:18 o/ 14:01:47 o/ 14:01:53 o/ 14:01:58 o/ 14:02:09 o/ 14:02:34 o/ 14:02:48 o/ 14:02:54 #topic review past action items 14:03:00 EmilienM to prepare an etherpad for tripleo onboarding session in Sydney 14:03:08 oops 14:03:14 let me prepare it today 14:03:15 #action EmilienM to prepare an etherpad for tripleo onboarding session in Sydney 14:03:15 o/ 14:03:26 (postponed) merge https://review.openstack.org/#/c/512569 (gfidente) 14:03:30 o/ 14:03:34 not merged yet 14:03:41 #link https://etherpad.openstack.org/p/SYD-forum-tripleo-onboarding 14:03:51 but +A so it should merge today, i'll keep an eye on it 14:03:59 mwhahaha thanks 14:04:01 me too 14:04:02 team to review https://review.openstack.org/#/c/511509/ asap 14:04:13 is pending merge 14:04:20 (still) 14:04:42 ugh the ci gate queue is 20 hours+ 14:04:45 * mwhahaha makes note for later 14:04:46 owalsh, you around ? 14:04:52 (postponed) marios and matbu to provide an upgrade workflow status 14:05:01 marios, matbu: update? 14:05:30 o/ sorry late 14:05:33 mwhahaha: o/ no, we discussed the overall workflow last week 14:05:36 mwhahaha: hey, so we have been in a upgrade meetup last week 14:06:01 rtailor: yea, what's up? 14:06:06 mwhahaha: we have a lot of action items to adressed, i think the next week we can have some good input to share with the team 14:06:10 mwhahaha: to be honest i'm not sure what the task is for.. is this for tracking major upgrade workflow tasks (in which case this is a good start https://bugs.launchpad.net/tripleo/+bug/1728905 ) 14:06:11 Launchpad bug 1728905 in tripleo "RFE: Feature parity for CLI and UI via mistral/tripleo-common for Queens major upgrade" [High,Triaged] - Assigned to Marios Andreou (marios-b) 14:06:53 Jose Luis Franco proposed openstack/tripleo-heat-templates master: Gate test - do not merged https://review.openstack.org/494593 14:06:54 marios: the original ask was to provide current status of workflow since we wanted to make sure that it's established for folks to start helping out with ASAP 14:07:08 owalsh, The tls stuff, I couldn't deploy overcloud with my changes, its failing everytime due to timeout issue, And I couldn't even get any service running on overcloud-controller 14:07:26 mwhahaha: we have some documentation comming about the expected workflow, a dev doc kind of thing 14:07:41 owalsh, the rabbitmq is also not running and on overcloud compute, I could see rabbitmq not reachable error messages. 14:07:41 k so maybe next week provide a summary for everyone 14:07:54 mwhahaha: ack ok this makes sense... and as others are pointing out it was one of the main things that came up last week too so we'll have action on it soon. will likely be combination of things, docs for starters 14:08:17 mwhahaha: how to for main course 14:08:23 mwhahaha: and ci for desert 14:08:26 #action marios, matbu, chem provide doc/status of upgrade workflow 14:08:30 chem: :D 14:08:30 owalsh, If you want to look at my setup, which I tried to deploy today. 14:08:32 k we'll revisit next week :) 14:08:44 mwhahaha: thx :) 14:08:46 (postponed) document that upgrades on baremetal aren't tested in Pike and Queens - I think done in https://review.openstack.org/#/c/496223/ 14:08:48 rtailor: yea, I can take a look 14:09:24 ok that's all on the past action items 14:09:32 #topic one off agenda items 14:09:32 #link https://etherpad.openstack.org/p/tripleo-meeting-items 14:09:38 (gfidente) is anybody else interested into deploying multiple instances of a service configured differently on different nodes? 14:09:44 #link https://blueprints.launchpad.net/tripleo/+spec/deploy-multiple-ceph-clusters 14:09:55 mwhahaha so let me introduce a bit better the topic 14:10:00 we discussed this earlier, and there are similarities to the requirements for spine/leaf deployments 14:10:08 ah, we had this discussion when we started containers 14:10:41 I ran across a similar request for multiple rabbitmq clusters 14:10:48 for ceph this ends up being a way of deploying multiple instances of the ceph services, with different configurations 14:10:58 ie for notification/mesaging 14:11:04 Yeah, also e.g per service DB instances to enable scaling beyond a single DB cluster 14:11:05 I would like to team up with somebody else so that we can come up with a better implementation and idea 14:11:10 can ceph run configuration from something else than a config file? (etcd, etc) 14:11:11 there are a few use-cases I think 14:11:28 EmilienM I don't think 14:11:47 I also think we can pass multiple --config-file arguments to the OpenStack services 14:11:53 and there must be some sort of override 14:12:06 but that's kind of an hack 14:12:06 EmilienM yah that's possible with ceph as well 14:12:09 well I think role specific parameters already solves part of this 14:12:29 but we're missing part for overriding per-node parameters (e.g things like the bind IPs) 14:12:44 shardy and also we make up the list of monitors or osds out of the service name 14:12:54 but in this case, for each cluster, the list should be made up from the node with a given role 14:13:08 not just those hosting the service 14:13:18 gfidente: yup, but I think we agreed roles_data tags could be used to subcategorize where a service spans multiple roles? 14:13:41 shardy yeah I haven't looked into how to use those yet, but yes, tags are a good idea 14:13:46 but yeah there's a few things we'll need to address, probably need a spec or at least some more prototyping 14:14:12 I like the idea of using the dynamic inventory to group the nodes 14:14:17 so I wonder if it might be useful organize a weekly call with interested people? 14:14:30 which should be possible if we go ahead with the split stack model and drive all configuration via ansible 14:15:06 ++ 14:15:17 gfidente: should we start with an etherpad to enumerate the issues, and then set up a call when we know what there is to discuss? 14:15:27 also we can link WIP patches etc for each issue 14:15:29 shardy ok great yes, will do 14:15:32 shardy: good idea 14:16:15 +1 14:16:21 maybe I can share that during next meeting with some dradts 14:16:27 *drafts 14:17:42 gfidente: do you want an action item reminder or will you do another one-off agenda item? 14:17:50 action item 14:18:18 #action gfidente put together issues around multiple service instances 14:18:21 k done 14:18:31 gfidente: /me interested too 14:18:33 any other one off items anyone want to bring up? 14:18:39 owalsh great 14:19:22 moving on to status 14:19:39 #topic Squad status 14:19:39 ci 14:19:39 #link https://etherpad.openstack.org/p/tripleo-ci-squad-scrum 14:19:40 upgrade 14:19:40 #link https://etherpad.openstack.org/p/tripleo-upgrade-squad-status 14:19:40 containers 14:19:40 #link https://etherpad.openstack.org/p/tripleo-containers-squad-status 14:19:40 integration 14:19:40 #link https://etherpad.openstack.org/p/tripleo-integration-squad-status 14:19:41 ui/cli 14:19:41 #link https://etherpad.openstack.org/p/tripleo-ui-cli-squad-status 14:19:42 validations 14:19:42 #link https://etherpad.openstack.org/p/tripleo-validations-squad-status 14:19:43 networking 14:19:43 #link https://etherpad.openstack.org/p/tripleo-networking-squad-status 14:20:12 mandre, dprince: looks like contatiners status is not updated for this week 14:20:26 gfidente: integration status is also stale 14:20:40 to which squad is the config-download work being assigned? 14:20:41 jeez, screw the DST 14:20:48 mwhahaha yeah we didn't do much work this week 14:20:50 mwhahaha: will do it ASAP 14:20:53 gfidente: k 14:20:56 mandre: thanks 14:21:12 workflows: thrash any pressing issues or status related items? 14:21:33 (we converted scenario004 to ceph-ansible though) 14:21:34 mwhahaha: I can send an update to the list today 14:21:38 would that work? 14:21:45 (so we now have rgw and mds via ceph-ansible tested in CI) 14:21:46 dprince: that or just update the etherpad 14:22:00 dprince: your preference :) 14:22:00 mwhahaha: ack, perhaps both 14:23:14 Merged openstack/tripleo-docs master: Stop using deprecated options for the prepare cmd https://review.openstack.org/516388 14:23:40 ok moving on 14:23:49 #topic bugs & blueprints 14:23:49 #link https://launchpad.net/tripleo/+milestone/queens-2 14:23:49 For Queens we currently have 69 (+0) blueprints and about 497 (+0) open bugs. 490 queens-2 and 7 queens-3. 14:24:01 #action mwhahaha to move medium bugs to queens-3 14:24:09 forgot to do that before i went on pto, i'll get to that today 14:24:11 (I'll follow-up with my unanswered question later) 14:24:26 mwhahaha: I moved all blueprints to queens-2 FYI (last week) 14:24:27 EmilienM: oh that's probably integration 14:24:48 sounds good 14:25:30 any bugs/blueprints that people would like to talk about or highlight? 14:26:28 sounds like nope 14:26:30 #topic projects releases or stable backports 14:26:35 so 14:26:48 we were (and still are?) stuck by some zuul v3 jobs 14:26:51 to produce tarballs 14:26:58 it should be sorted out today or tomorrow 14:27:08 once we have that, I'll re-propose a release for newton/ocata/pike 14:27:19 queens-1 has been sorted out, and pushed 14:27:32 so we hope to have new stable releases by end of this week 14:27:41 second update: 14:28:12 the topic we've discuss about extending the stable-policy for installers in paused now, we'll discuss about that during the Summit and try to make progress in person 14:28:20 I'll keep everyone posted on the mailing list 14:28:25 mwhahaha: done 14:28:32 EmilienM: thanks 14:28:42 #topic specs 14:28:42 #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:29:03 reminder, we should try and get all the specs merged by the end of queens-2 14:29:09 I sent a ML note about that a while back 14:29:16 please take some time to review the open specs 14:29:34 besides that, does anyone want to talk about a specific spec? 14:30:14 sounds like nope 14:30:15 #topic open discussion 14:30:22 soooo where are we at on master promotions 14:30:30 thanks everyone with your hard work on ci and promotions 14:30:37 afiu we have some issues with the build container job 14:30:44 (in RDO CI) 14:30:55 but other than that, all jobs seem to pass well 14:31:15 do we have bugs created for the container build failures? 14:31:18 adarazs|rover, panda|ruck can give more details 14:31:31 amoralej and mandre as well 14:32:00 EmilienM, https://bugs.launchpad.net/kolla/+bug/1728915 14:32:01 Launchpad bug 1728915 in kolla "kolla tries to install python2-cffi but latest version is python-cffi in base" [Undecided,In progress] - Assigned to Alfredo Moralejo (amoralej) 14:32:11 review at https://review.openstack.org/#/c/516643/ 14:32:21 it's about to merge 14:32:39 mandre, oracle jobs timed out, i had to recheck 14:32:40 EmilienM: I just wanted to raise a concern, as the upgrade jobs seem to be passing but that's because they are not running the upgrade step https://bugs.launchpad.net/tripleo/+bug/1728917 14:32:41 Launchpad bug 1728917 in tripleo "Upgrades CI jobs in master not performing upgrade" [Undecided,In progress] - Assigned to Jose Luis Franco (jfrancoa) 14:32:43 mandre: nice. Is that the only known blocker? 14:32:54 jfrancoa: on master? 14:32:58 EmilienM: yes 14:32:58 jfrancoa: I wouldn't look at them now 14:33:00 EmilienM, yes, the only known one 14:33:05 jfrancoa: containers are from Pike 14:33:13 jfrancoa: we don't even have QUeens containers yet 14:33:17 but one of the ovb jobs in last periodic run seems hang in deploy overcloud 14:33:24 jfrancoa: I would wait until we have a promotion and then try out the upgrades again 14:33:25 EmilienM: ok, so it's not so worrying then 14:33:30 jfrancoa: nope. 14:33:42 amoralej: any logs? 14:33:57 EmilienM, still running https://review.rdoproject.org/jenkins/job/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-master/282/console 14:34:12 it's more that 3hr in that step 14:34:19 mhh ok 14:34:28 the other two ovb passed 14:34:34 could it be i/o in the VM? 14:34:36 which looks like a good signal 14:35:06 and there is an intermitent issue with pushing containers 14:35:28 last containers build job in pike failed with it, and we've hit it in master in previous runs also 14:36:49 ok so sounds like we're just waiting on some CI and we'll keep an eye on that 14:36:53 amoralej: the one where it fails to push on localhost? 14:37:13 yeah 14:37:22 have we checked i/o on the machine? 14:37:41 pushing on a container on localhost shouldn't be a problem really 14:37:48 unless the VM is just dying 14:38:23 let me check if we have metrics 14:38:34 we can check that offline 14:39:05 on #rdo probably 14:39:09 so the other thing to talk about, what is the current status of the gate pipeline? do we know why we consistenly keep resetting the gate? 14:39:46 * EmilienM hasn't checked yet this morning - looks at adarazs|rover / panda|ruck 14:40:01 Jose Luis Franco proposed openstack/tripleo-heat-templates stable/pike: Gate test - do not merge https://review.openstack.org/516700 14:40:06 mwhahaha: I was wondering about that too, I was looking at one gate yesterday and it doesn't make much sense what zuul does with the gate jobs, killing them after 2 hours and starting them again. 14:40:07 it was 20hours earlier in the meeting now everything says 0 so i wonder if they reset zuul 14:40:23 adarazs|rover: usually that happens if a job fails in teh queue, the reset get reset 14:40:29 adarazs|rover: yes it does make sense, we talked about it last week 14:40:44 do we have a way to track & research all gate failures? 14:40:45 zuul makes sure patches about to merge don't break another patch also about to merge 14:41:04 that's why we have the shared queue 14:41:11 and that's why we need to look at every failure 14:41:24 1) make sure they are reported in Launchpad 14:41:26 i've asked infra and they don't have anything, but do we have that information available to us via the ci status pages or the extend status tool? 14:41:29 2) make sure they are in elastic recheck 14:41:53 EmilienM: ah, okay, I didn't follow that discussion. 14:41:56 we should really do 1) and 2) before any recheck 14:42:05 adarazs|rover: it's how zuul works 14:42:05 Gael Chamoulaud proposed openstack/tripleo-quickstart-extras master: Add pre-deployment negative tests for validations https://review.openstack.org/488495 14:42:10 adarazs|rover: I can find the doc if you need 14:43:31 adarazs|rover: https://docs.openstack.org/infra/zuul/user/gating.html 14:43:46 adarazs|rover, panda|ruck: so one of the things I would like to see from the CI squad is the review/metrics around these gate failures. Do you think you folks can look into getting this info 14:44:09 EmilienM: I think I just haven't looked closely at how it works until now. :) 14:44:14 EmilienM: thanks for the docs. 14:44:23 cool! 14:45:25 Gael Chamoulaud proposed openstack/tripleo-quickstart-extras master: Add post-deployment negative tests for validations https://review.openstack.org/504014 14:46:30 Gael Chamoulaud proposed openstack/tripleo-quickstart-extras master: Add timestamper_cmd for tripleo-validations https://review.openstack.org/490013 14:46:35 mwhahaha: hm, so you want some stats on why gate jobs fail; I don't think we have anything like that as they are the same jobs as the check jobs, so I think their stats gets merged in the rest of the jobs on SOVA. 14:46:57 mwhahaha: we would need to name them differently to discern the failures of gates vs. check jobs. 14:47:13 adarazs|rover: I assume this info is in SOVA but yes we need to surface the gate failures 14:47:16 Gael Chamoulaud proposed openstack/tripleo-quickstart-extras master: Fix ansible-lint.sh to check playbooks https://review.openstack.org/446525 14:47:21 adarazs|rover: I think what is really needed *now* is to track failures in *gate* and report them correctly in Launchpad and elastic recheck (queries) 14:47:23 adarazs|rover: it's very important that we actively look at those immediately 14:47:53 #action ci squad to start gathering gate failure metrics and information 14:49:11 any other topics or things folks would like to discuss? 14:50:27 sounds like nope 14:50:30 thanks everyone 14:50:31 #endmeeting