14:01:16 #startmeeting tripleo 14:01:16 #topic agenda 14:01:16 * Review past action items 14:01:16 * One off agenda items 14:01:16 * Squad status 14:01:17 * Bugs & Blueprints 14:01:17 * Projects releases or stable backports 14:01:18 * Specs 14:01:18 * open discussion 14:01:19 Anyone can use the #link, #action and #info commands, not just the moderatorǃ 14:01:19 Hi everyone! who is around today? 14:01:21 shardy: mind commenting the symlink vs copy saga? :) 14:01:22 Meeting started Tue Feb 13 14:01:16 2018 UTC and is due to finish in 60 minutes. The chair is mwhahaha. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:23 o/ 14:01:23 o/ 14:01:24 o/ 14:01:24 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:27 The meeting name has been set to 'tripleo' 14:01:27 o/ 14:01:28 o/ 14:01:32 o/ 14:01:32 o/ 14:01:33 hi 14:01:51 o/ 14:01:55 o/ 14:02:01 hi 14:02:09 o/ 14:02:57 o/ 14:03:57 ok lets get started 14:04:00 #topic review past action items 14:04:00 tbarron or EmilienM to send a note to ML to start discussion on python3 testing 14:04:16 I don't think we have done that 14:04:33 nope 14:04:37 #action tbarron or EmilienM to send a note to ML to start discussion on python3 testing 14:04:48 o/ 14:04:49 EmilienM to help jpich with member role for horizon 14:04:56 Done! Thanks :) 14:04:59 merged! 14:05:03 cool 14:05:06 did we backport it? 14:05:18 I think it was only in not-yet-branches projects 14:05:21 but I'll double check 14:05:25 *branched 14:05:55 jpich: do we need the fix in queens only? 14:06:00 #link https://review.openstack.org/#/c/541300/ 14:06:14 I think queens is highest priority for that fix 14:06:18 o/ 14:06:28 but do we need it Pike, ocata and newton? 14:06:35 EmilienM: Yes, I believe so (since the role creation was only removed in Keystone in Queens) 14:06:52 I don't think so, _member_ should still get created normally there 14:07:22 jpich: we removed it long time ago, isn't? 14:07:49 we'll verify it, no backport for no 14:07:51 now* 14:07:59 EmilienM: Keystone stopped creating _member_ as part of https://review.openstack.org/#/c/522461/ and that wasn't backported, so AFAIU we're good 14:08:09 ok good! 14:08:35 mwhahaha: next 14:08:45 #topic one off agenda items 14:08:46 #link https://etherpad.openstack.org/p/tripleo-meeting-items 14:08:53 (slagle) new squad: config-download 14:09:11 i guess we can replace containers squad with config-download? 14:09:15 hi, just wanted to bring to everyone's attention that i'm starting a new squad around config-download 14:09:23 we'll be working on https://blueprints.launchpad.net/tripleo/+spec/non-config-download-deprecate and its deps 14:09:26 w00t 14:09:31 so far gfidente and beagles have agreed to do some work 14:09:36 let me know if you want to join my squad 14:09:48 do we have a tee-shirt? 14:09:49 we're going to have a pub meetup with free beers in dublin 14:09:57 just fyi 14:10:05 slagle: plz update the tripleo-docs as well with the new squad 14:10:08 if you haven't already 14:10:10 mwhahaha: I think there's still some containers related activity around the undercloud and ultimately using the same approach for a single node installer 14:10:17 * dtantsur heard free beers 14:10:22 mwhahaha: is there stuff in docs? 14:10:26 i did do: https://review.openstack.org/#/c/543631/ 14:10:36 but I guess the original containers effort has completed so the squad could be renamed or reorganized 14:10:36 slagle, would you mind to do deep dive about this feature and its debug? 14:10:37 i'll check the docs, but i thought it was policy 14:10:44 slagle: yea that 14:10:52 slagle: thought it was in docs 14:10:54 sshnaidm|ruck: yea, i plan to 14:11:09 ++ really good idea 14:11:11 thanks, would be great! 14:11:13 shardy: ok. i'm ok with keeping it if we have someone to step up and drive the containers squad 14:11:13 #action slagle to schedule to deep-dive about config-download 14:11:16 slagle I am in at least for the ceph stuff 14:11:27 +1 14:11:38 dprince: ^^ did you want to repurpose the containers squad for the single-node use-case or start something new? 14:11:39 slagle but how different is that from external_deploy_task ? 14:11:50 slagle or is it the same thing we're talking? 14:11:54 gfidente: that's what it is. same thing 14:11:59 slagle ok cook 14:12:04 that is all from me 14:12:41 slagle: thanks 14:12:43 slagle: i'm interested... esp. around upgrades and aligning the way config download is used (we have some discussion here for anyone interested https://review.openstack.org/#/c/526656/4/doc/source/install/developer/upgrades/fast_fw_upgrade.rst ) 14:12:55 slagle: (and i'll take your beer thanks) 14:13:05 :) 14:13:09 (sshnaidm) about policy of merging patches to tripleo-quickstart and tripleo-quickstart-extras - it could be done by tripleo CI cores only 14:13:19 shardy: yes, we can prototype the single node stuff 14:13:26 mwhahaha, yeah, I'd like to propose it 14:13:27 sshnaidm|ruck: for what it is worth, I'm not sure I agree 14:13:36 if things keep breaking, I believe we lack proper coverage 14:13:36 shardy: I would like to treat it separate from 'containerized undercloud' as I think it is a unique case 14:13:45 or proper voting 14:13:46 o/ 14:13:50 I would push forward a +2 from CI team is sufficient 14:14:02 I have 3 reasons: 14:14:05 1) tripleo-quickstart is used in a few CI systems, not only in upstream, tripleo-ci core ususally aware of it and can check the consequences 14:14:05 2) tripleo-quickstart-extras has common roles which are used in all jobs, and even small functional change there can affect a lot of jobs 14:14:05 3) we have some strategy, code conventions and good practices in quickstart, patches should be aligned with that and tripleo-ci cores can address it in reviews 14:14:12 I think sshnaidm|ruck refers to the containerized undercloud patch that broke some jobs 14:14:16 slagle: is it possible to use config-download on a pre-deployed overcloud? does it work with updates? 14:14:20 EmilienM, it still breaks 14:14:24 sshnaidm|ruck: 1) needs coverage 14:14:29 2) needs coverage 14:14:40 3) needs better communication of these conventions and not relying on just a few people 14:14:41 jaosorior: yes 14:14:42 sshnaidm|ruck: I know, sorry for that 14:14:50 slagle: a colleague was trying to test it out and got something like this: Deployment NetworkDeployment with group:os-apply-config not supported with config-download. 14:14:58 jaosorior: please sync with me post-meeting 14:15:22 (3) should probably be enforced via some pep style job that ensure consistent style 14:15:45 sshnaidm|ruck: i agree that the CI team should review the things. But we need to be relying on automation to prevent breakages 14:15:45 although IMHO we have a lot of room for improvement in *quickstart due to all the j2 templated bash... 14:15:46 shardy, not really trivial 14:16:15 FWIW, ovb jobs are going to be voting this week probably: https://review.openstack.org/543654 14:16:33 EmilienM, that's really great 14:16:56 sshnaidm|ruck: I would also mention that the CI team has also been responsible for breakges merging their own reviewed code. So we need something that doesn't rely on people 14:17:03 but until we have all coverage (if we are capable to have it at all) it could be a valid step 14:17:27 It would be better to spend the time to RCA the breakages and investiage automated methods of mitigation 14:17:49 instead of reducing the scope of +2 in CI projects, I would rather increase education, documentation and trust. 14:18:03 restricting the projects ability to move forward based on a small group of people is not ideal 14:18:09 EmilienM: +1, and coverage to reduce the risk of landing patches 14:18:16 as mwhahaha has already said 14:18:17 and branching... 14:18:27 should we branch quickstart-extra? 14:18:34 +1 branching 14:18:41 that's what I meant, holser 14:18:41 I think we have active ML about branching 14:18:45 thanks for supporting idea 14:18:47 we are hurting outselves having to support the world in a single (or two) repositotires 14:18:53 +1 for branching 14:18:56 I think the quickstart review velocity has improved a lot in recent months, I personally would be sorry to see it go backwards in that regard 14:19:23 branching is not ideal and has its own disadvantages, it's doesn't solve all problems 14:19:24 shardy - What's your opinion about branching quickstart ? 14:19:24 at one time landing patches was very difficult and quite a demotivator for contributors 14:19:50 Dougal Matthews proposed openstack/instack-undercloud master: Catch APIException exceptions raised by Mistral https://review.openstack.org/543916 14:20:07 +1 for branching (at least extras, maybe not quickstart itself?) 14:20:08 shardy: correct, it's getting much better now. 14:20:09 holser: I think it makes sense for -extras, but I'd like -quickstart to just provide a release agnostic bootstrap of the VM test environment 14:20:17 +1 14:20:20 for extras only 14:20:21 sshnaidm|ruck: I think this is an important topic but I don't agree in the proposed solution. Perhaps we can move it to the ML and discuss other possible solutions. I think there are several possible solutions 14:20:26 Alan Bishop proposed openstack/puppet-tripleo master: Avoid hard-coded settings in Cinder HA containers https://review.openstack.org/542424 14:20:27 we could do changes affecting master much more easily, not having to worry what we're breaking in stable jobs 14:20:33 mwhahaha, ok, let's move on 14:21:00 #action sshnaidm to raise ML post about reducing risk with quickstart and quickstart-extras patches 14:21:00 jistr: holser bogdando +1 for branching 14:21:06 (sshnaidm) I'd prefer to revert: https://review.openstack.org/#/c/517444/ we still can't fix all issues and nobody knows how many of them and how many techdebts we created by this patch 14:21:33 btw, nobody in CI team except weshay_PTO knows what happens there 14:21:45 +1 for revert 14:21:47 including me 14:21:58 I would say that patch has been around for a long time, so it's not like it's been done in secret 14:22:11 that being said revert if it's breaking 14:22:16 yeah we started it in November 14:22:30 what job is still broken now? 14:22:39 mwhahaha, people could be added as reviewers at least.. 14:23:03 because so far I see green jobs this morning 14:23:04 EmilienM, undercloud containers run in ALL jobs, which cause timeouts 14:23:04 sshnaidm|ruck: and the CI team could pay attention to the open reviews 14:23:13 sshnaidm|ruck: come on, we have been discussing on this work on IRC, ML, Gerrit 14:23:16 sshnaidm|ruck: the onus is not just on the people proposing the patch 14:23:16 mwhahaha, we have priorities 14:23:32 sshnaidm|ruck: which should include paying attention to reviews for the projects you want to own 14:23:51 undercloud containers causing timeouts? 14:24:00 mwhahaha, it sounds like blaming team for merging this patch 14:24:12 EmilienM, yeah, it takes time 14:24:18 sshnaidm|ruck: no i'm not, but you're claiming ignorance on something that wasn't done in secret 14:24:27 show me the patches where the job timeouted 14:24:43 mwhahaha, we have really a lot of ways to increase visibility of patches, so please.. 14:24:45 sshnaidm|ruck: i said revert it if it's breaking things, that's fine. just don't claim you didn't know. you are equally responsible for reviewing 14:25:09 mwhahaha, I have my priority list and this patch wasn't there 14:25:22 neither in priority list of a team 14:25:50 sshnaidm|ruck: that seems to be an issue with priorities of the team them. You can't require only CI team merge thigns if CI team doesn't properly prioritize reviews 14:25:59 sshnaidm|ruck: so please review your priorities 14:26:05 note, it'd require reverting 445 as well. All of the related patches may be found here https://trello.com/c/3HxQkb0t/4-move-tripleo-ci-centos-7-undercloud-containers-to-gate 14:26:06 and let's revert that patch if it's still causing problems 14:26:21 mwhahaha, join the tripleo ci squad meeting and let's discuss our priorities 14:26:41 btw, looking into https://review.openstack.org/#/c/543861/ I can see only minor/major jobs red 14:26:48 mwhahaha, we are always open to feedbacks 14:26:49 and shown red in jenkins for quite a long 14:26:51 sshnaidm|ruck: I'd be happy to have a seperate meeting, unfortunately that one conflicts with my ability to be a parent 14:27:06 anyway let's move on 14:27:07 if we talk about unknown bugs, we can't be sure, unless we have coverage for them... 14:27:35 did we agree revert it? 14:27:36 #action sshnaidm, bogdando to figure out how to handle the fallout of https://review.openstack.org/#/c/517444/ 14:27:45 show me jobs that timeouts but so far I see some green today in our ci 14:28:08 EmilienM, we had timeouts in gates 14:28:09 EmilienM: yes, please. As I noted, https://review.openstack.org/#/c/543861/ is almost all green 14:28:21 sshnaidm|ruck: the containerized undercloud job doesn't run in gate 14:28:24 so totally unrelated 14:28:35 EmilienM, I'm not talking about this job 14:28:43 EmilienM, I'm talking about ALL jobs 14:28:54 so our work causes timeouts now? 14:28:58 show me the logs 14:29:07 please report bug and figure it out after the meeting 14:29:07 thanks 14:29:15 EmilienM, this patch enabled containerized undercloud for all jobs 14:29:24 EmilienM, let's take offline 14:29:27 (chem) Need help testing P->M and FFU, as check rdo experimental seems to not working for https://review.openstack.org/525686 and https://review.openstack.org/543440 14:29:35 chem: p->m? or p->q 14:29:50 it's for p->m with tripleo-upgrade role and ffu 14:30:01 m as in master? 14:30:08 beagles: yeap 14:30:12 oh not clear 14:30:14 ah okay 14:30:17 so p->q :D 14:30:25 mwhahaha: ack 14:30:46 so basically need help from ci folk to help me run those jobs on public server somehow 14:31:03 check rdo experimental failed on me 14:31:12 chem - What about P-M and O-P upgrade jobs? 14:31:24 We need to let them vote :D 14:31:49 holser: O->P can be run easily 14:31:51 holser: i don't think p-m is ready for voting :) 14:32:26 but we need to gate tripleo-upgrade role and so we need green jobs 14:32:43 but hard to test rigth now 14:33:05 so who could help 14:33:06 help 14:33:28 chem: so lets land this in the meantime https://review.openstack.org/526006 so we can at least run it with the rdo experimental (until the role based upgrade lands) 14:34:02 chem: wdyt? i know it will be replaced but i don't see the harm in having _something_ in the meantime 14:34:44 marios: ack, np :) sshnaidm|ruck could you have a look at https://review.openstack.org/#/c/526006/ :) 14:34:46 i can help when we're in better shape with updates job, but honestly i'd prefer to focus on P->M (or even Q->M when we branch) 14:35:04 i think we should focus on the latest first always 14:35:07 we won't branch before 2 weeks I think 14:35:16 (rough estimate) 14:35:50 stable jobs don't bring as much value (fewer patches landing there), and focusing on "history" makes us always trail behind the present 14:36:37 jistr: oki, P->M and FFU are the future :) 14:37:31 anything else? 14:37:32 oki, I just wanted to bring that up. Currently stuck, if anybody can help, that cool, and anyway I 'll faind a way 14:37:39 thanks chem 14:37:53 (beagles) stopping neutron agent containers breaks dataplane functionality (email just sent subject: [tripleo] [neutron] Current containerized neutron agents introduce a significant regression in the dataplane)While there are related bugs on namespaces etc., I'll file a new one that is specific to this 14:38:42 so, this came up late last week - the neutron agent containers probably need some reworking 14:39:00 myself and some neutron devs are going to need some assistance in sorting this out the "right way" 14:39:26 unless I'm missing some obvious container black magic, I don't think we can resolve this for queens 14:39:51 can you link the bug report please? 14:39:53 won't that be a blocker for queens upgrades where all neutron things will run in containers? 14:40:39 EmilienM, there are some bug reports like https://launchpad.net/bugs/1748658 but they sort of confuse the issue by talking about network namespaces and this is bigger than that 14:40:40 Launchpad bug 1748658 in tripleo "Restarting Neutron containers which make use of network namespaces doesn't work" [High,In progress] - Assigned to Brent Eagles (beagles) 14:41:02 Sagi Shnaidman proposed openstack-infra/tripleo-ci master: [WIP] Add multinode p->m jobs in experimental. https://review.openstack.org/543440 14:41:07 so I think I need to file a new one 14:41:22 yeah shardy this is actually pretty bad in that respect 14:41:43 beagles: ack, Ok +1 on raising a bug so we can discuss options 14:42:56 I'll work on arrange a meeting for that too.. one of the complicating factors is neutron manages processes but it is not container aware at the moment so it's going to probably be a neutron+tripleo effort 14:43:32 have we looked how did Kolla solved the issue ? 14:43:43 Dan linked an example 14:43:45 (we're probably not alone) 14:43:49 they mount run:shared 14:44:14 but that doesn't necessarily solve what beagles's is talking about I think 14:44:16 EmilienM, you're falling into the trap of the current bug reports ;) 14:44:25 the issue is the agents launching processes 14:44:30 http://git.openstack.org/cgit/openstack/kolla-ansible/tree/ansible/roles/neutron/defaults/main.yml#n92 14:44:32 Kolla might not even be solving this 14:44:45 the real issue is that the processes go away when the containers do and that breaks expected behavior 14:44:48 taken from https://review.openstack.org/#/c/542858/ 14:45:26 Marios Andreou proposed openstack/tripleo-quickstart-extras master: Wire up pike .. queens upgrade workflow (controller only for now) https://review.openstack.org/526006 14:45:28 #action beagles to raise bug around neutron issues with containers 14:45:36 * mwhahaha plays pin the action item on beagles 14:45:37 ack thanks mwhahaha 14:45:48 :) 14:45:55 alright anything else? 14:46:17 I was wondering are we planning to formally deprecate baremetal e.g puppet/services/* support for queens? 14:46:42 I didn't spot a relase note, I guess it's somewhat dependent on the outcome of the issues beagles just raised tho 14:47:29 I think deprecating things at that time of the cycle might be too late 14:47:32 shardy: good question as I don't think we have a docker by default anywhere 14:47:41 At some point if we can remove those and flatten the docker/services/* templates to reduce nesting, I think we'll see a signficant imrovement in performance 14:47:51 and also we can reduce the CI test matrix 14:48:06 mwhahaha: yeah I think it's been discussed but I wasn't sure if I missed a final decision 14:48:20 I sent an ML note back on 9/18 14:48:22 no one objected 14:48:33 decision made then ;) 14:48:43 [tripleo] Making containerized service deployment the default 14:48:51 (works for me, AFIK all our CI jobs are containerized) 14:48:55 I don't know if we ever officially made it the default from a code perspective 14:49:17 we should include docker and docker-ha by default 14:49:18 I'll at least create a release note in THT 14:49:27 the only thing I remember was Tengu who wasn't deploying on containers yet 14:49:36 not include...the mappings should just be the default 14:49:38 mwhahaha: well testing wise upgrades squad focused on upgrading to containerized Pike 14:49:41 Yeah I was thinking we should flip the default and add a relase note before we branch, then we can look at when we ultimately remove the puppet/services/* and rework the remaining templates 14:49:55 we can't remove teh puppet services cause I think we use them 14:50:01 in the docker bits 14:50:06 in some places 14:50:10 mwhahaha: so if you wanted to upgrade to baremetal, there would be at least some things needed in terms of puppet/services/ upgrade_tasks (just as a reminder/data point to this discussion) 14:50:13 so that would be an effort for Rocky 14:50:26 mwhahaha: yeah that's what I'm saying, the way we do that is a performance problem, heat creates like 300 stacks intead of ~50 14:50:27 Yurii Prokulevych proposed openstack/tripleo-upgrade stable/pike: [UPDATES] Validated oc containers/images. https://review.openstack.org/543957 14:50:42 that's why the ResourceChain takes 5mins to create on a deploy 14:50:43 +1 to communicate the deprecation in Queens + make it default and +1 to change the templates to have one YAML / service in Rocky 14:50:57 mwhahaha: i expect the same will be the case for P..Queens (even though we will have to deal with environments starting as baremetal as well as starting from containerized P) 14:50:58 shardy: sounds like we should get a blueprint to realign the puppet/services docker/services for Rocky 14:51:26 I can take care of that one 14:51:39 mwhahaha: ack I can raise one if we're agreed that puppet/services will be removed - we can probably script "flattening" the templates and do a one-time conversion 14:51:50 ++ 14:52:18 the other option I was considering is a heat feature to internally flatten the templates, but that looks much more difficult 14:52:27 it's YAML after all :) 14:52:39 good ol' yaml 14:52:49 ok we need to move on we only have 8 mins 14:52:57 ack thanks 14:53:09 #action EmilienM to open a blueprint for realign puppet/services and docker/services for Rocky 14:53:19 #topic Squad status 14:53:20 ci 14:53:20 #link https://etherpad.openstack.org/p/tripleo-ci-squad-meeting 14:53:20 upgrade 14:53:20 #link https://etherpad.openstack.org/p/tripleo-upgrade-squad-status 14:53:20 containers 14:53:20 #link https://etherpad.openstack.org/p/tripleo-containers-squad-status 14:53:21 config-download 14:53:21 #link https://etherpad.openstack.org/p/tripleo-config-download-squad-status 14:53:22 integration 14:53:22 #link https://etherpad.openstack.org/p/tripleo-integration-squad-status 14:53:23 ui/cli 14:53:23 #link https://etherpad.openstack.org/p/tripleo-ui-cli-squad-status 14:53:24 validations 14:53:24 #link https://etherpad.openstack.org/p/tripleo-validations-squad-status 14:53:25 networking 14:53:25 #link https://etherpad.openstack.org/p/tripleo-networking-squad-status 14:53:26 workflows 14:53:26 #link https://etherpad.openstack.org/p/tripleo-workflows-squad-status 14:53:40 an action to deprecate in Queens as well? (and another one to make docker the default in queens?) 14:54:13 EmilienM: ok 14:54:17 I think it's one action, a patch to change the default and add a deprecation release note 14:54:20 I can do it 14:54:25 shardy: ok 14:54:43 #action shardy to change the default deploy to include docker and add deprecation notice 14:54:49 any other status stuff? 14:55:14 I wanted to mention the networking squad have made good progress on https://review.openstack.org/#/c/523638/ 14:55:28 it's a huge change but we need to decide if we're willing to land it for queens 14:55:34 dsneddon: ^^ 14:55:48 I don't really want to land it for queens 14:55:57 reminds me of the composable networks we landed late in Pike 14:56:17 unless we're really sure it's not going to break anything 14:56:53 mwhahaha: ack - one observation is we have zero coverage of any except one nic configs in CI, so whenever we land this it's hard to be sure of the risk 14:56:53 it's not passing gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035-master (ipv6) 14:57:26 maybe we can look at ways to improve that but I'm not sure how feasable that is 14:57:46 sure 14:58:01 we only have 3 mins left 14:58:05 so moving on real quick 14:58:10 #topic bugs & blueprints 14:58:10 #link https://launchpad.net/tripleo/+milestone/queens-rc1 14:58:10 For Queens we currently have 33 (+0) blueprints and about 606 (-15) open bugs. 571 queens-rc1, 33 rocky-1, 1 rocky-2, 1 rocky-3. 14:58:24 as a reminder, rc1 is coming up next week 14:58:28 I would rather have the feature in queens, even if the risk is high - but we have folks around to fix it and we can always backport 14:58:35 breaking that nic-config patch up would be a significant improvement 14:58:45 so please pay attention to the critical bugs 14:58:49 Athlan-Guyot sofer proposed openstack/tripleo-quickstart master: [WIP] Use tripleo-upgrade role for p->m job. https://review.openstack.org/540072 14:59:20 #topic projects releases or stable backports 14:59:32 EmilienM: do we have any stable releases coming up? 14:59:38 I did ocata/pike yesterday, it's done 14:59:43 k thanks 14:59:47 #topic specs 14:59:47 #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:59:50 jschlueter: ^ 14:59:58 rocky is coming up, don't forget to propose specs prior to the PTG 15:00:11 #topic open discussion 15:00:11 Reminder to propose topics for the PTG 15:00:11 #link https://etherpad.openstack.org/p/tripleo-ptg-rocky 15:00:19 and we're out of time 15:00:26 thanks everyone 15:00:29 #endmeeting