13:59:37 #startmeeting tripleo 13:59:37 #topic agenda 13:59:37 * Review past action items 13:59:37 * One off agenda items 13:59:37 * Squad status 13:59:37 * Bugs & Blueprints 13:59:37 Meeting started Tue Oct 3 13:59:37 2017 UTC and is due to finish in 60 minutes. The chair is mwhahaha. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:59:37 * Projects releases or stable backports 13:59:38 * Specs 13:59:38 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:59:38 * open discussion 13:59:39 Anyone can use the #link, #action and #info commands, not just the moderatorǃ 13:59:41 The meeting name has been set to 'tripleo' 13:59:55 o/ 13:59:57 o/ 13:59:59 hi 14:00:02 o/ 14:00:06 o/ 14:00:07 o/ 14:00:12 o/ 14:00:17 \o 14:00:19 o/ 14:00:20 o/ 14:00:22 o/ 14:00:22 o/ 14:00:29 o/ 14:00:44 o/ 14:00:47 o/ 14:00:49 \o 14:01:26 alright let's do this 14:01:27 #topic review past action items 14:01:33 shardy to look at how to reduce # of services deployed on ovb (continued) 14:01:45 shardy, did you get a chance to do this? not that it matters because CI is hosed 14:02:38 o/ 14:03:06 I'll take that as a no, i'll follow up with steve later 14:03:15 o/ 14:03:25 #action mwhahaha to follow up with shardy about services and ovb jobs 14:03:27 review newton backports in gerrit 14:03:39 I think we did it 14:03:43 as a reminder upstream eol is soon 14:03:53 but some of them are still under review 14:04:05 still trying to figure out what that exactly means for tripleo but make sure to take a look at newton stuff 14:04:06 and it won't make progress until this zuul v3 stops to kill us 14:04:24 yup 14:04:31 regarding newton i have a bug i am working on i hope to backport to newton but it has to go further https://bugs.launchpad.net/tripleo/+bug/1720787 14:04:32 Launchpad bug 1720787 in tripleo "TripleO deploys ceph client keyring with 644 permissions" [High,In progress] - Assigned to John Fulton (jfulton-org) 14:04:33 mwhahaha: Hey sorry, running late, no progress re ovb jobs yet, as you say CI has been down 14:04:34 just an fyi 14:05:06 mwhahaha: still plan to take a look, but pre-zuul3 issues the OVB jobs weren't timing out as much so I deproritized reducing the services 14:05:11 fultonj: it will be - in theory we release newton / ocata / pike every 2 weeks 14:05:44 shardy: ok. yea i think it's still a good idea to evaulate what we're checking to also help with excessive coverage, etc 14:05:49 EmilienM: ack 14:06:01 Tim Rozet proposed openstack/tripleo-heat-templates master: Dynamically renders network isolation resource registry https://review.openstack.org/509190 14:06:18 mwhahaha: Yeah agreed 14:06:36 ok moving on to the next stuff 14:06:37 #topic one off agenda items 14:06:42 #link https://etherpad.openstack.org/p/tripleo-meeting-items 14:06:59 looks empty, anyone have anything they wish to address? 14:07:01 one-off... 14:07:07 https://etherpad.openstack.org/p/tripleo-integration-squad-status 14:07:10 TripleO Integration Squad Status 14:07:20 fultonj: we have a section for squad status (next) :D 14:07:24 whops 14:07:26 but thanks i'll just link that 14:07:35 fultonj: nice etherpad, thanks! 14:07:42 thanks 14:07:53 thanks gfidente 14:08:11 ok moving on to squad status since we have some 14:08:12 #topic Squad status 14:08:20 integration 14:08:20 #link https://etherpad.openstack.org/p/tripleo-integration-squad-status 14:08:38 we talked about these items at PTG 14:08:55 fultonj: do we target them all for Queens? 14:08:59 no progress on multiple ceph clusters (would be expansion of composable roles capability) 14:09:13 only multiple ceph pools and luminous 14:09:15 2 out of the 3 14:09:35 we hope to land changes for multiple ceph clusters now, but not fully deliver feature until post-queens 14:09:45 mwhahaha: i just put this down (we spoke about it in dfg upgrades not sure if someone got round to it already) but https://etherpad.openstack.org/p/tripleo-upgrade-squad-status 14:09:46 now --> during the queens cycle 14:09:58 marios: thanks! 14:10:08 matbu_: can you update with the minor updates remaining https://etherpad.openstack.org/p/tripleo-upgrade-squad-status 14:10:13 URGENT TRIPLEO TASKS NEED ATTENTION 14:10:14 https://bugs.launchpad.net/tripleo/+bug/1718387 14:10:15 https://bugs.launchpad.net/tripleo/+bug/1719123 14:10:15 https://bugs.launchpad.net/tripleo/+bug/1720220 14:10:16 https://bugs.launchpad.net/tripleo/+bug/1720458 14:10:16 Launchpad bug 1718387 in tripleo "ping test is periodically failing for the gate-tripleo-ci-centos-7-nonha-multinode-oooq " [Critical,Triaged] - Assigned to Sofer Athlan-Guyot (sofer-athlan-guyot) 14:10:16 https://bugs.launchpad.net/tripleo/+bug/1720556 14:10:17 https://bugs.launchpad.net/tripleo/+bug/1720721 14:10:18 mwhahaha: i'll ask folks to update it 14:10:18 https://bugs.launchpad.net/tripleo/+bug/1720918 14:10:18 https://bugs.launchpad.net/tripleo/+bug/1720973 14:10:19 Launchpad bug 1719123 in tripleo "tempest fails on overcloud keystone admin tenant error" [Critical,Triaged] - Assigned to Arx Cruz (arxcruz) 14:10:20 Launchpad bug 1720220 in tripleo "CI: Most/all legacy-tripleo jobs failing on gate" [Critical,Triaged] 14:10:21 Launchpad bug 1720458 in tripleo "Lastest delorean pip package causes error in autodoc" [Critical,In progress] - Assigned to wes hayutin (weshayutin) 14:10:22 Launchpad bug 1720556 in tripleo "tracker: upstream zuul is not triggering periodic jobs" [Critical,Triaged] - Assigned to wes hayutin (weshayutin) 14:10:23 Launchpad bug 1720721 in tripleo "CI: OVB jobs fail because can't install XStatic from PyPI mirror on rh1 cloud" [Critical,Triaged] - Assigned to Paul Belanger (pabelanger) 14:10:24 Launchpad bug 1720918 in tripleo "puppet-firewall changed the ipv6-icmp rule type name" [Critical,In progress] - Assigned to Ben Nemec (bnemec) 14:10:26 Launchpad bug 1720973 in tripleo "CI: infras cirros image has wrong permission" [Critical,Triaged] - Assigned to Paul Belanger (pabelanger) 14:10:30 marios: sounds good i'll use that to refer to status durring the meetings going forward 14:10:34 marios: yep /me tries to click on the link 14:10:34 fultonj: do you have upgrades working from ocata - puppet-ceph managed to pike ceph-ansible managed? (I know CI is down now, but I would like to see some CI job voting here once it's back) 14:11:04 from ocata working, but no ci (must add ci) 14:11:27 fultonj: maybe you can add it in the etherpad, I would like it to be a priority 14:11:37 EmilienM: yes, will do 14:11:40 thanks 14:11:41 #actionitem 14:11:46 EmilienM: I think that may be covered in the CI work we do in the containers squad 14:11:58 #action ceph-upgrade ci 14:12:10 mandre: most probably, I just want to make sure this is done at some point 14:12:12 talking about the container squad, we also prepared an etherpad :) https://etherpad.openstack.org/p/tripleo-containers-squad-status 14:12:29 mandre: thanks! i'll add that to the list for next time as well 14:12:52 marios: done 14:13:04 ci - weshay trown sshnaidm|mtg - any status that you can share around CI? 14:13:29 ui/cli - jrist - any status you can share around UI? (or validations as well) 14:13:39 mwhahaha: other than it is totally borked due to zuulv3 migration not a ton this week 14:13:44 mwhahaha: status is here https://etherpad.openstack.org/p/tripleo-ci-squad-meeting 14:13:57 ty matbu_ 14:13:58 EmilienM: yea but they are getting rid of the squad meeting so is that still valid? 14:13:59 nothing in particular except that we are beginning to work on some ci and upstream automation 14:14:00 mwhahaha, we're putting on the finishing touches to tripleo promotion jobs in rdo software factory 14:14:10 mwhahaha: or here even https://etherpad.openstack.org/p/tripleo-ci-squad-scrum 14:14:12 we are working on a new process for organizing our squad and will have more formal status etherpad going forward 14:14:43 trown: sounds good, just let me know so I can add it to the list 14:15:04 workflows - thrash - any status you can share? 14:15:12 networking - beagles - any status you can share? 14:15:19 tripleo-ci-squad sprint details can be found https://www.redhat.com/archives/rdo-list/2017-September/msg00068.html 14:16:39 mwhahaha: working on some rfe's for ui 14:16:54 thrash: sounds good, thanks. let us know if you need anything (reviews/etc) 14:17:00 mwhahaha: ack 14:17:39 ok thanks everyone for the status. let's move on 14:17:49 #topic bugs & blueprints 14:17:50 #link https://launchpad.net/tripleo/+milestone/queens-1 14:18:03 We currently have 62 blueprints and about 476 open bugs. Please take some time to review your blueprint status and make sure it is properly up to date. 14:18:14 please be aware that queens-1 is in about 2 weeks 14:18:26 so please move stuff that you aren't going to land by queens-1 out to queens-2 14:19:00 anyone have any blueprints or bugs they want to raise for visibility? 14:19:39 i filed a blueprint for the ansible work 14:19:42 just fyi 14:19:44 I hope we can merge something before the 2 weeks ^^ 14:20:14 ok 14:20:52 mwhahaha: its already in upgrades squad status, but for visibility, spec for the Q upgrades here https://review.openstack.org/507620 14:21:07 mwhahaha: captures what we discussed in ptg 14:21:12 marios: sounds good 14:21:17 mwhahaha: https://blueprints.launchpad.net/tripleo/+spec/ansible-config-download-ci and https://blueprints.launchpad.net/tripleo/+spec/ansible-config-download 14:22:07 slagle: do you think those will be landed in the next two weeks? 14:22:23 it's possible 14:22:24 mwhahaha: i think it's possible 14:22:30 yea, we'll go with "possible" :) 14:22:54 provided ci stops being on fire :D 14:22:58 it's also possible not a single patch will get landed in the next 2 weeks :) 14:22:59 we already tested the bits on our envs, it works fine (module some changes) - if our CI is back this week... 14:23:23 we'll move it to queens-2 otherwise, but we want to get ci coverage asap for this feature 14:23:31 makes sense 14:23:40 Ok any other bugs/blueprints? 14:23:41 476 open bugs - I wonder what we can do about that - I think the trend is upwards so perhaps we can prune/prioritize or de-duplicate better there? 14:23:57 shardy: i've started to go through and clean some up 14:24:04 shardy: we're about +6 for the week 14:24:18 mwhahaha: ack OK I'll see if I can spend some time helping 14:24:32 we could target 450 by queens-1, and maybe 400 by queens-2 14:24:36 there's a bunch of old stuff that is no longer valid i'm sure 14:24:44 mwhahaha: I find it useful to have a clearly defined (smaller) set of priority things for a milestone, then it's easier to know which reviews to prioritize 14:25:10 so we could move all <= medium to queens-2 for visibility 14:25:22 mwhahaha: yeah something like that might be good 14:25:39 #action mwhahaha to move bugs <= medium to queens-2 and review > medium for validity 14:25:43 i'll do that this week 14:25:45 mwhahaha: +1 14:25:56 moving on 14:25:57 #topic projects releases or stable backports 14:26:00 mwhahaha: I can do it with the script 14:26:09 EmilienM: i'll get it 14:26:13 k 14:26:40 so we have pending stable releases but there's some issues around th stable-policy stuff 14:27:31 Given the status of CI there isn't much point in talking backport patches 14:27:34 so let's move on 14:27:38 #topic specs 14:27:38 #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:27:58 given that queens-1 is in ~2 weeks, we need to make sure any queens specs get merged asap 14:28:04 please take a look at the open ones and review 14:28:18 as marios pointed out the upgrade spec is here: 14:28:20 #link https://review.openstack.org/507620 14:28:33 can we merge specs? lol 14:28:50 well it starts with reviewing them :D 14:28:51 ah yeah, jobs pass, at least something that works 14:28:57 mwhahaha: thanks, sorry for the noise before (I didn't read the context, i thought you were asking for updates on Q stuff, not blueprints/bugs targetting apologies ) 14:29:12 marios: it's all good, it's better to have more info then none at all :D 14:30:06 Ok so on to the open discussion 14:30:07 #topic open discussion 14:30:30 Anything else that folks would like to talk about? 14:30:38 we haven't talked much about CI 14:30:43 is there any super rough/high level eta for fixing zuulv3? 14:30:44 it's unclear to me where we are now 14:30:53 dmsimard, pabelanger: any updates on CI 14:30:56 right, i was weary about asking... do we know anything more about it :) seems other share the sentiment 14:31:01 bandini: that's been asked on the ML but I don't see any reply yet 14:31:10 we've been very patient until now but things are getting bad now 14:31:35 shardy: yeah saw the mail, thought maybe some tripleo ci overlords had some additional thoughts :) 14:31:37 pabelanger, dmsimard: this CI downtime is having a critical effect on TripleO delivery, fyi 14:31:52 mwhahaha, EmilienM: sshnaidm|mtg gave me an update earlier, it looks like we're almost clear and hitting some sort of timeout right now 14:32:08 let me pull up the review, hang on. 14:32:18 shardy: which email? 14:32:24 [openstack-dev] [all] Update on Zuul v3 Migration - and what to do about issues 14:32:27 this one? 14:32:28 [openstack-dev] [all] Update on Zuul v3 Migration - and what to do about issues 14:32:30 mwhahaha, EmilienM: https://review.openstack.org/#/c/508660/ 14:32:39 dmsimard: we are hitting a timeout because the connection to the nodepool node drops 14:32:47 EmilienM, but if we ping puppet-firewall, we'll be unable to merge the fix, right? 14:33:07 yeah that one, dansmith asked about mitigation as did sdague yesterday 14:33:09 True 14:33:10 trown: yes, I am looking into it right now 14:33:13 amoralej: let's pin it anyway for now, so we remove one issue 14:33:17 ok 14:33:59 dmsimard, pabelanger : so back to the initial question, any ETA? 14:34:11 mwhahaha, EmilienM: there are different ongoing discussions about considering a rollback but it's not a simple task because some projects (tripleo included) have had to introduce changes in their projects to support zuul v3 and doing a rollback would mean breaking those projects again 14:34:30 why haven't we tested thing before? 14:35:03 it's a paradox that the projects in charge of testing wasn't tested against tripleo before 14:35:13 yeah it's kinda surprising there wasn't some sort of paralell migration strategy so jobs could be switched over gradually 14:35:34 EmilienM: well it sounds like it's not only TripleO 14:36:02 the problem is that some of the fixes are stuck in the queue, but they will land 14:36:20 I'm not sure that rolling back now wouldn't cause even more disruption 14:36:32 we are having scale issues, which are harder to test then migration process 14:36:44 tosky: yeah, it'd just be nice to have better visibility of the status/progress I guess 14:37:15 EmilienM: no ETA, suggest watching https://etherpad.openstack.org/p/zuulv3-issues and #openstack-infra. 14:37:27 anything we can help? 14:37:42 EmilienM: I heavily recommended a gradual opt-in of selected projects, especially deployment projects such as puppet-openstack, openstack-ansible and tripleo but the decision ended otherwise 14:38:10 EmilienM: right now we are working with https://review.openstack.org/#/c/508660/ 14:38:24 and I am looking into what seems to be a timeout related issue 14:38:38 just patience and what ever needs done on etherpad 14:38:52 seeing as the jobs ran for over 2 hours, I'm hopeful there are no more issues related to tripleo or the jobs themselves 14:39:07 pabelanger: patience ✓ already ;-) 14:39:32 last patch merged in tripleo was Sep 28th 14:39:55 do we have a failover plan if by end of week it's not fixed? 14:40:15 we can't stop merging code duringe more than a week, we have to find a plan b 14:40:29 EmilienM: there's ongoing discussion around that topic and I anticipate an infra core to chime in the openstack-dev thread 14:40:30 weshay: said there might be an option with software factory 14:40:49 I don't think they have enough resources tbh 14:41:22 time to figure out what would qualify as a critical minimal subset of CI testing 14:41:57 do we have someone looking at that^ ? 14:41:59 weshay: ^ 14:42:21 mwhahaha: anyway, we can move on I guess 14:42:23 we did that once before, disabled a bunch of jobs and merged code with a subset - it took weeks to fix all the regressions after 14:42:33 so IMO it'd be best avoided if possible 14:42:35 shardy: no way we do that again 14:43:34 * mwhahaha shrugs 14:43:42 it's an option if we can't get anything goig 14:43:44 going 14:43:46 it seems jobs are timing out on a specific play: 14:43:48 http://logs.openstack.org/60/508660/13/check/legacy-tripleo-ci-centos-7-scenario001-multinode-oooq-puppet/c92e76d/job-output.txt.gz#_2017-10-03_10_59_26_310226 14:44:01 anyone familiar would know why ? 14:44:19 we can look after that 14:44:26 mwhahaha: yeah, cool, just saying probably should be a last resort :) 14:44:32 on "Add virthost to inventory" ? 14:44:50 maybe it's not a specific thing, another job timed out on a different task http://logs.openstack.org/60/508660/13/check/legacy-tripleo-ci-centos-7-undercloud-oooq/5dc7681/job-output.txt.gz#_2017-10-03_10_46_17_231725 14:44:53 ok so it sounds like we need to actively work on CI, so lets close out the meeting and go focus on CI 14:44:59 thanks everyone 14:45:01 #endmeeting