14:00:22 #startmeeting tripleo 14:00:23 Meeting started Tue Aug 15 14:00:22 2017 UTC and is due to finish in 60 minutes. The chair is EmilienM. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:24 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:26 The meeting name has been set to 'tripleo' 14:00:27 #topic agenda 14:00:34 * review past action items 14:00:36 * one off agenda items 14:00:38 * bugs 14:00:40 * Projects releases or stable backports 14:00:42 * CI 14:00:44 * Specs 14:00:46 * open discussion 14:00:48 Anyone can use the #link, #action and #info commands, not just the moderatorǃ 14:00:50 Hi everyone! who is around today? 14:00:54 o/ 14:00:56 o/ 14:00:58 o/ 14:01:02 o/ 14:01:03 o/ 14:01:03 o/ 14:01:06 hi 14:01:11 hi2u 14:01:23 #topic review past action items 14:01:27 o/ 14:01:37 o/ 14:01:38 o/ 14:01:44 EmilienM to switch master to run new upgrade jobs and not old ones anymore (done) 14:01:54 o/ 14:02:06 jaosorior and abishop to talk together about plans for queens relating to barbican backends (prepare ptg session if needed + discuss about migration tool) (postponed) - not sure about the status 14:02:29 to clarify, my involvement is making sure existing deployments using legacy encryption key manager work 14:02:29 and that includes future migration from legacy key manager to barbican 14:02:29 cinder guy (eharney) is hoping key manager migration can be accomplished within cinder 14:02:29 so, no immediate OOO action required (i.e. for Denver PTG) 14:02:29 I'll continue to monitor, and will re-raise issue if OOO changes are needed 14:02:52 ok good to know 14:02:57 abishop: thanks 14:03:02 gfidente to send an ML note about moving ceph rgw from scenario004 to 001 14:03:05 o/ 14:03:16 not sure Guilio is around, we can postpone this topic unless someone has thoughts 14:03:35 EmilienM: so yeah, action was taken :D 14:03:36 o/ 14:03:42 o/ 14:03:46 o\ 14:03:49 EmilienM: I don't think he is around but maybe he'll see this later 14:03:53 #topic one off agenda items 14:03:58 #link https://etherpad.openstack.org/p/tripleo-meeting-items 14:04:09 sshnaidm: floor is yours 14:04:14 yeah 14:04:15 Honza Pokorny proposed openstack/tripleo-ui master: Download logs interface https://review.openstack.org/473933 14:04:47 so clarkb, fungi suggest we will manage whitelist for /etc configurations to collect in logs server 14:05:01 "strongly recommend" I would say 14:05:03 Andy Smith proposed openstack/tripleo-heat-templates master: WIP OpenStack containerized qpid-dispatch-router service https://review.openstack.org/479049 14:05:11 will that really save much? I think we're already filtering out the bigger items 14:05:14 I did some calculations 14:05:15 yes, for months, I think 14:05:25 In general multinode job /etc folders take about 8MB, within 33MB of all logs. From 8MB we actually need about 5.5MB and don't need 2.5MB ( it's about 7% of logs). 14:05:59 So we can save 7% of place, I'm not really sure it worth the work... 14:06:19 Therefore I'd like to bring it to discussion here 14:06:20 doesn't sound like it. but we can definitely exclude some more files if it's straightforward. 14:06:38 7% is a lot if you think about the number of jobs we actually run 14:06:42 that's not trivial 14:06:51 on the other hand, infra provides us free resources and gently ask to help saving them 14:06:57 adarazs, there was a arguments that with new release of centos it could be included big files and will break logs server like it was in centos 7.3 with java 14:06:57 Ya, at our scale, 1% is worth it 14:07:03 i don't know if a whitelist is the best way to do it, but we do need to be better about excluding more 14:07:07 we collect alot too much 14:07:41 keep in mind, we are also re-writing devstack-gate for zuulv3 in ansible, so I'm pretty sure we're likely write a generic role for jobs to use to collect, whitelisted this 14:07:46 if a whitelist is too much work, then improve the exclude list 14:07:52 so we have 2 options right now : whitelist and bigger exclude list 14:08:16 I'm against whitelist because it will require a maintenance 14:08:23 * adarazs is just a bit weary of a maintained whitelist and the constant "why we don't collect X" requests :/ 14:08:25 all new services we need to add to it 14:08:27 Marios Andreou proposed openstack/tripleo-heat-templates master: Adds PostUpgradeConfigStepsDeployment to drive post config ansible https://review.openstack.org/493878 14:08:39 manually 14:08:40 sshnaidm: yep, exactly. 14:09:05 for context, here is an example of a fully loaded etc dir that we log http://logs.openstack.org/28/493728/3/check/gate-tripleo-ci-centos-7-containers-multinode/d49033b/logs/undercloud/etc/ 14:09:14 more aggressive exclusions I'm okay with. 14:09:30 do we really need the skel, udev, rc* dirs, etc? 14:09:44 mwhahaha, no, but it take about 7% 14:09:59 So I'd suggest to start from big exclude list and to see if it satisfies 14:10:04 wdyt? 14:10:17 URGENT TRIPLEO TASKS NEED ATTENTION 14:10:17 https://bugs.launchpad.net/tripleo/+bug/1709327 14:10:18 Launchpad bug 1709327 in tripleo "CI: extremely long times of overcloud deploy in multinode jobs" [Critical,Triaged] 14:10:18 https://bugs.launchpad.net/tripleo/+bug/1710533 14:10:19 https://bugs.launchpad.net/tripleo/+bug/1710773 14:10:19 Launchpad bug 1710533 in tripleo "docker client failed to download container from docker.io" [Critical,In progress] - Assigned to wes hayutin (weshayutin) 14:10:20 Launchpad bug 1710773 in tripleo "scenario001 and 004 fails when Glance with rbd backend is containerized but not Ceph" [Critical,Triaged] 14:10:22 yes i think we need to at least do that 14:10:27 I'm not sure I agree with whiltelist is more work, we do that for devstack-gate / logstash.o.o today. Its not like we are inundated with requests everyday to add more things. Sure, it will take a bit to build up the whitelist, just like it will take a while to exclude things 14:10:28 if we can keep reducing the size of logs, let's try that 14:10:29 pabelanger, ? 14:10:38 and, whitelisting is much nicer to logs.o.o 14:10:58 pabelanger, it's much more work than big exclude list 14:11:14 why is it much more work? 14:11:17 the thing is we're not like a usual project in openstack, we don't have a well defined set of config files to collect, or rather it's a very big set that's constantly changing. 14:11:19 we're talking about /etc only right now right? 14:11:26 pabelanger, because it requires a manual maintanance 14:11:33 EmilienM, yes 14:11:47 sshnaidm: look at d-g, and how we handle /etc today 14:11:50 you would do the same 14:11:58 pabelanger, we are different 14:12:02 I think a whitelist for /etc isn't too bad - we know what services we deploy (or plan to deploy) 14:12:14 but I might miss something 14:12:42 EmilienM, the only difference is manual or automatic maintenance 14:12:51 so i think this is pointing out the use of CI for debugging 14:12:59 which if you need something, you should spin up an env locally 14:13:10 and add it to the whitelist later 14:13:33 either way we're capturing too much 14:13:41 and it's been asked to be fixed for a while 14:13:54 so to start we can do a bigger exclude list 14:14:05 see how it works 14:14:05 but whitelist probably makes sense longer term 14:14:10 yes 14:14:19 if we can't get it down with an exclude list we must switch to a white list 14:14:23 manual work or not 14:14:27 wes hayutin proposed openstack/tripleo-quickstart-extras master: Use AFS mirrors to download containers instead of docker.io https://review.openstack.org/493728 14:14:33 mwhahaha: +1 14:14:44 mwhahaha, if we want whitelist, no need to make exclude list then.. 14:14:46 so can we get a larger exclude list for next week? 14:14:56 I'm fine with an exclude list, just not with an explicit whitelist. 14:15:02 mwhahaha, let's choose one way 14:15:08 it's about making incremental progress, right now we're not doign anything but arguing 14:15:18 infra asked for a whitelist 14:15:28 if we don't want to do that, then PoC an exclude list and lets go 14:15:42 but progress needs to be made like now 14:15:50 this has been a topic for far too long 14:15:53 mwhahaha: as far as I understand the topic is infra wanting explicit whitelist and sshnaidm doesn't think it's good. 14:16:20 ok, I'll prepare both and let's see who wins 14:16:25 :) 14:16:35 * sshnaidm done 14:16:45 Dmitry Tantsur proposed openstack/instack-undercloud master: [WIP] Switch to scheduling based on resource classes https://review.openstack.org/490851 14:16:54 k, can you have something by next week maybe? 14:17:00 I don't think we need to spend time on both now, we probably have other things to do as well 14:17:08 mwhahaha, even today 14:17:20 * EmilienM thinks sshnaidm is a machine 14:17:37 * sshnaidm not sure 14:17:52 #actions sshnaidm to prepare log exclusion/whitelist patches for review 14:17:53 honza: dumb question, what's the difference between registering and enrolling nodes? 14:18:10 moving on :D 14:18:12 sshnaidm, make the patch specific to the upstream env 14:18:12 anything else for off items? 14:18:20 akrivoka: none, afaik 14:18:32 #topic bugs 14:18:43 #link https://launchpad.net/tripleo/+milestone/pike-rc1 14:19:10 beside CI issues that we're already working on, do we have outstanding bugs that we need to get fixed in Pike RC1? 14:19:30 honza: is there any reason to introduce new terminology (enroll) when we have existing (register)? (https://review.openstack.org/#/c/488526/) 14:20:02 if I don't hear anything from anyone, I'll propose TripleO Pike RC1 by Friday morning. 14:20:16 EmilienM: there are some upgrades related things, i am looking at 2 personally https://bugs.launchpad.net/tripleo/+bug/1706951 and https://bugs.launchpad.net/tripleo/+bug/1708115 not sure we'll get everyting but we'll try 14:20:17 Launchpad bug 1706951 in tripleo "Ocata to Pike upgrade fails when cinder-volume runs on host because cinder-manage db sync runs when galera is unavailable" [Critical,In progress] - Assigned to Marios Andreou (marios-b) 14:20:18 Launchpad bug 1708115 in tripleo "Ensure non-controller are usable after upgrade and before converge." [Critical,Triaged] 14:20:26 EmilienM: i'm investigating a swift issue currently 14:20:26 akrivoka: it's an ironic thing, i guess https://github.com/openstack/tripleo-common/blob/master/workbooks/baremetal.yaml#L1034 14:20:29 EmilienM: due to the gate issues, I suspect some of the FFE things will slip into an RC2 14:20:32 but sounds good 14:20:41 marios: ok, upgrade patches are backportable in any case, don't worry 14:20:42 EmilienM: https://bugs.launchpad.net/tripleo/+bug/1710606. but it's not limmited to upgrades afaict 14:20:43 Launchpad bug 1710606 in tripleo "O -> P - Upgrade: swift_object_expirer, swift_container_replicator, swift_object_replicator, swift_rsync, swift_account_replicator, swift_proxy containers are restarting after upgrade" [Critical,In progress] - Assigned to Carlos Camacho (ccamacho) 14:21:00 but it's already aligned against rc1 14:21:01 slagle: oh this one :( ok 14:21:01 akrivoka: i was reusing the terminology from tripleo-common 14:21:02 EmilienM: ack thanks 14:21:11 shardy: yes I think RC2 will happen 14:21:12 EmilienM: yes. on a new deploy, i'm seeing the same thing 14:21:13 akrivoka: but i'm open to changing that! 14:21:38 shardy: do we automatically move all FFEs to RC2? or just some of them? 14:22:37 I'll look at the remaining ffes end of this week 14:23:07 slagle: it's weird we don't hit that in the CI, or do we? 14:23:45 EmilienM: maybe we should review the status and decide if any should be deferred to queens? 14:23:57 EmilienM: i don't know. do we have verification of swift in the overcloud? 14:23:59 same with bugs, we probably need to start reducing the number of things we're tracking? 14:24:05 shardy: yeah, probably... 14:24:15 sorry for appearing out of blue, but I'm solving an ironic-related upgrade complication. just want you to be aware of it. 14:24:16 EmilienM: but given the gate issues we probably should be flexible if patches are posted 14:24:21 slagle: yes, I guess, with the pingtest, it uploads an image to glance with swift backend 14:24:33 this is https://bugs.launchpad.net/tripleo/+bug/1708653 14:24:34 Launchpad bug 1708653 in tripleo "Need to set resource_class on Ironic nodes after upgrade to Pike" [High,In progress] - Assigned to Dmitry Tantsur (divius) 14:24:39 This is a regression that should probably get merged in rc1: https://review.openstack.org/#/c/482979/ 14:24:49 (tripleo-validations) 14:24:55 florianf: no bug report? 14:25:04 but ok 14:25:09 EmilienM: Let me create one 14:25:11 hello guys, as i understood by default 1 compute and 1 control node will be deployed, how i deploy compute and controller node all in in one? 14:25:16 ok moving on 14:25:22 #topic projects releases or stable backports 14:25:28 EmilienM: also we need a bug to track the remaining pieces that enable minor updates with containers 14:25:46 shardy: I haven't seen a blueprint for that :( 14:25:48 there's a couple of update related bugs targetted to rc1 already, so I'll re-title one 14:25:55 it's part of the Container support blueprint, I guess 14:26:05 EmilienM: well it's a bug, minor updates without downtime 14:26:09 ok 14:26:23 * shardy thinks there's one for that already, just not specific to containers 14:26:34 shardy: no problem for this one 14:26:42 so we'll see how it goes but 14:26:58 #action EmilienM to prepare tripleo pike rc1 by friday if things go right 14:27:15 stable/ocata upgrade jobs seem to be timing out a lot :/ 14:27:23 it's been that way for months now 14:27:27 if things don't go right, we'll probably defer to next week 14:27:35 jaosorior: yes it's not new 14:27:37 stable/ocata is effectively blocked on the upgrade jobs 14:27:49 ah 14:27:52 well crap 14:28:05 they used to work ~ fine 14:28:16 but indeed since ~2 months (I think) they timeout a lot 14:28:17 #topic CI 14:28:21 jaosorior, marios: Thanks! ;-) 14:28:48 the last time I checked was upgrade tasks taking time and it makes the job timeouting on some infra clouds 14:28:48 np :) 14:29:20 I posted https://bugs.launchpad.net/tripleo/+bug/1702955 14:29:21 Launchpad bug 1702955 in tripleo "tripleo upgrade jobs timeout when running in RAX cloud" [Critical,Triaged] 14:29:38 it'd be good to figure out which upgrade_tasks, chances are it's stuck downloading the new packages? 14:29:48 that's where most time goes, particularly without a local mirror 14:30:04 well it should be using the local mirror now 14:30:11 I added an alert on the bug and hopefully get some attention 14:30:17 yeah I just wonder if that's working as expected 14:30:17 but it just requires someone go look into it in depth 14:30:34 shardy: the local mirror works fine, afik but I can double check 14:30:56 I'll look at it if no one has time 14:31:12 do we have anything about CI? 14:31:18 I didn't see upgrades jobs ever passing.. 14:31:28 sshnaidm: on stable/ocata, they do pass 14:31:43 weshay: did you do CI squad meeting last week? 14:31:45 we need to merge the docker proxy today 14:31:51 if possible 14:31:53 EmilienM, yep, only there 14:32:02 yes.. I need to send the notes 14:32:06 otherwise tomorrow we're going to end up with a 24hour+ gate 14:32:10 mwhahaha: what's the docker proxy review? 14:32:12 for that and the rdo mtg 14:32:20 #action CI / URGENT: review https://review.openstack.org/#/c/493728 and https://review.openstack.org/#/c/493726/ 14:32:29 -^ 14:32:47 I think mandre isn't around but his -1 can be ignored 14:32:56 hello guys, as i understood by default 1 compute and 1 control node will be deployed, how i can deploy compute and controller node all in one? 14:33:06 mwhahaha, did you see my patch? I wounder if it will be enough https://review.openstack.org/#/c/491923/ 14:33:07 does the undercloud configure a proxy for docker? 14:33:14 lvdombrkr: hey, we're in weekly meeting, and we're almost done 14:33:50 EmilienM: sorry guys )) 14:34:00 sshnaidm: possibly, so we need to figure out between weshay's patches and yours 14:34:33 sshnaidm: are you sure you can get NODEPOOL_DOCKER_REGISTRY_PROXY without sourcing the env on the image? 14:34:38 Merged openstack/tripleo-heat-templates master: Fix Heat condition for RHEL registration yum update https://review.openstack.org/492632 14:34:41 sshnaidm, patch worked as well http://logs.openstack.org/23/491923/2/check/gate-tripleo-ci-centos-7-scenario002-multinode-oooq-container/891461e/logs/undercloud/etc/docker/daemon.json.txt.gz 14:35:00 EmilienM, not sure I understand, which image..? 14:35:29 ok, let's talk after mtg maybe 14:35:33 yeah 14:35:44 #topic specs 14:35:49 do we have anything specs related this week? 14:35:55 #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open 14:36:20 #topic open discussion 14:36:30 quick reminder about the PTG, next month 14:36:32 #link https://etherpad.openstack.org/p/tripleo-ptg-queens 14:36:40 feel free to propose topics 14:37:05 we'll work on the agenda in the following weeks 14:37:17 anyone has anything before we close the meeting and go back to normal work? 14:37:32 thanks folks 14:37:34 #endmeeting