14:00:13 <mwhahaha> #startmeeting tripleo
14:00:14 <openstack> Meeting started Tue Sep  5 14:00:13 2017 UTC and is due to finish in 60 minutes.  The chair is mwhahaha. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:15 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:17 <openstack> The meeting name has been set to 'tripleo'
14:00:22 <EmilienM> o/
14:00:25 <mwhahaha> #topic agenda
14:00:28 <thrash> o/
14:00:34 <cdearborn> o/
14:00:38 <mwhahaha> * review past action items
14:00:38 <mwhahaha> * CI status
14:00:38 <mwhahaha> * bugs
14:00:38 <mwhahaha> * Projects releases or stable backports
14:00:38 <mwhahaha> * Specs / blueprints
14:00:38 <mwhahaha> * one off agenda items
14:00:38 <mwhahaha> * open discussion
14:00:39 <mwhahaha> Anyone can use the #link, #action and #info commands, not just the moderatorǃ
14:00:39 <mwhahaha> Hi everyone! who is around today?
14:00:42 <beagles> o/
14:00:48 <EmilienM> hola
14:00:53 <jrist> o/
14:00:53 <lyarwood> o/
14:00:54 <larsks> o/
14:00:54 <marios> o/
14:00:54 <ccamacho> o/
14:00:55 <shardy> o/
14:01:01 <rbrady> o/
14:01:05 <fultonj> o/
14:01:13 <gcerami> \o
14:01:24 <gfidente> o/
14:01:25 <gfidente> :D
14:01:35 <sshnaidm> 0/
14:01:35 <jaosorior> o/
14:01:40 <jpich> o/
14:01:46 <adarazs> ö/
14:01:47 <trown> o/
14:02:04 <mwhahaha> #topic review past action items
14:02:27 <mwhahaha> team to help with reviewing https://review.openstack.org/#/q/topic:bug/1691403 (CI alert fix) - done
14:02:33 <mwhahaha> EmilienM sends a note about Queens blueprints / specs on ML
14:02:40 <EmilienM> done
14:02:52 <jfrancoa> o/
14:02:55 <mwhahaha> ok that's all we have from the last meeting
14:03:07 <mwhahaha> moving on
14:03:08 <mwhahaha> #topic CI status
14:03:15 <EmilienM> pretty bad :D
14:03:28 <janki> o/
14:03:33 <mwhahaha> so where are we at in terms of CI, it appears a whole bunch is broken at the moment
14:03:38 <EmilienM> I think there are some alerts that we can remove and threat them as "normal" bugs
14:03:39 <mandre> o/
14:04:01 <gfidente> I just moved https://bugs.launchpad.net/tripleo/+bug/1713659 back into 'Confirmed'/'Critical'
14:04:02 <openstack> Launchpad bug 1713659 in tripleo "Rabbitmq class expects ports as integers, not strings" [Critical,Confirmed] - Assigned to Juan Antonio Osorio Robles (juan-osorio-robles)
14:04:03 <EmilienM> we need an alert for OVB jobs for sure, they look all red right now
14:04:41 <owalsh> o/
14:04:45 <mwhahaha> https://bugs.launchpad.net/tripleo/+bug/1713832
14:04:47 <openstack> Launchpad bug 1713832 in tripleo "Object PUT failed for zaqar_subscription" [Critical,In progress] - Assigned to Marios Andreou (marios-b)
14:05:01 <mwhahaha> marios: do we have an update on this issue?
14:05:10 <EmilienM> I think we can remove alert on this one
14:05:11 <mwhahaha> or is anyone looking at it?
14:05:12 <mwhahaha> k
14:05:29 <EmilienM> mwhahaha: not sure we have made progress on this one
14:05:32 <marios> mwhahaha: not since i discussed it last night with EmilienM ... it is still happening http://status.openstack.org/elastic-recheck/#1713832
14:05:54 <EmilienM> gah
14:06:01 <mwhahaha> ok sounds like we need some more investigation from zaqar folks
14:06:05 <EmilienM> 1 fails in 24 hrs isn't terrible
14:06:10 <EmilienM> (we have seen worse lately)
14:06:12 <marios> mwhahaha: so whatever it is it is ongoing ... i went through recent zaqar/swift commits but can't see any obvious thing there
14:06:33 <mwhahaha> no it's not, let's drop the alert and poke the storage/zaqar folks to look into it
14:06:36 <EmilienM> mwhahaha: we'll need to discuss about marios's change that he wants to restore
14:06:37 <marios> EmilienM: well it seems to fail about the same rate since the revert as we discussed already (that is another issue lets talk about is separately)
14:06:41 <EmilienM> and eventually backport
14:06:42 <sshnaidm> EmilienM, where do you see ovb jobs failing?
14:06:53 <EmilienM> sshnaidm: tripleo.org/cistatus.html
14:07:08 <mwhahaha> let's go thought he alert bugs first
14:07:09 <marios> mwhahaha: EmilienM for context/logs the revert we are talking about is discussed here http://lists.openstack.org/pipermail/openstack-dev/2017-September/121722.html
14:07:14 <mwhahaha> then we can talk about ovb
14:07:23 <EmilienM> ok
14:07:54 <EmilienM> the patch that jaosorior sent isn't passing CI jobs
14:08:01 <openstackgerrit> yolanda.robla proposed openstack/tripleo-image-elements master: WIP: Use volumes for security hardened images  https://review.openstack.org/499588
14:08:03 <mwhahaha> #action drop alert on Bug 1713832 and ping storage/zaqar people to investigate
14:08:04 <openstack> bug 1713832 in tripleo "Object PUT failed for zaqar_subscription" [Critical,In progress] https://launchpad.net/bugs/1713832 - Assigned to Marios Andreou (marios-b)
14:08:22 <mwhahaha> https://bugs.launchpad.net/tripleo/+bug/1714361 apetrich any update?
14:08:23 <openstack> Launchpad bug 1714361 in tripleo "mistral on gates seems old and does not have the required patchs" [Critical,Triaged] - Assigned to Adriano Petrich (apetrich)
14:08:24 <jaosorior> EmilienM: trying to reproduce it locally to fix it up.
14:09:02 <shardy> jpena: does the latest dlrn have a workaround for the lack of master branch releases? ^^
14:09:31 <shardy> mwhahaha: when I looked earlier there is a pending patch for mistral to release the bugfix, but AFAICT it's a general issue so we need to fix it for everything?
14:09:59 <mwhahaha> is this the missing semver patches to get versions bumped for queens
14:10:05 <jpena> shardy: it depends on the situation, more info needed :)
14:10:08 * jrist really really wants this fixed
14:10:15 <ooolpbot> URGENT TRIPLEO TASKS NEED ATTENTION
14:10:15 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1713659
14:10:17 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1713832
14:10:18 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1714361
14:10:19 <ooolpbot> https://bugs.launchpad.net/tripleo/+bug/1714905
14:10:20 <openstack> Launchpad bug 1713659 in tripleo "Rabbitmq class expects ports as integers, not strings" [Critical,Triaged] - Assigned to Juan Antonio Osorio Robles (juan-osorio-robles)
14:10:20 <shardy> mwhahaha: yeah AFAIK that's the issue here
14:10:21 <openstack> Launchpad bug 1713832 in tripleo "Object PUT failed for zaqar_subscription" [Critical,In progress] - Assigned to Marios Andreou (marios-b)
14:10:22 <jrist> #link https://review.openstack.org/#/c/469608/
14:10:22 <shardy> and I don't think it's restricted to mistral
14:10:22 <openstack> Launchpad bug 1714361 in tripleo "mistral on gates seems old and does not have the required patchs" [Critical,Triaged] - Assigned to Adriano Petrich (apetrich)
14:10:24 <openstack> Launchpad bug 1714905 in tripleo "Composable scenarios BM -> Containers Upgrade jobs never deploy on BM" [Critical,In progress] - Assigned to Martin André (mandre)
14:10:38 <mwhahaha> the puppet modules we patched for the next version so those are good
14:10:52 <mwhahaha> this would be all the openstack services where they need a semver patch to bump numbers for the next cycle
14:10:57 <shardy> jpena: apevec mentioned there may be a dlrn workaround for https://bugs.launchpad.net/tripleo/+bug/1714361 - there's a ML thread linked from it that has more context
14:11:02 <mwhahaha> if i recall this is always a struggle at the start of the cycle
14:11:02 <jpena> shardy: oh, I think I know what you mean. Having newer tags in stable/pike than master causes troubles to upgrade
14:11:07 <EmilienM> see https://review.openstack.org/#/q/owner:%22Emilien+Macchi+%253Cemilien%2540redhat.com%253E%22+semver
14:11:10 <shardy> basically we end up installing the old version from delorean-pike-testing
14:11:12 <EmilienM> that's what I did the last time to fix this up
14:11:18 <shardy> jpena: yeah
14:11:20 <EmilienM> I can do it again if you want but it's a fight :D
14:12:01 <shardy> mwhahaha: yeah, it'd be good to discuss again with the release team
14:12:03 <jpena> so apevec had a (partial) fix for review, but it was never complete and he has to revamp it
14:12:03 <mwhahaha> or should we make upgrade jobs non voting until m1
14:12:20 <shardy> but is it only upgrade jobs?
14:12:35 <mwhahaha> in theory it should be
14:12:40 <EmilienM> yes
14:12:43 <EmilienM> I think?
14:12:45 <oidgar> o/
14:12:58 <mwhahaha> or it it a stable/pike issue
14:12:59 <shardy> Not sure it is, the way we layer the repos means that I think all jobs are affected
14:13:02 <EmilienM> mwhahaha: upgrade jobs won't vote on maste for now, we have no process to upgrade containers to containers
14:13:08 <shardy> perhaps we need to double check that though
14:13:16 <EmilienM> but we might want to enable voting upgrade jobs on stable/pike once they work
14:13:19 <shardy> like, how did that puppet-rabbitmq patch get promoted?
14:13:23 <mwhahaha> ok sounds like we need more investigation on the exact impact of the verions
14:13:38 <mwhahaha> anyone want to volunteer for that
14:13:56 * mwhahaha watches as everyone disapears
14:14:05 <trown> shardy: puppet modules we pull from master
14:14:13 <shardy> the logs I linked from the bug show the undercloud job using wrong versions
14:14:18 <gfidente> non-openstack only I suppose
14:14:19 <trown> shardy: they are treated like tripleo projects in that regard
14:14:50 <shardy> trown: Ah yeah I forgot it's not only puppet-openstack modules
14:14:55 <gfidente> eg. puppet-ceph was tested against tripleo
14:15:02 <gfidente> but was more an exception
14:15:27 <mwhahaha> ok so we need to move on, can anyone devote some time ot investigate this more?
14:15:40 <trown> though we probably want to rethink that... I think we should only include packages that have not passed promote that we have a gate on
14:16:00 <EmilienM> trown: indeed
14:16:01 <shardy> I'm happy to help, but am currently unclear if we're going after a dlrn fix or a fight to release all the things approach
14:16:22 <Tengu> jaosorior: small question: in order to get Public Endpoint TLS, I only have to include the following env: openstack-tripleo-heat-templates/environments/ssl/tls-endpoints-public-dns.yaml and openstack-tripleo-heat-templates/environments/enable-tls.yaml  right? Nothing more, nothing less ?
14:16:24 <shardy> the other approach is a whitelist for delorean-pike-testing
14:16:44 <mwhahaha> i think we need to figure out what is actually happening in the jobs before trying to come up with the fix. it seems to be several possible problems
14:17:10 <shardy> mwhahaha: Ok, well I already added some analysis to the bug but will do another pass looking at the different types of job
14:17:30 <mwhahaha> #action shardy to look into version impacts in ci jobs related to Bug 1714361
14:17:31 <openstack> bug 1714361 in tripleo "mistral on gates seems old and does not have the required patchs" [Critical,Triaged] https://launchpad.net/bugs/1714361 - Assigned to Adriano Petrich (apetrich)
14:17:40 <mwhahaha> https://bugs.launchpad.net/tripleo/+bug/1714905
14:17:41 <openstack> Launchpad bug 1714905 in tripleo "Composable scenarios BM -> Containers Upgrade jobs never deploy on BM" [Critical,In progress] - Assigned to Martin André (mandre)
14:17:48 <mwhahaha> mandre: any update?
14:17:58 <EmilienM> I've been working on this one last night
14:18:10 <Tengu> jaosorior: sorry, tripleo-heat-templates/environments/ssl/enable-tls.yaml - the other one is deprecated - I've updated earlier and it didn't activate TLS :(
14:18:16 <EmilienM> i've pushed https://review.openstack.org/#/c/500671/ (which is partially related)
14:18:27 <EmilienM> sshnaidm: might need your help on this one if you can
14:18:33 <mandre> I need to sync up with EmilienM on this one, as it may be a few bugs mixed in together
14:18:48 <jaosorior> Tengu: depends; if you have to get the nodes to trust a CA. then you need to add another environment.
14:19:04 <EmilienM> on the bright news, I saw an upgrade job passing from ocata to pike on stable/pike
14:19:10 <mwhahaha> ok sounds like progress
14:19:11 <EmilienM> (the only one I think) 2 days ago
14:19:16 <sshnaidm> EmilienM, sure
14:19:18 <EmilienM> yes little progress
14:19:19 <mwhahaha> moving on
14:19:21 <gfidente> EmilienM curious if you got pingtest to pass scenario004/containers after moving manila-share to pacemaker?
14:19:24 <EmilienM> sshnaidm: let's sync after meeting
14:19:27 <sshnaidm> ok
14:19:32 <mwhahaha> https://bugs.launchpad.net/tripleo/+bug/1708832
14:19:34 <openstack> Launchpad bug 1708832 in tripleo "DLRN build failures in gate" [High,In progress] - Assigned to wes hayutin (weshayutin)
14:19:34 <EmilienM> gfidente: yes and tempest as well
14:19:38 <gfidente> whoohooo
14:19:40 <openstackgerrit> Juan Antonio Osorio Robles proposed openstack/instack-undercloud master: Use integer for rabbitmq port and specify management IP  https://review.openstack.org/500798
14:19:41 <mwhahaha> anyone driving that since weshay_PTO is not around
14:19:46 <Tengu> jaosorior: not needed, public TLS will be signed by LE.
14:19:53 <jaosorior> Tengu: then that's it.
14:19:56 <Tengu> hmm.
14:19:58 <Tengu> weid.
14:19:59 <Tengu> weird.
14:20:10 <EmilienM> this one should be closed
14:20:10 <mwhahaha> looks like dmsimard|afk had a patch that was merged, is it still open?
14:20:11 <adarazs> mwhahaha: that should be fixed.
14:20:15 <EmilienM> the fix was https://review.openstack.org/#/c/498074/
14:20:26 <EmilienM> and folks forgot to update launchpad :/
14:20:32 <mwhahaha> ok looks like we need to fix release that bug then
14:20:46 <mwhahaha> https://bugs.launchpad.net/tripleo/+bug/1713127
14:20:47 <openstack> Launchpad bug 1713127 in tripleo "tripleo fails to deploy in ci : Failed to call refresh: /usr/bin/clustercheck" [High,Triaged]
14:20:54 <mwhahaha> bandini: any thoughts -^
14:21:13 <Tengu> jaosorior: though I have an idea: I suspect some receipt check for the existence of SSLCertificate and SSLKey - I didn't set them up, as the certificate will be generated on the (now unique) controller (I've overriden the OS::TripleO::NodeTLSData: resource)
14:21:29 <EmilienM> mwhahaha: 1708832 closed
14:21:45 <jaosorior> Tengu: you didn't?
14:22:08 <Tengu> jaosorior: nope. because the certificate isn't loaded like that :).
14:22:28 <EmilienM> mwhahaha: let's move forward
14:23:08 <mwhahaha> k i'm not sure that bug needs an alert
14:23:16 <mwhahaha> looking at it it seemed to be package downloads
14:23:19 <mwhahaha> will look at it more later
14:23:21 <mwhahaha> ok moving on
14:23:32 <mwhahaha> do we have a bug for the ovb failures (possibly related to rabbitmq)
14:23:35 <jaosorior> Tengu: you'll have issues: we explicitly check if it's set https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/haproxy.yaml#L83    and set the necessary value if it is https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/haproxy.yaml#L135
14:23:36 <EmilienM> mwhahaha: no alert is needed
14:23:55 <mwhahaha> no, alert is needed or no alert is needed
14:23:57 <EmilienM> mwhahaha: https://bugs.launchpad.net/tripleo/+bug/1713659
14:23:58 <openstack> Launchpad bug 1713659 in tripleo "Rabbitmq class expects ports as integers, not strings" [Critical,In progress] - Assigned to Juan Antonio Osorio Robles (juan-osorio-robles)
14:24:05 <jaosorior> Tengu: you can work around it by creating something similar to this: https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/haproxy-public-tls-certmonger.yaml   but that merely just sets the path for tripleo::haproxy::service_certificate
14:24:10 <EmilienM> mwhahaha: for the clustercheck, no alert is needed
14:24:14 <mwhahaha> k
14:24:35 <EmilienM> https://bugs.launchpad.net/tripleo/+bugs?field.tag=alert
14:24:50 <EmilienM> we have 3 alerts now, how does it looks?
14:25:02 <mwhahaha> better
14:25:03 <EmilienM> I'm not sure https://bugs.launchpad.net/tripleo/+bug/1714361 deserves an alert
14:25:04 <openstack> Launchpad bug 1714361 in tripleo "mistral on gates seems old and does not have the required patchs" [Critical,Triaged] - Assigned to Adriano Petrich (apetrich)
14:25:14 <EmilienM> it's a known issue and someone has to take some actions on it
14:25:31 <EmilienM> it doesn't block any gate, afik?
14:25:48 <mwhahaha> shardy is going to take a look at it, we can discuss more after the meeting
14:25:52 <shardy> well it might mean we're not testing anything properly for master branches
14:25:52 <mwhahaha> any other CI status items?
14:26:03 <shardy> but we can remove the alert and discuss as we investigate
14:26:24 <Tengu> jaosorior: so if I set tripleo::haproxy::service_certificate for example in hieradata, it will work as expected? The path is the very same the "original" script provides
14:26:47 <mwhahaha> k moving on
14:26:52 <mwhahaha> #topic bugs
14:26:53 <mwhahaha> #link https://launchpad.net/tripleo/+milestone/pike-rc2
14:27:14 <EmilienM> I think we covered critical bugs
14:27:22 <EmilienM> (or the most criticals)
14:27:26 <mwhahaha> any other bugs to discuss?
14:27:28 <EmilienM> did we miss something?
14:27:42 <jaosorior> Tengu: correct.
14:27:43 <larsks> I just want to ask about https://bugs.launchpad.net/tripleo/+bug/1713240. We have a fix available with one +2, but could use another set of eyes...
14:27:45 <openstack> Launchpad bug 1713240 in tripleo "Fluentd configuration not correctly written to disk" [High,In progress] - Assigned to Lars Kellogg-Stedman (larsks)
14:27:53 <shardy> we have 200 bugs targetted to rc2
14:28:07 <shardy> can we start deferring things that aren't release blockers, to help focus review attention?
14:28:34 <Tengu> jaosorior: good to hear :). I can do that in the puppet-stack-config-fix.yaml I aready have for some workarrounds.
14:28:40 <mwhahaha> shardy: yea makes sense
14:28:44 <mwhahaha> i'll take a look this week
14:28:51 <mwhahaha> i was already planning on doing some bug work
14:29:03 <mwhahaha> #action mwhahaha to retarget rc2 bugs if not release critical
14:29:31 <shardy> ack, and I guess if all folks doing triage or reporting bugs, please use queens-1 unless it's a blocker
14:29:38 <EmilienM> mwhahaha: I'll share my commands with you :D I use openstack release tools
14:29:42 <mwhahaha> any other bug related items?
14:29:58 <EmilienM> maybe we can move everything to Queens-1 with the script and manually move back to pike-rc2 what we actually want to solve this week
14:30:08 <larsks> mwhahaha: well, there's my question, if someone could commit to that or tell me to bug off or something...
14:30:16 <larsks> Any response, really.
14:30:25 <mwhahaha> larsks: we'll take a look afterwards, it loosk ok and you already have 2 +2s
14:30:32 <mwhahaha> larsks: but ci is hosed so probably not a +A right now
14:30:55 <larsks> Ah, looks like that second +2 just landed. Thanks jaosorior!
14:31:09 <mwhahaha> #topic projects releases or stable backports
14:31:10 <jaosorior> larsks: it looked reasonable. But yeah, no +A because of the CI situation
14:31:18 <larsks> No worries, thanks.
14:31:30 <EmilienM> so pabelanger mentioned about zuul v3 upgrade end of this week
14:31:38 <EmilienM> and suggested to release pike-rc2 by Thursday
14:31:53 <EmilienM> tbh, I don't think zuul upgrade really affects pike-rc2 release, it's just a tag we push
14:32:04 <mwhahaha> ok so it sounds like we need to get everything landed and CI fixed like today
14:32:05 <EmilienM> what we don't want is our CI more hosed I guess :D
14:32:12 <mwhahaha> well zuul upgrades usually break CI
14:32:42 <EmilienM> Sep 11 - Sep 15 (R+2) is the official limit for us to release final Pike
14:33:13 <EmilienM> so we have 9 days to release this final pike
14:33:21 <EmilienM> but next week is PTG so probably not the best time
14:33:41 <EmilienM> let's target this week and do our possible to commit to it
14:33:42 <mwhahaha> indeed
14:33:52 <EmilienM> but let's keep us this window of releasing next week in case
14:33:58 <shardy> I think there's only 1 feature pending, so we could land that, release, then backport bugfixes and do an additional stable release after the GA?
14:34:17 <mwhahaha> which feature is still pending?
14:34:17 <shardy> unless we can identify release-blocker bugs of course
14:34:20 <EmilienM> shardy: yes, that's what we do usually. Good idea to do it again
14:34:32 <EmilienM> https://blueprints.launchpad.net/tripleo/+spec/websocket-logging
14:35:02 <EmilienM> there is one patch in instack-undercloud iiuc
14:35:06 <mwhahaha> k
14:35:13 <mwhahaha> https://review.openstack.org/#/c/469608/
14:35:30 <mwhahaha> honza: are you going to be able to get that patch fixed up today or tomorrow?
14:35:32 <shardy> yeah honza can confirm but I think it was blocked on the old mistral in CI issue
14:36:03 <jpich> Yes, it's blocked due to one of the CI issues mentioned earlier IIUC, the one apetrich is working on
14:36:31 <mwhahaha> https://bugs.launchpad.net/tripleo/+bug/1714361
14:36:32 <openstack> Launchpad bug 1714361 in tripleo "mistral on gates seems old and does not have the required patchs" [Critical,Triaged] - Assigned to Adriano Petrich (apetrich)
14:36:40 <mwhahaha> so it looks like we need to figure that one out then
14:37:06 <mwhahaha> any other stable backports or release issues?
14:37:39 <mwhahaha> moving on
14:37:43 <mwhahaha> #topic specs / blueprints
14:37:43 <mwhahaha> #link https://review.openstack.org/#/q/project:openstack/tripleo-specs+status:open
14:38:07 <mwhahaha> ptg is next week so a reminder get your specs out there so we can discuss next week
14:38:20 <EmilienM> I think there is one topic related to specs in the today's items
14:38:30 <mwhahaha> yup, so let's move on to that
14:38:35 <EmilienM> lyarwood: ^
14:38:36 <mwhahaha> #topic one off agenda items
14:38:36 <mwhahaha> #link https://etherpad.openstack.org/p/tripleo-meeting-items
14:38:53 <mwhahaha> lyarwood: the floor is yours
14:38:59 <lyarwood> WIP Skip level upgrade spec underway, reviews & comments welcome, also looking for a slot to discuss this at the PTG next week.
14:39:02 <lyarwood> https://review.openstack.org/#/c/497257/
14:39:13 <lyarwood> ^ just added that to the agenda as a heads up ahead of the PTG next week
14:39:21 <EmilienM> let's look at the agenda https://etherpad.openstack.org/p/tripleo-ptg-queens
14:39:42 <EmilienM> Thursday, after Ceph integration future
14:39:56 <lyarwood> EmilienM: if there's time that would be great
14:40:04 <EmilienM> after 3.30pm
14:40:14 <EmilienM> there is no session after
14:40:19 <EmilienM> lyarwood: 40 min is good?
14:40:20 <EmilienM> 1h?
14:40:34 <lyarwood> EmilienM: 1h, I think the ceph session is until 16:30 no?
14:41:04 <EmilienM> https://calendar.google.com/calendar/embed?src=c1g5npdrsd3p37ods24s19gg0g%40group.calendar.google.com&ctz=America/Vancouver
14:41:22 <EmilienM> ah it changed, ok
14:41:40 <fultonj> 15.30 to 16.30 as per https://etherpad.openstack.org/p/tripleo-ptg-queens line 132
14:41:40 <EmilienM> lyarwood: 16.30 to 17.15 is good?
14:41:50 <lyarwood> EmilienM: yup that would work
14:41:55 <EmilienM> ah, my calendar is in PST
14:41:58 <EmilienM> sorry
14:42:25 <EmilienM> I created your session, we're good
14:42:31 <lyarwood> EmilienM: thanks!
14:42:42 <mwhahaha> cool, any other topics?
14:43:00 <sshnaidm> yes
14:43:06 <sshnaidm> can we define which bugs deserve alerts?
14:43:14 <sshnaidm> because if many of them are alerting, it turns to be noise which people ignore
14:43:35 <sshnaidm> I know to set alert in CI gates and promotion blockers
14:43:52 <sshnaidm> do we have another criteria?
14:44:25 <ioggstream> gfidente: happy birthday!
14:44:27 <mwhahaha> I believe that to be the criteria
14:44:33 <EmilienM> sshnaidm: I agree it has been quite verbose this time but I also found useful to bring the information visible on what's blocking the production chain
14:45:07 <gfidente> ioggstream++ :D love
14:45:50 <sshnaidm> EmilienM, ok, so production chain blockers too?
14:46:34 <mwhahaha> so production chain blockers include ci gates/promotioni ssues
14:46:42 <EmilienM> yeah
14:46:47 <EmilienM> the problem is that people do blind rechecks
14:46:58 <EmilienM> it wastes CI resources and it doesn't actually solve any problem
14:47:06 <EmilienM> pabelanger did a huge work in setting up logstash queries
14:47:12 <EmilienM> but now it's our turn to do it
14:47:26 <EmilienM> i've set alert tags on some bugs where we had a huge amount of hits in logstash
14:47:38 <sshnaidm> EmilienM, I doubt if logstash help not to do rechecks..
14:47:43 <EmilienM> because I think we want people to stop doing recheck and start helping more in our CI system
14:48:08 <mwhahaha> those bugs affect the ci gates which is why it would be a warrented alert
14:48:31 <mwhahaha> generally if it's alerting folks need to be looking at it and it is not noise
14:48:32 <therve> EmilienM, FWIW I made some progress on https://bugs.launchpad.net/tripleo/+bug/1713832
14:48:33 <openstack> Launchpad bug 1713832 in tripleo "Object PUT failed for zaqar_subscription" [Critical,In progress] - Assigned to Marios Andreou (marios-b)
14:48:50 <mwhahaha> if you're ignoring the alerting bugs you're not helping
14:48:51 <EmilienM> therve: great :)
14:48:51 <therve> I don't know if it's blocking stuff, but I know of a possible quick fix if that helps
14:49:33 <EmilienM> therve: awesome
14:49:51 <shardy> there are still some generic issues causing lots of rechecks, like all the OVB jobs are really close to timing out
14:50:07 <EmilienM> therve: should we stop using swift backend?
14:50:15 <shardy> We'll have to look at ways to speed them up, or we'll be forced to reduce the coverage
14:50:21 <EmilienM> therve: is it safe to release final pike with swift backend for zaqar if it's racy?
14:50:27 <therve> EmilienM, I don't think that's possible at that point :/
14:50:32 <shardy> e.g now we have scenarios, perhaps we don't need the ovb jobs to deploy so many services?
14:50:38 <shardy> that could save some time
14:50:42 <therve> We don't have mongo anymore, and I don't think redis is there either
14:50:44 <sshnaidm> right, when I get timeout message from elastic recheck, it doesn't help me at all - it could be both a bug, infra problem or just bad luck
14:51:08 <EmilienM> shardy: yes, we already said OVB jobs should deploy minimal services (nova glance keystone neutron)
14:51:30 <sshnaidm> shardy, when moving ovb  jobs to 3d party we don't need to limit them to 180 minutes
14:51:41 <shardy> Yeah either would work I guess
14:52:10 <EmilienM> someone needs to spend time on looking at what ovb is deploying today and try to reduce it
14:52:12 <dprince> sshnaidm: do you really want a CI job that takes longer than 180 minutes though?
14:52:19 <shardy> but there's many examples like https://review.openstack.org/#/c/450708/ where we've rechecked for $weeks due to slow jobs timing out
14:52:23 <EmilienM> anyone can take this work? ^
14:52:40 <EmilienM> dprince: I agree, we don't want ovb more than 180 min for sure
14:52:47 <marios> abishop: o/ thanks for checking i just revoted at https://review.openstack.org/#/c/496921/3
14:53:04 <sshnaidm> dprince, it's better to take 240 mins than do recheck and waste 180*2=360 mins
14:53:17 <shardy> EmilienM: I'll try to push a wip patch with a smaller ControllerServices list, at least for the HA job
14:53:35 <shardy> I guess we don't really need many of the services deployed there at all, now that few are managed by pacemaker
14:53:39 <EmilienM> shardy: thanks
14:53:41 <therve> dprince, Yeah the "C" in "CI" is not super compatible with 3h jobs. Maybe Perpetual Integration instead
14:54:41 <dprince> yeah, with jobs longer than to hours it really means you have 2 or 3 changes per day, which gets kind of sad... and ends up "camping out" on limited resources too long
14:54:57 <jfrancoa> EmilienM: I was also doing some work disabling some unused service (and I ran into deeper problems in fact) https://review.openstack.org/#/c/499182/, but I can try to help in making the CI jobs ligther
14:54:58 <mwhahaha> ok we have about 5 mins left, do we have anything else to talk about?
14:55:21 <larsks> I have a quick question...
14:55:28 <EmilienM> jfrancoa: you can probably pair with shardy
14:55:36 <mwhahaha> larsks: what's up?
14:55:40 <larsks> I want to clean up the fluentd service implementation, because it predates service_config_settings and is unnecessarily invasive because of that.
14:55:48 <abishop> marios: many thx!
14:55:52 <EmilienM> #action shardy to look at how to reduce # of services deployed on ovb
14:55:58 <sshnaidm> mwhahaha, maybe to add action item about testing "limited" ovb jobs
14:56:10 <sshnaidm> EmilienM, you were first)
14:56:13 <larsks> This is obviously a queens only thing. Would it be appropriate to submit these changes now (ish)?
14:56:17 <EmilienM> anyone can take actions, btw
14:56:21 <sshnaidm> ok
14:56:27 <mwhahaha> larsks: yea you can
14:56:34 <EmilienM> larsks: yes you can but no backports
14:56:44 <larsks> EmilienM: right, hence "queens only thing" :)
14:56:51 <EmilienM> queens cycle has been open!
14:57:00 <EmilienM> though any help to release pike is more than welcome
14:57:08 <larsks> Just wanted to make sure. I know that folks are busy right now with the pike release.
14:57:18 <EmilienM> kinda
14:57:19 <shardy> larsks: it'd be good to land any bugfixes before the refactor, but otherwise sounds good to me
14:57:24 <mwhahaha> larsks: do try and consistently tag it witha  bug or something so we can keep an eye on the changes together. Since they are invasive they probably won't be merged for a while
14:57:40 <mwhahaha> but it would be good to get them out early at least for initial reviews
14:57:43 <EmilienM> we should probably target m1 though
14:57:46 <EmilienM> since it's invasive
14:57:46 <larsks> Okay, thanks all.
14:58:00 <larsks> Many of the cleanup changes won't be.  Just the stuff that pulls things out of common/services.yml.
14:58:19 <mwhahaha> ok anything else?
14:58:41 <shardy> larsks: ah, I was wondering about removing some of those, will be happy to help with reviews etc when you're ready
14:59:05 <mwhahaha> ok thanks everyone
14:59:08 <mwhahaha> #endmeeting