#openstack-oslo log

15:03:53 <bnemec> #startmeeting oslo
15:03:53 <bnemec> Courtesy ping for bnemec, smcginnis, moguimar, johnsom, stephenfin, bcafarel, kgiusti, jungleboyj
15:03:53 <bnemec> #link https://wiki.openstack.org/wiki/Meetings/Oslo#Agenda_for_Next_Meeting
15:03:54 <openstack> Meeting started Mon Aug 10 15:03:53 2020 UTC and is due to finish in 60 minutes.  The chair is bnemec. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:03:55 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:03:58 <openstack> The meeting name has been set to 'oslo'
15:04:11 <johnsom> o/
15:07:43 <bnemec> #topic Red flags for/from liaisons
15:07:56 <johnsom> Nothing from Octavia
15:08:07 <bnemec> I was out last week so I have no idea what's going on. Hopefully someone else can fill us in. :-)
15:08:59 <moguimar> nothing from Barbican
15:09:08 <moguimar> we didn't have a meeting last week
15:09:39 <moguimar> now Hervé is also on PTO
15:10:14 <bnemec> Might be a quick meeting then.
15:10:15 <moguimar> and we need to come to a decision about kevko_ 's patch
15:10:24 <bnemec> Which is okay since I have a ton of emails to get through. :-)
15:10:35 <moguimar> this: https://review.opendev.org/#/c/742193/
15:11:56 <bnemec> I've added it to the agenda.
15:12:22 <bnemec> #topic Releases
15:12:37 <bnemec> I'll try to take care of these this week since Herve is out.
15:13:41 <bnemec> I guess that's all I have on this topic.
15:13:45 <bnemec> #topic Action items from last meeting
15:14:07 <bnemec> "kgiusti to retire devstack-plugin-zmq"
15:14:41 <kgiusti> in progress
15:14:51 <bnemec> Cool, thanks.
15:14:59 <bnemec> "hberaud to sync oslo-cookiecutter contributing template with main cookiecutter one"
15:15:08 <bnemec> Pretty sure I voted on this patch.
15:15:40 <bnemec> Yep.
15:15:41 <bnemec> #link https://review.opendev.org/#/c/743939/
15:15:50 <bnemec> It's blocked on ci.
15:16:22 <bnemec> Which is fixed by https://review.opendev.org/#/c/745304
15:16:49 <bnemec> So, all in progress, which is good.
15:17:09 <bnemec> #topic  zuulv3 migration
15:17:22 <bnemec> The zmq retirement is related to this..
15:17:40 <bnemec> I thought I saw something about migrating grenade jobs too.
15:17:48 <tosky> yep
15:18:01 <tosky> line 213: https://etherpad.opendev.org/p/goal-victoria-native-zuulv3-migration
15:18:04 <kgiusti> Yeah - that's one of the "retirement" tasks
15:18:21 <kgiusti> https://review.opendev.org/#/q/status:open+project:openstack/project-config+branch:master+topic:retire-devstack-plugin-zmq
15:18:25 <tosky> - the openstack/devstack-plugin-zmq  jobs are covered by repository retirement
15:18:40 <openstackgerrit> Sean McGinnis proposed openstack/oslo-cookiecutter master: sync oslo-cookiecutter contributing template  https://review.opendev.org/743939
15:18:42 <tosky> - oslo.versionedobjects is fixed by https://review.opendev.org/745183
15:19:02 <tosky> - and clarkb provided patches to port the pbr jobs (https://review.opendev.org/745171, https://review.opendev.org/745189, https://review.opendev.org/745192 )
15:19:10 <bnemec> \o/
15:19:47 <bnemec> So basically we have changes in flight to address all of the remaining oslo jobs.
15:20:03 <tosky> correct
15:20:34 <bnemec> stephenfin: See the above about the pbr jobs. I know you had looked at that too.
15:21:16 <bnemec> Okay, we're on track for this goal.
15:21:20 <bnemec> Thanks for the updates!
15:21:25 <bnemec> And all the patches!
15:21:55 <bnemec> #topic oslo.cache flush patch
15:22:02 <bnemec> #link https://review.opendev.org/#/c/742193/
15:22:19 <bnemec> moguimar: kevko_: You're up!
15:22:39 <bnemec> cc lbragstad since he had thoughts on this too.
15:22:53 <moguimar> I think the patch is pretty much solid
15:23:11 <lbragstad> i need to look at it again
15:23:18 <moguimar> but I'm concerned about Keystone expecting the default behavior to be True and we flipping it to False
15:24:16 <bnemec> If we do go ahead with the patch we must have a way for keystone to default that back to true, IMHO.
15:24:40 <lbragstad> imo - it seems like they need to scale up their memcached deployment
15:24:50 <bnemec> And since Keystone is one of the main consumers of oslo.cache I'm unclear how much it will help to turn it off only other places.
15:25:04 <lbragstad> because it appears to the root of the issue is that a network event causes memcached to spiral into an unrecoverable error
15:26:01 <lbragstad> i need to stand up an environment with caching configured to debug the issue where you don't flush, because i'm suspicious that stale authorization data will be returned
15:26:39 <lbragstad> (e.g., when memcached is unreachable, the user revokes their token or changes their password, but their tokens are still in memcached)
15:27:04 <moguimar> what if the default value was True instead
15:27:25 <bnemec> I think it's just one server going down in the pool, then the token getting revoked on a different one, then the original server coming back up that is the problem.
15:27:39 <bnemec> IIUC it can result in a bad cached value for the server that disconnected.
15:27:50 <openstackgerrit> Merged openstack/oslo-cookiecutter master: Add ensure-tox support.  https://review.opendev.org/745304
15:27:54 <lbragstad> right - you could have inconsistent data across servers
15:28:01 <lbragstad> and we don't really handle that in keystone code
15:28:14 * bnemec proposes that we just rm -rf memcache_pool
15:28:52 <lbragstad> well - that's essentially what we assume since we flush all memcached data (valid and invalid) when the client reconnects
15:29:22 <lbragstad> (we're not sure what happened when you were gone, but rebuild the source of truth)
15:29:56 <lbragstad> rebuild from keystone's database, which is the source of truth *
15:32:31 <lbragstad> i need to dig into this more, but i haven't had the time
15:32:50 <lbragstad> so i don't want to hold things up if it's in a reasonable place (where keystone can opt into the behavior we currently have today)
15:34:13 <moguimar> the patch does two things, turns it into a config option and flips the default behavior
15:39:36 <bnemec> I'm curious what happens in the affected cluster if they just restart all of their services. Doesn't it trigger the same overload?
15:39:46 <bnemec> Maybe on a rolling restart it's spread out enough to not cause a problem?
15:41:07 <openstackgerrit> Moisés Guimarães proposed openstack/oslo.cache master: Bump dogpile.cache's version for Memcached TLS support  https://review.opendev.org/745509
15:42:20 <bnemec> Okay, I've left a review that reflects our discussion here. Let me know if I misrepresented anything.
15:43:17 <openstackgerrit> Merged openstack/oslo-cookiecutter master: sync oslo-cookiecutter contributing template  https://review.opendev.org/743939
15:43:29 <bnemec> #topic  enable oslo.messaging heartbeat fix by default?
15:43:44 <bnemec> This came up the week before I left.
15:43:59 <kgiusti> seems like a safe bet at this point
15:44:08 <bnemec> Related to the oslo.messaging ping endpoint change.
15:44:34 <kgiusti> yeah, that change I'm not so thrilled about.
15:45:11 <kgiusti> I was thinking of -2'ing that change, but wanted to discuss it here first.
15:45:24 <kgiusti> too bad herve is off having a life :)
15:45:38 <bnemec> Yeah, related only insofaras it came up in the discussion as an issue with checking liveness of services.
15:45:43 <kgiusti> I wanted his opinion
15:45:55 <kgiusti> bnemec: +1
15:46:11 <bnemec> We can probably wait until next week. This option has been around for quite a while now so it's not critical that we do it immediately.
15:46:28 <bnemec> I'm not aware of anyone reporting issues with it though.
15:46:46 <kgiusti> neither do I
15:47:10 <kgiusti> but I think we do need to make a final decision of that ping patch
15:47:14 <bnemec> Okay, I'll just leave it on the agenda for next week.
15:47:21 <kgiusti> https://review.opendev.org/#/c/735385/
15:47:31 <bnemec> Was there more discussion on that after I logged off?
15:47:46 * bnemec has not been through openstack-discuss yet
15:48:01 <kgiusti> Lemme check...
15:48:44 <kgiusti> http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016229.html
15:49:17 <kgiusti> and the start of the discussion: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016097.html
15:49:45 <bnemec> I think it's in the same place as when I left. :-/
15:50:05 <kgiusti> KK
15:50:23 <bnemec> It feels like a bug in Nova if a compute node can stop responding to messaging traffic and still be seen as "up".
15:50:59 <kgiusti> Agreed.  Seems like that proposed feature is out of scope for the oslo.messaging project IMHO.
15:51:43 <kgiusti> Other than internal state monitoring, o.m. isn't intended to be a healthcheck solution
15:52:03 <bnemec> I feel weird arguing against this when I've been advocating for good-enough healthchecks in the api layer though. :-/
15:52:36 <kgiusti> heck I _wanted_ this, but for my own selfish "don't blame me" reasons :)
15:53:14 <kgiusti> Having dan's opinion made me rethink that from a more user-driver perspective.
15:54:36 <kgiusti> Anyhow, that's where we stand at the moment.
15:55:12 <kgiusti> I was wondering if any folks in Oslo felt differently.
15:55:30 <bnemec> Unfortunately we're a bit short on Oslo folks today.
15:56:16 <bnemec> I'm going to reply to the thread and ask if fixing the service status on the Nova side would address the concern here. That seems like a better fix than adding a bunch of extra ping traffic on the rabbit bus (which is already a bottleneck in most deployments).
15:56:38 <kgiusti> +1
15:57:04 <bnemec> #action bnemec to reply to rpc ping thread with results of meeting discussion
15:57:12 <kgiusti> thanks bnemec
15:58:43 <bnemec> Okay, we're basically at time now so I'm going to skip the wayward review and open discussion.
15:59:07 <bnemec> I think we had some good discussions this week though, so it was a productive meeting.
15:59:29 <bnemec> If there's anything else we need to discuss, feel free to add it to the agenda for next week or bring it up in regular IRC.
15:59:42 <bnemec> Thanks for joining everyone!
15:59:45 <moguimar> not on my end
15:59:58 <moguimar> o/
16:00:00 <bnemec> #endmeeting