16:00:24 #startmeeting keystone 16:00:24 Meeting started Tue May 8 16:00:24 2018 UTC and is due to finish in 60 minutes. The chair is lbragstad. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:26 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:28 o/ 16:00:28 The meeting name has been set to 'keystone' 16:00:32 o/ 16:00:33 o/ 16:00:36 o/ 16:00:41 o/ 16:00:52 o/ 16:00:54 #link https://etherpad.openstack.org/p/keystone-weekly-meeting 16:00:58 agenda ^ 16:01:32 O/ 16:02:02 we have a good mix of folks here - so we can go ahead and get started 16:02:10 #topic New direction with default roles 16:02:24 thanks to jroll we got a bunch of good infomraiton on the openstack-specs process last week 16:02:31 #link http://lists.openstack.org/pipermail/openstack-dev/2018-May/130207.html 16:02:38 summary ^ 16:02:39 #link https://review.openstack.org/#/c/566377/ 16:02:42 new specification ^ 16:03:21 the TL;DR is that openstack-specs isn't really reviewed anymore, and sounds like it has been unofficially superseded by community goals 16:03:43 * hrybacki nods 16:04:01 dhellmann: and mnaser had some really good input on the process too, and actually recommended that we propose the default roles thing as a community goal 16:04:09 (which i was pretty hesitant about) 16:04:20 just because of the volume of work 16:04:37 in the mean time, they suggested building out a solid template using keystone and a couple other services before we formally propose it as a community gaol 16:04:48 which is what hrybacki has done with the new specification 16:05:04 We are hoping for Keystone, Barbican, and Octavia at a minimum 16:05:53 Regardless, I'd like to get a jump on the Keystone work sooner rather than later 16:06:01 i thought the encouragement for proposing this as a community goal was a good sign 16:06:06 +1 16:06:38 so - i guess what we need to decide here is whether or not to allow the specification proposal post specification proposal freeze 16:07:34 I am ok with this one post freeze (tentatively, based upon cmnty feedback on community goals) 16:07:45 This is pretty well defined scope wise. 16:08:06 we also proposed it to openstack-specs a long time ago, so technically it's not a "new" proposal 16:08:06 ++ 16:08:12 but.. going to formality 16:08:29 cross the i's, dot the t's 16:09:01 keep in mind that before it can be accepted as a goal, we would need to see at least one project adopting it and general support for it across the other projects 16:09:31 the formalities are supposed to help us with planning work, moving the spec to a different repo isn't really changing anything for us 16:09:43 ++ cmurphy 16:09:52 we were planning on doing this work regardless i guess 16:09:59 Exactly. 16:10:14 cool - so no objections then, and it sounds like everyone is on the same page 16:10:22 i'll link to this discussion in the ml list 16:10:32 and we can go through the review process for the specification 16:11:02 +1 thanks lbragstad 16:11:16 dhellmann: brings up a good point, because we'll have the opportunity to really make the template solid before proposing the community goal 16:11:22 so we could work in the system_scope stuff too 16:11:52 and hopefully provide a really clear picture of how the two concepts work together 16:12:33 #topic getting patrole in CI 16:12:56 hrybacki: was this your topic? 16:13:02 aye 16:13:22 Okay, so this goes along with the default roles work. tl;dr we need a way to test it and CI is slow, slow, slow to bring in new features 16:13:33 #link https://review.openstack.org/#/c/559137/ 16:13:53 and, now knowing that openstack-specs isn't really monitored, the original review in keystone-specs 16:13:55 #link https://review.openstack.org/#/c/464678/ 16:14:08 would this also be a good fit for a "community goal"? 16:14:20 gagehugo: yes. I think that these two go hand-in-hand 16:14:24 have some role testing in multiple projects 16:14:37 and testing it out in Keystone/Barbican would be a great place to start and work out the kinks 16:14:44 felipemonteriro__ ^ 16:16:12 I'd like to see the original review picked back up, targetting a non-voting job against keystone changes. No way of knowing how long that will take tbh so getting the ball rolling know is preferential imo 16:16:46 adding a playbook to keystone is pretty simple now too 16:16:49 I don't see how this would pigeon hole us (I need to do more research on Patrole) but felipemonteiro__ is quite responsive 16:16:59 gagehugo: +1 16:17:50 I know that ayoung was opposed to the original keystone-spec when we were targetting RBAC in middleware work. I'm hoping he'd be willing to remove it now however 16:18:21 lbragstad: as we are past spec proposal freeze I can see how this might not be ideal for Rocky but is it possible to work on a PoC in the meantime? 16:18:30 which one? 16:18:36 ayoung: https://review.openstack.org/#/c/464678/ 16:18:36 hrybacki: hi. i don't recall if i mentioned this in the spec, but i noticed keystone (particularly lbragstad) and others were committing only to openstack-specs, so i followed suit... i can backtrack too as well and update the one in keystone-specs. 16:19:16 felipemonteiro__: ack, we thought that was the current standard but it's not. Community Goals are where it's at but require adoption and support across several projects first 16:19:21 yeah I don't think openstack-specs is actively reviewed 16:19:21 My issue with patrole is that I am afraid they are going to lock us in to the current, broken Bug 968696 stuff 16:19:23 bug 968696 in OpenStack Identity (keystone) ""admin"-ness not properly scoped" [High,In progress] https://launchpad.net/bugs/968696 - Assigned to Adam Young (ayoung) 16:19:38 I would be fine with it going in afterwards, or if there was a plan to mitigate 16:19:48 ayoung: how would it do that? 16:19:53 non-voting* 16:19:57 we could have a requirement to only add test cases for patrole once an API works in the new default roles and has scope_types set 16:19:58 I've not gotten a sense from the Patrole team that they take the problem seriously enough 16:20:37 TBH, if they care about policy, Patrole should be a distant second in effort 16:20:53 getting the changes in for scope should be full court press 16:21:32 felipemonteiro__, you are the Patrole Point of contact? 16:21:38 ayoung: folks are working that. I'm wanting to work with the AT&T folks to lay out the legwork for Patrole. I'd like to have a testing strategy ready to go before submitting a Communtiy Goal for the default roles work 16:23:05 I don't presume that goal will be accepted until T release 16:23:11 Patrole must come after new role work 16:23:28 ayoung: yes, as while others are active downstream, i work upstream too 16:23:31 I keep hearing that but I don't understand why 16:23:36 felipemonteiro__, why do you care? 16:23:43 We can't lock in on the broken role setup we have now. 16:23:54 felipemonteiro__, why did you build patrole? What was the impetus? 16:24:07 kmalloc: I don't get why it would do that 16:24:10 Gating on it locks us/wedges us deeper into where we are today that is seeing a lot of work to undig the hole 16:24:36 all the testing would require setup classes and infrastructure to assert scopes and things like that 16:24:37 i didn't build it, i helped contribute to it. i care because at&t cares about validating any rbac changes since our rbac customization is rather complex compared to most folks'. 16:24:38 non-voting jobs won't lock us in but they will give us a baseline to look back upon 16:25:09 felipemonteiro__, that is the problem 16:25:10 kmalloc I would hope we wouldn't make it voting until the broken roles are fixed 16:25:15 we need to change RBAC, radically 16:25:24 because how it exists today is broken 16:25:25 I'm not asking us to put in an entire suite of tests -- I just want to ensure that the infrastructure is there for when we are ready to do just that (after role work is in) 16:25:33 hrybacki: try changing something tempest is locked in on. It at least extends the timeframe by 2-3 cycles, and may mean we are even longer pushing out for generalized acceptance/gating of the new way 16:25:34 and if we test based on how it is today, we will not be able to fix it 16:25:39 and I am firm on thiws 16:25:41 this 16:25:56 patrole, without a fix for 968696 is dangerous 16:26:03 and I have been burnt by this pattern before 16:26:13 we had tests in tempest that assumed broken behavior 16:26:16 In short, it must come (anything that locks/gates/tests with voting) after new role work. 16:26:26 is there something preventing us from doing this incrementally? 16:26:34 and we could not fix in Keystoner because then the tests would fail in tempest, and not pass gate etc 16:26:40 but how do we confirm role work is not broken without testing it? 16:26:54 also again non-voting** 16:27:02 hrybacki, we know it IS broken today 16:27:06 if we add the default roles to an API in keystone, we can add a test to patrole after words to assert the correct behavior along with unit tests, can we not? 16:27:08 lbragstad: current roles are so narrow, if we test + vote now, we aren't going to be able to move the needle 16:27:08 once we fix it, yes, patrole 16:27:10 not until 16:27:29 So, we should setup patrole to test with the new defaults out the gate. 16:27:36 felipemonteiro__, help out on the 968696 work and you will get my full supprt 16:27:40 Just not on the current broken RBAC setup. 16:27:49 ayoung: yeah you're right, we've had to seek out workarounds to said bug internally, including scoping adminness via sub-admin roles and the like, but as a result of which we had to have a means of validation. 16:27:52 do we have scope changes for all services? 16:28:00 kmalloc: that is precisely what I'm shooting to do 16:28:20 felipemonteiro__, here is what I would accept 16:28:24 i think we've all said the same thing, just different ways 16:28:31 a version of Patrole that fails due to bug 968696 16:28:33 bug 968696 in OpenStack Identity (keystone) ""admin"-ness not properly scoped" [High,In progress] https://launchpad.net/bugs/968696 - Assigned to Adam Young (ayoung) 16:28:41 and that passes when we show changes go in that fix it 16:28:49 hrybacki: good, what ayoung and I are saying is we absolutely don't want to test current state of the world. 16:28:59 We're on the same page. 16:29:36 okay cool -- so tl;dr ensure the spec clearly indicates the non-voting* gate will be in place and testing against 16:29:51 hrybacki, ++ 16:29:52 hrybacki ++ 16:29:57 As long as we keep that in mind we are good. Standing patrole up for the new defaults is the confirm we have fixed 98696 16:30:02 And don't regress. 16:30:13 this way we have infra in place, that won't interfere with fixes, and doesn't stick us into existing behaviors 16:30:19 ++ 16:30:22 Exactly. 16:30:24 excellent, thanks all :) 16:30:30 lbragstad: you right 16:30:31 it sounds like the patrole spec can be collapsed into the testing section of the default roles one 16:30:31 lbragstad, do we have changes in place for, say cinder, neutron, glance, as well as nova? 16:30:49 ayoung: i've been working on patches for nova, but i haven't gotten everythign done yet 16:30:59 lbragstad: I'm not sure I want to tightly couple them but dont' have solid reasons to jusitfy that intuitino yet 16:31:02 waiting on the ksm patch and the oslo.context patch 16:31:13 keep them separate, but link them at the related specs section 16:31:26 hrybacki: ayoung ok - wfm 16:31:42 are we okay to un-abandon https://review.openstack.org/#/c/464678/ in this case? 16:31:59 ayoung: i can certainly try to assist in changing things w.r.t to things like system scoping but it's not my full time job, so any change that isn't forthcoming isn't because of a lack of agreement or consensus 16:32:00 I can work with felipemonteiro__ to bring it up-to-date given today's discussion 16:32:05 Its just a -1 16:32:30 hrybacki, TYVM 16:33:55 thanks for popping in felipemonteiro__ -- lbragstad that is all I have on that topic 16:34:02 cool 16:34:08 #topic multisite support 16:34:14 ayoung: i think this one was you? 16:34:17 hey this is me 16:34:25 hey zzzeek 16:34:39 i got alerted to a blueprint that ayoung is working on which is about something ive already been working on for months 16:34:55 #link https://review.openstack.org/#/c/566448/ 16:34:57 which is being able to have multiple overclouds coordinate against a single keystone database 16:35:19 we've already worked out a deployment plan for this as well as solved a few thorny issues in making it work with very minimal tripleo changes 16:35:22 the pattern we are seeing is that multi-site is not necessarily planned up front 16:35:54 instead, a large organization might have several openstack deployments, and they realize that they need to be run as multiple regions of a single deployment 16:36:02 right...so i call it a "stretch" clsuter, where you take the two overclouds and then you add a glaera database that is "stretched" over the two or more overclouds 16:36:10 #link https://github.com/zzzeek/stretch_cluster/tree/standard_tripleo_version 16:36:21 so that's a working POC 16:36:42 Sounds like something icky to setup, but a good approach in general. 16:36:43 For people not up-to-speed with Tripleo, overcloud is the term for the openstack cluster we care about 16:36:44 which does what it has to in order to make the two overclouds talk, so the plan is, make it nicer and make it not break with upgrades 16:36:57 there is an undercloud using ironic but those are site specific 16:37:17 right so here we are assuming: tripleo, pacemaker / HA, docker containers, galera 16:37:41 The only sticky point is divergence of keystone data, afaict. 16:37:51 zzzeek, and for the overcloud, we are supporting Keystone,nova,neutron,glance,swift, cinder, Sahara? 16:38:06 kmalloc, it is a major stickypoint, yes 16:38:12 Between the two sites (assuming these are existing clouds) 16:38:21 ayoung: keystone is the only service that actually communicates over the two overclouds. the rest of the servcies remain local to their overcloud 16:38:28 If it is strictly a new deployment, less icky. 16:38:31 so for the merging of data, O 16:38:46 I've been assuming that the overclouds are built up indepdendntly but plan to be merged immediately 16:39:03 upgrades should be able to use the Keystone 0 downtime upgrade mechanism, 16:39:06 so the keystone DBs are built up individually using distinct region names 16:39:14 That is less problematic to address. 16:39:22 role IDs should be synced 16:39:23 then the DBs are merged where all region-specific records are added 16:39:33 Long running clouds would be hard (tm) 16:39:43 at the moment I just use SQL scripts inside of an ansible playbook to do this 16:39:52 long running would require db conversion scripts, I would think 16:40:02 not to be lightly undertaken, but not impossible 16:40:05 ayoung: yes. And it gets very sticky. 16:40:12 just...don't try to do it while running the cloud 16:40:27 zzzeek: we could wrap that into a keystone-manage (vs ansible) 16:40:32 And I'd be ok with that. 16:40:34 i.e. stop keystone, run script, start keystone 16:40:36 ++ 16:40:59 kmalloc: yes it woudl be a keystone command 16:41:06 the biggest problem I would expect would be the Default domain 16:41:23 ayoung: id is "default" 16:41:26 everyone has one, and they have the same ID and Name, but then there are all the projects under 16:41:30 So, not a big issue. 16:41:47 all of the projects under it will conflict, tho 16:41:53 kmalloc: but additional work here that is outside of keystone involves deploying the galera cluster across datacenters. since ayoung's spec is against tripleo, that would be part of it right? 16:41:56 This is not long running clouds, so manageable. 16:42:14 so if you have 2 clouds, and each has a Production proejct, in default domain, they will have same names, different Ids 16:42:24 right, 16:42:26 zzzeek: yes, with a spec/something for keystone to track against 16:42:48 for long running clouds, you would say "get everything off the default domain" 16:42:57 before you could even consider something like this 16:43:02 But our side is pretty minimal, if you want to move it out of ansible.. non-existent if ansible is a-ok to continue with. 16:43:19 kmalloc: the ansible thing i have right now is not production grade since it is hardcoding SQL in it 16:43:31 would that be something we could do in a migration? 16:43:36 is it a run once type thing? 16:43:43 ayoung: unlikely. 16:43:53 ayoung: not really, it deosnt chagne the structure of the database, it's about data coming into the DB 16:43:55 But we can explore it. 16:44:09 ayoung: it's like, here's a keystone DB, heres a bunch of new services etc. from a new region 16:44:17 that can happen any number of times 16:44:39 and it must be idempotent, right? 16:44:52 Yeah, I would expect it to be idempotent 16:45:15 lbragstad: well if it's a keystone-manage command, it's always nice if it is but i dont know it's 100% necessary 16:45:42 I think idempotent is a requirement. Risk of breaking things if it isn't is high 16:45:54 kmalloc: sure. so keystone might need to have some knowledge of this 16:46:00 And up-arrow+enter could ruin a cloud. 16:46:02 e.g. the keystone db 16:46:20 kmalloc: yes if you're guarding against arrow-enter then yes, you need some kind of record that shows somehting already happened 16:46:27 Yep. 16:46:45 so i didn't manage to get "domains" to do this, keystone seemed pretty intent on "domain" being "default" 16:46:51 it seemed like "regions" were the correct concept 16:46:53 And keystone-manage commands are mostly safe to run anytime(tm) 16:46:53 is that not correct ? 16:47:23 so...from keystone-manage, we would want utilities to rename domains and regions. Anything else? 16:47:38 This sounds like regions to me. Domains are containers.. you'll need to replicate them, but that is fine. 16:47:51 kmalloc: here's the code right now: https://github.com/zzzeek/stretch_cluster/blob/standard_tripleo_version/roles/setup-keystone-db/tasks/main.yml#L29 16:48:00 kmalloc: which got me a basic nova hello world going 16:48:16 Cool. 16:48:33 kmalloc: there's a copy of each keystone DB on the shared cluster, keystone1 and keystone2, and it just copies region and endpoint in and that's it 16:48:35 Yeah that should ultimately be doable via keystone-manage. 16:48:42 Some of that is Tripleo specific, I think. If you install Keystone, you don't get any service catalog by default 16:48:44 kmalloc: seemed a little too simple 16:48:57 So, you're going to run into some issues with roles, projects, etx 16:49:18 kmalloc: so the assumption i made was that, you've just deployed the two overclouds assuming this case so they have the same passwods, the same projects and roles, etc. 16:49:19 ah...rigjht...role IDs should be sync-able, too 16:49:27 Basically that doesn't solve authn/authz across the clouds consistently 16:49:34 and that will be an expensive database update 16:49:44 Project IDs are uuid4 generated 16:49:47 kmalloc: for our initial version we were going to keep it simple that the two overclouds can fold into each other 16:49:54 So, if any are created, IDs won't be the same. 16:49:56 projects should be left alone 16:50:17 the problem is if two clusters have the same project name in the same domain. New install, not an issue 16:50:20 Same with users, etc. Passwords are bcrypt, so salted ...same but different t hashed strings. 16:50:35 kmalloc: yeah i acually deployed the overclouds w/ identical passwords to solve that :) 16:50:36 can we sync the salt? 16:50:52 We would need to sync the hash. 16:51:07 Which covers the salt. 16:51:33 We can't force a salt within keystone 16:52:14 i would say, pick a winner, the operator has to know the passwords are going to change if they aren't the same already 16:52:18 zzzeek: so, you will see drift of data between the keystone's if we're only galeraing the regions/catalog. 16:52:44 kmalloc: right. initial use case is that you have a brand new keystone DB in both datacenters 16:52:44 Right, you should setup a "master cloud" that populates the new cloud 16:53:07 And overwrites things as needed. 16:53:28 so, I think we need a spec for the keystone manage changes we want to support this, right? 16:53:38 Yeah, I think we do 16:53:58 This isn't a bad concept, or even anything outlandish 16:54:02 OK so from my perspective, I'm looking at the full view, which is, you have two tripleo overclouds, you have deployed such that keystones are talking to a separate galera DB that is local to their environment, the "stretch" operation then pulls those galeras into a single cross-datacenter cluster and the keystone data is merged 16:54:06 Just need to have it clearly written. 16:54:12 we'll need a proposal exception as well 16:54:15 I did start a tripleo spec, but I think I want a separate one for the Keystone options 16:54:47 also since nobody asked, the big trick here is that I modified the pacemaker resource agent to allow multiple pacemakers to coordinate a single galera cluster 16:54:51 zzzeek: that sounds reasonable. Then pivot the keystone to read/write from.galera 16:54:54 I'll submit a placeholder 16:55:10 Once the new db is in place. 16:55:30 just a heads up, we're at 5 minutes remaining 16:55:37 ayoung: this may need to be a "utility" separate from keystone e-manage 16:55:44 what also is nice is that the keystone services never change the DB they talk to, it's a galera cluster that just suddenly has new nodes as part of it 16:56:27 ayoung: let's get a spec and go from there. Make sure to add clear use-case descriptions 16:56:30 kmalloc, if so, we'll get that in the spec 16:56:50 So we don't need to refer to the meeting long to remember it all :) 16:58:33 i look forward to reading the spec, it'll be more clear to me on paper i think 16:58:43 #action ayoung to write up spec for stretch over cloud with zzzeek 16:59:16 any other comments on this topic? 17:00:00 thanks zzzeek 17:00:07 #topic open discussion 17:00:12 last minute - 17:00:26 wxy|: has a new version of the unified limit stuff up 17:00:29 and it looks really good 17:00:36 everyone here should go read 17:00:38 :) 17:00:40 #link https://review.openstack.org/#/c/540803/10 17:00:54 lemme leave a tab open for it 17:01:03 * hrybacki same 17:01:06 thanks for the time everyone! 17:01:10 #endmeeting