18:00:32 #startmeeting keystone 18:00:33 Meeting started Tue Aug 30 18:00:32 2016 UTC and is due to finish in 60 minutes. The chair is stevemar. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:34 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:00:36 The meeting name has been set to 'keystone' 18:00:40 heyo 18:00:50 breton: it's 16 weeks 18:00:58 o/ 18:00:59 o/ 18:01:06 o/ 18:01:09 o\ 18:01:10 hi 18:01:41 o/ 18:02:07 short agenda again! 18:02:07 https://etherpad.openstack.org/p/keystone-weekly-meeting 18:02:23 stevemar: thanks, sounds good 18:02:42 o/ 18:03:03 if anyone has anything to add to the agenda, do so now 18:03:09 or wait til open discussion 18:03:45 ping list? 18:03:56 ayoung: yes sir! 18:03:58 ping ajayaa, amakarov, ayoung, breton, browne, crinkle, claudiub, davechen, david8hu, dolphm, dstanek, edmondsw, gagehugo, gyee, henrynash, hogepodge, htruta, jamielennox, jaugustine, joesavak, jorge_munoz, knikolla, lbragstad, MaxPC, morgan, nkinder, notmorgan, raildo, rodrigods, rderose, roxanaghe, samleon, samueldmq, shaleh, stevemar, tsymanczyk, topol, vivekd, wanghong, xek, nishaYadav 18:04:05 o/ 18:04:20 o/ 18:04:34 hi all 18:04:38 o/ 18:04:53 o/ 18:05:02 #topic milestone-3 status 18:05:03 \o 18:05:46 dstanek's fix for caching merged, that was a huge win 18:06:10 does anyone know if it also fixes https://bugs.launchpad.net/bugs/1600393 and https://bugs.launchpad.net/bugs/1600394 ? 18:06:10 Launchpad bug 1600393 in OpenStack Identity (keystone) "v2.0 catalog seen in v3 token" [Critical,Confirmed] - Assigned to David Stanek (dstanek) 18:06:10 dstanek: now how about backport it? )) 18:06:10 stevemar: you mean caching works now :-P 18:06:12 Launchpad bug 1600394 in OpenStack Identity (keystone) "memcache raising "too many values to unpack"" [Critical,Confirmed] - Assigned to David Stanek (dstanek) 18:06:16 hehe 18:06:37 amakarov: sure, i can do that after this last pass at lbragstad's review 18:06:59 breton: i think you mentioned it also solved one of those bugs? 18:07:35 i suppose we could mark them as incomplete and ask the originator to check with the master branch if it's still an issue 18:07:35 I don't see how this would fix either of those bugs. 18:07:59 \o 18:08:00 the problems there aren't because the cache wasn't flushed correctly 18:09:17 bknudson: for both bugs or are you referring to one in particular? 18:09:43 bknudson: ++, those do not seem to be related to me either 18:09:43 stevemar: is my oslo.cache solution still required? Does somebody else use memcache besides keystone? 18:10:03 stevemar: dstanek: has anybody else got to reproduce those bugs ? 18:10:05 stevemar: both of those bugs. 18:10:14 * notmorgan is lurking at best 18:10:15 although I've never seen the v2.0 catalog seen in v3 token issue 18:10:49 amakarov: apparently there other users of oslo.cache besides keystone. 18:10:50 notmorgan: do you know what projects use oslo.cahe too? 18:10:54 bknudson: if flushing the cache solves the issue, then it sounds like we don't need a keystone fix? 18:11:11 amakarov: none in the way we do. most use the in-mem old compat memcache dict thing 18:11:43 i don't think my fix could possibly fix some of the weird bugs we are seeing 18:12:06 stevemar: flushing the cache won't fix the "too many values to unpack" bug. 18:12:17 bknudson: that is the wierdest bug 18:12:30 because the error comes from the memcache lib 18:12:36 so weird. 18:12:38 notmorgan: still looking at it. I'm going to add some tracing to python-memcached 18:12:42 dstanek: you'd become a silver bullet author in that case )) 18:12:50 bknudson: ++ 18:13:07 notmorgan: should be able to see exactly what it's reading from memcache socket 18:13:09 amakarov: one could only wish 18:13:20 well at any rate, it sounds like neither will be solved for milestone 3 18:13:35 i couldn't replicate the others in my local environment to debug 18:13:42 considering it will be tagged tomorrow... 18:14:10 stevemar: we can tag and work in a fix -> backport 18:14:19 yep 18:14:32 okay, i'll bump those 2 18:15:14 notmorgan: bknudson what about https://bugs.launchpad.net/bugs/1609566 and https://review.openstack.org/#/c/358872/ ? 18:15:14 Launchpad bug 1609566 in OpenStack Identity (keystone) "500 error from revocation event deserialize" [Medium,In progress] - Assigned to Morgan Fainberg (mdrnstm) 18:15:23 I think those are all the same bug 18:15:33 bknudson: nope. 18:15:35 same cause 18:15:41 bknudson: not the deserialie one 18:15:47 yep, 1609566 is unrelated 18:15:56 right 18:16:01 that one is actually a different issue, looks like it's getting non-strings back out 18:16:14 dstanek: you had a -1 on it 18:16:17 stevemar: a rebase should be all that patch needed, since it;'s partial-bug 18:16:52 there is longer term stuff to be done there but it will squash the issue more directly and give us feedback 18:16:58 stevemar: which one? 18:17:05 dstanek: https://review.openstack.org/#/c/358872/ 18:17:42 i don't think we can only serialize rthe keys we want atm, no guarantee that is correct. if anything we also need the filter on the deserialie+logging 18:17:48 ah, yeah. i don't see why we store it in memcache if we can serialize it. i'd rather fix on the flip side 18:18:24 we're serializing __dict__ which is bad 18:18:56 dstanek: what worries me is that this suddenly started being an issue, I don't know what refactor broke it. 18:19:09 rebased it 18:19:10 notmorgan: is it reproducible? 18:19:28 Speaking of 1609566, we've run into issue when revocation tree grows larger 1M and gets declined by memcache. It doesn't prodice an error though - just returns NO_VALUE on the next GET operation 18:19:30 dstanek: it has been in isolated environemnts. 18:19:39 dstanek: it's just another weird edge cases 18:19:41 I can probably recreate https://bugs.launchpad.net/bugs/1609566 18:19:41 Launchpad bug 1609566 in OpenStack Identity (keystone) "500 error from revocation event deserialize" [Medium,In progress] - Assigned to Steve Martinelli (stevemar) 18:19:42 amakarov: welcome to memcache 18:19:54 but I get other errors too when I try to recreate anything 18:20:06 amakarov: that has been a long running issue with using memcache and lists of objects 18:20:26 amakarov: yep, that's the expected behavior :-( 18:20:27 amakarov: if you're hitting that, restart memcache with a larger slab size =/ 18:20:52 notmorgan: yes, we had to do that 18:20:55 is someone working on the issue with querying revocation events? 18:21:07 bknudson ravelar is working on that a bit 18:21:08 I think there's a review in progress but it's a POC 18:21:15 notmorgan: amakarov: but first do some performance analysis to make sure the change does negatively impact what you can cache 18:21:16 specifically on optimizing the sql query 18:21:36 I think it would help everyone if that was completed 18:21:37 also I think we're doing smth wrong if we end up tossing cicterns of data here and there 18:21:43 dstanek: larger than 1M slab size is a horrible thing to do to memcache and will absolutely make for bad performance (relative) 18:21:48 s/cicterns/cisterns/ 18:22:00 speaking of ravelar1 18:22:02 there he is 18:22:09 lbragstad: has ravelar posted anything already? 18:22:11 amakarov: this is really a known issue. 18:22:26 samueldmq: i believe there is a wip 18:22:27 ravelar1 I believe you have a review up for your work on the revocation event sql query, right? 18:22:33 on my system I see huge slowdowns with only a few revocation events. 18:22:35 amakarov: and one of the only places we are storing a list of elements in cache. 18:22:42 dstanek: so you'll still be -1 on this? rather fix it than ignore the errors? (mind you it's just a partial fix) 18:22:48 amakarov: there is a reason we don't cache other "list" methods 18:22:57 samueldmq: btw my token pre-cache patch is really useful in the case ^^ 18:23:00 and in our staging envs we're getting a lot of complaints about the revocation event slowdown. 18:23:06 bknudson same here... i notice performance degradation after just a few tempest runs (~1500 events) 18:23:29 stevemar: i'd have to go over it again at +1 as a temp fix. my concern is that we could throw out an importatnt key 18:23:31 i hate to admit it, but the horrible tree code at least wasn't broken all over the place 18:23:37 i'd have to look to see if that is possible 18:23:57 even to build the tree you need to fetch every row 18:24:17 dstanek: we could, it is better to fail securely and reject more tokens than to pass invalid tokens (if we maintain revocations as a thing) 18:24:25 bknudson: what's the story with in-database revocation check? 18:24:25 bknudson: yes. i mean the associated bugs. 18:24:26 seems like we should be able to take advantage of the "since" part of the query (if we kept the list in mem) 18:24:27 ayoung: ^ 18:24:30 There are a few issues I have been running into with trying to change the current is_revoked and matches to query but they mainly stem from tox tests that solely test based on premade event dictionaries 18:24:31 i don't understand how we get a non-string key to begin with 18:24:48 Kill it. 18:24:58 dstanek: something in one of the refactors to revocations and serialization broke it. 18:25:09 dstanek: and i can't point a finger at why/where. 18:25:41 here's the review: https://review.openstack.org/#/c/359371/ 18:25:49 dstanek: i've now spent days working through the code. so in short, i just opted for somerthng that breaks less but makes a lot of noise so we get feedback 18:26:05 for making the revocation check an sql query 18:26:17 bknudson: ++ i would like to see that if we keep revocations. 18:26:29 Clint Byrum was interested. Wanted to see indexes on the columns. 18:26:38 and he is correct 18:26:42 we should index the columns 18:26:49 we should drop the whole damn thing 18:26:50 i think i said the same thing early on in these discussions 18:26:56 y, otherwise you do a table scan anyways 18:27:07 sounds like this bug can make the cut, at least the partial fix can 18:27:15 is there an evidence that the issue is the python code? 18:27:16 is SpamapS around? 18:27:17 Newton Release Note: Tokens can no longer be revoked. 18:27:18 DOne. 18:27:19 ayoung: convince stevemar, i'm fine with it but convince stevemar 18:27:24 ayoung: since he's PTL 18:27:40 i think the ops community will be in an uproar 18:28:00 not the right time, i just want to get newton-3 done 18:28:01 because there is no way to lock out tokens without timeout in cases . 18:28:02 yes! 5 minutes tokens ftw! 18:28:05 notmorgan: the only thing i can think of as being a possible scenario is somehow __dict__ is getting directly manipulated 18:28:36 dstanek: possibly, i am actually thinking of making the reveent not an object anymore 18:28:42 dstanek: and make it a primitive if we keep it 18:28:45 lbragstad: I'm around. what's up? 18:28:53 dstanek: then we drop the custom msgpack serializer 18:28:59 dstanek: trivial 18:28:59 notmorgan, ? 18:29:01 SpamapS you were looking to have a discussion around indexes here - https://review.openstack.org/#/c/359371/ ? 18:29:01 if you want to set up a dev environment where you can reproduce these problems I can provide instructiona. 18:29:10 lbragstad: I'll look right now. 18:29:19 revocation as a primitive...? 18:29:22 SpamapS we have ravelar1 here, too 18:29:27 ayoung: make it a dict (eeeeuuwwW) and it'll be less janky serializing [if we don't drop it] 18:29:29 Oh cool 18:29:32 notmorgan, ++ 18:29:36 ok I already looked at that yesterday, yay. :) 18:29:42 SpamapS: yeah we were saying we agree, indexes 18:29:45 notmorgan, lets make it a dict. Nothing wrong with that. 18:29:45 SpamapS: :) 18:29:57 I just want to see an assertion of which existing index, or new index, will service the query. 18:30:02 ayoung: ok i'll roll that today then. but it'll be a backport for RC 18:30:06 it is just a Data transfer object. If we say "any fields but these are ignored" we should be OK 18:30:09 or wait 18:30:11 uh 18:30:13 whatever 18:30:16 it'll be the post n-3 18:30:19 I trust mysql's optimizer to sometimes pick better ones than we _think_ will be used. 18:30:24 and "all fields must be serializable" 18:30:26 notmorgan: yes, thats the one ;) 18:30:33 But we should at least think about which ones we think should make it go fast. 18:30:34 bknudson: yes, please 18:30:41 SpamapS: oh i 100% agree here. 18:31:08 can i move on to the blueprints? 18:31:14 ayoung: we wont be able to drop msgpack because datetime doesn't serialize the way we need it in json 18:31:23 dstanek: here's the instructions: https://review.portbleu.com/gitweb?p=pyrrrat-ci.git;a=blob;f=docs/deploy-vagrant.rst;h=0cbe2a128c17b57753b168cc5990512712e834f9;hb=HEAD 18:31:24 ayoung: but we can just make rev event a dict and call it a day 18:31:39 notmorgan, if that works, I am fine with it 18:32:14 ravelar1 + bknudson to fix revocation :) 18:32:25 tempted to #action that one 18:33:04 #topic blueprints still tagged for newton-3 18:33:34 bknudson: thx 18:33:35 we've got https://blueprints.launchpad.net/keystone/+spec/credential-encryption and https://blueprints.launchpad.net/keystone/+spec/pre-cache-tokens still tagged for newton3 18:33:53 precache tokens first for amakarov 18:34:27 the spec was approved, the code is up https://review.openstack.org/#/c/309146/ 18:34:44 dstanek: you're -1 on it, and so is haneef 18:35:18 after talking to amakarov i think i'll take the -1 away 18:35:18 notmorgan opinion on this one? your views are always appreciated 18:35:22 notmorgan: which column is consistent enough to index? 18:35:26 samueldmq, dstanek, haneef: In the discussion above was the point that validation gets slow if revocation tree grows large 18:35:41 that's the value of the patch 18:35:48 (sneaks in the back door) 18:35:51 i wasn't completely sure on the overall value since i reuse tokens. this would only really help those that never reuse tokens 18:36:11 * dstanek welcomes henrynash to the party 18:36:21 back from bank holiday 18:36:22 henrynash: you carried an entire door with you? 18:36:40 stevemar, it was a doggy door. 18:36:51 a man's gotta do what a man's gotta do 18:36:57 amakarov: my biggest concern is the extra invalidations. not sure what the impact would be for those 18:37:31 dstanek: we cache tokens in KMW anyway 18:38:37 dstanek: oh, didn't get your point: you are about chenges that cause cache region to die 18:38:50 amakarov: but your pre-caching in keystone so i'm wondering how often one of those operations happen that invalidate *all* the tokens 18:38:55 what if there was a config option to enable pre-caching of tokens? 18:39:19 bknudson: would that make you sleep better at night? 18:39:21 then if you think it'll help you then go ahead and enable it 18:39:24 bknudson: 'caching=true' )) 18:39:42 bknudson: options are usually a good thing 18:39:45 amakarov: i don't think bknudson's ask is too much 18:39:56 amakarov: think you can whip it up? 18:40:07 stevemar: I have no problem with making that configurable 18:40:27 amakarov: is there any way to say what the negative impact might be from real production cloud data? 18:40:33 stevemar: iiuc the question is the value of the whole thing 18:41:07 you can propose removing the config option next release 18:41:11 dstanek: our scale team agreed this won't do any harm 18:41:41 amakarov: so not many of those action actually happen? 18:41:44 I suspect it will be fine to do and we'll enable it but I'd want to see the performance results first. 18:42:29 #action amakarov to make precaching configurable 18:42:32 bknudson: shall we discuss a test case you'd like to see? 18:42:48 out of the meeting probably 18:43:21 amakarov: what happens when a trust is deleted for instance 18:43:25 dstanek: approximately as many as those causing token revocations 18:43:51 dstanek: ..it happens :) regions gets invalidated 18:43:57 I'm not going to have the time to try it myself until I get through a bunch of other issues. 18:44:19 nobody here is complaining about this since we're hitting the issue with revocation events 18:44:30 on to credential encryption... 18:44:38 #link https://review.openstack.org/#/c/355618/ 18:44:42 lbragstad: https://blueprints.launchpad.net/keystone/+spec/credential-encryption 18:44:45 USE TLS 18:44:54 And X509 auth 18:45:00 ^ that change is dependent on a couple devstack changes 18:45:07 (which are linked in the commit message) 18:45:15 ayoung: it's more so admins can't do a database dump and look at all your stufff 18:45:17 but it's passing - and I'm working on a migration guide now 18:45:35 I'm about half done and expect to have it completed by the end of the day 18:45:46 amakarov: with my patch merged we'll actualy start invalidating regions for real, which is part of my concern 18:45:49 nothing formal, just a gist showing what the upgrade looks like, along with the new upgrade process 18:45:51 stevemar, Huh? I thought it was for config files. 18:45:58 ayoung: nope 18:46:06 but - with that, I think the biggest hangup is now on triggers... 18:46:10 ayoung: it's for the credentials backend 18:46:16 dstanek: your patch will save a lot of headache! 18:46:17 ah 18:46:23 ok... 18:46:33 * ayoung goes back to LDAP land 18:46:41 hehe 18:47:16 I'm working on the guide here - https://gist.github.com/lbragstad/ddfb10f9f9048414d1f781ba006e95d1 18:47:20 #link https://gist.github.com/lbragstad/ddfb10f9f9048414d1f781ba006e95d1 18:47:41 which right now is pretty basic, I'm adding the credential specific stuff in a bit 18:48:02 I'll be sure to drop it into -keystone when its finished, and link it in the implementation review 18:49:04 lbragstad: still seems a lot of moving parts to land before tomorrow EOD 18:49:23 lbragstad: must it land in newton? 18:50:04 stevemar well - it was a requirement for the mfa stuff 18:50:24 true 18:52:06 we could make the MFA work dependant on credential encryption (in O) 18:52:31 lbragstad: i won't bump it for now, but i won't cry if it doesn't make the cut :) 18:52:31 stevemar i think we need to have the trigger discussion 18:52:48 #topic sql triggers 18:52:55 because that is going to be a big factor in if this lands. 18:53:12 stevemar I doubt I'll have the time to rework the implementation to not use triggers by tomorrow 18:53:40 mailing list for the record: http://lists.openstack.org/pipermail/openstack-dev/2016-August/102293.html 18:53:55 lbragstad: i thought bayer said triggers was OK to use in this instance 18:54:08 stevemar agree - he did 18:54:16 lbragstad: so what's the issue? 18:54:21 stevemar it sounds like others still have issues with it though 18:54:22 i say we use them 18:54:35 ok 18:54:45 me too :) ++ 18:54:48 stevemar: ++ (even though we hopefully don't need them for the created_at fix) 18:55:09 stevemar in that case - i think the current implementation works well 18:55:30 we test the triggers against MySQL and postresql 18:56:10 lbragstad: ...and I think we are saying....we don't support sqlite for upgrades... 18:56:22 henrynash: yes 18:56:32 henrynash yeah - zzzeek was on board with that too 18:56:36 i doubt we've ever supported that 18:57:04 stevemar: well, we have to be a bit carefull....you do an offline "upgrade" just to stand up a new version of openstack 18:57:15 I think I got disconnected 18:57:20 stevemar: i.e. we do a db_sync and run through all the migrations 18:57:41 lbragstad: henrynash are there special cases for sqlite too ? 18:58:07 lbragstad: bumping credential encryption also means bumping the triggers decision to O :) 18:58:16 since we won't have any 18:58:32 stevemar that is ultimately bumping R/W upgrades as well 18:58:36 stevemar: that's gotta work...or nobody can use sqlite at all (which we would secretly like, but not sure we can legistlate against immeditaly) 18:58:43 stevemar: seems a sane decision 18:58:57 any reason we should rush with that ? 18:59:02 lbragstad: I really object to that....I would rather have RW upgrades than mfa 18:59:26 stevemar we also currently support TOTP 18:59:35 which leverages the credential backend 18:59:39 lbragstad: and those aren't encrypted either 18:59:45 from a security perspective - they aren't encrypted 18:59:46 right 19:00:02 lbragstad: the need for keystone no downtime upgrades must, imho, come before mfa (if the mfa solution isn'r ready for RW upgrades) 19:00:10 lets jump to -keystone 19:00:14 #endmeeting