16:00:10 #startmeeting Cinder 16:00:11 Meeting started Wed Feb 1 16:00:10 2017 UTC and is due to finish in 60 minutes. The chair is smcginnis. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:12 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:14 The meeting name has been set to 'cinder' 16:00:19 hi 16:00:23 o/ 16:00:24 o/ 16:00:33 ping dulek duncant eharney geguileo winston-d e0ne jungleboyj jgriffith thingee smcginnis hemna xyang1 tbarron scottda erlon rhedlind jbernard _alastor_ bluex karthikp_ patrickeast dongwenjuan JaniceLee cFouts Thelo vivekd adrianofr mtanino yuriy_n17 karlamrhein diablo_rojo jay.xu jgregor baumann rajinir wilson-l reduxio wanghao thrawn01 chris_morrell stevemar watanabe.isao,tommylikehu 16:00:39 mdovgal ildikov wxy viks ketonne 16:00:40 hey 16:00:45 yough 16:00:45 Hi! 16:00:46 hi 16:00:47 hi 16:00:48 * bswartz wanders into the room 16:01:02 Hey 16:01:20 o/ 16:01:37 #topic Announcements 16:02:01 hi 16:02:05 RC-1 is tomorrow. We should try to wrap up any important bugs today if at all possible. 16:02:25 After RC-1 we will really need to limit what, if anything, we allow in. 16:02:35 Critical only, ideally. 16:02:36 hi 16:03:16 I need to start reworking the focus etherpad to start prepping for Pike, so I won't link that here for now. 16:03:22 smcginnis: master opens for Pike once RC-1 is tagged, right? 16:03:54 dulek: Correct. I need to branch stable/ocata when we tag RC-1. So once that is done, that means master will now be Pike. 16:04:24 So just hold off on anything until you see that stable/ocata is actually created. ;) 16:04:42 Hi 16:04:45 #link http://www.openstack.org/ptg PTG info and registration 16:04:57 рш 16:04:58 hi 16:05:03 I think it was mentioned there are only 18 spots left for the PTG as of this morning. 16:05:26 If you've been waiting on that, better get moving. 16:05:47 #link https://etherpad.openstack.org/p/ATL-cinder-ptg-planning PTG topic planning 16:05:58 * dulek finally got his flight tickets for PTG. Wheee! :) 16:06:00 Add any ideas to scottda's list of topics. ^^ 16:06:04 dulek: Awesome! 16:06:05 ha 16:06:09 :) 16:06:41 I'll try to start arranging proposed topics into a logical-ish order soon so we can have a list to work through at the PTG. 16:07:17 #link https://www.openstack.org/summit/boston-2017/call-for-presentations/ Summit CFP 16:07:35 Less than a week left to submit talk proposals for the Summit. 16:09:02 I guess one final announcement item - PTL nomination period closed and I was the only one. So y'all stuck with me again. :) 16:09:24 Congratulations, Great Leader! 16:09:33 Hah, thanks. 16:09:41 #topic Consistent Versioned Endpoints in the Service Catalog 16:09:49 scottda: OK, all yours. 16:10:08 So, we'll be talking about this Mon or tues at the PTG 16:10:21 Ops have complained about the service catalog being inconsistent... 16:10:28 and cinder is a good example. 16:10:43 I.e. that we have 'volume', 'volumev2', 'volumev3' 16:10:50 #link http://lists.openstack.org/pipermail/openstack-dev/2017-January/110043.html 16:10:50 smcginnis, Thanks for continuing to be our fearless leader! 16:11:18 I'm not sure anything can be done to change things. That might break a lot of code and scripts. 16:11:39 But we'll be talking about it. So either attend, or let me know opinions so I can represent them. 16:11:54 scottda: Would be great if we could just have 'volume' and figure out from there where to go. 16:12:02 smcginnis: +10000 16:12:08 smcginnis: +1 16:12:10 Yes, volume could be the naked endpoint: 16:12:19 http://: 16:12:26 and you get version info from there. 16:12:34 do we have working version discovery in cinderclient? 16:12:37 And I think that ops and API WG folks like that idea. 16:12:37 smcginnis: +1 16:12:50 e0ne: yes 16:13:01 scottda: cool 16:13:04 both CLI and a static method that takes a URL and no auth 16:13:24 #link https://review.openstack.org/#/c/420119 16:13:46 But, I'm sure there will be debate about how to change things, deprecation, etc. 16:13:58 Please attend, or voice strong opinions somehow. 16:14:21 Otherwise, they'll get my opinion. 16:14:27 That's it for this topic. 16:14:53 scottda: OK, thanks! 16:15:06 I think if we can work through migration issues, it would get us in a better place. 16:15:11 scottda: I'll be there and full of opinions 16:15:25 bswartz: Good to hear. Maybe after the meeting we can chat. 16:15:47 I'd love to revisit removing of v1 at some point too. But that's another big discussion for another time. 16:16:03 #topic Storing cinder version info and client caching in nova 16:16:06 smcginnis: Yes, but we'll likley touch on that.. 16:16:12 OK, next topic... 16:16:18 #link https://review.openstack.org/#/c/420201/ 16:16:42 I've a POC for how Nova can get cinder server (and client) info , and use for new APIs for attach/detach 16:17:10 One question is: Do we store that info, or get it dynamically for each call to the cinderclient? 16:17:32 Another question is: Do we really need to instantiate a client for each call from nova -> cinder? 16:17:40 We currently do this 4 times for a volume attach. 16:17:43 scottda, where would the client even store it ? 16:17:57 scottda: Seems inefficient. 16:18:00 hemna: See my patch. It's clunky, but uses a Global 16:18:14 stored in nova/cinder.py, not on the client 16:18:17 between invocations? 16:18:22 nova gets the version info, and puts it into a global 16:18:31 oh. ew ok. 16:18:32 ONce that is set, Nova can use that in manager.py code 16:18:34 scottda, Wow, 4 times seems excessive. 16:18:50 Issue there would be what if the service gets upgraded. So Nova would have to get restarted to recognize Cinder changed. 16:18:59 Other choice is to query the cinder server for each nova call that goes through cinderclient to cinder... 16:19:07 instantiating a object should be trivially cheap though -- what is the expensive part? 16:19:13 smcginnis: Yes, that's the issue. But how big of an issue is it? 16:19:31 bswartz: Expensive would be pinging cinder server each call to get version info 16:19:38 why would we do that? 16:19:59 just send the API request with the highest version you know... 16:20:05 bswartz: https://review.openstack.org/#/c/420201/7/nova/compute/manager.py 16:20:16 If the Cinder microversion was reported in the service catalog, it would be better - we're calling Keystone anyway. 16:20:20 because we've new APIs for nova to use with cinder for volume attach... 16:20:25 this is the whole point I was trying to make about version APIs 16:20:31 But nova code won't know at run time which version of cinder exists. 16:20:42 you never query the server for versions until after something has failed 16:20:58 you send the request, expect success, and deal with failure by negotiating down 16:21:07 bswartz: maybe 16:21:13 not sure Nova wants to do it that way 16:21:22 as an optimization, you cache the server version after negotiating down to avoid repeat failures 16:21:23 Maybe it should be in nova.conf? It's admin who knows what Cinder version is in the deployment. 16:21:31 bswartz: +1 on such solution 16:21:54 but caching the version is dangerous because the server could upgrade without telling you 16:21:55 Kind of more "pythonic" to try and fail, then failback to the old way I guess. 16:22:04 bswartz: if you're going to cache the server version, why not just get it upon first query and use it? 16:22:14 scottda: because it's slow as hell 16:22:16 I'm not personally married to any particular solution... 16:22:34 and the negotiate down case should be rare 16:22:39 bswartz: Not sure it's any more round trips that the try- fail-try again 16:22:46 you always want the common case to be fast and the rare case to be slow 16:23:10 it's same in the fail case but more round trips in the happy case 16:23:24 bswartz: Yeah, it's maybe the best. Part of this conversation is to have another conversation about the same subject with Nova 16:23:33 and in this specific case you're worried about it happening 4 times, which is easily avoidable 16:23:37 FYI, cinder's patch "Make Nova API version configurable" https://review.openstack.org/#/c/302177/ 16:23:38 Since they will ultimately decide what goes in their code. 16:24:42 OK, well, I'm wanting to solicit ideas.. We'll likely discuss at the next nova-cinder api meeting... 16:25:02 I'm worried that microversions are getting a bad name because people are deciding use to them wrong 16:25:04 tomorrow, Thurs, at 1700 UTC 16:25:28 bswartz: Well, I think they have a bad name. And it's a matter of education and information. 16:25:34 +1 16:25:40 indeed 16:25:47 If you just try..except for everything, why do you need microversions? 16:25:54 I can only educate and inform so much 16:26:04 just see if the feature exists, and then fall back to the older way. 16:26:17 I think in this case though, it's not limited to just one call and failing back. It kind of dictates the whole workflow being done. 16:26:36 smcginnis: That's right. There will be a lot of code in Nova for just attach and detach... 16:26:41 So there probably should be some caching or something to know for the whole "transaction". 16:26:56 And we might be adding more in the future for special cases like migration, shelve offload, etc. 16:27:40 OK, well, I want to solicit opinions, and welcome people to join in. 16:27:52 ildikov: Did I get the meeting time right for nova-cinder api talk? 16:28:10 ildikov: Thursday's, same time, right? 16:28:17 what channel is that meeting 16:28:38 scottda: smcginnis: yes, Thursdays, 1700UTC, #openstack-meeting-cp 16:28:45 ildikov: Thanks! 16:28:57 cool. That's it for me. 16:29:01 scottda: Thanks again. 16:29:06 scottda: smcginnis: it's also registered now on eavesdrop, so it's official :) 16:29:12 I'll shut up now 16:29:14 Oooh! 16:29:15 ildikov: Ooh, nice. 16:29:17 scottda: Hah 16:29:25 #topic Open Discussion 16:29:34 THat was it on the agenda. Anything else today? 16:29:35 ildikov, Nice. Now the issues will get resolved then, right? ;-) 16:29:37 scottda: smcginnis: :) 16:29:48 jungleboyj: what issues? ;) 16:29:52 Or we can get 30 minutes to finalize bug fixes before the RC. :) 16:30:01 smcginnis, I wanted to raise the question of marking drivers unsupported. 16:30:04 ildikov, Indeed. ;-) 16:30:16 hemna: Oh, good topic I guess. Especially given the timing. 16:30:26 I'm re-running the latest report right now 16:30:37 And here goes the next 30 minutes. 16:31:00 I've spent some time reworking the lastcomment.py tool to output some more information that gives us a better idea of the jobs for each CI 16:31:24 Last one hemna ran until he gets current results: http://paste.openstack.org/show/597113/ 16:31:41 http://paste.openstack.org/show/597227/ 16:31:44 that one just finished 16:32:10 it details every job in the CI and if the success rate is <=60% it shows the last success run for the job as well 16:32:31 do we care at this point to mark failing drivers as unsupported? 16:32:42 some are low% 16:32:59 hemna: I think with timing I would rather mark them unsupported as soon as Pike opens up. 16:33:01 36% 16:33:16 some are 0% 16:33:27 Though it can be argued we are going out with drivers supported that do not meet CI requirements at this point. :{ 16:33:39 yah, that's why I wanted to raise that now 16:33:48 clearly some CI's are not working 16:34:31 *crickets* 16:34:35 hemna: whatever we decide, I think we should send out an email to the mailing list, describe what exactly is the criteria for marking a driver unsupported 16:34:44 I'm not as concerned about passing percentage (at least at this point) as the ones that haven't even reported for weeks. 16:34:50 xyang2 we've done that every time we've met 16:35:05 hemna: give driver maintainers the tools you used so they can run themselves 16:35:11 xyang2: We have and we've published it on our wiki. I think we said two weeks? 16:35:18 I'm kinda of the mindset that we have been the nice guy for a while now to get people used to keeping the CI up 16:35:28 smcginnis: right , two weeks 16:35:37 smcginnis, I think we should talk about what the acceptable percentage is at the PTG and clearly state that before doing anything right now. 16:35:41 we either care about the CI failing and use the unsupported flag or...not 16:35:47 smcginnis, hemna: I don't remember we published something based on percentage 16:36:08 jungleboyj: Yes, before we base anything on percentage we should discuss and publish it. 16:36:12 we said 50% passing in a 2 week period at the last mid cycle 16:36:19 Obviously if it isn't running that means it is unsupported. 16:36:36 hemna, Did we announce that? 16:36:43 hemna: put that on wiki then 16:36:48 yup, I'm sure smcginnis did 16:36:52 hemna: it was not there when I checked last time 16:37:09 anyway, we have many CI's far less than 50% now 16:37:31 hemna, smcginnis: I don't remember we agreed on a exact percentage before 16:37:48 Seems we need to talk about a way to capture agreements from meetups...that's a topic for the PTG. 16:37:51 https://wiki.openstack.org/wiki/Cinder/tested-3rdParty-drivers#Non-Compliance_Policy 16:38:02 Not percentage. Just time based. 16:38:36 Doh. 16:38:53 ok it's not published, but I do remember smcginnis saying 50% in 2 weeks at the mid cycle 16:38:59 Goodhart's law: "When a measure becomes a target, it ceases to be a good measure." 16:39:03 we have some that are 4%, 14% 16:39:12 We do have the line "Other issues are found but failed to be addressed in a timely manner" 16:39:18 I don't consider those as working 16:39:31 I think 4% passing is an "other issue". 16:39:51 The problem now, is if we mark them unsupported today, that gives zero time for issues to be addressed. 16:40:10 I don't really want to cause issues. I just want a strong incentive to keep CI running. 16:40:29 that's what I thought the unsupported flag was 16:40:34 instead of removing them from tree 16:40:47 hemna: +1 16:40:51 here is a good example 16:40:52 too late for Ocata, already passed O-3, given that we didn't publish the percentage 16:40:55 11% success rate 16:41:00 last success was 42 days ago 16:41:07 Yeah, but it does have a negative impact on end users. 16:41:08 and yet we are worried about upsetting them? 16:41:11 I don't get it 16:41:21 More concerned about the folks stuck using their gear. 16:42:06 so 16:42:16 Let's talk about this at the PTG. I think we need to definitely decide on a passing percentage and timeframes for things. 16:42:17 can we backport a bug fix to remove the unsupported flag? 16:42:19 Better they know they aren't being supported now? 16:42:26 just because cinder team declares a driver unsupported doesn't mean that an end user can't obtain support from their distro for that driver 16:42:36 that's between them and the distro 16:42:38 bswartz +1 16:42:38 bswartz: Right. 16:42:42 bswartz, +1 16:42:58 OK, what do folks think. Should we flag these as unsupported now? 16:43:12 I disagree 16:43:20 Seems like there's not much warning. 16:43:26 I personally thing the time periods are too short 16:43:31 Yea... Let's wait 16:43:34 s/thing/think/ 16:43:34 Well, for future discussion I think we need to plan at what point we are going to do the last check for adding the unsupported flag. 16:43:35 If on the wiki, it clearly says there's a percentage, then that is different 16:43:37 Ocata probably isn't the release to do this with. 16:43:42 Short cycle and all. 16:43:44 jungleboyj: +1 16:43:49 but if we agreed to them we should change the agreement or enforce it 16:43:51 this criteria is not clearly published 16:43:52 That can go with the percentage discussion. 16:43:54 fwiw every time we discuss this everyone waffles on it. 16:44:09 this was the entire point of creating the unsupported flag vs. removal. 16:44:19 hemna, Everyone becomes terrified and thinks back to the last time their CI broke for a month. 16:44:20 now we will ship drivers that haven't had a working CI in 42+ days. 16:44:20 I also think percentages are the wrong measure 16:44:20 hemna, But we have successfully moved forward over time. 16:44:23 hemna: We do have a few marked unsupported that I plan on removing in Pike. I'm just concerned about the timing right now. 16:44:36 but I agree with hemna, we have to enforce what we have, or decide to change the rules 16:44:40 smcginnis, ++ Timing ... 16:45:02 yes we should enforce the rule, but the rule did not specify a percentage 16:45:07 And out of the handful that we marked unsupported, at least half came back and didn't realize things were failing and fixed it. But it took a few weeks for that to happen. 16:45:08 But before hemna 's latest scripts, did we have a good way of getting data on CIs running? 16:45:09 so it is not a written rule 16:45:11 I'd argue that the timing is perfect for it because the CI's are broken 16:45:13 bswartz. hemna: +1 16:45:16 That needs to be consistent and reliable 16:45:18 and that's the point of showing that they are unsupported. 16:45:24 making rules and ignoring them is just bad 16:45:30 Of course if a CI is failing at the point we are releasing a product doesn't that generally mean it doesn't work? 16:45:36 and that they haven't been around working on it, haven't been around telling us that the CI is broken because of X 16:45:39 and not participating. 16:46:03 bswartz: there is rule written in wiki currently but that rule does not say percentage 16:46:10 xyang2: +1 16:46:12 I go by what says on wiki 16:46:21 yah it doesn't say it 16:46:32 but do you think we should ship a driver that hasn't had a working CI result in 42 days? 16:46:39 hemna +1 16:46:44 hemna: +1 16:46:54 or a CI that has a 30% success rate in the same time period? 16:47:08 I think we keep getting better at our CI policy. Now that we have a good tool to get the data we need, I think at the PTG we can iterate again and make our policy more clear and be in a better position to enforce it. 16:47:40 if everyone is ok with shipping drivers marked as supported when they are broken, that's cool. 16:47:40 we ship it, but give it a red mark of shame -- that's the point of "unsupported" right? 16:47:45 The incremental improvement has been the key. 16:47:47 bswartz, yes 16:47:53 bswartz, +1 16:47:56 * smcginnis looks through the latest output again... 16:48:13 hemna: Do you know the list of which ones would be affected if we did it now? 16:48:30 Does it matter who? 16:48:38 I'd have to call them out by looking at the results 16:48:41 I didn't want to do that here. 16:48:41 Just getting an idea of how many are impacted. 16:48:45 Swanson: +1 16:48:49 * jungleboyj pictures smcginnis pulling out his red marker. 16:48:59 http://paste.openstack.org/show/597227/ 16:48:59 are there any driver not covered by this output? 16:48:59 Blockbridge has been 63 days. That should probably get flagged. 16:49:01 the data is there. 16:49:45 xyang2 http://paste.openstack.org/show/597233/ 16:49:52 thats the list of CI's I used to run the report 16:50:03 I think some of them are dead now 16:50:10 How about we set the bar low for ocata and then wiki up a higher bar for pike? 16:50:21 Swanson, +1 16:50:29 Swanson: That's kind of what I was thinking. 16:50:29 Swanson +1 16:50:34 Swanson: +1 16:50:35 hemna: that list might have missed some drivers 16:50:41 hemna: nice job computing results on a per-jenkins-job basis 16:50:50 Swanson: +1 16:50:52 Swanson: +1 16:50:55 can we backport a bug fix to remove the flag after O ships? 16:50:59 But a few of these are pretty bad, so I would be open to flagging a couple of these that are really bad. Two months is a bit extreme. 16:51:15 smcginnis, that's why I wanted to raise this. some are really bad. 16:51:59 smcginnis, What about the question of backporting removal of the flag? 16:52:26 hemna: To be fair, some of these are already flagged, so the list isn't really that big. 16:52:29 jungleboyj: that seems a bad move IMO 16:52:34 cool 16:52:39 Or at least not as big as I originally thought. 16:52:39 Or you could just mark them all unsupported and then, if people contact you, take it on a case by case basis. 2 months isn't extreme for someone to get hold of you. 16:52:40 that's a good thing :) 16:52:46 jungleboyj: I would be fine if that's before the release. After the release that's a bad practice. 16:53:14 erlon, That was my thought too, just curious what people though. 16:53:21 jungleboyj: Yeah, I think I'd want to leave that to folks like RH and Mirantis to decide to "unmark" them after the release. 16:53:22 jungleboyj: Distros would have different drivers supported depending on minor version of stable release they're based on. 16:54:03 dulek, Ew, that sounds messy. 16:55:14 Just based on the discussion here, I would feel better if we don't do anything right now and improve our policy at the PTG. 16:55:14 I would vote for flagging the drivers that are really bellow the bar and set tight bars in the next release 16:55:28 but once flagged, is flagled 16:55:38 so 30%? 16:55:40 or less? 16:55:44 erlon: I'll filter through this list and see. 16:55:48 smcginnis, 16:55:51 cool 16:55:55 No percentage at this point, just time since last reporting success. 16:56:19 hemna: jenkins -15%? 16:56:38 smcginnis, I think that is safest for now. 16:56:39 erlon, that's an interesting but valid measure I'd say 16:56:42 That's it, I'm kicking Jenkins out. :D 16:56:45 :) 16:56:59 haha 16:57:12 :-) 16:57:12 Hitachi is consolidated under one CI now, right? 16:57:20 the interesting thing is that some of the jobs in CIs are mostly good, but 1, and it causes the overall % to drop 16:57:34 smcginnis: we have 2 accounts 1 for HNAS and another for VSP/HBSD 16:57:54 erlon: OK. Is the third party wiki page up to date? 16:58:26 smcginnis: I believe yes, have to double check 16:58:43 erlon: OK, thanks. 16:58:52 2 minutes. Anything else? 16:58:55 smcginnis WEAK ON CI! SAD! SOGGY! 16:58:59 smcginnis remember that there's a CI that was marked unsupported inadvertently--can we fix that today? 16:59:06 Swanson: :) 16:59:17 pots: What? 16:59:51 the hpmsa driver inherited the dothill driver's unsupported flag, but the hpmsa CI is running fine 17:00:09 pots: I thought you or someone was going to submit a patch to unflag that one? 17:00:10 Swanson, And he is our PTL again. What have we done? 17:00:20 jungleboyj: Hah! 17:00:31 Times up, let's continue in #openstack-cinder. 17:00:37 #endmeeting