16:00:01 #startmeeting Cinder 16:00:01 Meeting started Wed Jun 17 16:00:01 2015 UTC and is due to finish in 60 minutes. The chair is thingee. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:02 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:05 The meeting name has been set to 'cinder' 16:00:06 hi all! 16:00:09 hi 16:00:11 hi 16:00:12 hi! 16:00:12 hi 16:00:17 Hi 16:00:17 hi 16:00:18 hi o/ 16:00:23 hello 16:00:24 o/ 16:00:25 o/ 16:00:26 hello 16:00:33 hi 16:00:42 #topic announcements 16:00:51 hi 16:00:53 So Liberty-1 is coming to a close 16:00:54 o/ 16:00:58 important dates 16:01:16 #link https://launchpad.net/cinder/+milestone/liberty-1 16:01:35 #info new drivers need to be merged by June 19th 16:01:51 #link https://etherpad.openstack.org/p/cinder-liberty-drivers 16:01:57 those are the drivers that have a chance 16:02:09 note, that this list unfortunately was updated again =/ 16:02:13 I want to ask l-2 can merge update driver 16:02:18 hi! 16:02:26 thingee, I signed up for 2 of those 16:02:27 zongliang: driver updates have nothing to do with new drivers. 16:02:47 o/ 16:03:15 o/ 16:03:23 for your convenience here's a list of reviews needed that are not driver related: 16:03:25 #link https://etherpad.openstack.org/p/cinder-liberty-1-reviews 16:03:51 drivers have priority this milestone though so once that list is done, please take a look at the blueprints and bugs for L-1 16:04:37 #info Cinder L-1 may close 6-25 or sooner 16:05:02 I've already taking the liberty (heh) to moving stuff to L-2 https://launchpad.net/cinder/+milestone/liberty-2 that doesn't stand a chance 16:05:15 any questions? 16:05:52 alright lets get started! 16:05:59 agenda for today: 16:06:01 #link https://wiki.openstack.org/wiki/CinderMeetings#Next_meeting 16:06:20 thingee: We've established the deadline for TaskFLow refactoring for L-1. Now BP is L-2, are we moving the deadline? 16:06:32 #topic Multiattach volumes 16:06:34 hemna: hi 16:06:36 hey 16:06:38 #link https://etherpad.openstack.org/p/liberty-volume-multiattach 16:06:42 dulek_home: lets talk later 16:06:47 or setup an agenda item 16:06:48 :) 16:06:50 so the question with this one is how to report the capability 16:07:10 I modified the 3PAR driver to report multiattach = True in the get_volume_stats 16:07:16 and had a patch against LVM as well 16:07:35 but there were a few objections saying that we should just silently default the capability to True 16:07:47 hemna: can we do that for now, but have it in our best interest to that changing for the standard capability stuff? 16:07:48 so I just wanted to iron this out here. 16:07:59 to changing that* 16:08:01 hemna: does multiattach mean that eg. multiple nova nodes access the same volume via iSCSI? 16:08:09 I personally like the explicit declaration of capabilities 16:08:13 I am in favor of keeping it explicit, like in that patch 16:08:15 so they aren't hidden or inferred. 16:08:24 hemna: ++ 16:08:29 hemna: how many drivers be broken if we'll set multiattach=true? 16:08:33 because that would mean that they need some kind of DLM etc. 16:08:39 e0ne: Was wondering that as well. 16:08:51 flip214, it means that the driver and it's backend support attaching a single volume to more than one instance/host 16:08:56 My opinion - either default it to false so those drivers that don't support it aren't impacted, or default to true and update each driver to report false. 16:09:24 hemna: okay. what about volumes stored on multiple hosts, like Ceph or DRBD? 16:09:26 I don't know of all of the backends and what they support or don't 16:09:36 that means that all the Nova nodes need access to all the storage nodes. 16:09:42 which is why I think it's safer to explicitly report the capability 16:09:42 smcginnis: That sounds like a safer approach. 16:09:50 seems like the safest is to default to false, right? 16:09:51 flip214, that's not multiattach 16:09:51 and that means IMO that there has to be a way to *check* that 16:10:05 jungleboyj, smcginnis the former or latter ? 16:10:16 flip214, multiattach as supported in Cinder means a single cinder volume can be attached to more than 1 host/instance 16:10:23 smcginnis: I suggest default to False to be safe 16:10:31 xyang__: +1 16:10:31 it's up to the user to make sure it has the right filesystem on that volume to actually work. 16:10:35 i'd also rather default to false 16:10:36 xyang__ : +1 16:10:39 xyang__: ++ 16:10:40 thingee: To default to false and let those who support it report true. 16:10:44 hemna: understood. 16:10:45 xyang__: +1 16:10:46 thingee: xyang__ +1 16:10:47 xyang__: +1 16:10:50 hemna: seems we have consensus 16:10:53 ok cool. 16:10:54 xyang__: +1 16:11:03 xyang__: +1 16:11:05 sounds good to me. thanks guys. 16:11:09 +1 to make default as False 16:11:09 o/ 16:11:10 hemna: take note of my earlier comment though 16:11:19 hemna: I can take care of that though 16:11:29 thingee, coolio 16:11:44 Good talk, good talk. :-) 16:12:05 with capabilities in cinder from the summit, we already agreed the manager by default would report false on capabilities, unless the driver explicitly says true 16:12:17 ok I'll also take the LVM patch out of WIP. 16:12:25 https://review.openstack.org/#/c/190725/ 16:12:30 that's the capability reporting for LVM 16:13:32 I'll push up a new patch on the lvm driver that removes the 'WIP' in the commit message and clean up the message itself. 16:13:35 thanks guys. 16:13:43 thanks 16:13:49 hemna: ++ 16:13:55 #topic Add support to return request_id of last request (python-cinderclient) 16:14:00 hi 16:14:03 abhishekk: hi 16:14:04 #link https://blueprints.launchpad.net/python-cinderclient/+spec/return-req-id 16:14:14 blueprint ^ 16:14:14 So, todo for the rest of us, if we support multi-attach to add it to our capabilities reporting. 16:14:19 cross project spec: 16:14:21 #link https://review.openstack.org/#/c/156508/8 16:14:31 gerrit patch for cinderclient: 16:14:34 #link https://review.openstack.org/#/c/173199/ 16:14:37 Doug hellmann suggested to use thread local storage 16:14:52 for storing the last request id in client 16:15:02 which will not break the comptibility 16:15:41 I would like to here the opinions of community members on this approach 16:16:27 the general idea sounds reasonable to me 16:16:32 IMO this is a plus point for the end-user to know the request id of his request which will help him to understand the cause of failure 16:16:48 sounds good to me 16:17:19 what can a client do with the request ID once he has it? 16:17:22 I also like to complete it within liberty time frame 16:17:37 it will be used for logging 16:17:48 abhishekk: Yeah, this is an idea I have been in support of. 16:17:51 bswartz: ^ 16:17:58 cinder has no client-visible logs yet though, right? 16:18:09 jungleboyj: thanks 16:18:16 bswartz, not that I know of 16:18:26 so this only benefits clients who are also administrators, or clients working with administrators to debug an issue 16:18:36 abhishekk, thingee: can't we get the request_id from the context 16:18:38 https://github.com/openstack/cinder/blob/master/cinder/context.py#L65 16:18:56 But if you report an issue to IT and you have the request ID, everyting is easier to track 16:18:59 bswartz: it's really useful when someone calls support 16:19:05 or its not applicable in this case ? 16:19:05 so what do the other clients do wrt to the optional return_req_id 16:19:07 vilobhmm: this is the request-id of the caller service 16:19:10 yeah I'm just trying to understand the use case 16:19:14 we should be consistent with others that do this same thing. 16:19:15 how about having cinderclient log to syslog by default? command used, and resulting req-id 16:19:15 it makes sense 16:19:27 then everybody could look it up 16:19:28 abhishekk: whatever happend to the 'transaction_id' concept? 16:19:32 abhishekk : ok 16:19:37 and it doesn't have to be stored anywhere explicitly 16:20:08 or, perhaps, just append to ~/.cinderclient/requests.log by default... 16:20:28 flip214: the basic idea is if nova calls cinder, then it should have both request id's 16:20:30 and if that isn't possible, put a warning to STDERR: "can't append logfile ..., last req-id is ..." 16:21:01 abhishekk: okay. I guess I've misunderstood the issue at hand (again), so I'll just shut up. 16:21:04 flip214: these request id's will be logged on same line so that it will be useful for tracing 16:21:14 ameade: would this help/relate in the async error reporting you've working on? 16:21:35 erlon: it would definitely be something i would want in the payload 16:21:57 guys, fwiw, there still is no agreement on the cross project spec itself 16:21:59 https://review.openstack.org/#/c/156508/ 16:22:06 hemna: thanks for the link! 16:22:18 shouldn't we at least wait or comment on that first ? 16:22:31 +1 16:22:36 hemna: Thanks, was wondering about that. I am surprised the contention around this. It seems logical. 16:22:47 hemna: yes, I am talking with them as well, but they suggested to talk with individual teams 16:22:52 also, there is still the issue of the optional command line arg for cinderclient 16:23:03 we should be consistent with what other clients do wrt this 16:23:11 +1 16:23:27 other than that, I think this is a good idea 16:23:36 hemna: I am proposing this in all clients (neutron, glance, cinder, heat) 16:23:52 abhishekk, so the optional arg is the same for all ? 16:24:22 hemna: still working on that 16:24:56 I have code ready for cinderclient and needs early feedbcak from the community members 16:25:06 ok, that needs ironing out then IMHO 16:25:16 #link https://review.openstack.org/#/c/173199/ 16:25:58 please give your opinions for the same 16:26:32 abhishekk, ok I don't see any changes to shell.py to add the optional return id 16:26:38 abhishekk: I will definitely review that spec, thanks for bringing this up 16:26:56 ameade: thank you 16:27:08 probably should mark that as WIP, until it's complete and have the return arg ironed out. 16:27:43 agree 16:27:44 hemna: I will set WIP 16:27:58 ok coolio. 16:28:39 abhishekk: we'll look out for consensus on the cross project 16:28:51 thingee: ok 16:28:56 abhishekk, thanks for the work! I think it's great fwiw 16:29:08 hemna: thank you 16:29:24 #topic open discussion 16:29:29 I will update about other meetings output (glance and nova) 16:29:56 thank you all 16:30:50 sorrison, any chance I can get some other eyes on the os-brick patches ? 16:30:52 so 16:30:54 I want to learn when need 3rd ci stable 16:30:54 https://review.openstack.org/#/q/status:open+project:openstack/os-brick,n,z 16:31:05 hemna: I'll take a look. 16:31:07 hemna: Sure. 16:31:13 the HGST connector looks like it's good to go 16:31:26 zongliang: for which driver? 16:31:28 and it looks like there have been a few new ones added while I was out 16:31:42 Huawei driver 16:31:57 zongliang: I haven't really figured that out yet. 16:32:05 guess I could ask folks here. 16:32:06 thingee: About TaskFlow refactoring - we've agreed on 6 week deadline. That's the beggining of L-2, so I think targeting BPs for L-2 is fine. 16:32:22 s/BPs/BP 16:32:24 hemna, thingee: we need to add os-brick to our review dashboard 16:32:41 e0ne, +1 16:32:45 dulek_home: ok, also your first patch should be landing https://review.openstack.org/#/q/project:openstack/cinder+topic:+bp/taskflow-refactoring,n,z 16:32:48 We meet some network problem for the goverment forbidden 16:32:55 #action thingee to add os-brick to dashboard 16:33:06 #action thingee to propose cinder dashboard 16:33:21 alright lets discuss CI stability 16:33:26 #topic Cinder CI stability 16:33:42 thingee: Thanks, that's actually a second patch, I've did some work in scheduler flow too. Now the hardest part - manager. ;) 16:33:50 #link http://lists.openstack.org/pipermail/third-party-announce/2015-June/thread.html 16:34:09 I've been all over the place on here... there's a lot of CI's that need help 16:34:25 Yes 16:34:39 Some CI's have not been running for 72 days... and the maintainers have been inactive. 16:34:41 thingee: 2c from my side: there are some CIs that reports after patch is merged 16:34:54 doh 16:35:07 e0ne: yeah I think that's a separate issue on being able to scale with patches coming in. SOmething we haven't touched on yet 16:35:10 We meet some problem for the network problem 16:35:26 and something I don't think we can touch on yet until we can be regularly posting comments from CI 16:35:34 e0ne, scaling is a problem for some folks. the amount of Cinder patches that kick off CI is a non trivial thing. 16:35:58 hemna: agree, fair enough 16:36:00 hemna: scaling is a problem for us too 16:36:17 so I think we talked about this in another meeting of how long we give CI's until we kick the driver 16:36:29 I kinda think that 3rd party CI shouldn't kick off until after the standard check's have +1'd (jenkins) 16:36:34 that might help some. 16:36:52 hemna: That's what I do. 16:36:58 same here, the queue is been always bigger than what our slaves can handle :/ 16:37:02 this is not going to be where we set X amount of success versus X amount of failures... I think this should be on a case by case basis 16:37:02 hemna: +1, we do that as well 16:37:10 hemna: I think we already do that is a few CI's 16:37:14 May be help us 16:37:55 thingee: Good point. I think first metric is do they run on every patch. Second metric is are they actually passing when they should. 16:38:03 thingee, so what if a CI system isn't working w/in a release milestone then it's up for getting booted ? 16:38:13 In the past I think we said two weeks. I mean two weeks where there is not really progress on the issue. Progress is defined as something other than "we're working on it"... 16:38:20 I think we should have some sort of 'standard' 16:38:35 for reporting and success, then we can point to that when someone doesn't meet it. 16:38:38 even if the bar is low 16:38:39 hemna: Within a milestonesounds reasonable. 16:38:49 I'm so sick of hear "we're working on it" ... it doesn't help me understand what progress is being made. 16:38:50 Two week warning, four week and you're in trouble? 16:39:15 if your stuff is broken for a milestone, then you get a warning that it needs to be fixed. if it's not fixed in the next milestone..., revert driver patch ? 16:39:16 dunno 16:39:22 just throwing that out in the wind. 16:39:41 Are the warnings for this automated? 16:39:46 hemna: I don't think it's that simple. I have found some where they're actively trying to resolve an issue say in Nova.... so they end up skipping some tests until 16:39:47 well, from *my* contributor side I'd be happy to explain in detail what I've been doing, even if it would still fail... 16:39:59 hemna: and that can take time 16:40:03 of course, that doesn't help in the big picture, but it would be more information than "working on it" 16:40:03 thingee, yah I think that's ok 16:40:38 that's why I'm saying we can't set a number of passes versus failures... (also note some failures are correct) ... I think they have to be reviewed on a case by case 16:40:39 thingee, I guess I'm thinking more of the really low bar case. where someone it's broken and there is no contact or no work being done. 16:41:01 at least there is a 'standard' that we can all point to as a community and say, see, you didn't even meet this low bar. it's documented. 16:41:09 hemna: Yeah, that was what I was thinking. 16:41:28 kinda a C.Y.A. 16:41:40 Obviously there are the cases where we are seeing progress and the issue is being worked . Those are handled differently. 16:41:47 jungleboyj, +1 16:41:51 hemna: +1. If it's not working and no one has stepped up to say they are working on it then that is pretty straightforward. 16:42:01 smcginnis: ++ 16:42:13 If someone is working on it, then I do think that's more of a subjective evaluation. 16:42:20 smcginnis, but at least document the rules, and we can point to it when they get the wake up call that their driver was removed. 16:42:23 +1 16:42:28 hemna: +2 16:42:30 smcginnis: We let thingee work those out. :-) 16:42:34 hemna: +1 16:42:37 hemna: ++ 16:42:45 hemna: it must be documented 16:42:45 The Enforcer. :) 16:42:48 so where do we track which ones are being worked on, unmaintained, need work, etc? 16:43:05 hemna: what do you propose on standards? 16:43:16 patrickeast: is jgriffith working on some chart? 16:43:19 I mentioned it earlier 16:43:23 smcginnis: https://s.yimg.com/fz/api/res/1.2/lG9j5MlWP6mbIullhk30Wg--/YXBwaWQ9c3JjaGRkO2g9MTA4MDtxPTk1O3c9MTkyMA--/http://cf2.imgobject.com/t/p/original/tAeUxFqTZcwVGuvUJMqNUcyojGF.jpg 16:43:41 hemna: I remember you mentioned setting a bar, but I don't know what that is 16:43:45 basically that if your CI is borked, and you aren't fixing it/working on it, you get 1 more milestone to fix it, or revert driver patch is submitted. 16:43:47 it's a low bar 16:43:55 there's already a third-party ci status page 16:44:07 we should require that stays up-to-date 16:44:13 there are multiple of them iirc 16:44:15 jungleboyj: Wish my gimp skills were up to putting blue hair on that. :) 16:44:15 akerr: the wiki? 16:44:15 akerr, +1 16:44:24 hemna: that's sort of going away from our previous proposal 16:44:29 but doesn't help if the ci owners are not updating it 16:44:31 * jungleboyj is laughing Nice smcginnis 16:44:34 hemna: that we agreed on 16:44:36 https://wiki.openstack.org/wiki/ThirdPartySystems 16:44:40 thingee, ok which is ? 16:44:42 hemna: previously we said two weeks. 16:44:46 thingee, refresh my grey matter :) 16:44:56 akerr: that one is up to the driver maintainer to update 16:44:58 thingee, ok I think that's fine then as well. 16:45:11 I think we need some auto generated graph 16:45:11 just as long as it's documented. 16:45:20 and we can point to it when they bitch that their driver is gone. :) 16:45:32 akerr: I've been finding people aren't updating that =/ 16:45:34 +1 16:45:36 that's really all I wanted. 16:45:36 xyang__: yes, and if the maintainers are active and their system is having issues this page should be up-to-date with that. If they can't even do this then it's more weight to remove them 16:45:51 akerr: there is one driver in particular I've already proposed for removal for this reason 16:46:05 i just don't think "yet another" page to keep updated is the right approach 16:46:12 patrickeast: Has a start at a dashboard. I have been using that. 16:46:27 hemna: all my discussions are on the third party list. 16:46:35 hemna: some are still unanswered from last week 16:46:41 hemna: I also cc the maintainer 16:46:43 xyang__: there are some people working in gather all CI jobs and results in one page 16:46:56 akerr: I agree. no new page please. 16:47:03 akerr: we already don't use the existing one 16:47:05 yea so there is this thing I made http://ec2-54-67-102-119.us-west-1.compute.amazonaws.com:5000/?project=openstack%2Fcinder&user=&timeframe=24 16:47:07 erlon: that will be nice 16:47:33 but it won't show missing systems that stopped reporting very easily 16:47:54 thingee, can we mention the upkeep 'rules' on this https://wiki.openstack.org/wiki/Cinder/tested-3rdParty-drivers ? 16:47:57 patrickeast: that looks like a dell firewall error page. 16:48:06 patrickeast: nice, I that would be a good place to gather info to be able to set a bar 16:48:07 haha 16:48:13 hemna: sure 16:48:14 I don't see a discussion on what happens if your CI starts to fail after your driver is in. 16:48:31 thingee, that's all I was trying to propose. is that the rules, whatever they are, are documented. 16:48:44 patrickeast: and that page isn't 100% yet. I've noticed several instances where at least our driver is shown as having no results when it did vote, or having failures when it really was successful. Not a dig on the page, just saying it still has work 16:49:04 #action thingee to add CI upkeep on Cinder wiki 16:49:04 yep 16:49:12 thingee, thank you 16:49:13 :) 16:49:41 thingee: Thanks. 16:49:42 patrickeast: that reminds me, Microsoft needs to be looked at 16:49:48 thingee: document will be great. I can just send the link to people instead writing explanations myself 16:50:14 thingee, Also, I'll work with asselin to add a section on how to CI for os-brick connectors and why it's important. I think that's also needed on that wiki page 16:50:17 xyang__: ++ 16:50:28 hemna: +1 16:50:43 hemna: good point 16:51:02 thingee: What is "third party list"? 16:51:18 dguryanov: http://lists.openstack.org/cgi-bin/mailman/listinfo/third-party-announce 16:51:25 dguryanov: http://lists.openstack.org/cgi-bin/mailman/listinfo/third-party-announce 16:51:25 thingee: the Microsoft CI is down due to a patch that merged last night 16:51:38 thanks 16:51:49 the aptch is: https://review.openstack.org/#/c/182985/ 16:51:51 dguryanov, documented here: http://docs.openstack.org/infra/system-config/third_party.html#creating-a-service-account 16:51:59 Note you should also subscribe to the third-party-announce list to keep on top of announcements there which can include account disablement notices. 16:53:05 Intoruce Guru Meditation Reports. What a title. 16:53:16 jungleboyj, :) 16:53:19 thingee: that patch did not include a fix for Windows that alexpilotti submitted one year ago in oslo 16:54:13 ? 16:54:21 Man, who's slacking on our oslo support? 16:54:26 thingee: https://review.openstack.org/#/c/77596/ fixes the issue of using hardcoded SIGUSR1 16:54:28 asselin: thanks! 16:54:33 smcginnis: yeaaa, I need to talk to jungleboyj 16:54:51 ;) 16:54:54 smcginnis, does he have wifi working yet? 16:55:02 smcginnis: Thanks for throwing me under that bus. 16:55:08 hemna: Ah, that must be why we were behind. 16:55:15 * smcginnis hides 16:55:35 thingee: Yeah, I am hoping to lock myself in my office and work on that. 16:55:47 thingee: Do you have specific issues at the moment? 16:56:32 jungleboyj: https://twitter.com/bradtopol/status/601584343404785664 16:56:50 alright anything else? 16:56:52 thingee: Really.... 16:57:07 thingee: Still haven't gotten him to tell me what you two talked about. 16:57:16 thingee: jungleboyj alexpilotti just submitted https://review.openstack.org/#/c/192616/ that should add the same fix 16:57:26 ociuhandu: thanks 16:57:33 thanks everyone 16:57:35 #endmeeting