17:02:06 #startmeeting Ironic 17:02:06 #chair devananda 17:02:07 Welcome everyone to the Ironic meeting. 17:02:07 Meeting started Mon Jan 26 17:02:06 2015 UTC and is due to finish in 60 minutes. The chair is NobodyCam. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:02:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:02:11 The meeting name has been set to 'ironic' 17:02:12 Current chairs: NobodyCam devananda 17:02:19 Of course the agenda can be found at: 17:02:20 #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_next_meeting 17:02:55 status of course can be found on the white board 17:03:11 #link https://etherpad.openstack.org/p/IronicWhiteBoard 17:03:40 morning, all 17:03:47 morning devananda :) 17:03:53 I'm sort of here (actually in a car right now, but not driving) 17:03:54 you are already a chair 17:03:55 good morning 17:04:23 was just about to jump into announcements 17:04:30 carry on then :) 17:04:39 #topic announcements: 17:05:08 Lots of mid cycles and EE comming up 17:05:20 only annoucement from me is about the midcycle sprints -- just that we're having them and such 17:05:24 what is EE? 17:05:32 * naohirot EE? 17:05:33 FF 17:05:47 FF? 17:05:51 feature freeze 17:05:52 :D 17:05:53 feature freeze 17:05:54 yeah 17:06:07 yep 17:06:10 :-p 17:06:10 lol 17:06:20 feature freeze or feature spec freeze? 17:06:21 whenis ff? 17:06:27 March 5th is the date when most projects, Ironic included, will stop accepting new features 17:06:28 march 5th 17:06:37 it's all up on https://wiki.openstack.org/wiki/Kilo_Release_Schedule 17:06:46 not really an announcement ... 17:06:55 however that does not give the coders a whole lot of time to land the code 17:07:15 but well cover that in a bit 17:07:21 NobodyCam: Is march 5th deadline of submitting a new bp? 17:07:30 naohirot: yes 17:07:55 lets move on to the status check 17:08:09 #topic Status updates 17:08:10 #topic subteam status reports 17:08:14 lol 17:08:32 NobodyCam: I think it's very difficult to go through review from Match 5th, right? 17:08:44 naohirot: yes 17:08:56 any questions on the status 17:08:56 NobodyCam: Okay 17:09:29 (As of Mon, 26 Jan 11:45 UTC) Open: 122 (0). 4 new (0), 32 in progress (+1), 1 critical (+1), 16 high (+2) and 6 incomplete 17:09:58 our bug list is growing 17:10:01 looks like there is progress on both ilo and irmc drivers, but more reviews are still needed 17:10:28 I'm not sure all 32 bugs are _actually_ in progress. probably I should start poking people again... 17:10:30 wrt oslo. Doesn't seem like we've gotten anything there. It'd be useful to know 1. what oslo changes could be done, even if they aren't done yet. 17:10:45 I'd like to also remind folks that, when reviewing code for third-party drivers, we need to trust the authors / maintainers of those drivers 17:10:53 yes. pls continue to review ilo specs. 17:10:55 did we ever got to have our bug scrub? 17:11:05 I havent seen any updates from GheRivero on the oslo stuff in a while 17:11:12 and afaik he is not coming to our sprint next week 17:11:29 do we know if he'll make the SF meetup? 17:11:31 NobodyCam: yah. several folks worked on bugs for a day (I had to miss it, unfortunatey) 17:11:35 NobodyCam: he will not 17:11:39 :( 17:11:49 NobodyCam, we did :) 17:11:58 great I also missed it :( 17:12:06 it's worth repeating from time to time anyway :) 17:12:06 devananda: "we need to trust the authors / maintainers of those drivers" what do you mean by this? 17:12:42 devananda: do you want to save that for the review section 17:12:58 i spoke with dhellmann last week briefly about the oslo.objects refactoring. it was/is bocked on something else in oslo, and at this point, doesn't seem like a priority for this cycle for anyone 17:13:18 :/ 17:13:21 jroll: let's come back to that in the next section ... 17:13:37 ok 17:13:53 anything else on staus updates? 17:14:14 not from me 17:14:28 if not I'm going to shuffel the oder listed on the agenda, for time. 17:14:35 NobodyCam: do it :) 17:14:41 #topic IPMI retry timeout value 17:14:53 trown: that you 17:14:56 yep 17:14:58 :) 17:15:11 so, I picked up this bug as it was marked "low hanging fruit" 17:15:36 the change is pretty trivial, however, I need some help with determining what the "best" value is 17:15:36 #link https://review.openstack.org/#/c/131296/ 17:15:55 * jroll is checking... 17:16:01 is there operator opinion/consensus on this? 17:16:12 yeah it would be good to get an operator opnion on that 17:16:24 I'm not sure that a single default can cover all hardware 17:16:37 as different hardware may have problems at different retry timings 17:16:50 the current default is overly conservative IMO, though 17:17:06 could we half it to 30? 17:17:12 10? 17:17:24 right, also maybe as part of that bug we could put a NOTE at the deployer's doc 17:17:29 we're using 60 at rackspace, it's been fine, though we're also vigilant at tracking down bad BMCs and putting them in maintenance mode 17:17:32 I'd like some input from the tripleo rack as well, as they were the folks who originally filed / fixed that issue 17:17:38 to call attention to that option 17:17:43 how would one go about getting operator opinion? wouldn't that be a good first step? 17:17:44 "Too low -> we can inadvertantly knock over some (fragile) BMCs and require a hard-reset of the BMC (not the node) to unbrick it. " <--- how low does this happen at? 17:17:45 we also haven't played with different values 17:18:55 trown: that totally depends on the hardware, unfortunately 17:19:08 trown: I've definitely seen reports of that happening around 5 seconds 17:19:28 ah ok 17:19:36 with older HP hardware 17:19:48 so 60 is really conservative then 17:19:53 and I believe victor_lowther had pointed out his experiences on some hardware with that, a while back as well 17:20:04 https://review.openstack.org/#/c/82668/ changed it from 10 to 60. 17:20:06 fwiw, I'd rather have my power sync loop be slow than brick a bmc 17:20:19 jroll: exactly 17:20:24 yes 17:20:27 right 17:20:35 watch for slow bmcs and maintenance them as needed 17:21:02 so how about 30 and if that causes not issues than half it again to 15 17:21:20 so should we mark the bug as invalid? Or would be updating the deployer docs to call more attention to that option a valid "fix" 17:21:31 this is a default value, so every time we change it, we could be breaking backwards compat. 17:21:46 https://review.openstack.org/#/c/96558/ added the minimum time interval 17:21:57 I like the "update the deployer docs" approach 17:22:09 conservative defaults are not a bad thing 17:22:12 in resposne to https://bugs.launchpad.net/ironic/+bug/1320513 filed by tripleo team 17:22:35 I also like the documentation fix, maybe add more notes on it to the actual config optino as well 17:22:47 jroll: ++ 17:22:49 yeah, since there's no best default 17:22:54 right 17:22:58 so 10 and doc changes? 17:23:01 that would work for me 17:23:01 no 17:23:07 I think leave it at 60 17:23:10 I would say leave as-is 17:23:13 ah, ok 17:23:15 ya ++ 17:23:15 to be conservative, but update the docs 17:23:20 sounds good 17:23:25 yea! 17:23:28 fwiw, when we deployed initial production 17:23:30 ++ 17:23:32 put a NOTE or even a WARNING there 17:23:32 who's going to do the doc patch? 17:23:40 we didn't notice slow BMCs slowing things down until around 5-600 nodes 17:23:47 devananda: I can do the doc patch 17:23:48 trown, ? since he's the one assigned to the bug? 17:23:54 if he wants 17:23:55 ah ok 17:24:11 #agreed keep the current default, but improve documentation 17:24:27 :) 17:24:28 #action trown to add or improve documentation around ipmi retry timing options 17:24:57 in code (conf file) and wiki right? 17:25:31 any thing else on timeout 17:25:34 NobodyCam: that seems good 17:25:43 ok then 17:25:49 #topic Prioritize Spec and code reviews 17:25:53 oh thats me 17:26:28 Ok we are approching FF and still have a bunch of specs and code out that needs reviewing 17:26:49 . 17:27:02 I've seen a trend of our reviews esp. on spec to be getting more and more nit picky 17:27:16 specs need to be approved by FF? 17:27:23 * devananda has seen this too 17:27:36 and we already have many specs approved and blocked on other work 17:27:37 I think we all have trust in each other to be able to do the work we say we will 17:28:00 and code reviews are taking an increasing amount of time, even for essential things (ie, patches that block feature work) 17:28:14 Overall review bandwidth seems to be down over the last month 17:28:16 I am suggesting that we lossen the review criteria 17:28:18 if you look at statistics 17:28:18 NobodyCam: that's where my comment earlier about trusting driver authors comes in 17:28:25 JayF, yeah noticed that too 17:28:26 do you notice this more with cores or non-cores or both? 17:28:27 JayF: indeed it is 17:28:47 jroll: both. but cores should be setting the better example 17:28:50 loosen review criteria for both specs and code? 17:28:51 both, even I am guilty of this 17:29:05 I'm sure I am too 17:29:13 a few suggestions for us all (myself, too) 17:29:13 rloo: mainly for specs 17:29:29 but code needs to land faster 17:29:39 - for code nits, just add a follow on patch yuorself,a nd +2/+A it if you would other wise +A the original patch 17:30:14 like, if I spot a spelling ereror in a comment string, I will start +2/+A'ing my own patch, just so the original patch author doesn't have to do anothe rrev 17:30:20 (and we dont have to wait for it) 17:30:43 I like that Idea 17:30:51 ++ 17:30:54 - for driver changes, if its from the driver maintainer, and you don't know that hardware, please give them teh benefit of the doubt 17:31:26 :) 17:31:53 as a little background I started the thread because I was reviewing a spec for a vendor passthru method 17:32:53 NobodyCam, as you know I don't agree on _this_ particular one 17:32:54 If I recall the Vendor passthru is a area where vendors can add functions that they can perform with their hardware that the generic drivers can not do 17:33:13 dtantsur: thats why I wanted to chat about 17:33:14 it 17:33:39 all code has to go thru code review 17:33:56 vendor passthru is an API, however we call it. that's why I think we should not be too easy about it... 17:33:59 so I would expect any glaring coding errors would be cought there 17:34:04 * lucasagomes I think I reviewed that spec... looks again 17:34:38 I though the biggest point of the spec process was to agree on important things like API 17:34:48 when we came up with teh VP aPI it was to allow vendors to be able to dothere own thing 17:34:48 * devananda goes afk to walk into the nova meeting 17:34:53 because once published, they're hard to change 17:35:15 I mean, people weren't -1'ing the spec because there might be coding errors... 17:35:24 #link https://review.openstack.org/#/c/149383/ 17:35:26 lucasagomes: ^ 17:35:49 yeah... looking again, didn't recheck the replied 17:35:51 replies* 17:35:54 I recall us talking about reviewing the VP methods and if we found several drivers doing the same thing then we would look at moving that code to a standard api for all the drivers 17:36:06 if we make exception about vendor passthru's, namely 1. allow to break backward compatibilty without warning, 2. allow it to screw everything including breaking hardware, we have to communicate it very clearly. 17:36:39 dtantsur: nothing about that spec does either of those 17:36:43 most concerning point for me about this spec is that it allows deploy to happen in the middle of bios update 17:36:51 oh? 17:36:54 jroll, how does it prevent ^^^ 17:36:58 dtantsur: my thought behind this was that it is in the vendors best intress to ensure their code works 17:36:59 I mean 17:37:02 it could lock the node 17:37:04 pretty simple 17:37:15 I would assume it would lock the node while performing the firmware update 17:37:34 dtantsur: would that have been cought in a review of the code? 17:38:02 maybe it's only me, but it's hard for me to watch both high-level and low-level things while looking at the code 17:38:09 NobodyCam: not necessarily (caught in code review) 17:38:21 NobodyCam, but again, why we have spec process at all? everything can be caught during code review :D 17:38:37 I think spec is a chance to think about such corner cases 17:38:46 right, replies on that spec looks good to me. But idk... I don't think the reviews were nitpicky in that case 17:38:52 they were more suggestions and ideas 17:39:09 sure, but the comments on that spec is "let's just do real zapping instead". when zapping has been bumped to L. 17:39:23 * dtantsur didn't know zapping was bumped 17:39:25 my goal is to imporve our volisity on reviews both code and spec 17:39:26 so "just make zapping steps" doesn't solve the problem 17:39:42 well, hardware capability stuff was bumped, which zapping kind of depends on AIUI 17:39:52 + I understand VP doens't need to be super "consistent" but, we depend on VP methods as today to deploy any machine with PXE for example 17:40:26 jroll, I guess it should be an explicit decision, I didn't think we can't make zapping states _at all_ without capabilities 17:40:30 yeah i think zapping will have to be bumped 17:40:36 like in this case: we could call it zapping 17:40:53 my understanding is zapping did get bumped 17:41:02 but I don't insist, I really didn't realize that it's pretty much decided, sorry for that folks :) 17:41:13 when was it mentioned to the group that zapping got bumped? 17:41:31 on the nova side I think 17:41:33 dtantsur: sure, we should talk about it, idk, maybe it isn't officially bumped 17:41:38 * devananda is back 17:41:52 devananda, was zapping _officially_ bumped to L? 17:41:54 dtantsur, it's all good. I mean, we should say it if we think it's important... and you can have ur -1 there even if the spec is merged 17:41:56 rloo: i don't think it's officially bumped yet. 17:42:15 I didn'tknow zapping is bumped. 17:42:22 * lucasagomes neither do I 17:42:23 i could write and land most of the code, but doing zapping you can't schedule against seems...less useful 17:42:28 we need raid config 17:43:23 JoshNang, the spec in question is a use case, raid stuff as well 17:43:36 scheduling does not depends on capability spec 17:43:42 i admit that I don't always know when something is a nit or not. I like it when dtantsur and lucasagomes comment, so I would hate to dissuade them from doing so (or others). maybe gentle reminders or something? eg ping me if you disagree? 17:43:44 dtantsur: I do not recall that zapping was bumped 17:44:10 so look at it this way: do we actually think we can land a zapping spec and implementation, in time that victor_lowther could build the drac raid stuff on top? 17:44:17 some how I thought it was 17:44:26 jroll: nope 17:44:30 right. 17:44:35 and zapping requires tne cleaning implementation 17:44:39 rloo, yeah, well I'm always in the channel and ppl can ping me to discuss whatever 17:44:39 the VP api is for vendors to build things that aren't generic yet 17:44:40 so without going into all the detail about specific state transitions 17:44:44 zapping/raid is not generic 17:44:46 which I'm the only one working on actually implementing 17:44:46 yet 17:44:48 we seem to be digressing... 17:44:57 and which took a lot of time to get anyone to review 17:44:58 rloo++ 17:45:00 and yes i'm frustrated about that 17:45:04 we have totally gone off topic 17:45:16 which was -- how do we improve OVERALL velocity 17:45:17 yep 17:45:19 by not nit picking 17:45:30 and then dtantsur brought up why we have specs at all 17:45:46 which I'd like to decompose a bit 17:46:01 i think the example (spec) given about 'nit picking' wasn't considered nit picking by others 17:46:03 15 minuts btw 17:46:06 dtantsur: do you feel that specs provide you, as a code reviewer, with any meaningful context? 17:46:45 I'm just wondering, in order to accelerate review process, can't we set a deadline for each spec review? 17:46:47 devananda, I feel like it allows me to become acknowledged with high-level idea, before diving into (possibly a lot of) code 17:46:51 rloo: NobodyCam also brought up the vendor_passthru question, which I agree, doesn't sound like a nit. but it IS related to how deeply we scrutinize drivers 17:46:52 devananda, so yes 17:46:55 * Shrews forgot about the meeting... apologizes 17:47:18 so how deeply do we scrutinize drivers? 17:47:33 I've heard this often recently: "as a code reviewer, does this spec give you enough information to tell you if the implementation is done" and I don't think that should be the intention of specs because "tell you if it's done" means different things to different people 17:47:49 I was assuming that API (and other use visible) changes are the most important to agree and be clear on 17:48:23 dtantsur: cool. I'm glad. that's what I beleive specs should do (help to flesh out a high-level idea enough to agree on the direction) 17:48:40 also, yes, changes to API (both external and internal) need more detailed review than other things 17:48:47 when they are part of the core APIs 17:48:56 vendor_passthru is vendor specific 17:49:07 it's supposed to be an area for vendors to do new things 17:50:14 I have been thinking along the lines of "if the change impacts a common area or more then one driver." 17:50:14 let me ask: what's teh impact to the project if a driver decides to break their vendor_passthru's backwards compatibility? 17:50:46 depend on the driver 17:50:51 it's clearly a higher impact if that is IPA or the pxe/iscsi driver ... beacuse these are common drivers we all rely on 17:50:52 for the continue_deploy VP of the PXE driver 17:50:58 it can be problematic 17:50:58 I guess it depends on the passthru api 17:51:02 devananda, people coming to a channel asking us to fix it :) 17:51:07 but iLO? DRAC? iRMC? -- if they break an experimental feature in vendor_passthru 17:51:11 devananda: let me ask you, are we OK if a driver breaks their vendor_pasthru's backwards compatibility? cuz I personally don't care if it is a third-party driver, but... what would our users think? 17:51:33 devananda: didn't nova-bm teach us that nothing is experimental? :) 17:51:48 jroll: indeed. except when it is. 17:51:55 I remember us arguing about backward compatibility of an ILO driver_info option between commits in the middle of cycle 17:52:08 i.e. ilo_username vs ipmi_username vs username 17:52:22 and we decided that we don't wanna break even this compatibility 17:52:24 dtantsur: that's not venor passthru 17:52:38 that is the basic interface for node.driver_info 17:52:43 changing that would break the whole driver 17:52:57 not just vendor_passthru/method=my_experimental_feature 17:53:03 that's what people has much less chances to be broken by. now we're talking about API's 17:53:13 rloo: most of the vendor bmc's can also fall back to generic ipmi if the vendor driver was not working 17:53:13 yeah it's complicated... I think that we can be more soft on vendor passthru. But again, I'm looking at the comments on that spec, and I think that suggestions are valid. That's spec is all about adding VP methods 17:53:14 devananda: so you are saying that vendor_passthru == vendor specific and as such, 'experimental' 17:53:16 sorry, I don't recall us having notion of 'experimental' API 17:53:27 if we can't suggest, if it's totally up to the vendor then we don't need a spec 17:53:29 as NobodyCam commented 17:53:48 perhaps we should have a separate /experimental section? 17:53:56 e.g. in discoverd I have API's that are only enabled by a configuration option, saying that it's experimental 17:54:12 devananda, or just have the code itself adding methods to the VP methods 17:54:16 devananda, ++ for anything clearly telling people: this can eat your laundry :D 17:54:18 devananda: or verdor contrib area 17:54:29 ++ 17:54:31 if it's not touching any core parts, it's not impacting on DB or core API etc 17:54:36 vendor_passthru is vendor contributed ... 17:54:46 lucasagomes: that's exactly what vendor passthru should be doing ... 17:54:50 yeah 17:54:51 true 17:54:59 so maybe not requiring specs for vendor passthru bits? 17:55:03 just send code 17:55:05 ++ 17:55:12 with a good commit message 17:55:24 in the code we can decide whether things simple like a return code for a vendor does make sense or not 17:55:37 and suggest something, but that's imlementation so it's even easier to do when reviewing the code 17:55:39 not the spec 17:55:40 how does that sound ^^^^ 17:55:47 vote? 17:55:55 * devananda feels like this still strayed far afield from the original topic of review velocity 17:56:04 how did we prioritize the spec review? 17:56:25 why can't we set a deadline for each review? 17:56:50 naohirot: our reviewers are also very busy 17:56:51 devananda: it seems we've managed to eliminate one source of slowness, at least 17:56:57 review velocity == both reviewers (core or not) and committer doing their part. (apart from nits if we can clarify what nits are). 17:57:05 NobodyCam, lucasagomes, -0 until we have way to distinguish experimental (might break and be broken) and normal ones... 17:57:28 naohirot: what happens if deadlines are not met? pay cuts? :P 17:57:32 * rloo gets tired of asking why a comment in a previous revision wasn't addressed. 17:57:39 rloo, ++ 17:57:41 rloo: me too 17:57:45 yep 17:57:46 dtantsur: I like the invers of that.. tag drivers as production ready 17:58:00 NobodyCam: who decides what's production ready? 17:58:07 NobodyCam: the TC is getting out of that business ... 17:58:09 In order to proceed next step, a little by little, we need a little bit fine, small granularity schedule, I think. 17:58:13 ahh 17:58:30 two minutes 17:58:39 #topic open discussion 17:58:40 maybe we could get feedback from people submitting patches, if they have suggestions to speed things up. 17:59:20 rloo: or perhaps rather than gathering more feedback, we could just all do code reviews, based on the project's priorities 17:59:26 anyting for OD? 17:59:46 and if the priorities aren't clear, then that's my fault, and I'll put more effort into communicating them 17:59:51 * rloo been trying but /me wonders how to improve patches etc so fewer iterations 17:59:55 continue in channel? 18:00:13 devananda: that TODO.rst helps, but I'd love if we got together and decided priorities for the rest of this cycle 18:00:15 yeah channel is it 18:00:17 thanks, all. see (some of) you next week in Grenoble 18:00:18 yeah 18:00:21 * jroll reposts there 18:00:22 htank you all 18:00:24 jroll, +1! 18:00:24 jroll: If review didn't ended by the deadline, then we have to go to the next step. 18:00:25 #endmeeting