04:59:52 #startmeeting Ironic 04:59:53 #chair devananda 04:59:54 Meeting started Tue Mar 31 04:59:52 2015 UTC and is due to finish in 60 minutes. The chair is NobodyCam. Information about MeetBot at http://wiki.debian.org/MeetBot. 04:59:54 Welcome everyone to the Ironic meeting. 04:59:55 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 04:59:57 The meeting name has been set to 'ironic' 04:59:58 Current chairs: NobodyCam devananda 05:00:13 Of course the agenda can be found at: 05:00:13 #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_next_meeting 05:00:22 #topic Greetings, roll-call and announcements 05:00:24 Roll-call: Who's here for the Ironic Meeting? 05:00:30 o/ 05:00:30 hiya. 05:00:32 o/ 05:00:33 o/ 05:00:34 o/ 05:00:40 hola 05:00:57 hey hey "morning" all :) 05:01:08 heh 05:01:19 it's late here and way past my bed time so I may be slow 05:01:24 sorry in adv. 05:01:42 #topic announcements: 05:02:23 devananda: is in meeting and will most likly not make it to the meeting 05:02:43 Specs for L will be opening after RC1 05:03:15 Feature,Dep, ans string freeze are all in effect 05:03:46 our goal to cur rc one is April 9 05:03:53 s/cur/cut/ 05:04:42 apparently dmitry wanted to announce: DevStack plugin for ironic-discoverd for those interested: https://etherpad.openstack.org/p/DiscoverdDevStack 05:04:48 #link https://etherpad.openstack.org/p/DiscoverdDevStack 05:05:04 Ty jroll was just typingthat 05:05:10 hehehe 05:05:13 np 05:05:15 review day pad has been updated with bugs and docs for rc1 05:05:16 :) 05:05:31 #link https://etherpad.openstack.org/p/IronicReviewDay 05:06:26 ty BadCub 05:06:26 may be this needs discussion, but it's difficult (and dangerous) to fix https://bugs.launchpad.net/ironic/+bug/1422723 for rc1 05:06:27 Launchpad bug 1422723 in Ironic "Ironic sets instance as active even if bootloader installation fails when boot_option is 'local'" [Critical,Confirmed] 05:06:53 jroll: I have try with discoverddevstack, looks like dhcp not working, will check with dmitry, maybe it is my env issue 05:07:05 rameshg87:lets cover that in the rc1 section 05:07:11 NobodyCam: okay 05:07:20 #topic SubTeam: status report 05:07:40 looks like the white board didn't get updated 05:08:35 anything that needs covering for subteams? 05:08:43 NobodyCam: I updated irmc section :) 05:08:49 I've nothing 05:09:21 naohirot: awesome Ty and good news too :) 05:09:35 jroll: :) 05:09:40 NobodyCam: :) 05:09:59 ok then we can jump into one of the hot topics 05:10:10 #topic RC1 status check. 05:10:25 looks like the bug list has growen 05:10:33 grown even 05:11:07 Blueprints: 1 Implemented Bugs:6 Confirmed, 5 Triaged, 7 In Progress, 10 Fix Committed 05:11:13 #link https://launchpad.net/ironic/+milestone/kilo-rc1 05:11:55 time to discuss about critical one - Ironic sets instance as active even if bootloader installation fails when boot_option is 'local' ? 05:12:04 rameshg87: look what the #1 critical bug is :0 heheheh 05:12:20 i filed this bug 05:12:34 pxe_ipmitool supports both dib ramdisk and ipa ramdisk now 05:12:50 rameshg87: can you explain this a bit more? it's not clear to me if the grub installation fails, or if the instance fails to boot due to grub not being present in the image 05:12:52 when localboot is used - ipa ramdisk can report back the status of bootloader installation, whereas dib ramdisk cannot 05:13:11 #link https://bugs.launchpad.net/ironic/+bug/1422723 05:13:12 Launchpad bug 1422723 in Ironic "Ironic sets instance as active even if bootloader installation fails when boot_option is 'local'" [Critical,Confirmed] 05:13:27 since we don't wait for status of boot loader installation from dib ramdisk, we mark the instance as active 05:13:58 rameshg87: yes, after 2nd reboot, the baremetal is outof control from nova and ironic I think 05:14:04 code in this block: https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/pxe.py#L556-L571 05:14:19 oh, ouch, this is nasty 05:14:35 Haomeng: yes, node never boots up if boot loader installation failed, and nova says instance is active 05:14:37 #link https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/pxe.py#L556-L571 05:15:04 Haomeng: this is a deploy failure that we do not catch 05:15:09 rameshg87: yes, this is *control gap* I think, but it is diffcult to get connection after 2nd reboot 05:15:19 jroll: yes 05:15:24 Haomeng: first reboot, this is in deploy image 05:15:31 now an initial idea which lucasagomes and myself discussed was introduction of a second vendor-passthru for communicating status of bootloader installation 05:15:45 Haomeng: https://github.com/openstack/diskimage-builder/blob/master/elements/deploy-ironic/init.d/80-deploy-ironic#L69-138 05:16:00 jroll: thk, got 05:16:12 and ironic waiting till dib ramdisk calls back before turning the instance active 05:16:36 but that requires changes in both dib and ironic - and might be too much for rc1 in my opinion 05:16:40 rameshg87: what are the other options? that seems sane to me (but lots of code) 05:16:43 Haomeng: so need more communications during bootloader installing 05:16:44 unless someone else have better ideas 05:17:07 rameshg87: I think the basic idea is good one 05:17:16 * jroll thinks we need to ditch the bash ramdisk sooner than later 05:17:23 jroll: nothing else come to my mind other than a vendor passthru (there is no other way dib ramdisk to talk to ironic) 05:17:39 * rameshg87 is +1 with jroll 05:17:41 ya i cann't think of a better way right now 05:17:47 rameshg87: right, I tend to think that's fine 05:18:13 ditto 05:18:27 rameshg87: is that somethign you've started on? 05:18:29 jroll: +1 05:18:47 NobodyCam: no haven't started anything on it 05:19:14 can you? 05:19:34 or more correctlly can we assign the bug to you? 05:19:48 NobodyCam: yes i can, may be the same i discussed above i can give it a shot and see how it works 05:20:16 NobodyCam: yeah, i can take it up 05:20:21 but with 9 days left, i might miss it for rc1 :( 05:20:22 rameshg87: I think we agree its a valid approch 05:20:45 rameshg87: ack. THank you! 05:20:49 i mean i can get code ready, but there a lot more things to consider :( 05:20:51 can we just move 'nc -l -p 10000' to latest line, and capture &raise install bootloader exception if failed 05:21:09 like node should move to wait-call-back state to wait for dib ramdisk to come back again, and such things 05:21:14 i will give it a try NobodyCam 05:21:42 Haomeng: that really should be the last line (almost) 05:21:50 awesoem TY rameshg87 :) 05:22:08 morning, folks. apologies for being late, it's something like 5am here. 05:22:17 ouch :D 05:22:17 morning devananda 05:22:22 morning devananda :) 05:22:24 morning devananda 05:22:24 morning devananda 05:22:30 I was hopping you'd still be asleep 05:22:41 NobodyCam: I mean after "if [ "$IRONIC_BOOT_OPTION" = "local" ]; then" block, maybe I missed something here 05:22:45 hey devananda vgridnev 05:22:45 so that approach may work, but requires coordinating a change with diskimage-builder 05:22:50 devananda: morning:) 05:22:50 devananda: have you read the scroll back? 05:23:12 devananda: yes 05:23:15 and it breaks compatiility with a ramdisk built by previous versions because they will not send the second call-back. right? 05:23:28 devananda: it's only for localboot 05:23:40 devananda: i mean 2nd call back will come only when boot_option == 'local' 05:23:42 which they donn't support 05:23:56 devananda: and we will wait for second call back only when boot_option == 'local' 05:23:56 sure. but if I upgrade Ironic from juno to kilo, suddenly I cant use hte same deploy ramdisk??? that's ... not right 05:23:56 which what doesn't support? 05:24:11 devananda: ah yes 05:24:31 devananda: but localboot is a new feature which needs a newer ramdisk, and we should recommend a new ramdisk :) 05:24:32 previous versions of dib ramsdisks 05:24:40 rameshg87: does it need a newer ramdisk? 05:25:02 devananda: i mean a ramdisk built recently which supports the feature 05:25:11 (this still can break people deploying trunk) 05:25:13 rameshg87: what change in DIB enabled this? 05:25:31 devananda: https://github.com/openstack/diskimage-builder/commit/9880d16d47aad92b625e543ede2590c058bcea14 05:25:40 jan 9 05:25:52 devananda: a second call back after installing boot loader in https://github.com/openstack/diskimage-builder/blob/master/elements/deploy-ironic/init.d/80-deploy-ironic#L73-L137 05:26:20 devananda: oh NobodyCam has correct link :) 05:26:49 I see 05:27:57 so yes, we need to get a fix into dib, then get a release of it done, and recommend that version as a minimum for kilo support 05:29:03 rameshg87: can you update the bug the plan ? 05:29:04 okay, i will try this approach and see how it goes on, will give an update asap 05:29:19 yeah NobodyCam will do that 05:29:26 rameshg87: great thank you :) 05:29:38 ok next bug https://bugs.launchpad.net/ironic/+bug/1405131 05:29:40 Launchpad bug 1405131 in OpenStack Compute (nova) "Ports cannot be mapped to networks" [Low,In progress] - Assigned to Mark Goddard (mgoddard) 05:29:50 low 05:29:52 huh 05:30:31 that's just nova priority 05:30:36 ok 05:30:42 yep 05:30:52 i thought I pasted the wrong bug 05:30:58 so, that looks like a fairly large patch for nova at this point in time 05:31:06 this one seems to be "hardware deployed by ironic that has >1 NIC behaves poorly with networking" 05:32:04 it doesn't look like anyone's actively working on it, though 05:32:06 devananda: are you seeing this in testing bifrost? 05:32:17 NobodyCam: we've only got one NIC, essentially, so no 05:32:22 devananda: >1 physical network, it seems 05:32:24 NobodyCam: and we're not using Nova in Bifrost 05:32:32 jroll: correct 05:32:34 :) 05:32:51 "physical". 05:33:10 do we need some one on this bug? 05:33:18 jroll: i'd wonder if you guys are hitting this though 05:33:18 devananda: it doesn't seem to be behaving poorly 05:33:20 it doesn't break me, but I've observed that this is a thing 05:33:23 devananda: ^ 05:33:26 "2014-12-23 05:03:20.877 227052 ERROR ironic.conductor.manager [-] Timeout reached when waiting callback for node 49f8b733-525d-44a2-ae19-596b19aa5d1a" 05:33:30 jroll: how so? 05:33:33 devananda: because we only register the two NICs we put in a bond 05:33:41 indicates deployment is never done 05:33:41 ahh 05:34:31 devananda: we assign two networks but both are just vlans on the bond (I hope that makes sense, I'm tired) 05:34:47 rameshg87: "behaving poorly" == "randomly failing" in my 5am-i-havent-had-coffee voice :) 05:34:55 devananda: oh, okay 05:34:59 :) 05:35:06 but yes, this is certainly a bug, I agree. not sure the best way to fix 05:35:33 jroll: so there's a patch from mark, but a) it needs work, b) it's sizeable and late in the cycle, c) it's all in nova 05:35:52 devananda: 4) and might more need work in ironic too 05:35:53 if we want to fix it, i'd say someone should work with mark or just take it over to clean up 05:35:58 rameshg87: indeed 05:36:07 devananda: d) it changes the driver API, best I can tell 05:36:19 erm :( 05:36:29 see the returns here: https://review.openstack.org/#/c/153230/2/nova/virt/driver.py 05:37:04 oh yah. see https://review.openstack.org/#/c/153230/2/nova/network/neutronv2/api.py,cm 05:37:44 heh, line 326 on the old version of the diff is telling 05:37:47 yea, that's ... going to take some discussion with the Nova folks 05:38:07 indeed 05:38:09 jroll: but bug says get physical network id from ironic ports metadata - "tag the ironic ports with some metadata, such as the provider:physical_network" 05:38:11 devananda: do we need to keep this as high? 05:38:23 jroll: might be queriable with our current api itself from nova driver i think 05:38:46 rameshg87: sure, I just don't see this getting fixed in 9 days 05:38:56 jroll: +1 :) 05:39:00 jroll: +1 05:39:03 rameshg87: I think it is bigger than the driver after reading deva's link 05:39:30 so the tldr here is that ironic does not support >1 NIC with physically different networks and static network config (ie, non-openflow) 05:39:53 ** s/ironic/openstack's bare metal service/ 05:40:03 :) 05:40:09 because this is a limitation currently in the interaction between ironic + nova + neutron 05:40:26 yep 05:40:47 fixing this in Nova requires storing additional metadata in Ironic and additional processes done by the user (ie, create the right networks in Neutron first, store the right metadata in Ironic) 05:41:23 sounds like a lot more work than we're going to do right now. AND I'd like us to actualy discuss the design a bit 05:41:35 I'm sad, but let's accept and document the limitation for Kilo 05:41:49 we want to overhaul networking stuff in L anyway 05:41:54 yep 05:41:57 :) 05:42:04 ++ 05:42:18 who's going to follow up to make sure this is documented appropriately? 05:42:28 BadCub: ^ ? 05:42:37 * rameshg87 heard from dtantsur that discoverd creates ironic ports only for mac which pxe booted discoverd ramdisk, not sure 05:43:11 are you looking for tracking or writing the docs? 05:43:19 rameshg87: that sounds like a bug :( 05:43:25 BadCub: either or both 05:43:32 lol 05:43:48 devananda: might not be the case, that's why i am not sure. i will check with dtantsur 05:43:51 :) 05:44:06 Am happy to stab at it 05:44:25 awesome! 05:44:36 ok how about https://bugs.launchpad.net/ironic/+bug/1432761 05:44:37 Launchpad bug 1432761 in Ironic "Cannot perform get_cleaning_steps on powered off node." [High,Confirmed] 05:45:21 devananda: i don't think we have /nodes//clean_steps api 05:45:38 devananda: i think we bumped that to L 05:46:25 * rameshg87 gets the link 05:46:36 https://review.openstack.org/#/c/159322/ 05:46:48 #link https://review.openstack.org/#/c/159322/ 05:47:06 then why is there a bug filed 05:47:08 this is a bug for non-merged code infact :) 05:47:11 by JoshNang of all people 05:47:16 haha 05:47:20 hahah 05:47:46 BadCub: i think 1432761 need to be un targeted for rc-1 05:47:47 In any rate, the get_cleaning_steps api should allow to take place while node is powered off for oob implementation 05:48:03 kk 05:48:07 wanyen: yes I would think so 05:48:38 NobodyCam: I'm marking it not-a-bug 05:48:53 about 15 minutes left ... stay on bugs or??? 05:49:20 removed from review day pad as well 05:50:25 I'd like to ask - are there other bugs which are not targeted to RC1 05:50:29 that folks are hitting? 05:50:47 this is the time to be testing, filing bugs, raising awareness of hte issues 05:51:02 devananda: https://launchpad.net/bugs/1431929 05:51:03 Launchpad bug 1431929 in Ironic "Ironic failing to gracefully handle: ipmi error "insufficient resources for session"" [Undecided,In progress] - Assigned to Julia Kreger (juliaashleykreger) 05:51:29 eww 05:51:32 devananda: this one too: https://bugs.launchpad.net/ironic/+bug/1437549 05:51:33 Launchpad bug 1437549 in Ironic "ilo inspection fails when inspect_ports is an integer" [Undecided,In progress] - Assigned to Ramakrishnan G (rameshg87) 05:51:46 devananda: so I have a weird issue with a conductor flapping (according to driver-list, going by the logs everything seems fine), but haven't determined if it's environmental yet 05:52:03 just a heads up, idk if it's a hashring bug or what 05:52:28 * NobodyCam saw jroll's comment about could be config :-p 05:52:33 hehehe 05:52:40 *ducks* 05:52:44 jroll: what do you mean by flapping? 05:53:01 devananda: sometimes it's in the response, sometimes it isn't 05:53:11 don't panic yet, just pointing it out :) 05:53:19 * rameshg87 hits the issue nova delete takes too long time to complete when we do "nova boot" for two instances and tries to delete one of them 05:53:35 don't know if someone else is hitting this 05:53:47 rameshg87: delete is while instance is active or deploying? 05:53:53 wait-call-back 05:53:59 node is in wait-call-back 05:54:07 instance in nova is spawning 05:54:18 *instances in nova are spawning 05:54:25 rameshg87: oh, I think set-provision-state deleted calls might get a 409 during that, but not sure 05:54:47 jroll: response to what? 05:54:52 jroll: i hit this yesterday, will try to see what is happening and then file a bug if i see something odd 05:54:53 oh. gah. yes, i get it 05:55:01 devananda: "ironic driver-list" 05:55:03 jroll: that suggests the heartbeat has timed out 05:55:23 jroll: ie, the periodic task in the conductor hasn't "checked in" recently enough 05:55:23 devananda: right, but it's quicker than that 05:55:39 like I said, I need to dig, it's quite odd 05:55:40 huh 05:55:45 * 5 minute bell * 05:55:47 yea, that's odd 05:55:49 it will go online for like 5 seconds 05:56:59 rameshg87: on https://bugs.launchpad.net/ironic/+bug/1437549 why would inspect_ports be an integer? :/ 05:57:01 Launchpad bug 1437549 in Ironic "ilo inspection fails when inspect_ports is an integer" [Undecided,In progress] - Assigned to Ramakrishnan G (rameshg87) 05:57:20 the error could be better but seems like that shouldn't happen much 05:57:40 jroll: because of https://github.com/openstack/python-ironicclient/blob/master/ironicclient/common/utils.py#L96-L101 05:57:45 jroll: of our python-ironicclient 05:58:08 jroll: as mentioned on comment https://review.openstack.org/#/c/168776/1/ironic/tests/drivers/ilo/test_inspect.py L83 (my reply) 05:58:17 rameshg87: and why would inspect_ports be 'none' ? 05:58:36 rameshg87: isn't inspect_ports supposed to be a uuid or mac address or something? 05:58:45 devananda: when operator doesn't want ironic to inspect ports (for some reason) 05:58:56 oh wait 05:58:58 what 05:59:00 whaaaat 05:59:08 jroll: https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/ilo/common.py#L79-L85 05:59:22 * minute warnning * 05:59:29 rameshg87: what if I don't use python-ironicclient and I pass some other value like 'true', 'True', or 'zebra' ? 05:59:58 rameshg87: urgh. 06:00:10 can we take this back into channel? 06:00:11 devananda: we will catch it here: https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/ilo/inspect.py#L307-L314 06:00:15 NobodyCam: okay 06:00:16 I don't even know what that means 06:00:18 yeah 06:00:21 awesome thank you 06:00:31 rameshg87: or if I unset that field by calling ironic node-update NNN delete driver_info/inspect_ports 06:00:33 GReat meeting everyone ... we are out of time 06:00:49 good night folks 06:00:53 thank you all 06:01:13 #endmeeting