04:59:52 <NobodyCam> #startmeeting Ironic
04:59:53 <NobodyCam> #chair devananda
04:59:54 <openstack> Meeting started Tue Mar 31 04:59:52 2015 UTC and is due to finish in 60 minutes.  The chair is NobodyCam. Information about MeetBot at http://wiki.debian.org/MeetBot.
04:59:54 <NobodyCam> Welcome everyone to the Ironic meeting.
04:59:55 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
04:59:57 <openstack> The meeting name has been set to 'ironic'
04:59:58 <openstack> Current chairs: NobodyCam devananda
05:00:13 <NobodyCam> Of course the agenda can be found at:
05:00:13 <NobodyCam> #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_next_meeting
05:00:22 <NobodyCam> #topic Greetings, roll-call and announcements
05:00:24 <NobodyCam> Roll-call: Who's here for the Ironic Meeting?
05:00:30 <naohirot> o/
05:00:30 <jroll> hiya.
05:00:32 <Haomeng> o/
05:00:33 <mrda> o/
05:00:34 <ramineni> o/
05:00:40 <BadCub> hola
05:00:57 <NobodyCam> hey hey "morning" all :)
05:01:08 <jroll> heh
05:01:19 <NobodyCam> it's late here and way past my bed time so I may be slow
05:01:24 <NobodyCam> sorry in adv.
05:01:42 <NobodyCam> #topic announcements:
05:02:23 <NobodyCam> devananda: is in meeting and will most likly not make it to the meeting
05:02:43 <BadCub> Specs for L will be opening after RC1
05:03:15 <NobodyCam> Feature,Dep, ans string freeze are all in effect
05:03:46 <NobodyCam> our goal to cur rc one is April 9
05:03:53 <NobodyCam> s/cur/cut/
05:04:42 <jroll> apparently dmitry wanted to announce: DevStack plugin for ironic-discoverd for those interested: https://etherpad.openstack.org/p/DiscoverdDevStack
05:04:48 <jroll> #link https://etherpad.openstack.org/p/DiscoverdDevStack
05:05:04 <NobodyCam> Ty jroll was just typingthat
05:05:10 <jroll> hehehe
05:05:13 <jroll> np
05:05:15 <BadCub> review day pad has been updated with bugs and docs for rc1
05:05:16 <NobodyCam> :)
05:05:31 <BadCub> #link https://etherpad.openstack.org/p/IronicReviewDay
05:06:26 <NobodyCam> ty BadCub
05:06:26 <rameshg87> may be this needs discussion, but it's difficult (and dangerous) to fix https://bugs.launchpad.net/ironic/+bug/1422723 for rc1
05:06:27 <openstack> Launchpad bug 1422723 in Ironic "Ironic sets instance as active even if bootloader installation fails when boot_option is 'local'" [Critical,Confirmed]
05:06:53 <Haomeng> jroll: I have try with discoverddevstack, looks like dhcp not working, will check with dmitry, maybe it is my env issue
05:07:05 <NobodyCam> rameshg87:lets cover that in the rc1 section
05:07:11 <rameshg87> NobodyCam: okay
05:07:20 <NobodyCam> #topic SubTeam: status report
05:07:40 <NobodyCam> looks like the white board didn't get updated
05:08:35 <NobodyCam> anything that needs covering for subteams?
05:08:43 <naohirot> NobodyCam: I updated irmc section :)
05:08:49 <jroll> I've nothing
05:09:21 <NobodyCam> naohirot: awesome Ty and good news too :)
05:09:35 <NobodyCam> jroll: :)
05:09:40 <naohirot> NobodyCam: :)
05:09:59 <NobodyCam> ok then we can jump into one of the hot topics
05:10:10 <NobodyCam> #topic RC1 status check.
05:10:25 <NobodyCam> looks like the bug list has growen
05:10:33 <NobodyCam> grown even
05:11:07 <BadCub> Blueprints: 1 Implemented Bugs:6 Confirmed, 5 Triaged, 7 In Progress, 10 Fix Committed
05:11:13 <BadCub> #link https://launchpad.net/ironic/+milestone/kilo-rc1
05:11:55 <rameshg87> time to discuss about critical one - Ironic sets instance as active even if bootloader installation fails when boot_option is 'local' ?
05:12:04 <NobodyCam> rameshg87: look what the #1 critical bug is :0 heheheh
05:12:20 <rameshg87> i filed this bug
05:12:34 <rameshg87> pxe_ipmitool supports both dib ramdisk and ipa ramdisk now
05:12:50 <jroll> rameshg87: can you explain this a bit more? it's not clear to me if the grub installation fails, or if the instance fails to boot due to grub not being present in the image
05:12:52 <rameshg87> when localboot is used - ipa ramdisk can report back the status of bootloader installation, whereas dib ramdisk cannot
05:13:11 <NobodyCam> #link https://bugs.launchpad.net/ironic/+bug/1422723
05:13:12 <openstack> Launchpad bug 1422723 in Ironic "Ironic sets instance as active even if bootloader installation fails when boot_option is 'local'" [Critical,Confirmed]
05:13:27 <rameshg87> since we don't wait for status of boot loader installation from dib ramdisk, we mark the instance as active
05:13:58 <Haomeng> rameshg87: yes, after 2nd reboot, the baremetal is outof control from nova and ironic I think
05:14:04 <rameshg87> code in this block: https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/pxe.py#L556-L571
05:14:19 <jroll> oh, ouch, this is nasty
05:14:35 <rameshg87> Haomeng: yes, node never boots up if boot loader installation failed, and nova says instance is active
05:14:37 <NobodyCam> #link https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/pxe.py#L556-L571
05:15:04 <jroll> Haomeng: this is a deploy failure that we do not catch
05:15:09 <Haomeng> rameshg87: yes, this is *control gap* I think, but it is diffcult to get connection after 2nd reboot
05:15:19 <Haomeng> jroll: yes
05:15:24 <jroll> Haomeng: first reboot, this is in deploy image
05:15:31 <rameshg87> now an initial idea which lucasagomes and myself discussed was introduction of a second vendor-passthru for communicating status of bootloader installation
05:15:45 <jroll> Haomeng: https://github.com/openstack/diskimage-builder/blob/master/elements/deploy-ironic/init.d/80-deploy-ironic#L69-138
05:16:00 <Haomeng> jroll: thk, got
05:16:12 <rameshg87> and ironic waiting till dib ramdisk calls back before turning the instance active
05:16:36 <rameshg87> but that requires changes in both dib and ironic - and might be too much for rc1 in my opinion
05:16:40 <jroll> rameshg87: what are the other options? that seems sane to me (but lots of code)
05:16:43 <Haomeng> Haomeng: so need more communications during bootloader installing
05:16:44 <rameshg87> unless someone else have better ideas
05:17:07 <Haomeng> rameshg87: I think the basic idea is good one
05:17:16 * jroll thinks we need to ditch the bash ramdisk sooner than later
05:17:23 <rameshg87> jroll: nothing else come to my mind other than a vendor passthru (there is no other way dib ramdisk to talk to ironic)
05:17:39 * rameshg87 is +1 with jroll
05:17:41 <NobodyCam> ya i cann't think of a better way right now
05:17:47 <jroll> rameshg87: right, I tend to think that's fine
05:18:13 <pshige> ditto
05:18:27 <NobodyCam> rameshg87: is that somethign you've started on?
05:18:29 <Haomeng> jroll: +1
05:18:47 <rameshg87> NobodyCam: no haven't started anything on it
05:19:14 <NobodyCam> can you?
05:19:34 <NobodyCam> or more correctlly can we assign the bug to you?
05:19:48 <rameshg87> NobodyCam: yes i can, may be the same i discussed above i can give it a shot and see how it works
05:20:16 <rameshg87> NobodyCam: yeah, i can take it up
05:20:21 <rameshg87> but with 9 days left, i might miss it for rc1 :(
05:20:22 <NobodyCam> rameshg87: I think we agree its a valid approch
05:20:45 <NobodyCam> rameshg87: ack. THank you!
05:20:49 <rameshg87> i mean i can get code ready, but there a lot more things to consider :(
05:20:51 <Haomeng> can we just move 'nc -l -p 10000' to latest line, and capture &raise install bootloader exception if failed
05:21:09 <rameshg87> like node should move to wait-call-back state to wait for dib ramdisk to come back again, and such things
05:21:14 <rameshg87> i will give it a try NobodyCam
05:21:42 <NobodyCam> Haomeng: that really should be the last line (almost)
05:21:50 <NobodyCam> awesoem TY rameshg87 :)
05:22:08 <devananda> morning, folks. apologies for being late, it's something like 5am here.
05:22:17 <rameshg87> ouch :D
05:22:17 <NobodyCam> morning devananda
05:22:22 <jroll> morning devananda :)
05:22:24 <BadCub> morning devananda
05:22:24 <rameshg87> morning devananda
05:22:30 <NobodyCam> I was hopping you'd still be asleep
05:22:41 <Haomeng> NobodyCam: I mean after "if [ "$IRONIC_BOOT_OPTION" = "local" ]; then" block, maybe I missed something here
05:22:45 <mrda> hey devananda vgridnev
05:22:45 <devananda> so that approach may work, but requires coordinating a change with diskimage-builder
05:22:50 <Haomeng> devananda: morning:)
05:22:50 <NobodyCam> devananda: have you read the scroll back?
05:23:12 <rameshg87> devananda: yes
05:23:15 <devananda> and it breaks compatiility with a ramdisk built by previous versions because they will not send the second call-back. right?
05:23:28 <rameshg87> devananda: it's only for localboot
05:23:40 <rameshg87> devananda: i mean 2nd call back will come only when boot_option == 'local'
05:23:42 <NobodyCam> which they donn't support
05:23:56 <rameshg87> devananda: and we will wait for second call back only when boot_option == 'local'
05:23:56 <devananda> sure. but if I upgrade Ironic from juno to kilo, suddenly I cant use hte same deploy ramdisk??? that's ... not right
05:23:56 <jroll> which what doesn't support?
05:24:11 <rameshg87> devananda: ah yes
05:24:31 <rameshg87> devananda: but localboot is a new feature which needs a newer ramdisk, and we should recommend a new ramdisk :)
05:24:32 <NobodyCam> previous versions of dib ramsdisks
05:24:40 <devananda> rameshg87: does it need a newer ramdisk?
05:25:02 <rameshg87> devananda: i mean a ramdisk built recently which supports the feature
05:25:11 <jroll> (this still can break people deploying trunk)
05:25:13 <devananda> rameshg87: what change in DIB enabled this?
05:25:31 <NobodyCam> devananda: https://github.com/openstack/diskimage-builder/commit/9880d16d47aad92b625e543ede2590c058bcea14
05:25:40 <NobodyCam> jan 9
05:25:52 <rameshg87> devananda: a second call back after installing boot loader in https://github.com/openstack/diskimage-builder/blob/master/elements/deploy-ironic/init.d/80-deploy-ironic#L73-L137
05:26:20 <rameshg87> devananda: oh NobodyCam has correct link :)
05:26:49 <devananda> I see
05:27:57 <devananda> so yes, we need to get a fix into dib, then get a release of it done, and recommend that version as a minimum for kilo support
05:29:03 <NobodyCam> rameshg87: can you update the bug the plan ?
05:29:04 <rameshg87> okay, i will try this approach and see how it goes on, will give an update asap
05:29:19 <rameshg87> yeah NobodyCam will do that
05:29:26 <NobodyCam> rameshg87: great thank you :)
05:29:38 <NobodyCam> ok next bug https://bugs.launchpad.net/ironic/+bug/1405131
05:29:40 <openstack> Launchpad bug 1405131 in OpenStack Compute (nova) "Ports cannot be mapped to networks" [Low,In progress] - Assigned to Mark Goddard (mgoddard)
05:29:50 <NobodyCam> low
05:29:52 <NobodyCam> huh
05:30:31 <jroll> that's just nova priority
05:30:36 <NobodyCam> ok
05:30:42 <BadCub> yep
05:30:52 <NobodyCam> i thought I pasted the wrong bug
05:30:58 <jroll> so, that looks like a fairly large patch for nova at this point in time
05:31:06 <devananda> this one seems to be "hardware deployed by ironic that has >1 NIC behaves poorly with networking"
05:32:04 <devananda> it doesn't look like anyone's actively working on it, though
05:32:06 <NobodyCam> devananda: are you seeing this in testing bifrost?
05:32:17 <devananda> NobodyCam: we've only got one NIC, essentially, so no
05:32:22 <jroll> devananda: >1 physical network, it seems
05:32:24 <devananda> NobodyCam: and we're not using Nova in Bifrost
05:32:32 <devananda> jroll: correct
05:32:34 <NobodyCam> :)
05:32:51 <jroll> "physical".
05:33:10 <NobodyCam> do we need some one on this bug?
05:33:18 <devananda> jroll: i'd wonder if you guys are hitting this though
05:33:18 <rameshg87> devananda: it doesn't seem to be behaving poorly
05:33:20 <jroll> it doesn't break me, but I've observed that this is a thing
05:33:23 <jroll> devananda: ^
05:33:26 <rameshg87> "2014-12-23 05:03:20.877 227052 ERROR ironic.conductor.manager [-] Timeout reached when waiting callback for node 49f8b733-525d-44a2-ae19-596b19aa5d1a"
05:33:30 <devananda> jroll: how so?
05:33:33 <jroll> devananda: because we only register the two NICs we put in a bond
05:33:41 <rameshg87> indicates deployment is never done
05:33:41 <devananda> ahh
05:34:31 <jroll> devananda: we assign two networks but both are just vlans on the bond (I hope that makes sense, I'm tired)
05:34:47 <devananda> rameshg87: "behaving poorly" == "randomly failing" in my 5am-i-havent-had-coffee voice :)
05:34:55 <rameshg87> devananda: oh, okay
05:34:59 <NobodyCam> :)
05:35:06 <jroll> but yes, this is certainly a bug, I agree. not sure the best way to fix
05:35:33 <devananda> jroll: so there's a patch from mark, but a) it needs work, b) it's sizeable and late in the cycle, c) it's all in nova
05:35:52 <rameshg87> devananda: 4) and might more need work in ironic too
05:35:53 <devananda> if we want to fix it, i'd say someone should work with mark or just take it over to clean up
05:35:58 <devananda> rameshg87: indeed
05:36:07 <jroll> devananda: d) it changes the driver API, best I can tell
05:36:19 <devananda> erm :(
05:36:29 <jroll> see the returns here: https://review.openstack.org/#/c/153230/2/nova/virt/driver.py
05:37:04 <devananda> oh yah. see https://review.openstack.org/#/c/153230/2/nova/network/neutronv2/api.py,cm
05:37:44 <jroll> heh, line 326 on the old version of the diff is telling
05:37:47 <devananda> yea, that's ... going to take some discussion with the Nova folks
05:38:07 <devananda> indeed
05:38:09 <rameshg87> jroll: but bug says get physical network id from ironic ports metadata - "tag the ironic ports with some metadata, such as the provider:physical_network"
05:38:11 <NobodyCam> devananda: do we need to keep this as high?
05:38:23 <rameshg87> jroll: might be queriable with our current api itself from nova driver i think
05:38:46 <jroll> rameshg87: sure, I just don't see this getting fixed in 9 days
05:38:56 <rameshg87> jroll: +1 :)
05:39:00 <NobodyCam> jroll: +1
05:39:03 <jroll> rameshg87: I think it is bigger than the driver after reading deva's link
05:39:30 <devananda> so the tldr here is that ironic does not support >1 NIC with physically different networks and static network config (ie, non-openflow)
05:39:53 <devananda> ** s/ironic/openstack's bare metal service/
05:40:03 <rameshg87> :)
05:40:09 <devananda> because this is a limitation currently in the interaction between ironic + nova + neutron
05:40:26 <jroll> yep
05:40:47 <devananda> fixing this in Nova requires storing additional metadata in Ironic and additional processes done by the user (ie, create the right networks in Neutron first, store the right metadata in Ironic)
05:41:23 <devananda> sounds like a lot more work than we're going to do right now. AND I'd like us to actualy discuss the design a bit
05:41:35 <devananda> I'm sad, but let's accept and document the limitation for Kilo
05:41:49 <jroll> we want to overhaul networking stuff in L anyway
05:41:54 <devananda> yep
05:41:57 <NobodyCam> :)
05:42:04 <NobodyCam> ++
05:42:18 <devananda> who's going to follow up to make sure this is documented appropriately?
05:42:28 <devananda> BadCub: ^ ?
05:42:37 * rameshg87 heard from dtantsur that discoverd creates ironic ports only for mac which pxe booted discoverd ramdisk, not sure
05:43:11 <BadCub> are you looking for tracking or writing the docs?
05:43:19 <devananda> rameshg87: that sounds like a bug :(
05:43:25 <devananda> BadCub: either or both
05:43:32 <BadCub> lol
05:43:48 <rameshg87> devananda: might not be the case, that's why i am not sure. i will check with dtantsur
05:43:51 <rameshg87> :)
05:44:06 <BadCub> Am happy to stab at it
05:44:25 <NobodyCam> awesome!
05:44:36 <NobodyCam> ok how about https://bugs.launchpad.net/ironic/+bug/1432761
05:44:37 <openstack> Launchpad bug 1432761 in Ironic "Cannot perform get_cleaning_steps on powered off node." [High,Confirmed]
05:45:21 <rameshg87> devananda: i don't think we have /nodes/<node>/clean_steps api
05:45:38 <rameshg87> devananda: i think we bumped that to L
05:46:25 * rameshg87 gets the link
05:46:36 <rameshg87> https://review.openstack.org/#/c/159322/
05:46:48 <NobodyCam> #link https://review.openstack.org/#/c/159322/
05:47:06 <jroll> then why is there a bug filed
05:47:08 <rameshg87> this is a bug for non-merged code infact :)
05:47:11 <jroll> by JoshNang of all people
05:47:16 <devananda> haha
05:47:20 <NobodyCam> hahah
05:47:46 <NobodyCam> BadCub: i think 1432761 need to be un targeted for rc-1
05:47:47 <wanyen> In any rate, the get_cleaning_steps api should allow to take place while node is powered off for oob implementation
05:48:03 <BadCub> kk
05:48:07 <NobodyCam> wanyen: yes I would think so
05:48:38 <devananda> NobodyCam: I'm marking it not-a-bug
05:48:53 <NobodyCam> about 15 minutes left ... stay on bugs or???
05:49:20 <BadCub> removed from review day pad as well
05:50:25 <devananda> I'd like to ask - are there other bugs which are not targeted to RC1
05:50:29 <devananda> that folks are hitting?
05:50:47 <devananda> this is the time to be testing,  filing bugs, raising awareness of hte issues
05:51:02 <NobodyCam> devananda: https://launchpad.net/bugs/1431929
05:51:03 <openstack> Launchpad bug 1431929 in Ironic "Ironic failing to gracefully handle: ipmi error "insufficient resources for session"" [Undecided,In progress] - Assigned to Julia Kreger (juliaashleykreger)
05:51:29 <devananda> eww
05:51:32 <rameshg87> devananda: this one too: https://bugs.launchpad.net/ironic/+bug/1437549
05:51:33 <openstack> Launchpad bug 1437549 in Ironic "ilo inspection fails when inspect_ports is an integer" [Undecided,In progress] - Assigned to Ramakrishnan G (rameshg87)
05:51:46 <jroll> devananda: so I have a weird issue with a conductor flapping (according to driver-list, going by the logs everything seems fine), but haven't determined if it's environmental yet
05:52:03 <jroll> just a heads up, idk if it's a hashring bug or what
05:52:28 * NobodyCam saw jroll's comment about could be config :-p
05:52:33 <NobodyCam> hehehe
05:52:40 <NobodyCam> *ducks*
05:52:44 <devananda> jroll: what do you mean by flapping?
05:53:01 <jroll> devananda: sometimes it's in the response, sometimes it isn't
05:53:11 <jroll> don't panic yet, just pointing it out :)
05:53:19 * rameshg87 hits the issue nova delete takes too long time to complete when we do "nova boot" for two instances and tries to delete one of them
05:53:35 <rameshg87> don't know if someone else is hitting this
05:53:47 <jroll> rameshg87: delete is while instance is active or deploying?
05:53:53 <rameshg87> wait-call-back
05:53:59 <rameshg87> node is in wait-call-back
05:54:07 <rameshg87> instance in nova is spawning
05:54:18 <rameshg87> *instances in nova are spawning
05:54:25 <jroll> rameshg87: oh, I think set-provision-state deleted calls might get a 409 during that, but not sure
05:54:47 <devananda> jroll: response to what?
05:54:52 <rameshg87> jroll: i hit this yesterday, will try to see what is happening and then file a bug if i see something odd
05:54:53 <devananda> oh. gah. yes, i get it
05:55:01 <jroll> devananda: "ironic driver-list"
05:55:03 <devananda> jroll: that suggests the heartbeat has timed out
05:55:23 <devananda> jroll: ie, the periodic task in the conductor hasn't "checked in" recently enough
05:55:23 <jroll> devananda: right, but it's quicker than that
05:55:39 <jroll> like I said, I need to dig, it's quite odd
05:55:40 <devananda> huh
05:55:45 <NobodyCam> * 5 minute bell *
05:55:47 <devananda> yea, that's odd
05:55:49 <jroll> it will go online for like 5 seconds
05:56:59 <jroll> rameshg87: on https://bugs.launchpad.net/ironic/+bug/1437549 why would inspect_ports be an integer? :/
05:57:01 <openstack> Launchpad bug 1437549 in Ironic "ilo inspection fails when inspect_ports is an integer" [Undecided,In progress] - Assigned to Ramakrishnan G (rameshg87)
05:57:20 <jroll> the error could be better but seems like that shouldn't happen much
05:57:40 <rameshg87> jroll: because of https://github.com/openstack/python-ironicclient/blob/master/ironicclient/common/utils.py#L96-L101
05:57:45 <rameshg87> jroll: of our python-ironicclient
05:58:08 <rameshg87> jroll: as mentioned on comment https://review.openstack.org/#/c/168776/1/ironic/tests/drivers/ilo/test_inspect.py L83 (my reply)
05:58:17 <devananda> rameshg87: and why would inspect_ports be 'none' ?
05:58:36 <jroll> rameshg87: isn't inspect_ports supposed to be a uuid or mac address or something?
05:58:45 <rameshg87> devananda: when operator doesn't want ironic to inspect ports (for some reason)
05:58:56 <jroll> oh wait
05:58:58 <jroll> what
05:59:00 <jroll> whaaaat
05:59:08 <rameshg87> jroll: https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/ilo/common.py#L79-L85
05:59:22 <NobodyCam> * minute warnning *
05:59:29 <devananda> rameshg87: what if I don't use python-ironicclient and I pass some other value like 'true', 'True', or 'zebra' ?
05:59:58 <jroll> rameshg87: urgh.
06:00:10 <NobodyCam> can we take this back into channel?
06:00:11 <rameshg87> devananda: we will catch it here: https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/ilo/inspect.py#L307-L314
06:00:15 <rameshg87> NobodyCam: okay
06:00:16 <jroll> I don't even know what that means
06:00:18 <jroll> yeah
06:00:21 <NobodyCam> awesome thank you
06:00:31 <devananda> rameshg87: or if I unset that field by calling ironic node-update NNN delete driver_info/inspect_ports
06:00:33 <NobodyCam> GReat meeting everyone ... we are out of time
06:00:49 <BadCub> good night folks
06:00:53 <NobodyCam> thank you all
06:01:13 <NobodyCam> #endmeeting