17:00:08 <TheJulia> #startmeeting ironic
17:00:09 <openstack> Meeting started Mon Feb 12 17:00:08 2018 UTC and is due to finish in 60 minutes.  The chair is TheJulia. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:00:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:00:11 <TheJulia> o/
17:00:12 <dtantsur> o/
17:00:12 <openstack> The meeting name has been set to 'ironic'
17:00:17 <etingof> o/
17:00:19 <mjturek> o/
17:00:21 <jroll> \o
17:00:23 <stendulker> o/
17:00:34 <TheJulia> Our meeting agenda can be found on the wiki, as always!
17:00:37 <hshiina> o/
17:00:39 <TheJulia> #link https://wiki.openstack.org/wiki/Meetings/Ironic
17:01:03 <rloo> o/
17:01:05 <mgoddard_> o/
17:01:07 <TheJulia> #topic Announcements / Reminder
17:01:29 <TheJulia> First off, Thank you dtantsur for your hard work!
17:01:35 <jroll> ++
17:01:43 <dtantsur> :)
17:01:55 <rloo> +++
17:01:57 <dtantsur> and congrats TheJulia for taking this hard work ;)
17:02:00 <etingof> dtantsur++
17:02:05 <jlvillal> o/
17:02:11 <jlvillal> +1 :)
17:02:12 <rloo> +++ congrats!
17:02:16 <TheJulia> I hope to meet everyone's expectations. Please remember that I too am human, as I take on the role of fearless leader.
17:02:22 <stendulker> Congrats Julia !!
17:02:26 <rloo> and THANK YOU TheJulia for volunteering!
17:02:30 <etingof> ++TheJulia
17:02:43 * jlvillal had expectations of an in-human leader :P
17:02:51 <TheJulia> Anyway, time for the remaining annoucements
17:02:59 <TheJulia> jlvillal: only if I get a flying aircraft carrier ;)
17:03:11 <jlvillal> :)
17:03:18 * dtantsur announces that his visa to ireland was finally approved
17:03:36 <TheJulia> The PTG is coming up. Please update the PTG Planning etherpad today and over this week.
17:03:39 <TheJulia> #link https://etherpad.openstack.org/p/ironic-rocky-ptg
17:04:00 * jlvillal thought we had a new core reviewer
17:04:07 <TheJulia> I will break the etherpad up and generate a schedule for Wednesday/Thursday on this coming friday.
17:04:30 <TheJulia> jlvillal: thanks for the reminder!
17:04:46 <TheJulia> #info hshiina is now a member of ironic-core, congrats!
17:04:50 <jroll> \o/
17:04:53 <jlvillal> Nice! :)
17:05:06 <hshiina> thanks, everyone
17:05:13 <stendulker> Congrats hshiina !!
17:05:24 <TheJulia> I think that is about it for annoucements, does anyone else have anything to annouce?
17:05:29 <TheJulia> announce
17:05:30 <rloo> congrats hshiina, welcome to moar reviews! :)
17:05:35 * jlvillal checks time in Japan and sees it is 2:05AM. Yowzer!
17:05:37 <rloo> ptg get together poll: https://doodle.com/poll/d4ff6m9hxg887n9q
17:05:38 <dtantsur> when is queens final?
17:05:56 <TheJulia> #link https://doodle.com/poll/d4ff6m9hxg887n9q
17:06:01 <sambetts> o/
17:06:13 <TheJulia> Please respond to the doodle so we can schedule an evening gathering at the PTG.
17:06:34 <rloo> if you're interested in joining the ptg get-together, please indicate your availability via the poll. i had indicated a feb 16 deadline, but i think i'd like to book a place sooner rather than later, so please sign up by tomorrow. i'll send out email too. thx.
17:06:51 <dtantsur> ++ let's not wait till the last moment with booking
17:06:52 <TheJulia> #info Queens final releases are slated for the week of the 19th-23rd.
17:07:08 <TheJulia> rloo: thanks!
17:07:20 <dtantsur> oh, so one more week to fix all the bugs
17:07:31 * rloo wonder, what bugs? impossible...
17:07:35 <TheJulia> and tests... and gates
17:07:45 <dtantsur> rloo: 3 critical bugs..
17:07:47 <TheJulia> Anyway, we should move on
17:07:58 <dtantsur> move on \o/
17:07:59 <TheJulia> dtantsur: is there a list on the whiteboard?
17:08:01 * dtantsur flies to the woods
17:08:04 <dtantsur> TheJulia: there is
17:08:09 <TheJulia> #topic Review action items from previous meeting
17:09:04 <TheJulia> Looks like our only action item was to review/triage and work on bugs last week.
17:09:56 <TheJulia> I think we can just move on since this week should be the same. Any disagreements?
17:10:21 <dtantsur> +++
17:10:44 <TheJulia> Moving on then!
17:11:12 <TheJulia> #topic Subteam status reports
17:11:16 <TheJulia> #link https://etherpad.openstack.org/p/IronicWhiteBoard
17:11:20 <TheJulia> Starting at Line 202
17:12:40 <rloo> dtantsur: are you going to work on classic driver deprecation this week? (doc needs updating?)
17:12:46 <TheJulia> FYI, for those that don't see it, dtantsur has put the list of critical Queens bugs that need to land and be backported this week under the bugs section.
17:12:49 <dtantsur> rloo: very likely so
17:13:02 <dtantsur> rloo: after solving the API issue we talked about
17:13:10 <rloo> dtantsur: we should ping vendors that need to update their docs wrt classic driver deprecations
17:13:12 * dtantsur finally has a devstack environment to test things
17:13:23 <dtantsur> rloo: well, I did a call on the ML, and at least 2 vendors proposed patches
17:13:34 <rloo> dtantsur: ok good.
17:13:37 * dtantsur hands TheJulia a loooong stick to poke people
17:13:56 <jroll> the critical bugs are at line 215, for anyone else that also can't read today
17:14:04 <rloo> dtantsur: also the TODOs wrt migrating CI to hardware types. who's going to do all those?
17:14:05 <TheJulia> dtantsur: is the end sharpened ?
17:14:19 <dtantsur> TheJulia: just enough to make it annoying
17:14:26 <TheJulia> dtantsur: awesome!
17:14:30 <dtantsur> rloo: I suspect me, unless somebody wants to help
17:14:50 <rloo> just a heads up in case you miss it, traits is almost done but we forgot one thing, it'll need to be backported (L285)
17:15:07 <TheJulia> mgoddard: re: traits, is any further action absolutely required for this release?
17:15:17 * TheJulia looks at 285
17:15:26 <jroll> should just be that one
17:16:08 <TheJulia> https://review.openstack.org/#/c/543461/
17:16:08 <patchbot> patch 543461 - ironic - Validate instance_info.traits against node traits
17:16:20 <rloo> it isn't clear to me what we did wrt routed network support :) L314+
17:16:52 * TheJulia stats a list
17:16:55 <dtantsur> hjensas: mind cleaning it up please ^^^?
17:17:18 <dtantsur> to make it clear what is to be done for queens, what is a follow-up for rocky, etc
17:17:42 <hjensas> o/ Will do.
17:17:55 <rloo> i am deleting the 'split away tempest plugin', meant to do that before this meeting (L423)
17:18:16 <TheJulia> Looks like the the ansible docs need a revision, and we should likely try to land/backport
17:18:23 <dtantsur> oh, before I forgot: TheJulia we need to document creating queens jobs for the tempest plugin in our releasing documentation
17:18:32 <dtantsur> and, well, create queens jobs :)
17:18:50 <mgoddard_> TheJulia: that patch is borderline required IMO, but would be nice to get it in. I have just found a small issue with the nova virt driver that will need fixing
17:19:06 <rloo> TheJulia: wrt bifrost L431 -- that's the bug we're fixing now, right?
17:19:23 <TheJulia> dtantsur: is it done for this cycle?
17:19:39 <dtantsur> TheJulia: nope, just remembered
17:19:48 <dtantsur> I can do it while we talk
17:20:21 <TheJulia> rloo: no, different but not really a big deal, just needs to be gotten to "soon" since it is all the way off in keystoneauth1
17:20:34 <rloo> TheJulia: :-(
17:20:50 <TheJulia> Yeah :\
17:21:10 <TheJulia> Anyway, I think we've looked most everything over for subteams
17:21:18 <TheJulia> Are we ready to move on?
17:21:52 <rloo> + moving on. do we want to continue with these subteam statuses until after PTG, or put on hold until after PTG?
17:22:34 <rloo> although i guess we aren't quite done with queens so maybe continue...
17:22:36 <openstackgerrit> Dmitry Tantsur proposed openstack/ironic-tempest-plugin master: Add jobs for stable/queens  https://review.openstack.org/543555
17:22:36 <dtantsur> TheJulia: ^^^
17:22:38 <TheJulia> We ought to hold off for next week, I think next week will all be discussion
17:22:41 <TheJulia> dtantsur: thanks!
17:23:06 <TheJulia> #topic Priorities for this week
17:23:18 <TheJulia> I'm going to remove the list of things from last week that are struck out
17:24:51 <TheJulia> dtantsur: do you think we should explicitly add the list of bugs to the priority list?
17:25:06 <rloo> TheJulia: dtantsur's doc patch for classic drivers dep: https://review.openstack.org/#/c/537959/
17:25:07 <patchbot> patch 537959 - ironic - Switch contributor documentation to hardware types
17:25:17 <dtantsur> TheJulia: probably won't hurt
17:25:29 <TheJulia> dtantsur: can you perform that copy/paste?
17:25:41 <dtantsur> yep
17:25:44 <openstackgerrit> Dmitry Tantsur proposed openstack/ironic master: releasing docs: document stable jobs for the tempest plugin  https://review.openstack.org/543558
17:26:48 <TheJulia> I think that works order wise
17:26:52 <TheJulia> thoughts/objections?
17:27:21 <dtantsur> LGTM
17:27:32 <TheJulia> Are we ready to move on?
17:27:39 <rloo> i guess it is implicit that the traits patch is a weekly priority?
17:27:54 <dtantsur> it's in "Required backports"
17:28:04 <rloo> dtantsur: hence 'implicit' :)
17:28:11 <dtantsur> :)
17:29:19 <TheJulia> One thing to keep in mind, if anyone becomes aware of something that must be backported, please raise visibility as soon as possible.
17:29:33 <TheJulia> Time to move on :)
17:29:46 <dtantsur> I think https://review.openstack.org/542214 is nice to have
17:29:46 <patchbot> patch 542214 - ironic-inspector - Only set switch_id in local_link_connection if it ...
17:30:02 <TheJulia> I agree
17:30:35 <TheJulia> #topic Bug Triaging for the week
17:30:54 <TheJulia> Same as last week?
17:31:01 <dtantsur> ++
17:31:29 <TheJulia> #action Everyone to triage/review bugs in preparation for final Queens release.
17:31:47 <TheJulia> Moving on!
17:31:52 <TheJulia> #topic Discussion
17:32:07 <TheJulia> First, and only topic it looks like, is what to do with grenade.
17:32:14 <rloo> fix it :)
17:32:27 <TheJulia> The problem is there is no fixing it as-is...
17:32:47 <rloo> so it seems like grenade framework doesn't work for us/rolling upgrades
17:33:14 <rloo> have we had discussion with the grenade folks in the past about it? cuz we're now trying to continue to hack to get it to work for us
17:33:34 <rloo> and we hack something but then don't follow up. and then something breaks :-(
17:33:56 <TheJulia> Yeah, we can't complete the nova upgrade without a nova patch in-place to handle version negotiation either.
17:34:05 <jroll> what's the latest problem with grenade?
17:34:06 <rloo> which doesn't mean that we shouldn't hack something now but ...
17:34:34 <TheJulia> jroll: tl;dr sqlalchemy gets upgraded, and old nova is incompatible with newer sqlalchemy
17:34:40 <TheJulia> *boom*
17:35:02 <rloo> and in our rolling upgrades scenario, we don't upgrade nova, just ironic
17:35:11 * jroll thinks he needs more time than we have to fully understand the thing
17:35:23 <rloo> cuz the order of upgrading is ironic first, then nova
17:35:27 <jroll> are we back to the segfault problem, is my actual question
17:35:32 <TheJulia> And since we don't upgrade ironic-api either, we can't actually upgrade nova
17:35:41 <TheJulia> jroll: We are! :)
17:35:55 <jroll> TheJulia: that seems to me like a critical bug to be fixed, likely by the nova team
17:36:08 * jroll recalls dansmith saying similar, and then the bug disappeared for a while
17:36:12 <TheJulia> A critical bug in Pike?
17:36:19 <jroll> yes
17:36:26 <rloo> but should old s/w be expected to work with new packages?
17:36:49 <jroll> running software should not be expected to segfault after an apt-get upgrade.
17:37:00 <jroll> ever, that's a bug, flat-out.
17:37:06 <rloo> jroll: ok, in that case, it is a nova bug.
17:37:39 <dtantsur> we don't upper-cap sqlalchemy in requirements, so we're expected to work with newer versions
17:38:11 <dansmith> rloo: a segv after a package upgrade would be a bug in some library
17:38:13 <jroll> I'm totally open to a conversation about whether grenade is the right tool for the job here, but it seems to me we've been doing a lot to hack around this bug, and then complaining that grenade makes those hacks hard :)
17:38:16 <TheJulia> So then it is a nova bug
17:38:25 <dansmith> rloo: there should be nothing you can do from python land to segv yourself
17:38:42 * jroll isn't sure it's a nova bug, but it's a bug with how nova interacts with the system, yes
17:38:45 <rloo> dansmith: that is good to know!
17:39:00 <dansmith> jroll: you might even argue that grenade is the right tool since it's poking something that needs fixing :)
17:39:10 <TheJulia> jroll: I think at the same time, we have an unrealistic scenario that we're executing with grenade
17:39:22 <jroll> dansmith: yeah, I should have finished with "so that's a separate conversation" :)
17:39:27 <dansmith> jroll: aye :)
17:39:35 <dansmith> TheJulia: what's unrealistic about it?
17:39:51 <dansmith> aside from the fact that nobody would deploy any of this from devstack anyway
17:39:57 <TheJulia> dansmith: Upgrade everything but nova on the same machine without isolation of underlying shared packages
17:40:08 <TheJulia> which we do because we can't run newer nova with older ironic
17:40:12 <dansmith> TheJulia: I don't think that's unrealistic
17:40:36 <dansmith> it's unideal for sure,
17:40:54 <dansmith> but if the package versions don't prohibit it, I think people would expect that should work
17:41:18 <jroll> would and do, unfortunately
17:41:25 <dansmith> right
17:41:41 <TheJulia> so will we actually get traction for nova to fix it in stable/pike?
17:41:55 <TheJulia> Well, for a fix to land
17:42:09 <dansmith> if there's something nova has to do, then sure, but I can't imagine what that is
17:42:48 <rloo> i think we may need to work with nova to help pinpoint where/how it is failing... seems like if we take ironic out of the picture, nova should still segv?
17:42:49 <TheJulia> If dtantsur's assertion that projects must be compatible with future sqlalchemy versions, then there is an extra kwarg that needs to be removed that is currently ignored I believe
17:43:21 <TheJulia> if the underlying bytecode is removed that the python runtime is using, does it recompile the bytecode?
17:43:33 <jroll> dansmith: I think it's less that nova needs to do something and more us begging for help because we've cumulatively put hundreds of people-hours into trying to track this down and/or fix it :(
17:43:45 <dansmith> jroll: I hear ya
17:44:47 <dtantsur> TheJulia: IIRC yes
17:44:48 <dansmith> TheJulia: that shouldn't cause an segv, otherwise that'd be a python bug
17:44:54 <rloo> doesn't nova have a rolling upgrades/grenade job? I'd think it would have barfed there too?
17:45:06 <dansmith> rloo: several of them yeah
17:45:20 <jroll> rloo: nova-conductor is upgraded in that job (which is the service that is segfaulting)
17:45:40 <jroll> that's why it isn't seen there
17:45:41 <TheJulia> but do those upgrades not actually upgrade nova?
17:45:44 <rloo> jroll: really, i thought in our job, we didn't upgrade nova. let's take it offline
17:46:17 <jroll> rloo: correct, we do not upgrade nova. nova's grenade jobs do. nova-conductor only breaks when not upgraded.
17:46:18 <dansmith> TheJulia: we upgrade pieces of nova in the partial job, but conductor always gets upgraded (i.e. restarted(
17:46:25 <TheJulia> dansmith: ok
17:46:43 <rloo> jroll: ah got it. so can we change their test to not upgrade and see if it barfs?
17:46:57 <dansmith> rloo: no, the whole point of our grenade test is to upgrade conductor :)
17:47:01 <TheJulia> The take-away I'm getting is we don't try and change the grenade scenario, that we hunt down and try and fix the root cause of the segfault?
17:47:24 <dansmith> has anyone tried to reproduce this locally?
17:47:39 <jroll> TheJulia: that's my opinion, yes
17:47:42 <rloo> TheJulia: yup, we should fix root cause
17:47:46 <dansmith> because doing that would let us get core files more easily and dig into what was going on when the segv is triggered
17:47:50 <TheJulia> dansmith: I'm fairly sure I did so last week
17:47:52 * jroll has not tried locally
17:48:08 <dansmith> um okay :)
17:48:18 <TheJulia> I wiped the machine out though
17:48:27 <dansmith> TheJulia: does that mean you're fairly sure you reproduced it? or fairly sure you tried?
17:48:39 * rloo wonders why the segv appeared, then disappeared, then appeared again...
17:48:52 <dansmith> rloo: that's usually the nature of such things
17:49:04 <TheJulia> dansmith: I really don't remember at this point :(
17:49:11 <rloo> dansmith: that explains it then!
17:49:22 <TheJulia> I think that I did, but last week was a blur
17:49:24 <dansmith> they can be deterministic, but often not, due to ordering and timing
17:50:00 <rloo> dansmith: so it might be hard to reproduce. great.
17:50:43 <rloo> although zuul is having great luck reproducing
17:51:19 <TheJulia> unless it is breaking updated bytecode that is causing the segfault.. I seem to remember the first time we ran into this we got some lsofs out of a running system where the conductor was crashing and we had some sqlalchemy files open but not all...
17:51:59 <TheJulia> I'll continue to work on it this week, but with the constraint of not changing the job or scenario
17:52:01 <dansmith> AFAIK, python only opens those files whilst loading them the first time, not continually
17:52:25 <dansmith> and I don't think it ever purges them and has to reload them
17:52:43 <jroll> I would imagine it's more about some shared library underneath getting upgraded
17:52:47 <dansmith> yes
17:52:51 <dansmith> I would bet on it
17:53:06 <TheJulia> The case is the same for shared libraries
17:53:15 <TheJulia> Open file handler don't change
17:53:32 <dansmith> TheJulia: but shared libraries can be opened and closed,
17:53:33 <TheJulia> it would have ot be opening a new file/library/thing that is often accessed
17:53:39 <jroll> even when the process is forked?
17:53:51 <dansmith> jroll: yes if it's just a fork
17:54:11 <TheJulia> We're running out of time today
17:54:19 <TheJulia> rloo: It doesn't look like we're goin gto get to RFEs at this point
17:54:28 <rloo> no worries
17:54:39 <rloo> i might poke people about them later. or not :)
17:54:50 <TheJulia> rloo: I believe that is reasonable
17:55:33 <TheJulia> #action TheJulia to try and reproduce the fun grenade crash situation locally and use that to try and collect data
17:56:01 <TheJulia> Since we have only 4 minutes left, does anyone have anything else that needs to be discussed today?
17:56:52 * TheJulia queries crickets as a service
17:57:12 * jroll has nothing
17:57:18 <rloo> crickets
17:57:27 * dtantsur too
17:57:28 <TheJulia> Okay, thanks everyone!
17:57:32 <TheJulia> Have a wonderful week!
17:57:35 <dtantsur> thanks TheJulia and congrats again
17:57:43 <TheJulia> #endmeeting