#openstack-meeting-3 log

17:00:02 <jroll> #startmeeting ironic
17:00:03 <openstack> Meeting started Mon Jan 18 17:00:02 2016 UTC and is due to finish in 60 minutes.  The chair is jroll. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:00:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:00:04 <jroll> ohai.
17:00:07 <openstack> The meeting name has been set to 'ironic'
17:00:09 <devananda> o/
17:00:12 <lucasagomes> o/
17:00:12 <TheJulia> o/
17:00:13 <zhenguo> o/
17:00:14 <jroll> as always, the agenda is here:
17:00:14 <cdearborn> o/
17:00:16 <stendulker> o/
17:00:17 <jroll> #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_next_meeting
17:00:17 <rpioso> o/
17:00:31 <jroll> US holiday for some folks today, so attendance may be light
17:00:34 * jroll jumps right in
17:00:40 <jroll> #topic announcements and reminders
17:00:43 <zer0c00l> o/
17:00:47 <Nisha> o/
17:01:16 <rloo> o/
17:01:17 <dtantsur> o/
17:01:31 <jroll> #info it appears that february 16-18 is the best date for our virtual midcycle, so I'm going to roll with that
17:01:33 <mjturek1> \o
17:01:43 <lucasagomes> nice, wfm
17:01:44 <jroll> more info to come, I'll email on that as well
17:01:57 <rloo> thx jroll
17:02:05 <jroll> additionally, mitaka 2 milestone is this week for the rest of openstack
17:02:10 <jroll> as such, I'd like to cut a release very soon
17:02:24 <jroll> hoping we can land the manual cleaning work before doing so
17:02:31 <jroll> so please jump in and help with that :)
17:02:35 <sambetts> o/
17:02:44 <jroll> also needing reviews on the neutron stuff
17:02:55 <jroll> last but of course not least, our gate is currently down due to a devstack bug
17:03:09 <jroll> dtantsur has a patch up to devstack to fix
17:03:32 <jroll> and in general, our gate is looking pretty sorry, I'd really like folks to help out with working on that where they can
17:03:39 <lucasagomes> :-(
17:04:05 * devananda has the sad
17:04:05 <lucasagomes> sambetts, any news on tiny IPA ? Last time it that PoC patch passed right?
17:04:05 <jroll> any other announcements?
17:04:16 <rloo> devstack patch: https://review.openstack.org/#/c/268960/, but are there any qa cores working today?
17:04:28 <jroll> unclear at this time
17:04:57 <sambetts> lucasagomes: Yup tinyipa has passed in the gate, I've done some optimisations too, but can't get the RAM requirement below 384MB
17:05:23 <lucasagomes> sambetts, nice! We can probably talk about tinyipa later on the meeting
17:05:32 <sambetts> Ok :D
17:05:34 <jroll> yeah, let's discuss that in open discussion
17:05:35 <lucasagomes> but even if its 384MB but startup time is very quick
17:05:38 <lucasagomes> we should def use it
17:06:07 * jroll moves on
17:06:09 <jroll> #topic subteam status reports
17:06:13 <jroll> as always, these are here:
17:06:15 <jroll> #link https://etherpad.openstack.org/p/IronicWhiteBoard
17:06:21 <jroll> I'll give folks a few to review
17:07:13 <rloo> lucasagomes, lintan, wrt live upgrades, is there a list of what needs to be done to get live upgrade working?
17:07:38 <jroll> rloo: the biggest thing for that is getting it testing in the gate
17:07:46 <lucasagomes> rloo, I gotta look, but i though it just needed some documentation
17:07:52 <rloo> jroll: it needs more code.
17:07:52 <lucasagomes> yeah and gate testing
17:07:56 <jroll> jlvillal has started on making grenade work, and then we can do a grenade partial
17:08:00 <jroll> rloo: does it?
17:08:12 <rloo> jroll: i know of at least one, let me find it. the one that clamps down the rpc version.
17:08:22 <jroll> mmm
17:08:24 <jroll> ok
17:08:40 <jroll> honestly I won't believe it works until the gate says so
17:08:47 <rloo> https://review.openstack.org/#/c/253355/
17:09:09 <rloo> i was wondering if there were more patches needed besides that one.
17:09:10 <lucasagomes> lintan is not around the meeting (due TZ probably) but I will catch up with him
17:09:19 <jroll> rloo: right, I'm not sure that's required but rather a nice to have
17:09:20 <devananda> wow, osprofiler cross project spec landed after about 18 months of work. /me reads
17:09:29 <jroll> I could be wrong
17:09:43 * lucasagomes adds to his review list
17:09:57 <rloo> jroll: you need that, otherwise you would be upgrading to a new version when you have old version that doesn't support new version/objects.
17:10:34 <rloo> jroll: anyway, lets get more info from lintan. i wanted an idea of what was left to do there :)
17:11:17 <jroll> rloo: yeah, I think you're right, agree
17:11:45 <rloo> jroll, devananda: wrt node filtering API etc, that is low priority so should i remove it from subteam report until N* cycle?
17:12:12 <rloo> oh, i gues that pertains to multiple compute hosts too.
17:12:32 <jroll> rloo: we still want to work on it this cycle, just not until the other priorities are more done
17:12:43 <devananda> rloo: I think lucas has continued to work on that. I've had to put it on my back burner as I'm focusing onthe neutron integration
17:13:10 <rloo> jroll, devananda: OK
17:13:17 <lucasagomes> yeah, I can drop it and work on other stuff. But i would like to move it at least a little this cycle
17:13:31 <lucasagomes> otherwise I think it will be hard to complete on the next cycle alone
17:13:36 <rloo> lucasagomes: i think we should try to get the properties/capabilities into a separate table.
17:13:53 <lucasagomes> yeah, that's a small step forward
17:14:00 <devananda> lucasagomes: I agree. do you think some of the internal refactoring can be done w/o any api changes this cycle?
17:14:38 <lucasagomes> devananda, yes we can architect the database without exposing it in the API if needed
17:15:03 <lucasagomes> the API syntax itself for filtering will be a complex one, I would suggest we have a spec only for it
17:15:12 <rloo> devananda: why don't you want any api change related to that? we have microversions to the rescue! :)
17:15:12 <jroll> +1
17:15:38 * dtantsur is looking forward to a new (micro)versioning flame war ^_^
17:15:54 * lucasagomes no war!
17:16:01 <TheJulia> indeed dtantsur
17:16:09 <devananda> rloo: because if we can't do internal refactoring *without* API changes, it's harder to make that backwards compatible (iow, downgrade the API version)
17:16:38 <devananda> rloo: I didn't mean to imply that we should not change the API
17:16:50 <devananda> but that the db schema changes and API changes should be done separately
17:17:01 <devananda> which was a point lucasagomes and I were discussing (I think it was) last week
17:17:05 <mgould> lucasagomes, we're still going for flattened-JSON-in-strings because we have to support SQLite, right?
17:17:17 <rloo> devananda: ok. we can discuss in the spec itself.
17:17:43 <rloo> devananda: but if we have a  new node.capabilities -- that means a microversion bump right there.
17:17:56 <dtantsur> mgould, does mysql have native JSON support? I'm only aware of PostgreSQL's
17:18:08 <devananda> rloo: indeed it does
17:18:17 <devananda> dtantsur: not until 5.7, which isn't packaged widely yet
17:18:19 <lucasagomes> mgould, it's in the air, but I think we tend to use strings instead of native json
17:18:27 <lucasagomes> due the version of the databases to support it
17:18:28 <dtantsur> rloo, it's not even a node.capabilities, it's a separate table IIUC
17:18:53 <dtantsur> or am I wrong?
17:18:56 <jroll> the capabilities move *can* still be done without api changes
17:19:04 <mgould> lucasagomes :-(
17:19:07 <lucasagomes> mgould, the main point now is about having those strings indexed
17:19:12 <mgould> yep
17:19:14 <rloo> dtantsur: yeah, a separate table, but represented via node.capabilities. i suppose we could just have the separate table but use it to populate node.properties['capabilities'] instead.
17:19:21 <jroll> but let's not architect these things right now
17:19:27 <lucasagomes> mgould, which we may want to do by limiting the size of the key/values paris
17:19:28 <devananda> my point was, I think that ^ could be done this cycle, since it sounds like the API changes need more time to bake
17:19:28 <jroll> to the spec!
17:19:32 <devananda> jroll: ++
17:19:38 <rloo> jroll: ++ (sorry for sidetracking it)
17:19:39 <lucasagomes> but for that we need to move capabilities out of properties (and root device hints)
17:19:42 <jroll> any other subteam things here?
17:20:24 <jroll> k, moving on
17:20:39 <jroll> #topic Node's tags optimization should we do it or not ?
17:20:43 <jroll> #link https://review.openstack.org/#/c/253748/
17:20:46 <jroll> lucasagomes: this is you
17:21:04 <lucasagomes> hi, so there's this update to the nodes tag spec
17:21:26 <lucasagomes> it's introducing some optimization so we can reuse the tags name across multiple nodes (or even resources AFAIUI)
17:21:51 <lucasagomes> but I think this brings some extra complexity when updating/deleting those tags
17:22:16 <lucasagomes> as I have a limited database knowledge I added it to the agenda so more people can take a look at it
17:22:16 <jroll> right, so I see the delete complexity
17:22:26 <jroll> however we could just not delete the unused tags
17:22:33 <lucasagomes> that's a way to do it too
17:22:35 <jroll> if too many show up, ops can remove them
17:22:56 <devananda> jroll: ++
17:23:03 <rloo> jroll: do we have an API for ops to remove them?
17:23:11 <devananda> I dont think this needs an API
17:23:19 <jroll> then again, I'm not terribly convinced that we need this optimization at all
17:23:23 <lucasagomes> the spec mentions deletion, so it would need to be updated. As-is (incl. deletion) I think that we should keep the simplicity
17:23:30 <jroll> I agree this doesn't need an API, it's DB pruning
17:23:32 <rloo> how does ops remove them then?
17:23:53 <devananda> rloo: mysql "delete from ... left join ... where col is null"
17:23:54 <lucasagomes> rloo, ops can delete them from the nodes (node_tags table)
17:23:54 <jroll> delete from tags where (something crazy to find unused tags);
17:24:03 <lucasagomes> but they tag name would live in the "tags" table still
17:24:31 <rloo> ahhhh.... i don't think we want to encourage ops to issue db calls directly. is that what you're thinking?
17:24:51 <jroll> in normal operation, we don't
17:24:56 <devananda> jroll, lucasagomes: so this optimization was my idea. If it's going to stall the work, we can punt / just not do it. the gains in performance will be measurable, but they won't be huge, and on a beefy database server, it'll be negligible
17:24:58 <dtantsur> we also need an "upsert" logic to *add* tags, right?
17:24:59 * rloo doesn't recall the details of tags
17:25:01 <jroll> but db pruning... I don't see an issue
17:25:29 <jroll> devananda: yeah, I'm not sure this is necessary - if someone raises it as an actual issue they're running into, I'm inclined to refactor
17:25:31 <devananda> ops do db maintenance periodically for many projects. especially Nova
17:25:35 <jroll> did the existing schema land yet?
17:25:38 <lucasagomes> devananda, I don't think it's going to stall the work. If we agree on not deleting that from tags the problem is solved
17:25:42 <jroll> (I believe it did)
17:26:01 <dtantsur> yeah, not deleting also solves problem with reliable insert
17:26:07 <rloo> jroll: i could be wrong, but i seem to recall some mention downstream about not wanting ops to do db calls directly, if pruning is needed, then it seems like ironic should provide some way to do that pruning.
17:26:27 <devananda> rloo: we could trivially add a cmdline option to dbsync
17:26:27 * mgould would be *astonished* if the tags table grew large enough to be a problem
17:26:31 <jroll> rloo: we could provide a script or something, sure
17:26:35 <devananda> but I agree with mgould
17:26:37 <mgould> the node_tags table, sure
17:26:42 <rloo> devananda, jroll: Ok, a script is ok too.
17:27:14 <devananda> it is, at most, about 2kb per row
17:27:16 <devananda> including indexes
17:27:58 <jroll> ok, so I'd vote to punt on this for now
17:28:05 <dtantsur> we can recommend people not to create millions of tags :D
17:28:07 <mgould> devananda, and maybe 1e4 rows?
17:28:08 <jroll> does anyone vehemently disagree?
17:28:14 <mgould> jroll, works for me
17:28:19 <lucasagomes> jroll, wfm
17:28:28 <rloo> so if someone had 10s of thousands of nodes, we're good?
17:28:39 <dtantsur> punt = do not care about deletion? +1
17:28:56 <jroll> if they have thousands of tags across 10k nodes, it may become 10ms instead of 1ms
17:29:00 <jroll> dtantsur: punt on the whole thing
17:29:08 <dtantsur> also +1 :)
17:29:18 <lucasagomes> dtantsur, I think is punt on leaving as-is now (duplicating tag names across nodes)
17:29:18 <jroll> (also those are totally random numbers, I could be way off)
17:29:30 <mgould> what's the current state? A string field on the nodes table containing a list of tags?
17:29:48 <devananda> jroll: my back of the envelope says at scale, the current table could become ~ 1GB
17:29:58 <jroll> mgould: many to many table tags:nodes
17:30:06 <mgould> jroll, gotcha
17:30:10 <jroll> tag_string:node_id
17:30:12 <jroll> that is
17:30:15 <devananda> which is still perfefctly reasonable for any modern server to handle
17:30:17 <jroll> I think
17:30:42 <jroll> mgould: yeah, sorry, current table is tag_str:node_id, one row per tag/node combo
17:31:01 <jroll> devananda: can you ask to punt this on the spec then? :)
17:31:09 <devananda> punt ++
17:31:15 <mgould> punt ++
17:31:32 <TheJulia> 
17:31:45 <TheJulia> doh
17:32:31 <jroll> alright, shall we move on then?
17:32:42 <lucasagomes> jroll, ++
17:32:47 <jroll> #agreed punt on tags optimizations as they probably aren't needed
17:32:54 <jroll> #topic open discussion
17:32:59 <jroll> so I wanted to talk about tinyipa
17:33:21 <jroll> I'm all about it, in that it improves gate times and in theory, success rate
17:33:41 <jroll> however, would we recommend that people use it in production? and if not, are we comfortable testing with that?
17:34:04 <jroll> and if we do switch our tests to use tinyipa, do we continue building/publishing the coreos ramdisk? do we gate on that?
17:34:32 <rloo> we should test *something* that we think can be used in production
17:34:39 <lucasagomes> I would say we should still incentive people to use another ramdisk in production
17:34:43 <devananda> I may need to be caught up -- why wouldn't someone use tinyipa in production?
17:35:03 <jroll> devananda: I'm not sure, that's why I ask
17:35:07 <devananda> ok :)
17:35:14 <lucasagomes> since we boot VMs the kernel/base OS we use is not that important in gate, with tinyipa we are testing the ironic-python-agent code
17:35:26 <lucasagomes> so sounds like a good thing to do (due the limited resources we have)
17:35:33 <jroll> that said, I'll always use the coreos one because debuggability and customization stories are far better
17:35:41 <lucasagomes> ++
17:36:10 <jroll> oh did sambetts miss all of that? :(
17:36:21 <sambetts> :( my session just blew up
17:36:24 <jroll> sambetts: https://gist.github.com/jimrollenhagen/43f3f41a2b6dd45f1cd3
17:36:55 <zer0c00l> Where do i find more info on tinyipa
17:36:55 <zer0c00l> ?
17:37:10 <sambetts> https://review.openstack.org/#/c/234902/
17:37:14 <jroll> zer0c00l: I'd show you but gerrit seems to be down :)
17:37:15 <sambetts> zer0c00l: ^
17:37:17 <jroll> oh there you go
17:37:18 <devananda> jroll: debugging is easier because the coreos ipa ramdisk has additional tools in it (like sshd)? or is there another reason?
17:37:36 <jroll> devananda: yes, that's the primary reason, easier to bake in ssh keys as well
17:37:49 <jroll> devananda: and easy to customize to add additional tools, like say, tcpdump
17:38:00 <jroll> especially useful if your agents don't have internet access :)
17:38:06 <devananda> jroll: I would say the same for the DIB-built IPA image
17:38:31 <jroll> devananda: I disagree, adding a hardware manager or a set of ssh keys would require writing a new DIB element
17:38:32 <lucasagomes> devananda, ++ for the key injection thing, there's a element called dynamic-login in DIB
17:38:38 <sambetts> jroll: making it easy to add additional tools is just down to making the build script better right?
17:38:40 <lucasagomes> jroll, I've changed that
17:38:46 <jroll> lucasagomes: can it add an arbitrary list of keys?
17:38:51 <lucasagomes> #link https://github.com/openstack/diskimage-builder/tree/master/elements/dynamic-login
17:38:54 <Nisha> jroll, tinyipa is for bios and uefi?
17:38:58 <jroll> sambetts: yeah maybe
17:39:02 <lucasagomes> jroll, well not that advanced, but you can configure key or password at boot time
17:39:06 <jroll> Nisha: I'm not sure
17:39:11 <jroll> lucasagomes: yeah, that doesn't help in production
17:39:11 <lucasagomes> via kernel cmdline if the image is built with that element (dynamic-login)
17:39:11 <devananda> jroll: so I think your question is: (a) should we recommend using tinyipa downstream? (b) should we use it in the gate at all? (c) should we use it in the gate exclusively?
17:39:30 <devananda> I think (b) is obviously yes, since making boot times faster in the gate => more stability
17:39:32 <jroll> devananda: yes, ty
17:40:07 <devananda> I also think (a) determines (c)
17:40:27 <devananda> if we don't recommend it downstream (say, cause we still recommend the coreos ipa image) then we need to also test that in the gate
17:40:30 <lucasagomes> (b) is a yes for me as well... specially for things like granade, requiring 384MB vs 1GB is a good improvement
17:40:40 <Nisha> jroll, AIUT it is building the deploy ramdisk using coreos
17:40:50 <jroll> Nisha: it is not
17:41:04 <jroll> devananda: I agree - but then the question for (c) is do we keep the jobs running on other ramdisks voting?
17:41:44 <jroll> Nisha: and right now we're only discussing the gate so UEFI is irrelevant IMO
17:41:58 <Nisha> jroll, ok
17:41:59 <zer0c00l> i was looking at coreos ramdisk boot times other day, looks like it can be improved as well
17:42:02 <zer0c00l> http://fpaste.org/312052/53138904/
17:42:12 <devananda> jroll: OT: do you want to start testing UEFI in the gate? :)
17:42:22 <zer0c00l> look at ldconfig and systemd-hwdb-update they are not really needed
17:42:32 <zer0c00l> takes almost 20 sec
17:42:32 <jroll> devananda: of course :)
17:43:10 <Nisha> devananda, ++
17:43:15 <lucasagomes> jroll, I think we can test other ramdisks in gate if we limit the scope of it (just test deployment, no clean, no rebuild etc)
17:43:24 <lucasagomes> because tinyipa already test the ipa code
17:43:39 <jroll> lucasagomes: yeah, not a bad idea
17:43:45 <lucasagomes> other ramdisks should be tested to see if it boots correctly (base OS + service files)
17:43:51 <lucasagomes> if things are getting started correctly and so on
17:44:03 <devananda> Nisha: it might be worthwhile to add UEFI VM support to devstack and the ssh (or libvirt) driver, so we can exercise those interfaces in the upstream gate
17:44:20 <devananda> lucasagomes: ++
17:44:41 <Nisha> devananda, noted :)
17:44:51 <zer0c00l> another important aspect of a ramdisk and kernel is their hardware enablement, does tinycorelinux gets updated often?
17:45:05 <devananda> zer0c00l: good point
17:45:11 <lucasagomes> zer0c00l, less often than other I believe... but since in gate we test on VMs
17:45:19 <lucasagomes> tinyapi seems ok
17:45:23 <devananda> that probably has the most impact on whether we recommend this downstream -- and the tinyipa image build tools
17:45:28 <lucasagomes> 3rd party CI can use other ramdisks if they want
17:45:39 <zer0c00l> okay
17:45:57 <devananda> sounds like we can't remove other images from the gate, because we've just recognized that downstream folks will still have need of them
17:45:57 <Nisha> devananda, but we cannot use coreos for uefi as it doesnt support building uefi deploy image as of now.
17:46:22 <Nisha> devananda, anyway noted for enhancement :)
17:46:25 <pas-ha> btw, re using smaller boostrap image and have better extensibility - please check out proposed ansible-deploy driver https://review.openstack.org/#/c/238183/
17:46:45 <devananda> jroll: on another topic, also related to testing, have you seen the libvirt driver proposal?
17:46:49 <jroll> pas-ha: that won't help with the gate though :(
17:46:57 <jroll> devananda: yes, I'm apathetic
17:47:26 <lucasagomes> (off-topic) but QEMU is adding IPMI support https://github.com/qemu/qemu/commit/23076bb34b049f5908fefae19266d3f25f55fd3e
17:47:35 <jroll> devananda: I don't see enough benefit there to spend time on it right now with all these other things lagging behind
17:47:37 <jroll> I mean
17:47:39 * TheJulia blinks
17:47:41 <devananda> jroll: in as much as it may make it easier to test things like the management interface for boot device config, gathering "sensor data", etc
17:47:41 <jroll> RAID has been done forever
17:47:48 <jroll> but we can't use it
17:47:52 <devananda> jroll: I am interested in it -- but also don't feel any priority for it
17:47:55 <jroll> because manual cleaning is still hanging around
17:47:58 <devananda> yah
17:48:14 <jroll> and I really wish folks would focus on important things like that, rather than re-writing a testing driver
17:48:25 <devananda> jroll: preaching to the choir ...
17:48:29 <jroll> :)
17:49:46 <jroll> ok so going back to tinyipa;
17:49:50 <Nisha> jroll, RAID CLI and doc is in progress...sorry couldnt spend much time on upstream
17:49:55 <mgould> jroll, is there a description of the manual cleaning problem somewhere?
17:50:04 <Nisha> should be usable once manual cleaning in in
17:50:11 <jroll> I say we get that code in, and enable a non-voting job to use it
17:50:18 <jroll> and go from there
17:50:20 <jroll> sound good?
17:50:29 <devananda> jroll: ++
17:50:31 <jroll> mgould: it lacks sufficient code review.
17:50:54 <rloo> mgould: it is one of the subteam reports, take a look at the etherpad
17:50:57 <Nisha> jroll ++
17:51:16 <mgould> rloo, thanks
17:51:30 <jroll> sambetts: can you hammer out a project-config patch that makes a non-voting tinyipa job, and I'll do my best to make sure that code gets in soonish?
17:51:53 <lucasagomes> ++ for -nv tinyipa
17:51:54 <sambetts> jroll: Will do :)
17:51:58 <jroll> thanks much
17:52:01 <lucasagomes> or even experimental for now
17:52:04 <jroll> anything else for open discussion?
17:52:12 <jroll> lucasagomes: meh, fast-tracking :)
17:52:21 <lucasagomes> heh just throwing the idea out
17:53:08 <jroll> ok I'll give people another 60 seconds to throw anything else out
17:53:15 <jroll> only 6 minutes left anyway
17:54:06 <jroll> thanks all :)
17:54:09 <jroll> #endmeeting