17:00:02 #startmeeting ironic 17:00:03 Meeting started Mon Jan 18 17:00:02 2016 UTC and is due to finish in 60 minutes. The chair is jroll. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:04 ohai. 17:00:07 The meeting name has been set to 'ironic' 17:00:09 o/ 17:00:12 o/ 17:00:12 o/ 17:00:13 o/ 17:00:14 as always, the agenda is here: 17:00:14 o/ 17:00:16 o/ 17:00:17 #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_next_meeting 17:00:17 o/ 17:00:31 US holiday for some folks today, so attendance may be light 17:00:34 * jroll jumps right in 17:00:40 #topic announcements and reminders 17:00:43 o/ 17:00:47 o/ 17:01:16 o/ 17:01:17 o/ 17:01:31 #info it appears that february 16-18 is the best date for our virtual midcycle, so I'm going to roll with that 17:01:33 \o 17:01:43 nice, wfm 17:01:44 more info to come, I'll email on that as well 17:01:57 thx jroll 17:02:05 additionally, mitaka 2 milestone is this week for the rest of openstack 17:02:10 as such, I'd like to cut a release very soon 17:02:24 hoping we can land the manual cleaning work before doing so 17:02:31 so please jump in and help with that :) 17:02:35 o/ 17:02:44 also needing reviews on the neutron stuff 17:02:55 last but of course not least, our gate is currently down due to a devstack bug 17:03:09 dtantsur has a patch up to devstack to fix 17:03:32 and in general, our gate is looking pretty sorry, I'd really like folks to help out with working on that where they can 17:03:39 :-( 17:04:05 * devananda has the sad 17:04:05 sambetts, any news on tiny IPA ? Last time it that PoC patch passed right? 17:04:05 any other announcements? 17:04:16 devstack patch: https://review.openstack.org/#/c/268960/, but are there any qa cores working today? 17:04:28 unclear at this time 17:04:57 lucasagomes: Yup tinyipa has passed in the gate, I've done some optimisations too, but can't get the RAM requirement below 384MB 17:05:23 sambetts, nice! We can probably talk about tinyipa later on the meeting 17:05:32 Ok :D 17:05:34 yeah, let's discuss that in open discussion 17:05:35 but even if its 384MB but startup time is very quick 17:05:38 we should def use it 17:06:07 * jroll moves on 17:06:09 #topic subteam status reports 17:06:13 as always, these are here: 17:06:15 #link https://etherpad.openstack.org/p/IronicWhiteBoard 17:06:21 I'll give folks a few to review 17:07:13 lucasagomes, lintan, wrt live upgrades, is there a list of what needs to be done to get live upgrade working? 17:07:38 rloo: the biggest thing for that is getting it testing in the gate 17:07:46 rloo, I gotta look, but i though it just needed some documentation 17:07:52 jroll: it needs more code. 17:07:52 yeah and gate testing 17:07:56 jlvillal has started on making grenade work, and then we can do a grenade partial 17:08:00 rloo: does it? 17:08:12 jroll: i know of at least one, let me find it. the one that clamps down the rpc version. 17:08:22 mmm 17:08:24 ok 17:08:40 honestly I won't believe it works until the gate says so 17:08:47 https://review.openstack.org/#/c/253355/ 17:09:09 i was wondering if there were more patches needed besides that one. 17:09:10 lintan is not around the meeting (due TZ probably) but I will catch up with him 17:09:19 rloo: right, I'm not sure that's required but rather a nice to have 17:09:20 wow, osprofiler cross project spec landed after about 18 months of work. /me reads 17:09:29 I could be wrong 17:09:43 * lucasagomes adds to his review list 17:09:57 jroll: you need that, otherwise you would be upgrading to a new version when you have old version that doesn't support new version/objects. 17:10:34 jroll: anyway, lets get more info from lintan. i wanted an idea of what was left to do there :) 17:11:17 rloo: yeah, I think you're right, agree 17:11:45 jroll, devananda: wrt node filtering API etc, that is low priority so should i remove it from subteam report until N* cycle? 17:12:12 oh, i gues that pertains to multiple compute hosts too. 17:12:32 rloo: we still want to work on it this cycle, just not until the other priorities are more done 17:12:43 rloo: I think lucas has continued to work on that. I've had to put it on my back burner as I'm focusing onthe neutron integration 17:13:10 jroll, devananda: OK 17:13:17 yeah, I can drop it and work on other stuff. But i would like to move it at least a little this cycle 17:13:31 otherwise I think it will be hard to complete on the next cycle alone 17:13:36 lucasagomes: i think we should try to get the properties/capabilities into a separate table. 17:13:53 yeah, that's a small step forward 17:14:00 lucasagomes: I agree. do you think some of the internal refactoring can be done w/o any api changes this cycle? 17:14:38 devananda, yes we can architect the database without exposing it in the API if needed 17:15:03 the API syntax itself for filtering will be a complex one, I would suggest we have a spec only for it 17:15:12 devananda: why don't you want any api change related to that? we have microversions to the rescue! :) 17:15:12 +1 17:15:38 * dtantsur is looking forward to a new (micro)versioning flame war ^_^ 17:15:54 * lucasagomes no war! 17:16:01 indeed dtantsur 17:16:09 rloo: because if we can't do internal refactoring *without* API changes, it's harder to make that backwards compatible (iow, downgrade the API version) 17:16:38 rloo: I didn't mean to imply that we should not change the API 17:16:50 but that the db schema changes and API changes should be done separately 17:17:01 which was a point lucasagomes and I were discussing (I think it was) last week 17:17:05 lucasagomes, we're still going for flattened-JSON-in-strings because we have to support SQLite, right? 17:17:17 devananda: ok. we can discuss in the spec itself. 17:17:43 devananda: but if we have a new node.capabilities -- that means a microversion bump right there. 17:17:56 mgould, does mysql have native JSON support? I'm only aware of PostgreSQL's 17:18:08 rloo: indeed it does 17:18:17 dtantsur: not until 5.7, which isn't packaged widely yet 17:18:19 mgould, it's in the air, but I think we tend to use strings instead of native json 17:18:27 due the version of the databases to support it 17:18:28 rloo, it's not even a node.capabilities, it's a separate table IIUC 17:18:53 or am I wrong? 17:18:56 the capabilities move *can* still be done without api changes 17:19:04 lucasagomes :-( 17:19:07 mgould, the main point now is about having those strings indexed 17:19:12 yep 17:19:14 dtantsur: yeah, a separate table, but represented via node.capabilities. i suppose we could just have the separate table but use it to populate node.properties['capabilities'] instead. 17:19:21 but let's not architect these things right now 17:19:27 mgould, which we may want to do by limiting the size of the key/values paris 17:19:28 my point was, I think that ^ could be done this cycle, since it sounds like the API changes need more time to bake 17:19:28 to the spec! 17:19:32 jroll: ++ 17:19:38 jroll: ++ (sorry for sidetracking it) 17:19:39 but for that we need to move capabilities out of properties (and root device hints) 17:19:42 any other subteam things here? 17:20:24 k, moving on 17:20:39 #topic Node's tags optimization should we do it or not ? 17:20:43 #link https://review.openstack.org/#/c/253748/ 17:20:46 lucasagomes: this is you 17:21:04 hi, so there's this update to the nodes tag spec 17:21:26 it's introducing some optimization so we can reuse the tags name across multiple nodes (or even resources AFAIUI) 17:21:51 but I think this brings some extra complexity when updating/deleting those tags 17:22:16 as I have a limited database knowledge I added it to the agenda so more people can take a look at it 17:22:16 right, so I see the delete complexity 17:22:26 however we could just not delete the unused tags 17:22:33 that's a way to do it too 17:22:35 if too many show up, ops can remove them 17:22:56 jroll: ++ 17:23:03 jroll: do we have an API for ops to remove them? 17:23:11 I dont think this needs an API 17:23:19 then again, I'm not terribly convinced that we need this optimization at all 17:23:23 the spec mentions deletion, so it would need to be updated. As-is (incl. deletion) I think that we should keep the simplicity 17:23:30 I agree this doesn't need an API, it's DB pruning 17:23:32 how does ops remove them then? 17:23:53 rloo: mysql "delete from ... left join ... where col is null" 17:23:54 rloo, ops can delete them from the nodes (node_tags table) 17:23:54 delete from tags where (something crazy to find unused tags); 17:24:03 but they tag name would live in the "tags" table still 17:24:31 ahhhh.... i don't think we want to encourage ops to issue db calls directly. is that what you're thinking? 17:24:51 in normal operation, we don't 17:24:56 jroll, lucasagomes: so this optimization was my idea. If it's going to stall the work, we can punt / just not do it. the gains in performance will be measurable, but they won't be huge, and on a beefy database server, it'll be negligible 17:24:58 we also need an "upsert" logic to *add* tags, right? 17:24:59 * rloo doesn't recall the details of tags 17:25:01 but db pruning... I don't see an issue 17:25:29 devananda: yeah, I'm not sure this is necessary - if someone raises it as an actual issue they're running into, I'm inclined to refactor 17:25:31 ops do db maintenance periodically for many projects. especially Nova 17:25:35 did the existing schema land yet? 17:25:38 devananda, I don't think it's going to stall the work. If we agree on not deleting that from tags the problem is solved 17:25:42 (I believe it did) 17:26:01 yeah, not deleting also solves problem with reliable insert 17:26:07 jroll: i could be wrong, but i seem to recall some mention downstream about not wanting ops to do db calls directly, if pruning is needed, then it seems like ironic should provide some way to do that pruning. 17:26:27 rloo: we could trivially add a cmdline option to dbsync 17:26:27 * mgould would be *astonished* if the tags table grew large enough to be a problem 17:26:31 rloo: we could provide a script or something, sure 17:26:35 but I agree with mgould 17:26:37 the node_tags table, sure 17:26:42 devananda, jroll: Ok, a script is ok too. 17:27:14 it is, at most, about 2kb per row 17:27:16 including indexes 17:27:58 ok, so I'd vote to punt on this for now 17:28:05 we can recommend people not to create millions of tags :D 17:28:07 devananda, and maybe 1e4 rows? 17:28:08 does anyone vehemently disagree? 17:28:14 jroll, works for me 17:28:19 jroll, wfm 17:28:28 so if someone had 10s of thousands of nodes, we're good? 17:28:39 punt = do not care about deletion? +1 17:28:56 if they have thousands of tags across 10k nodes, it may become 10ms instead of 1ms 17:29:00 dtantsur: punt on the whole thing 17:29:08 also +1 :) 17:29:18 dtantsur, I think is punt on leaving as-is now (duplicating tag names across nodes) 17:29:18 (also those are totally random numbers, I could be way off) 17:29:30 what's the current state? A string field on the nodes table containing a list of tags? 17:29:48 jroll: my back of the envelope says at scale, the current table could become ~ 1GB 17:29:58 mgould: many to many table tags:nodes 17:30:06 jroll, gotcha 17:30:10 tag_string:node_id 17:30:12 that is 17:30:15 which is still perfefctly reasonable for any modern server to handle 17:30:17 I think 17:30:42 mgould: yeah, sorry, current table is tag_str:node_id, one row per tag/node combo 17:31:01 devananda: can you ask to punt this on the spec then? :) 17:31:09 punt ++ 17:31:15 punt ++ 17:31:32 17:31:45 doh 17:32:31 alright, shall we move on then? 17:32:42 jroll, ++ 17:32:47 #agreed punt on tags optimizations as they probably aren't needed 17:32:54 #topic open discussion 17:32:59 so I wanted to talk about tinyipa 17:33:21 I'm all about it, in that it improves gate times and in theory, success rate 17:33:41 however, would we recommend that people use it in production? and if not, are we comfortable testing with that? 17:34:04 and if we do switch our tests to use tinyipa, do we continue building/publishing the coreos ramdisk? do we gate on that? 17:34:32 we should test *something* that we think can be used in production 17:34:39 I would say we should still incentive people to use another ramdisk in production 17:34:43 I may need to be caught up -- why wouldn't someone use tinyipa in production? 17:35:03 devananda: I'm not sure, that's why I ask 17:35:07 ok :) 17:35:14 since we boot VMs the kernel/base OS we use is not that important in gate, with tinyipa we are testing the ironic-python-agent code 17:35:26 so sounds like a good thing to do (due the limited resources we have) 17:35:33 that said, I'll always use the coreos one because debuggability and customization stories are far better 17:35:41 ++ 17:36:10 oh did sambetts miss all of that? :( 17:36:21 :( my session just blew up 17:36:24 sambetts: https://gist.github.com/jimrollenhagen/43f3f41a2b6dd45f1cd3 17:36:55 Where do i find more info on tinyipa 17:36:55 ? 17:37:10 https://review.openstack.org/#/c/234902/ 17:37:14 zer0c00l: I'd show you but gerrit seems to be down :) 17:37:15 zer0c00l: ^ 17:37:17 oh there you go 17:37:18 jroll: debugging is easier because the coreos ipa ramdisk has additional tools in it (like sshd)? or is there another reason? 17:37:36 devananda: yes, that's the primary reason, easier to bake in ssh keys as well 17:37:49 devananda: and easy to customize to add additional tools, like say, tcpdump 17:38:00 especially useful if your agents don't have internet access :) 17:38:06 jroll: I would say the same for the DIB-built IPA image 17:38:31 devananda: I disagree, adding a hardware manager or a set of ssh keys would require writing a new DIB element 17:38:32 devananda, ++ for the key injection thing, there's a element called dynamic-login in DIB 17:38:38 jroll: making it easy to add additional tools is just down to making the build script better right? 17:38:40 jroll, I've changed that 17:38:46 lucasagomes: can it add an arbitrary list of keys? 17:38:51 #link https://github.com/openstack/diskimage-builder/tree/master/elements/dynamic-login 17:38:54 jroll, tinyipa is for bios and uefi? 17:38:58 sambetts: yeah maybe 17:39:02 jroll, well not that advanced, but you can configure key or password at boot time 17:39:06 Nisha: I'm not sure 17:39:11 lucasagomes: yeah, that doesn't help in production 17:39:11 via kernel cmdline if the image is built with that element (dynamic-login) 17:39:11 jroll: so I think your question is: (a) should we recommend using tinyipa downstream? (b) should we use it in the gate at all? (c) should we use it in the gate exclusively? 17:39:30 I think (b) is obviously yes, since making boot times faster in the gate => more stability 17:39:32 devananda: yes, ty 17:40:07 I also think (a) determines (c) 17:40:27 if we don't recommend it downstream (say, cause we still recommend the coreos ipa image) then we need to also test that in the gate 17:40:30 (b) is a yes for me as well... specially for things like granade, requiring 384MB vs 1GB is a good improvement 17:40:40 jroll, AIUT it is building the deploy ramdisk using coreos 17:40:50 Nisha: it is not 17:41:04 devananda: I agree - but then the question for (c) is do we keep the jobs running on other ramdisks voting? 17:41:44 Nisha: and right now we're only discussing the gate so UEFI is irrelevant IMO 17:41:58 jroll, ok 17:41:59 i was looking at coreos ramdisk boot times other day, looks like it can be improved as well 17:42:02 http://fpaste.org/312052/53138904/ 17:42:12 jroll: OT: do you want to start testing UEFI in the gate? :) 17:42:22 look at ldconfig and systemd-hwdb-update they are not really needed 17:42:32 takes almost 20 sec 17:42:32 devananda: of course :) 17:43:10 devananda, ++ 17:43:15 jroll, I think we can test other ramdisks in gate if we limit the scope of it (just test deployment, no clean, no rebuild etc) 17:43:24 because tinyipa already test the ipa code 17:43:39 lucasagomes: yeah, not a bad idea 17:43:45 other ramdisks should be tested to see if it boots correctly (base OS + service files) 17:43:51 if things are getting started correctly and so on 17:44:03 Nisha: it might be worthwhile to add UEFI VM support to devstack and the ssh (or libvirt) driver, so we can exercise those interfaces in the upstream gate 17:44:20 lucasagomes: ++ 17:44:41 devananda, noted :) 17:44:51 another important aspect of a ramdisk and kernel is their hardware enablement, does tinycorelinux gets updated often? 17:45:05 zer0c00l: good point 17:45:11 zer0c00l, less often than other I believe... but since in gate we test on VMs 17:45:19 tinyapi seems ok 17:45:23 that probably has the most impact on whether we recommend this downstream -- and the tinyipa image build tools 17:45:28 3rd party CI can use other ramdisks if they want 17:45:39 okay 17:45:57 sounds like we can't remove other images from the gate, because we've just recognized that downstream folks will still have need of them 17:45:57 devananda, but we cannot use coreos for uefi as it doesnt support building uefi deploy image as of now. 17:46:22 devananda, anyway noted for enhancement :) 17:46:25 btw, re using smaller boostrap image and have better extensibility - please check out proposed ansible-deploy driver https://review.openstack.org/#/c/238183/ 17:46:45 jroll: on another topic, also related to testing, have you seen the libvirt driver proposal? 17:46:49 pas-ha: that won't help with the gate though :( 17:46:57 devananda: yes, I'm apathetic 17:47:26 (off-topic) but QEMU is adding IPMI support https://github.com/qemu/qemu/commit/23076bb34b049f5908fefae19266d3f25f55fd3e 17:47:35 devananda: I don't see enough benefit there to spend time on it right now with all these other things lagging behind 17:47:37 I mean 17:47:39 * TheJulia blinks 17:47:41 jroll: in as much as it may make it easier to test things like the management interface for boot device config, gathering "sensor data", etc 17:47:41 RAID has been done forever 17:47:48 but we can't use it 17:47:52 jroll: I am interested in it -- but also don't feel any priority for it 17:47:55 because manual cleaning is still hanging around 17:47:58 yah 17:48:14 and I really wish folks would focus on important things like that, rather than re-writing a testing driver 17:48:25 jroll: preaching to the choir ... 17:48:29 :) 17:49:46 ok so going back to tinyipa; 17:49:50 jroll, RAID CLI and doc is in progress...sorry couldnt spend much time on upstream 17:49:55 jroll, is there a description of the manual cleaning problem somewhere? 17:50:04 should be usable once manual cleaning in in 17:50:11 I say we get that code in, and enable a non-voting job to use it 17:50:18 and go from there 17:50:20 sound good? 17:50:29 jroll: ++ 17:50:31 mgould: it lacks sufficient code review. 17:50:54 mgould: it is one of the subteam reports, take a look at the etherpad 17:50:57 jroll ++ 17:51:16 rloo, thanks 17:51:30 sambetts: can you hammer out a project-config patch that makes a non-voting tinyipa job, and I'll do my best to make sure that code gets in soonish? 17:51:53 ++ for -nv tinyipa 17:51:54 jroll: Will do :) 17:51:58 thanks much 17:52:01 or even experimental for now 17:52:04 anything else for open discussion? 17:52:12 lucasagomes: meh, fast-tracking :) 17:52:21 heh just throwing the idea out 17:53:08 ok I'll give people another 60 seconds to throw anything else out 17:53:15 only 6 minutes left anyway 17:54:06 thanks all :) 17:54:09 #endmeeting