15:00:03 #startmeeting ironic 15:00:04 Meeting started Mon Jan 28 15:00:03 2019 UTC and is due to finish in 60 minutes. The chair is dtantsur. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:05 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:07 The meeting name has been set to 'ironic' 15:00:18 Hi all! Who is here for the most ironic meeting in the world? :) 15:00:24 o/ 15:00:26 o/ 15:00:27 \o 15:00:29 o/ 15:00:30 o/ 15:00:30 o/ 15:00:34 o/ 15:00:36 \o 15:00:48 \o 15:00:54 o/ 15:01:05 Welcome everyone! Our agenda is as usual here: 15:01:08 #link https://wiki.openstack.org/wiki/Meetings/Ironic 15:01:37 #topic Announcements / Reminder 15:01:50 #info TheJulia is traveling for on-site meetings this week 15:01:58 this was from last week's agenda, but I guess it still holds 15:02:06 o/ 15:02:09 o/ 15:02:13 #info Successful midcycle last week: https://etherpad.openstack.org/p/ironic-stein-midcycle 15:02:26 please review the notes and do not forget any actions items you took :) 15:02:45 anything else to announce or remind of? 15:03:26 #topic Review action items from previous meeting 15:03:33 #link http://eavesdrop.openstack.org/meetings/ironic/2019/ironic.2019-01-14-15.00.html 15:03:42 No action items here, so moving on 15:03:51 #topic Review subteam status reports (capped at ten minutes) 15:04:01 o/ 15:04:07 #link https://etherpad.openstack.org/p/IronicWhiteBoard around line 233 15:05:55 hjensas: how is neutron event processing progressing? 15:06:53 dtantsur: progress, but slow. I started looking at the event processor last week. I will continue this week. API patch also needs some more work. 15:07:10 does it make sense to put it on the priority list this week? 15:08:12 mgoddard: was hesitant about merging the API version without also actually doing something with the events. I.e avoid introducing the changed behaviour without api version change later. 15:08:21 dtantsur: ^ 15:09:06 well, "do nothing" will be supported behavior even afterwards, with the noop network interface 15:09:07 there are things we could do about that, such as not bumping the API version yet, or adding a second API version bump when we support an event 15:09:28 I think we don't change API versions when drivers/interfaces start/stop supporting something 15:11:05 I guess I'm not a hard -1 based on that, just seems a little odd to change behaviour without an API bump 15:11:54 I suppose its unavoidable sometimes 15:13:07 o/ 15:14:58 yeah, we do it quite often 15:15:09 anyway, let's bring it back to the patch 15:15:14 anything on the statuses? 15:15:48 zuulv3 status is in https://etherpad.openstack.org/p/zuulv3_ironicprojects_legacyjobs 15:15:55 yeah, I added the link 15:15:57 almost finished =) 15:16:08 #link https://etherpad.openstack.org/p/zuulv3_ironicprojects_legacyjobs zuulv3 migration status 15:16:31 okay, moving on? 15:17:02 #topic Deciding on priorities for the coming week 15:17:13 let me remove the finished things 15:17:52 hjensas, mgoddard, should we add the neutron events work to the priorities? 15:18:54 hjensas: it needs an update right now, right? 15:19:28 The API change needs an updated, to improve the data validation stuff. 15:19:40 hjensas: will you have time to keep updating is this week? 15:19:52 I will work on it this week. 15:19:55 awesome 15:20:43 how's the list looking to everyone? 15:21:42 no objections. :) 15:22:01 looks good. I'll aim to get deploy templates to a place where it could be on that list next week 15:22:30 or at least some of it 15:22:43 that would be really good 15:22:58 okay, moving to the discussion? 15:23:39 #topic Bikolla 15:23:44 mgoddard: the mic is yours 15:23:55 thanks 15:24:15 I've been working on a little project unimaginitively called bikolla 15:24:32 it uses kolla-ansible to deploy a standalone ironic cluster 15:24:50 \o/ 15:24:51 and parts of bifrost to build an image & deploy nodes 15:25:08 it's really just a proof of concept 15:25:49 the idea being that we get good support for standalone ironic in kolla-ansible, and potentially take pressure off of the ironic team with bifrost 15:26:12 yeah, I think the installation bits in bifrost kind of duplicate $many_other_installers 15:26:30 at the moment I have it working in a CentOS VM, using Tenks to create virtual bare metal 15:26:30 and switching to kolla sounds natural to me 15:27:02 so really this is an invitation to anyone who's interested in this, or Tenks, to give it a try 15:27:03 https://github.com/markgoddard/bikolla 15:27:40 #link https://github.com/markgoddard/bikolla prototype of kolla-ansible + bifrost 15:27:47 I think that's all I have to say for now, any questions/comments? 15:27:49 thanks mgoddard, this is curious 15:28:11 a dumb question, does this involve to container? 15:28:20 mgoddard, that looks very interesting 15:28:33 kaifeng: not a dumb question! It uses the kolla containers, deployed via kolla-ansible 15:29:09 if you check the README, there is a dump of 'docker ps' 15:29:22 11 containers :) 15:29:31 woa 15:29:32 oh yeah, i feel kolla is doing containerized deployment, but never take a look on it :) 15:29:40 mgoddard: I may have something to remove one of containers as the next topic ;) 15:30:03 dtantsur: kill it! 15:30:06 * dtantsur also wonders what iscsid is doing there 15:30:13 poor little rabbit 15:30:15 hehe 15:30:20 iscsid is for iscsi deploys 15:30:26 dtantsur, why hating rabbits so much? :D 15:30:28 yeah, but why a separate container? 15:30:32 why not? 15:30:34 I'm pretty sure we don't have it in tripleo 15:30:42 possibly not 15:30:49 you could run it on the host 15:30:52 well, that's an argument :) but ironic does not start the server on the conductor side, the server is on IPA 15:31:14 isn't the server tgtd? 15:31:23 client uses iscsid? 15:31:46 mgoddard: maybe? still a bit weird to have it as a separate container. I would assume it's for Cinder. 15:32:13 https://docs.openstack.org/kolla-ansible/4.0.0/cinder-guide.html#cinder-lvm2-back-end-with-iscsi 15:32:13 kolla puts everything in a container 15:32:26 it can also be used for cinder 15:32:32 yeah, but I doubt ironic needs iscsid 15:33:01 turns out I'm using direct deploy interface by default (like bifrost), so won't use it anyway 15:33:02 maybe I don't know something about it 15:33:08 heh 15:33:39 anyways, thanks for listening, happy to help anyone wanting to use it 15:34:04 mgoddard++ 15:34:13 #topic RFE review 15:34:47 #link https://storyboard.openstack.org/#!/story/2004874 Support JSON-RPC as an alternative for oslo.messaging 15:35:03 #link https://review.openstack.org/633052 PoC patch 15:35:04 patch 633052 - ironic - [PROTOTYPE] Use JSON-RPC instead of oslo.messaging - 8 patch sets 15:35:23 it actually passed all devstack jobs at one point (I changed it to remove the Flask dependency after that) 15:35:43 I think it's pretty cool for standalone usage like in bikolla/bifrost 15:36:05 I don't suggest we approve the RFE right now, but your comments are welcome :) 15:36:42 do you think it's suitable for non-standalone? 15:37:04 mgoddard: I don't see why not 15:37:17 but non-standalone case will have rabbitmq anyway (for nova and other services) 15:37:30 unless we can persuade them :) 15:37:53 I was told some of the projects actually use messaging queue features of oslo.msg 15:37:56 avoiding a middle-man seems like a good thing 15:38:09 any downsides? 15:38:21 resilience to conductor restarts? 15:38:39 yeah, a request will get aborted if a conductor fails mid-way 15:39:01 but since oslo.msg only implements "at most one" semantics, I think it can happen with it as well 15:39:02 lots of connections required if I run a million conductors? 15:39:04 hmm, actually that applies to rabbitmq too 15:39:27 mgoddard: if you have a million conductors, each of them will talk to rabbit 15:39:44 rabbitmq got retry ability, how about json-rpc? 15:40:00 kaifeng: it's just HTTP, you can use retries, https, etc 15:40:25 I don't even use a special client in my PoC patch, just plain 'requests' lib 15:40:38 dtantsur: true, although it puts the high fanout in one place (for better or worse) 15:40:51 seems like an interesting PoC 15:41:24 I guess I'll have to provide some kind of authentication for it before we can really land it 15:41:30 and HTTPs support 15:41:32 +1 15:41:59 but early reviews and suggestions are welcome 15:42:19 on the large conductor count question, it might affect connection reuse 15:42:33 would need to be tested 15:42:42 how many conductors do people practically have? 15:43:00 ask oath :) 15:43:05 I don't think a million is anywhere near a realistic estimate :) 15:43:11 with 1700 nodes we have 3 conductors 15:43:17 right 15:43:29 I'd bet a few dozens is enough for every practical case 15:43:32 yeah, not really expecting a millon 15:43:39 should expect so 15:44:37 #topic Open discussion 15:44:45 the floor is open 15:44:52 I have a small issue 15:45:03 I’d like some input on https://review.openstack.org/#/c/632774 15:45:05 patch 632774 - ironic - Preserve BIOS boot order upon deployment - 4 patch sets 15:45:27 this is a patch to always preserve the bios boot order 15:45:40 to make it configurable to be precise 15:46:13 while our use case is for IPMI, there were comments whether this should be applied to other h/w types as well 15:46:29 arne_wiebalck_: I'd call the new option "allow_persistent_boot_device" or something like that. and maybe have it in driver_info per node in addition to the config. 15:46:49 dtantsur: I think I did now 15:47:20 it looks like you only use the config option: https://review.openstack.org/#/c/632774/4/ironic/drivers/modules/pxe.py 15:47:21 patch 632774 - ironic - Preserve BIOS boot order upon deployment - 4 patch sets 15:47:43 afaik, it's not just persistent, it's the device as the admin manually set on the node 15:48:04 dtantsur: yes, sorry, I mis-read 15:48:16 hmm, yeah, I guess the current name makes sense as well 15:48:27 also I don't think it belongs in [agent] section, since it's not IPA-specific 15:48:47 dtantsur: right kaifeng pointed this out as well 15:48:50 and I wonder if we should handle it on some top level, so that we don't have to put it in every boot interface 15:49:33 dtantsur: so you think it’s should be available across all hardware types? 15:49:56 yeah, I think this behavior should not change if you switch the driver 15:49:57 it shouldn’t harm, just wasn’t sure if that will be useful to anyone but us 15:50:09 dtantsur: that’s alos a point, yes 15:50:13 I think I see similar requests from customers from time to time 15:50:15 s/alos/also/ 15:50:38 ok, that would mean updating all h/w types 15:51:41 this is why I wonder if we can avoid doing it by putting this logic somewhere 15:52:01 ah, right 15:52:57 I can have a look if that is possible 15:53:13 otherwise, the change (as done for ipmi) is pretty simple 15:53:22 and easy to understand 15:53:45 yeah 15:53:50 cool, thx! 15:53:50 thanks arne_wiebalck_ 15:53:59 anyone has anything else? 15:54:43 I wonder if anyone awares something about inband instance monitoring? 15:55:09 we generally try to avoid touching anything in running instances 15:56:12 this is/was also discussed in the context of a cmdb-like functionality 15:56:25 if it is not possible to get data OOB 15:56:41 well, it's originated from need of customers, just want know if there is any mature design 15:57:48 there is need to collecting stats from bm instances, but it appears to me that the only way is to have a public ip and establish a monitoring server there. 15:57:54 kaifeng: we typically use monasca 15:58:09 allows for collecting control plane and user logs and metrics 15:58:28 users need to run agents on their instances 15:58:45 the nice thing is it's multi-tenant aware 15:58:59 it's quite complex though 15:59:11 this is along the lines of the cmdb discussion, there was sth from rackspace at some point I think 15:59:14 it works for tenant network too? 15:59:26 kaifeng: http://www.stackhpc.com/monasca-comes-to-kolla.html 16:00:16 thanks mgoddard, logged will take a look 16:00:25 kaifeng: you need to make the monasca API available to tenants 16:01:09 kaifeng: ironic can collect stats via IPMI and send them as notifications via rabbitmq 16:01:15 oh, I have no idea of monasca 16:01:32 kaifeng: (that part is separate from monasca) 16:01:32 so it's oob 16:01:51 my experience with monasca i only say one word pain XD 16:01:57 monasca is usually in-band, via an agent. the ironic monitoring is OOB 16:02:10 not sure if is better now 16:02:15 iurygregory_wfh: yeah, it can be difficult 16:02:35 we put a lot of work into deploying it via kolla-ansible, so hopefully a bit easier to deploy now 16:02:38 main problem was memory XD 16:02:39 thanks anyway, I think I need to take a look at the monasca first :) 16:02:44 okay, let's wrap it up 16:02:48 #endmeeting