10:59:59 #startmeeting scientific-sig 11:00:00 Meeting started Wed Apr 10 10:59:59 2019 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:01 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:03 The meeting name has been set to 'scientific_sig' 11:00:13 Greetings 11:00:25 o/ 11:00:26 #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_April_10th_2019 11:00:37 Hey arne_wiebalck, thanks for joining 11:00:45 Hello 11:00:53 Hi priteau 11:01:59 o/ 11:02:01 \o 11:02:10 Hi dtantsur mgoddard - welcome 11:02:22 :) 11:02:50 hi, thanks 11:02:58 OK lets kick off with the agenda 11:03:12 #topic Ironic and external DCIM 11:03:35 Thanks Arne for bringing the discussion this way. 11:04:01 That's a long standing issue here at CERN. 11:04:07 Can you summarise what your needs are? 11:04:41 Sure. 11:05:02 We have compiled a doc with all details: https://docs.google.com/document/d/144g8E_fzGD4WZzMvswkeowzcILL4hxg4QDS-46KYCcQ/edit?usp=sharing 11:05:14 In short, we need to keep track of the servers in our DC. 11:05:31 Life cycle, but also for campaigns. 11:05:46 Think f/w upgrades. 11:06:07 We have a DB for this, but keeping it up to date is diffcult. 11:06:35 ironic provides some means to gather data about hosts, so we were wondering whether this could be leveraged. 11:07:15 is it enough to have a link in ironic to some external system? or do you mean inspection integration? 11:07:17 For now, we were wondering how other sites deal with this issue and what tools they use. 11:07:34 the issue with inspection is that it happens only once 11:07:45 It's interesting because there's a clear precedent for this problem. 11:07:52 ironic would interface with some external system 11:07:59 to store the data 11:08:14 In other sites I'm aware of, there's openDCIM (used by the DC guys) and XCAT for the infrastructure provisioning (non-OpenStack) 11:08:15 main idea here is that others have already thought about a schema, for instance 11:08:31 Now those sites also have Ironic to deal with as a third source of truth. 11:08:53 yeah, different sources of truth is also an issue 11:09:42 I have a list of 8 or so tools that we will have a look at. 11:10:06 is monitoring baremetal node health of interest in this context, or do you do this using a different tool? 11:10:14 The main idea being to use these as potential backends to which ironic (or some additional tool) could send data. 11:10:28 monitoring is different, I'd say 11:10:46 In this model, the nodes and ports would no longer be stored in a local database? Do you think everything could be delegated? 11:10:48 this is more like #cpus, disks, ... f/w version, ... 11:11:21 oneswig: "local" you mean ironic's DB? 11:11:29 eys 11:11:31 yes 11:11:34 no 11:11:50 I think this would be more like the sensor data ironic is gathering and sending to ceilometer 11:11:56 it sounds like you need to post results of introspection to some external database for reconciliation? 11:12:05 and maybe do it periodically out-of-band? 11:12:08 dtantsur: yes 11:12:14 yes to both? 11:12:14 dtantsur: that sums it up 11:12:19 dtantsur: yes 11:12:38 fun fact: we've just discussed downstream a possibility of periodic out-of-band mini-inspection 11:12:40 there is a fine line here between monitoring and inspection 11:12:50 periodically, out-of-band, external system 11:13:07 now, with ironic-inspector it's easy: we have processing hooks that can post data anywhere 11:13:18 dtantsur: yes 11:13:19 we'll probably want the results of out of band 'inspection' to also be pushed to a monitoring system 11:13:46 what we don't have is 1. out-of-band inspection hooks to post data anywhere, 2. a way to run out-of-band inspection periodically 11:13:54 it's just a question of which parts of the data they're interested in, and how frequently 11:14:06 this sounds like two pretty orthogonal RFEs to me, I'm fine with both personally 11:14:07 mgoddard: what kind of monitoring? 11:14:15 dtantsur: concerning 2. what about the sensor data collection, isn't that similar? 11:14:31 arne_wiebalck: it's similar, except that inventory is not quite sensor data 11:14:39 oneswig: the equivalent of IPMI sensor data collection, I assume something similar exists for redfish 11:14:57 the question is whether we want to represent inventory as part of sensor data or separately 11:14:58 power and environmentals via BMC channels? 11:15:05 dtantsur: agreed, I was more thinking about the machinery/framework 11:15:36 dtantsur: the sensor data contains what? 11:15:44 I've heard downstream that people want to collect sensor data much more (and order of magnitude?) often than inventory 11:15:54 arne_wiebalck: thermal data, etc 11:15:58 I can try finding an example 11:15:58 I am also very interested in baremetal health monitoring via out-of-band but agreed that's not necessary something for a DCIM system 11:16:03 the line between these two things might be more of a site policy question than anything? 11:16:30 It does sound separable, if not completely separate. 11:16:36 agreed 11:16:45 yeah, it may be a similar mechanism, but logically separate 11:16:57 Hardware inventory is a piece that could be focused on first 11:16:58 DCIM doesn't need to send notifications, it needs to answer queries 11:17:12 Although mgoddard we certainly know people who want the monitorng 11:17:13 DCIM may be more than what I had in mind initially (it'd add switches, racks, cabls, ...) 11:17:14 monitoring is sort of the opposite in that sense 11:18:03 arne_wiebalck: looking at our example from IPMI I see power, voltage, POST errors, temperature readings, etc 11:18:06 I guess there is also a thin line 11:18:08 this is for sensors ^^ 11:18:22 dtantsur: thanks 11:18:43 dtantsur: that is handled by our monitoring (and hence separate for us :-D) 11:19:07 well, yeah. we need it from ironic :) but for IPMI it's an existing feature for some time. 11:19:58 I'm putting some hope into redfish for OOB inspection 11:20:03 I can help with the ironic side of this feature (well, two related features), we have interest in it 11:20:19 dtantsur: excellent! :) 11:20:25 dtantsur: with inspection data & extra data, is that posted to Swift by Ironic? Could that be made modular? 11:20:39 oneswig: swift or a db 11:20:42 oneswig: it's only for ironic-inspector (in-band inspection) 11:20:51 and it is pluggable since Stein 11:21:02 we also have pluggable hooks that run on inspection 11:21:11 but again, it's all in-band. you cannot run it periodically on a live instance. 11:21:23 (well, unless you write an in-instance agent) 11:21:33 the alternative would be to require a h/w daemon on each instance 11:21:40 dtantsur: :-D 11:22:03 right, this is something that we (as in ironic) are not very fond of doing, to be honest 11:22:16 are you aware of anyone using this (an agent inside the baremetal) in the wild? 11:22:27 we've been trying to avoid touching running instances. doing that will need us to talk to neutron to understand networking, etc. 11:22:31 I think Rackspace had sth like this. 11:22:35 sounds crazy but maybe it's not that crazy 11:22:53 it's not crazy at all, it just doesn't seem to fit in ironic very well 11:23:10 well, that would be an ironic-independent approach 11:23:11 you don't need ironic's help to do that. you can make such daemon just post data wherever you need 11:23:14 What inventory data can be extracted from out-of-band inspection, using generic Redfish objects? 11:23:45 oneswig: essentially everything in theory 11:23:46 having an optional agent gathering utilisation metrics and health information would be pretty cool 11:23:55 the practice depends on a specific vendor, unfortunately 11:23:58 for now running in baremetal means bye bye metrics 11:24:00 redfish is a very loose standard 11:24:21 or - could this be done out of band as well? I guess that'd be vendor specific 11:24:28 oneswig: dtantsur : but in reality probably very different things 11:24:36 of course :( 11:24:37 dtantsur: I think of it like SNMP, in that there is a common protocol but with useful data in vendor extensions. 11:24:57 arne_wiebalck, oneswig, talk to etingof on #openstack-ironic, he's our local Redfish guru :) 11:25:13 oneswig: it's not THAT bad usually 11:25:21 :) 11:25:22 it's just that literally everything in Redfish is optional 11:25:25 literally 11:25:35 dtantsur: thanks :-) 11:25:35 you can have a compliant implementation with zero features 11:25:57 but vendors do catch up, so it's rarely that bad in practice 11:26:09 arne_wiebalck: you have a shortlist already, what's on it? 11:26:26 a short list for things to read out, you mean? 11:26:57 external dcims 11:27:05 ah 11:27:12 * arne_wiebalck getting the list 11:27:22 check e.g. this DMTF mockup of a Redfish server: https://redfish.dmtf.org/redfish/mockups/v1/863 11:27:28 Seems like a reference implementation is readily achievable, which is great 11:27:48 (go into the available System, check e.g. Processors) 11:28:06 #link https://redfish.dmtf.org/redfish/mockups/v1/863 a redfish mockup to play with 11:28:15 i-doit, CMDBuild, SnipeIT, SysAid, Spiceworks, GLPI, NetBox, xcat 11:28:16 cool! 11:28:35 completely unfiltered ^ 11:28:57 oneswig: you mentioned openDCIM earlier. Is that worth adding? 11:29:31 I think so but I don't know how it works and what it can do for supporting integration 11:29:40 we'll start by having a look at these and see which of these make it on a short list by criteria such as licensing, maintained, storage needs, ... 11:29:57 ... APIs ... 11:30:09 well, that's the plan for now 11:30:27 openDCIM is OK. Simple to set up 11:30:43 hearnsj: added to my list, thx 11:30:45 I'm not an expert with commercial DCIM suites though 11:31:07 Hello hearnsj, welcome! 11:31:08 I have worked with openDCIM ping me offline hearnsj@gamil.com 11:31:22 hearnsj: thx! 11:32:17 APIs is probably a big differentiator given the implementors may not assume a machine-to-machine interface 11:32:34 no API - no integration :) 11:32:53 yeah, probably a good thing to check early on 11:33:24 I don't know how much useful lifecycle data XCat stores - that's not really it's mission AFAIK. 11:33:42 Way back there was a project to integrate it with Ironic as a power driver... 11:34:39 waaaay back :) 11:34:43 I think that died a horrible death, didn't it? 11:35:14 I never quite got the concept - was it more of an image-deployment driver? 11:35:32 power wise, not sure what would xcat add to the picture? 11:35:42 I barely remember it 11:36:02 I'm not sure either... 11:37:04 Were there other candidates to suggest for arne_wiebalck ? 11:37:21 if you have suggestions or input, please get in touch :) 11:38:22 Quick response from UCam - "HI Stig, we've experimented with both of those, but still use Xcat mostly for a pseudo DCIM. But we're playing with netbox from DO now as a replacement to make Xcat less central, and also I know Sanger use Infloblox as their DHCP/DNS/IPAM solution." 11:38:55 Infoblox.. that's widely used in shops around town here 11:39:45 some these tools have quite an emphasis on the networking ... 11:40:04 .. not necessarily what we look for for our use case. 11:40:28 OK, we should conclude this topic 11:40:32 Add Infoblox IPAM agent image. The agent will listen for neutron and nova events and automatically manage DNS entries within an Infoblox appliance. The agent will be started when ‘infoblox’ is selected as the neutron_ipam_driver. 11:40:37 any action items from ironic side? 11:40:39 that's right 11:41:04 arne_wiebalck: will you follow up with your research? 11:41:13 definitely :) 11:41:27 dtantsur: I would be very much interested in the out-of-band monitoring angle 11:41:32 dtantsur: sounds like your hard work is already done :-) Thanks for that! 11:41:56 arne_wiebalck: is there anything planned for forum or PTG on this? 11:42:05 no 11:42:05 maybe we should sync off-channel re next ironic action items? or indeed at the forum/PTG? 11:42:14 there was a session two PTGs ago 11:42:17 +1 for PTG 11:42:32 I'll put this on the agenda again next week in the US timezone and try to gather a few more data points. 11:42:36 dtantsur: sounds good! 11:42:50 oneswig: thanks! 11:42:50 arne_wiebalck: wanna drive this process? 11:42:57 dtantsur: yes 11:43:14 great, thanks! 11:43:19 dtantsur: won't be at the ptg, though 11:43:26 what about the forum? 11:43:36 dtantsur: neither 11:43:37 :( 11:43:56 okay, asynchronous ways work just as well 11:44:02 lol, sorry 11:44:25 no need to worry. I actually think it's better. 11:44:38 Great, thanks arne_wiebalck dtantsur and all, lets move on 11:44:46 thx oneswig ! 11:44:55 #topic HPC containers CFP 11:45:15 Martial sends his apologies today but is running an event at ISC in Frankfurt 11:45:24 #link HPC containers workshop https://easychair.org/cfp/hpcw2019 11:46:14 He has a few panelists involved and from them I think he'll be looking at a good range of technologies 11:46:34 I think it was just the announcement for that one. 11:46:59 #topic Help most needed - SIG functional gaps 11:47:34 #link Forum session on SIG-identified functional gaps https://www.openstack.org/summit/denver-2019/summit-schedule/events/23612/help-most-needed-for-sigs-and-wgs 11:47:52 Rico Lin got in touch to ask for participation last week 11:48:01 I see 2 bare metal items, but I think I may have a conflict at this time.. 11:48:09 We started an etherpad 11:48:17 #link SIG functional gaps etherpad https://etherpad.openstack.org/p/scientific-sig-denver2019-gaps 11:48:54 dtantsur: I think there'll be plenty of opportunities to respond. A good one might be the SIG session at the PTG. 11:49:09 oh, it's not critical for me, but rather for mgoddard :D 11:49:13 I mean, the conflict 11:49:41 mgoddard is packing a suitcase full of hats to wear for his many roles in Denver... 11:49:52 I didn't even check, tuesday is full :) 11:50:41 mgoddard: yeah, it's the slot of your talk 11:50:47 Thursday morning I hope will be a better chance (haven't seen the SIG slots for the PTG yet though) 11:51:45 I'm hoping we can add some federation experiences to this etherpad in due course 11:52:38 I am inclined to add the baremetal metrics in there 11:52:55 is this of interest to you as well? 11:53:04 janders: good idea. 11:53:24 For example SKA has concerns on heat management. 11:53:38 Given their data centre sites in South Africa and Western Australia... 11:54:20 janders: Is SuperCloud in Canberra or WA? 11:54:38 Canberra.. for now ;) 11:55:02 Regarding heat/power - people also looking at power budget for compute runs, not just time 11:55:40 immersive cooling could help there - and there's some really interesting work on that in WA 11:55:52 hearnsj: did you ever evaluate DK Panda's energy-efficient MPI? 11:56:26 a small part of a solution for a big problem 11:57:00 * dtantsur has to drop for lunch, sorry. was great to talk to you! 11:57:02 janders: think you've got some interest in seeing that raised 11:57:06 thanks dtantsur! 11:57:28 #topic Denver Summit 11:57:40 #link SIG main session https://www.openstack.org/summit/denver-2019/summit-schedule/events/23741/scientific-sig-bof-and-lightning-talks 11:57:50 It's Monday afternoon this time 11:58:29 * arne_wiebalck has to run to another meeting o/ 11:58:31 We'll use the same format, which means half the session will be lightning talks on SIG topics 11:58:34 thanks arne_wiebalck! 11:58:55 oneswig Never did I'm afraid 11:59:07 #link lightning talks etherpad https://etherpad.openstack.org/p/scientific-sig-denver19-bof 11:59:16 Regarding immersive cooling - my company has a boat load of it. Pun intended. 11:59:27 I see janders is already there to defend his win last time :-) 11:59:37 It public knowledge - our Houston datacentre has immersive cooling. 11:59:59 hearnsj: interesting 12:00:00 Is there any relevance to Openstack though - we dont run Openstack there. 12:00:10 hearnsj: maybe you can give a lightning talk about it? :) 12:00:24 We are out of time 12:00:32 thanks guys! 12:00:34 Final mentions? 12:00:54 OK, have to close it there. Thank you all 12:00:57 #endmeeting