10:59:59 <oneswig> #startmeeting scientific-sig
11:00:00 <openstack> Meeting started Wed Apr 10 10:59:59 2019 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
11:00:01 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
11:00:03 <openstack> The meeting name has been set to 'scientific_sig'
11:00:13 <oneswig> Greetings
11:00:25 <arne_wiebalck> o/
11:00:26 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_April_10th_2019
11:00:37 <oneswig> Hey arne_wiebalck, thanks for joining
11:00:45 <priteau> Hello
11:00:53 <oneswig> Hi priteau
11:01:59 <dtantsur> o/
11:02:01 <mgoddard> \o
11:02:10 <oneswig> Hi dtantsur mgoddard - welcome
11:02:22 <dtantsur> :)
11:02:50 <mgoddard> hi, thanks
11:02:58 <oneswig> OK lets kick off with the agenda
11:03:12 <oneswig> #topic Ironic and external DCIM
11:03:35 <oneswig> Thanks Arne for bringing the discussion this way.
11:04:01 <arne_wiebalck> That's a long standing issue here at CERN.
11:04:07 <oneswig> Can you summarise what your needs are?
11:04:41 <arne_wiebalck> Sure.
11:05:02 <arne_wiebalck> We have compiled a doc with all details: https://docs.google.com/document/d/144g8E_fzGD4WZzMvswkeowzcILL4hxg4QDS-46KYCcQ/edit?usp=sharing
11:05:14 <arne_wiebalck> In short, we need to keep track of the servers in our DC.
11:05:31 <arne_wiebalck> Life cycle, but also for campaigns.
11:05:46 <arne_wiebalck> Think f/w upgrades.
11:06:07 <arne_wiebalck> We have a DB for this, but keeping it up to date is diffcult.
11:06:35 <arne_wiebalck> ironic provides some means to gather data about hosts, so we were wondering whether this could be leveraged.
11:07:15 <dtantsur> is it enough to have a link in ironic to some external system? or do you mean inspection integration?
11:07:17 <arne_wiebalck> For now, we were wondering how other sites deal with this issue and what tools they use.
11:07:34 <arne_wiebalck> the issue with inspection is that it happens only once
11:07:45 <oneswig> It's interesting because there's a clear precedent for this problem.
11:07:52 <arne_wiebalck> ironic would interface with some external system
11:07:59 <arne_wiebalck> to store the data
11:08:14 <oneswig> In other sites I'm aware of, there's openDCIM (used by the DC guys) and XCAT for the infrastructure provisioning (non-OpenStack)
11:08:15 <arne_wiebalck> main idea here is that others have already thought about a schema, for instance
11:08:31 <oneswig> Now those sites also have Ironic to deal with as a third source of truth.
11:08:53 <arne_wiebalck> yeah, different sources of truth is also an issue
11:09:42 <arne_wiebalck> I have a list of 8 or so tools that we will have a look at.
11:10:06 <janders> is monitoring baremetal node health of interest in this context, or do you do this using a different tool?
11:10:14 <arne_wiebalck> The main idea being to use these as potential backends to which ironic (or some additional tool) could send data.
11:10:28 <arne_wiebalck> monitoring is different, I'd say
11:10:46 <oneswig> In this model, the nodes and ports would no longer be stored in a local database?  Do you think everything could be delegated?
11:10:48 <arne_wiebalck> this is more like #cpus, disks, ... f/w version, ...
11:11:21 <arne_wiebalck> oneswig: "local" you mean ironic's DB?
11:11:29 <oneswig> eys
11:11:31 <oneswig> yes
11:11:34 <arne_wiebalck> no
11:11:50 <arne_wiebalck> I think this would be more like the sensor data ironic is gathering and sending to ceilometer
11:11:56 <dtantsur> it sounds like you need to post results of introspection to some external database for reconciliation?
11:12:05 <dtantsur> and maybe do it periodically out-of-band?
11:12:08 <arne_wiebalck> dtantsur: yes
11:12:14 <dtantsur> yes to both?
11:12:14 <arne_wiebalck> dtantsur: that sums it up
11:12:19 <arne_wiebalck> dtantsur: yes
11:12:38 <dtantsur> fun fact: we've just discussed downstream a possibility of periodic out-of-band mini-inspection
11:12:40 <mgoddard> there is a fine line here between monitoring and inspection
11:12:50 <arne_wiebalck> periodically, out-of-band, external system
11:13:07 <dtantsur> now, with ironic-inspector it's easy: we have processing hooks that can post data anywhere
11:13:18 <arne_wiebalck> dtantsur: yes
11:13:19 <mgoddard> we'll probably want the results of out of band 'inspection' to also be pushed to a monitoring system
11:13:46 <dtantsur> what we don't have is 1. out-of-band inspection hooks to post data anywhere, 2. a way to run out-of-band inspection periodically
11:13:54 <mgoddard> it's just a question of which parts of the data they're interested in, and how frequently
11:14:06 <dtantsur> this sounds like two pretty orthogonal RFEs to me, I'm fine with both personally
11:14:07 <oneswig> mgoddard: what kind of monitoring?
11:14:15 <arne_wiebalck> dtantsur: concerning 2. what about the sensor data collection, isn't that similar?
11:14:31 <dtantsur> arne_wiebalck: it's similar, except that inventory is not quite sensor data
11:14:39 <mgoddard> oneswig: the equivalent of IPMI sensor data collection, I assume something similar exists for redfish
11:14:57 <dtantsur> the question is whether we want to represent inventory as part of sensor data or separately
11:14:58 <oneswig> power and environmentals via BMC channels?
11:15:05 <arne_wiebalck> dtantsur: agreed, I was more thinking about the machinery/framework
11:15:36 <arne_wiebalck> dtantsur: the sensor data contains what?
11:15:44 <dtantsur> I've heard downstream that people want to collect sensor data much more (and order of magnitude?) often than inventory
11:15:54 <dtantsur> arne_wiebalck: thermal data, etc
11:15:58 <dtantsur> I can try finding an example
11:15:58 <janders> I am also very interested in baremetal health monitoring via out-of-band but agreed that's not necessary something for a DCIM system
11:16:03 <mgoddard> the line between these two things might be more of a site policy question than anything?
11:16:30 <oneswig> It does sound separable, if not completely separate.
11:16:36 <janders> agreed
11:16:45 <dtantsur> yeah, it may be a similar mechanism, but logically separate
11:16:57 <oneswig> Hardware inventory is a piece that could be focused on first
11:16:58 <janders> DCIM doesn't need to send notifications, it needs to answer queries
11:17:12 <oneswig> Although mgoddard we certainly know people who want the monitorng
11:17:13 <arne_wiebalck> DCIM may be more than what I had in mind initially (it'd add switches, racks, cabls, ...)
11:17:14 <janders> monitoring is sort of the opposite in that sense
11:18:03 <dtantsur> arne_wiebalck: looking at our example from IPMI I see power, voltage, POST errors, temperature readings, etc
11:18:06 <arne_wiebalck> I guess there is also a thin line
11:18:08 <dtantsur> this is for sensors ^^
11:18:22 <arne_wiebalck> dtantsur: thanks
11:18:43 <arne_wiebalck> dtantsur: that is handled by our monitoring (and hence separate for us :-D)
11:19:07 <dtantsur> well, yeah. we need it from ironic :) but for IPMI it's an existing feature for some time.
11:19:58 <arne_wiebalck> I'm putting some hope into redfish for OOB inspection
11:20:03 <dtantsur> I can help with the ironic side of this feature (well, two related features), we have interest in it
11:20:19 <janders> dtantsur: excellent! :)
11:20:25 <oneswig> dtantsur: with inspection data & extra data, is that posted to Swift by Ironic?  Could that be made modular?
11:20:39 <arne_wiebalck> oneswig: swift or a db
11:20:42 <dtantsur> oneswig: it's only for ironic-inspector (in-band inspection)
11:20:51 <dtantsur> and it is pluggable since Stein
11:21:02 <dtantsur> we also have pluggable hooks that run on inspection
11:21:11 <dtantsur> but again, it's all in-band. you cannot run it periodically on a live instance.
11:21:23 <dtantsur> (well, unless you write an in-instance agent)
11:21:33 <arne_wiebalck> the alternative would be to require a h/w daemon on each instance
11:21:40 <arne_wiebalck> dtantsur: :-D
11:22:03 <dtantsur> right, this is something that we (as in ironic) are not very fond of doing, to be honest
11:22:16 <janders> are you aware of anyone using this (an agent inside the baremetal) in the wild?
11:22:27 <dtantsur> we've been trying to avoid touching running instances. doing that will need us to talk to neutron to understand networking, etc.
11:22:31 <arne_wiebalck> I think Rackspace had sth like this.
11:22:35 <janders> sounds crazy but maybe it's not that crazy
11:22:53 <dtantsur> it's not crazy at all, it just doesn't seem to fit in ironic very well
11:23:10 <arne_wiebalck> well, that would be an ironic-independent approach
11:23:11 <dtantsur> you don't need ironic's help to do that. you can make such daemon just post data wherever you need
11:23:14 <oneswig> What inventory data can be extracted from out-of-band inspection, using generic Redfish objects?
11:23:45 <dtantsur> oneswig: essentially everything in theory
11:23:46 <janders> having an optional agent gathering utilisation metrics and health information would be pretty cool
11:23:55 <dtantsur> the practice depends on a specific vendor, unfortunately
11:23:58 <janders> for now running in baremetal means bye bye metrics
11:24:00 <dtantsur> redfish is a very loose standard
11:24:21 <janders> or - could this be done out of band as well? I guess that'd be vendor specific
11:24:28 <arne_wiebalck> oneswig: dtantsur : but in reality probably very different things
11:24:36 <dtantsur> of course :(
11:24:37 <oneswig> dtantsur: I think of it like SNMP, in that there is a common protocol but with useful data in vendor extensions.
11:24:57 <dtantsur> arne_wiebalck, oneswig, talk to etingof on #openstack-ironic, he's our local Redfish guru :)
11:25:13 <dtantsur> oneswig: it's not THAT bad usually
11:25:21 <arne_wiebalck> :)
11:25:22 <dtantsur> it's just that literally everything in Redfish is optional
11:25:25 <dtantsur> literally
11:25:35 <oneswig> dtantsur: thanks :-)
11:25:35 <dtantsur> you can have a compliant implementation with zero features
11:25:57 <dtantsur> but vendors do catch up, so it's rarely that bad in practice
11:26:09 <oneswig> arne_wiebalck: you have a shortlist already, what's on it?
11:26:26 <arne_wiebalck> a short list for things to read out, you mean?
11:26:57 <oneswig> external dcims
11:27:05 <arne_wiebalck> ah
11:27:12 * arne_wiebalck getting the list
11:27:22 <dtantsur> check e.g. this DMTF mockup of a Redfish server: https://redfish.dmtf.org/redfish/mockups/v1/863
11:27:28 <oneswig> Seems like a reference implementation is readily achievable, which is great
11:27:48 <dtantsur> (go into the available System, check e.g. Processors)
11:28:06 <dtantsur> #link https://redfish.dmtf.org/redfish/mockups/v1/863 a redfish mockup to play with
11:28:15 <arne_wiebalck> i-doit, CMDBuild, SnipeIT, SysAid, Spiceworks, GLPI, NetBox, xcat
11:28:16 <oneswig> cool!
11:28:35 <arne_wiebalck> completely unfiltered ^
11:28:57 <mgoddard> oneswig: you mentioned openDCIM earlier. Is that worth adding?
11:29:31 <oneswig> I think so but I don't know how it works and what it can do for supporting integration
11:29:40 <arne_wiebalck> we'll start by having a look at these and see which of these make it on a short list by criteria such as licensing, maintained, storage needs, ...
11:29:57 <arne_wiebalck> ... APIs ...
11:30:09 <arne_wiebalck> well, that's the plan for now
11:30:27 <hearnsj> openDCIM is OK. Simple to set up
11:30:43 <arne_wiebalck> hearnsj: added to my list, thx
11:30:45 <hearnsj> I'm not an expert with commercial DCIM suites though
11:31:07 <oneswig> Hello hearnsj, welcome!
11:31:08 <hearnsj> I have worked with openDCIM   ping me offline hearnsj@gamil.com
11:31:22 <arne_wiebalck> hearnsj: thx!
11:32:17 <oneswig> APIs is probably a big differentiator given the implementors may not assume a machine-to-machine interface
11:32:34 <dtantsur> no API - no integration :)
11:32:53 <arne_wiebalck> yeah, probably a good thing to check early on
11:33:24 <oneswig> I don't know how much useful lifecycle data XCat stores - that's not really it's mission AFAIK.
11:33:42 <oneswig> Way back there was a project to integrate it with Ironic as a power driver...
11:34:39 <dtantsur> waaaay back :)
11:34:43 <janders> I think that died a horrible death, didn't it?
11:35:14 <janders> I never quite got the concept - was it more of an image-deployment driver?
11:35:32 <janders> power wise, not sure what would xcat add to the picture?
11:35:42 <dtantsur> I barely remember it
11:36:02 <oneswig> I'm not sure either...
11:37:04 <oneswig> Were there other candidates to suggest for arne_wiebalck ?
11:37:21 <arne_wiebalck> if you have suggestions or input, please get in touch :)
11:38:22 <oneswig> Quick response from UCam - "HI Stig, we've experimented with both of those, but still use Xcat mostly for a pseudo DCIM. But we're playing with netbox from DO now as a replacement to make Xcat less central, and also I know Sanger use Infloblox as their DHCP/DNS/IPAM solution."
11:38:55 <janders> Infoblox.. that's widely used in shops around town here
11:39:45 <arne_wiebalck> some these tools have quite an emphasis on the networking ...
11:40:04 <arne_wiebalck> .. not necessarily what we look for for our use case.
11:40:28 <oneswig> OK, we should conclude this topic
11:40:32 <janders> Add Infoblox IPAM agent image. The agent will listen for neutron and nova events and automatically manage DNS entries within an Infoblox appliance. The agent will be started when ‘infoblox’ is selected as the neutron_ipam_driver.
11:40:37 <dtantsur> any action items from ironic side?
11:40:39 <janders> that's right
11:41:04 <oneswig> arne_wiebalck: will you follow up with your research?
11:41:13 <arne_wiebalck> definitely :)
11:41:27 <janders> dtantsur: I would be very much interested in the out-of-band monitoring angle
11:41:32 <oneswig> dtantsur: sounds like your hard work is already done :-) Thanks for that!
11:41:56 <oneswig> arne_wiebalck: is there anything planned for forum or PTG on this?
11:42:05 <arne_wiebalck> no
11:42:05 <dtantsur> maybe we should sync off-channel re next ironic action items? or indeed at the forum/PTG?
11:42:14 <arne_wiebalck> there was a session two PTGs ago
11:42:17 <janders> +1 for PTG
11:42:32 <oneswig> I'll put this on the agenda again next week in the US timezone and try to gather a few more data points.
11:42:36 <arne_wiebalck> dtantsur: sounds good!
11:42:50 <arne_wiebalck> oneswig: thanks!
11:42:50 <dtantsur> arne_wiebalck: wanna drive this process?
11:42:57 <arne_wiebalck> dtantsur: yes
11:43:14 <dtantsur> great, thanks!
11:43:19 <arne_wiebalck> dtantsur: won't be at the ptg, though
11:43:26 <dtantsur> what about the forum?
11:43:36 <arne_wiebalck> dtantsur: neither
11:43:37 <arne_wiebalck> :(
11:43:56 <dtantsur> okay, asynchronous ways work just as well
11:44:02 <arne_wiebalck> lol, sorry
11:44:25 <dtantsur> no need to worry. I actually think it's better.
11:44:38 <oneswig> Great, thanks arne_wiebalck dtantsur and all, lets move on
11:44:46 <arne_wiebalck> thx oneswig !
11:44:55 <oneswig> #topic HPC containers CFP
11:45:15 <oneswig> Martial sends his apologies today but is running an event at ISC in Frankfurt
11:45:24 <oneswig> #link HPC containers workshop https://easychair.org/cfp/hpcw2019
11:46:14 <oneswig> He has a few panelists involved and from them I think he'll be looking at a good range of technologies
11:46:34 <oneswig> I think it was just the announcement for that one.
11:46:59 <oneswig> #topic Help most needed - SIG functional gaps
11:47:34 <oneswig> #link Forum session on SIG-identified functional gaps https://www.openstack.org/summit/denver-2019/summit-schedule/events/23612/help-most-needed-for-sigs-and-wgs
11:47:52 <oneswig> Rico Lin got in touch to ask for participation last week
11:48:01 <dtantsur> I see 2 bare metal items, but I think I may have a conflict at this time..
11:48:09 <oneswig> We started an etherpad
11:48:17 <oneswig> #link SIG functional gaps etherpad https://etherpad.openstack.org/p/scientific-sig-denver2019-gaps
11:48:54 <oneswig> dtantsur: I think there'll be plenty of opportunities to respond.  A good one might be the SIG session at the PTG.
11:49:09 <dtantsur> oh, it's not critical for me, but rather for mgoddard :D
11:49:13 <dtantsur> I mean, the conflict
11:49:41 <oneswig> mgoddard is packing a suitcase full of hats to wear for his many roles in Denver...
11:49:52 <mgoddard> I didn't even check, tuesday is full :)
11:50:41 <dtantsur> mgoddard: yeah, it's the slot of your talk
11:50:47 <oneswig> Thursday morning I hope will be a better chance (haven't seen the SIG slots for the PTG yet though)
11:51:45 <oneswig> I'm hoping we can add some federation experiences to this etherpad in due course
11:52:38 <janders> I am inclined to add the baremetal metrics in there
11:52:55 <janders> is this of interest to you as well?
11:53:04 <oneswig> janders: good idea.
11:53:24 <oneswig> For example SKA has concerns on heat management.
11:53:38 <oneswig> Given their data centre sites in South Africa and Western Australia...
11:54:20 <oneswig> janders: Is SuperCloud in Canberra or WA?
11:54:38 <janders> Canberra.. for now ;)
11:55:02 <hearnsj> Regarding heat/power - people also looking at power budget for compute runs, not just time
11:55:40 <janders> immersive cooling could help there - and there's some really interesting work on that in WA
11:55:52 <oneswig> hearnsj: did you ever evaluate DK Panda's energy-efficient MPI?
11:56:26 <oneswig> a small part of a solution for a big problem
11:57:00 * dtantsur has to drop for lunch, sorry. was great to talk to you!
11:57:02 <oneswig> janders: think you've got some interest in seeing that raised
11:57:06 <oneswig> thanks dtantsur!
11:57:28 <oneswig> #topic Denver Summit
11:57:40 <oneswig> #link SIG main session https://www.openstack.org/summit/denver-2019/summit-schedule/events/23741/scientific-sig-bof-and-lightning-talks
11:57:50 <oneswig> It's Monday afternoon this time
11:58:29 * arne_wiebalck has to run to another meeting o/
11:58:31 <oneswig> We'll use the same format, which means half the session will be lightning talks on SIG topics
11:58:34 <oneswig> thanks arne_wiebalck!
11:58:55 <hearnsj> oneswig Never did I'm afraid
11:59:07 <oneswig> #link lightning talks etherpad https://etherpad.openstack.org/p/scientific-sig-denver19-bof
11:59:16 <hearnsj> Regarding immersive cooling - my company has a boat load of it. Pun intended.
11:59:27 <oneswig> I see janders is already there to defend his win last time :-)
11:59:37 <hearnsj> It public knowledge - our Houston datacentre has immersive cooling.
11:59:59 <oneswig> hearnsj: interesting
12:00:00 <hearnsj> Is there any relevance to Openstack though - we dont run Openstack there.
12:00:10 <janders> hearnsj: maybe you can give a lightning talk about it? :)
12:00:24 <oneswig> We are out of time
12:00:32 <janders> thanks guys!
12:00:34 <oneswig> Final mentions?
12:00:54 <oneswig> OK, have to close it there.  Thank you all
12:00:57 <oneswig> #endmeeting