21:00:22 <oneswig> #startmeeting scientific-sig
21:00:23 <openstack> Meeting started Tue Mar  6 21:00:22 2018 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:25 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:27 <openstack> The meeting name has been set to 'scientific_sig'
21:00:32 <oneswig> ahoy there
21:00:48 <oneswig> #link agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_March_6th_2018
21:02:04 <oneswig> #topic SIG roundup from the PTG
21:02:17 <rbudden> hello
21:02:27 <oneswig> Hi Bob
21:02:43 <oneswig> How's Bridges?
21:03:05 <rbudden> Doing good
21:03:21 <rbudden> Keeping us all busy!
21:03:30 <oneswig> We had some discussion around Ironic deploy steps, ramdisk boot and kexec - I think you were interested in this?
21:03:37 <rbudden> indeed
21:04:05 <oneswig> Seems like the deploy steps concept is right for you then :-)
21:04:20 <b1airo> Morning oneswig
21:04:25 <oneswig> It was lucky, we were scheduled at a quiet time when not much was going on
21:04:27 <oneswig> Hey Blair
21:04:31 <oneswig> #chair b1airo
21:04:32 <openstack> Current chairs: b1airo oneswig
21:04:49 <b1airo> I'm wrangling the kids to school so one eye on this
21:04:52 <oneswig> as a result, we had good attendance by a number of key people
21:05:06 <rbudden> awesome
21:05:11 <rbudden> i’m checking out the etherpad now
21:05:15 <oneswig> b1airo: I got my kids to school hours ago...
21:06:08 <oneswig> Julia Kreger seemed particularly comfortable with the idea of supporting ramdisk boot as a proven technique
21:06:16 <b1airo> Bugger, I must have overslept!
21:07:15 <oneswig> That was at the tail end of a couple of hours of discussion though.  We had an in-depth update on preemptible instances from ttsiouts at CERN
21:08:09 <TheJulia> extremely comfortable given what I've read where it has been done in various deployments
21:08:14 <martial> Hello all
21:08:18 <oneswig> Hi TheJulia!
21:08:21 <oneswig> thanks for joining
21:08:23 <oneswig> Hi martial
21:08:28 <oneswig> #chair martial
21:08:29 <openstack> Current chairs: b1airo martial oneswig
21:09:11 <oneswig> I was just recapping the discussion (although running backwards)
21:09:44 <oneswig> rbudden: one action we took was to document more clearly our use cases for non-conventional Ironic deployment steps
21:10:54 <rbudden> sounds good
21:10:55 <oneswig> TheJulia: can you remind me the best way use cases could be made available for the Ironic team?
21:11:31 <TheJulia> oneswig: a new use case to support a new thing, or an existing usecase that we already support?
21:11:48 <oneswig> New thing - eg ramdisk boot
21:12:11 <oneswig> Storyboard / launchpad?
21:12:12 <TheJulia> oneswig: Create an bug tagged with [RFE] in the subject on ironic's launchpad
21:12:38 <TheJulia> at least, until we migrate to launchpad. I have to find a spare network cable before I can run a test migration to storyboard
21:13:00 <oneswig> OK, launchpad will work for now
21:13:56 <oneswig> If I create a bug and sketch out the need, rbudden can you add details specific to what you'd like for Bridges?  I'll circulate to Pierre and Tim as well
21:14:05 <rbudden> sure
21:14:18 <rbudden> I think Trandles has the largest use case for ramdisk boot
21:14:40 <rbudden> we could use that as well for our 12TB nodes on Bridges, but we only have a handful of them
21:14:44 <oneswig> Sounds like he's up to something interesting
21:14:53 <rbudden> I think boot from Cinder Vol would fix us up as well
21:15:12 <rbudden> obviously we like kexec to avoid multiple reboots
21:16:03 <oneswig> rbudden: there was some discussion on multi-attach for large scale cinder volume boot, I think it needs some testing at scale (which we may try in a couple of months)
21:16:14 <oneswig> rbudden: how long to reboot a node with 12 TB RAM?
21:16:53 <rbudden> i don’t have the exact number off the top of my head, but i recall when we PXE booted it from the showfloor at SC it was at least 30 min :(
21:17:16 <rbudden> we unded up pulling blades to debug to cut back the boot time since it was a demo ;)
21:17:25 <TheJulia> oneswig: if you do, we would love to know the details behind any testing since some systems have architectural limits.
21:17:41 <rbudden> that was years ago though, so i’m unsure if there have been improvements to disable things like ram check, etc. at the iLO level
21:18:37 <oneswig> TheJulia: johnthetubaguy and mgoddard are likely to be leading it - I'm sure they'll keep you updated.  The rough scale is deploying to ~600 ironic nodes.
21:18:57 <TheJulia> Awesome, thanks!
21:19:29 <oneswig> Should be a lot of fun :-)
21:21:39 <oneswig> There was a good deal of interesting discussion on preemptible instances, including how they might interact with reservations in Blazar.  I think that was one of the highlights of the session
21:23:03 <oneswig> That discussion gained some user input from the Scientific SIG and went on to a Nova group discussion on the Friday afternoon.
21:23:42 <oneswig> It was a bit difficult to focus by that time given everyone had just had their flights cancelled but I think the Nova team soldiered on.
21:24:37 <oneswig> One of the nuances was on whether to perform the preempting action (ie, killing an instance) upon the final "NoValidHost" event, or to attempt to do it slightly before then based on (eg) 95% utilisation.
21:25:04 <oneswig> I think the CERN team want the former to get maximum utilisation
21:25:49 <oneswig> The latter might feasibly be a role performed by a process like Watcher.
21:27:06 <oneswig> There was also some discussion on a new strategy for resolving quotas across nested project hierarchies.
21:27:08 <b1airo> Would be nice to have that option integrated given how close they are
21:28:08 <oneswig> b1airo: right - seems like it, although having many concurrent actors could make a complex system chaotic.  Perhaps one strategy will win out.
21:29:58 <b1airo> Figuring out what 95% is could be a difficult problem in real deployments
21:29:59 <oneswig> The quota issue may be resolved in the long term through managing support for quotas through a new Oslo library, tasked with managing resource quotas across a subtree of projects
21:31:05 <oneswig> There were some interesting issues raised on how to count resource consumption when (eg) mixing virtualised and bare metal compute, given the custom resource classes of baremetal.
21:31:56 <b1airo> My natural instinct is that they should be separate quotas
21:32:16 <oneswig> b1airo: does it all come back to the placement service in the end? On your previous comment
21:32:43 <b1airo> I suspect it has to
21:36:06 <oneswig> There was also some interesting discussion in the Ironic sessions on complex deploys - multi-partition, RAID, etc.
21:37:18 <oneswig> We also briefly talked about setting BIOS config during deploy steps.  This raises a question on how to undo in cleaning all that was done in deployment.
21:38:14 <rbudden> i’m not sure if it currently exists, but a way to plugin certain cleaning steps would be nice
21:38:28 <rbudden> specifically for us it would be for puppet cert cleanup before a redeploy
21:38:50 <martial> (catching up on the typed text, why were flight canceled?)
21:39:18 <oneswig> martial: it snowed a freakish amount for Ireland.
21:40:33 <oneswig> https://twitter.com/DublinAirport/status/969368265662267393
21:41:44 <oneswig> I got home after ~36 hours, the airport had only just reopened then.  This part of Europe isn't geared to handle weather like that, everything shuts down...
21:43:04 <b1airo> No plows lining the runways like in Chicago
21:43:31 <TheJulia> They had plows at the airport... but yeah
21:45:05 <TheJulia> oneswig: with regards to undo settings applied, I'm fairly sure ironic may only need to undo the boot node, but I've not had much time to think about it.. nor ability to brain after the two day trek home.
21:45:47 <oneswig> rbudden: I think you can already create custom clean steps, but perhaps you'd need to roll your sleeves up - https://docs.openstack.org/ironic/pike/admin/cleaning.html
21:45:48 <TheJulia> oneswig: a distinct use case where we would need to peel things back that is not raid would be appreciated, if your aware of one
21:46:28 <rbudden> oneswig: thanks, i’ll check that out. i haven’t played with cleaning much, but always find simple cleanup steps that would be awesome to just automate
21:46:30 <oneswig> TheJulia: aside from RAID our main use cases are hyperthreading and power profile.
21:46:56 <oneswig> I guess hyperthreading is the one you'd notice immediately
21:47:15 <TheJulia> I think those could all be done upon next deployment if we get deploy steps sorted with the bios interface work
21:48:01 <oneswig> TheJulia: if it could be done in one hit, that would be good - avoiding another reset...
21:48:24 <rbudden> hyperthreading is a good one, we occasionally get requests for this as well
21:49:30 <TheJulia> I suspect it would almost be better to always try to assert desired state upfront. The only thing I can really think of is needing special firmware, but that is.... yeah.
21:50:01 <oneswig> TheJulia: careful what you wish for! :-
21:50:43 <TheJulia> I'm sure that would make some operators happy
21:51:13 <oneswig> It would mean a comprehensive picture of default settings, to totally define hardware state upon deployment
21:51:54 <oneswig> I think that was all I had on the PTG - TheJulia was there anything the scientific SIG would really like from the Ironic sessions?
21:52:08 <b1airo> Virtualisation features would be another common toggle
21:54:03 <oneswig> b1airo: agreed.
21:54:58 <oneswig> BTW have you seen this project from Dell - https://github.com/dsp-jetpack/JetPack
21:55:23 <oneswig> The missing piece from python-dracclient (NIC config) is found here.
21:55:31 <TheJulia> oneswig: I'm still typing up everything. We did briefly discuss firmware management but there are many different ways we can approach that.
21:56:00 <oneswig> Thanks TheJulia, I'll follow that (probably indirectly via Mark and John)
21:57:19 <oneswig> We are nearly out of time...
21:57:23 <oneswig> #topic AOB
21:57:38 <oneswig> Queens is imminent!
21:57:59 <oneswig> Mark did a test deploy to shake out some things in Kolla and Bifrost
21:59:00 <oneswig> One other announcement - https://github.com/openstack/kayobe - one step closer
21:59:31 <b1airo> I saw mikal praising Bifrost on Twitter :-)
22:00:18 <oneswig> it's the future of deployment! :-)
22:00:26 <oneswig> On that happy note, final comments?
22:00:42 <b1airo> He'll be on to Kayobe next
22:00:47 <martial> Our P2302/ORCA meeting is coming soon ( March 20-21) ... details at federatedcloud.eventbrite.com
22:00:56 <oneswig> Won't we all b1airo :-)
22:00:57 <martial> (final comments shameless plug ;) )
22:01:07 <oneswig> thanks martial
22:01:13 <oneswig> good reminder!
22:01:19 <oneswig> OK, we are out of time
22:01:21 <oneswig> #endmeeting