21:01:12 <martial> #startmeeting Scientific-sig
21:01:15 <openstack> Meeting started Tue Feb  5 21:01:12 2019 UTC and is due to finish in 60 minutes.  The chair is martial. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:01:16 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:01:18 <openstack> The meeting name has been set to 'scientific_sig'
21:01:30 <martial> good day everyone
21:01:45 <martial> welcome to the next edition of the Scientific SIG meeting
21:01:57 <b1airo> hi martial
21:02:03 <martial> #chair b1airo
21:02:04 <openstack> Current chairs: b1airo martial
21:02:08 <martial> Gi Blair
21:02:15 <martial> Hi Blair (typo)
21:02:35 <b1airo> i was about to type startmeeting when my laptop suddenly powered off, so if i go quiet i may be having computer issues...
21:03:11 <martial> A couple announcements first:
21:03:19 <martial> (rgr b1airo)
21:03:41 <martial> PEARC19 CFP is out
21:03:59 <martial> #link https://www.pearc19.pearc.org/
21:04:03 <b1airo> i was thinking about going to PEARC this year, have heard good things...
21:04:31 <martial> it is turning into a good mini SuperComputing in truth
21:05:30 <martial> and ISC HP 19 is going to have a container in HPC specific workshop
21:05:40 <martial> Christian Kniep from Docker is the chair
21:06:04 <b1airo> interesting...
21:06:27 <martial> https://www.irccloud.com/pastebin/kDWaSrVb/
21:06:32 <martial> In conjunction with ISC HIGH PERFORMANCE 2019
21:07:13 <martial> am on the technical committee so I will post as we get the CFP finalized
21:07:19 <martial> but FYI
21:07:52 <martial> if you are coming to PEARC I will add you to this version of the HPC Infrastructure panel
21:08:04 <martial> (need to finalize it)
21:08:24 <janders> g'day all. Sorry - hardware issues!
21:08:42 <b1airo> hi janders
21:08:47 <martial> welcome janders
21:09:21 <martial> adding to that list ... the "Open Infrastructure Summit" is coming soon :)
21:09:38 <martial> and we are looking for lightning talk presenters for the SIG BoF at the Summit ... as usual :)
21:10:38 <martial> so if you are doing something using an Open Infrastructure (not just OpenStack) related to HPC or scientific workflow and are willing to tell us about it (and are coming to the summit) ...
21:10:56 <martial> the call is out :)
21:11:10 <b1airo> we should probably create an etherpad as usual...
21:11:16 <b1airo> doing so now
21:12:07 <b1airo> #link https://etherpad.openstack.org/p/scientific-sig-denver19-bof
21:12:07 <janders> in regards to the Summit - do you guys know what's the approximate timeline for presentation selection?
21:12:34 <b1airo> let me check latest emails from the organising committee, janders...
21:12:45 <janders> thank you b1airo
21:13:29 <martial> Past this little tidbit, the usual conversation here :)
21:13:43 <martial> oh and I know Blair is keen to collect HPC container war stories
21:13:50 <b1airo> current plan appears to be that "The Foundation staff will send the speaker notifications the week of February 19"
21:14:06 <janders> I could probably do a storage-related talk - I've got a BeeGFS, GPFS and long-range storage projects going on, at least one should be presentation-worthy by Denver timeframe
21:14:17 <martial> do it~
21:15:42 <janders> at this stage I'd say I'm 75% likely to be in Denver, whether my presentation gets in or not will determine how easy it will be to convince my bosses to fund the trip
21:16:33 <b1airo> i'm not sure there's a fingers cross ascii emoji is there?
21:16:45 <b1airo> 🤞
21:17:09 <janders> 19 Feb notifications - does this mean the voting is on?
21:17:23 <martial> it is
21:17:28 <martial> (vote for me :) )
21:18:23 <b1airo> i think voting may have actually closed already
21:19:04 <janders> oops - looks like it has
21:19:44 <martial> really?
21:20:24 <janders> this feels like it was an unusually short window..
21:20:53 <janders> submissions only closed like a fortnight ago right?
21:21:07 <b1airo> yeah i'm always surprised at how quickly this seems to roll around relative to the last Summit
21:21:56 <b1airo> it does feel like the track finalisation could be pushed at least a couple of weeks closer to the actual Summit, but i'm no events manager (probably a good thing!)
21:22:24 <b1airo> janders: looks like there was a lot of community interest in your talk ;-)
21:22:48 <janders> thank you b1airo! :) good to hear
21:23:28 <janders> one of my RH contacts works with the organisers a fair bit, I will ask around about the motivation between accelerating things
21:23:49 <janders> I think the Summit is reinventing itself a fair bit, so it's quite likely they will be changing their ways across the board
21:28:22 <b1airo> this is kind of off topic but janders you might have a pointer i can pass on - Mike Lowe was asking in a separate chat about tracking GPU utilisation against Slurm jobs, i.e., how to tell how much GPU a GPU job has actually used... i'm no Slurm guru but doesn't look obvious
21:29:06 <b1airo> looks like it might require a job epilogue that uses nvidia-smi's accounting functionality
21:29:13 <janders> I don't have an answer off the top of my head but I will ask my Slurm guys and get back to you
21:29:41 <martial> I have a couple colleagues at NIST (where I am right now) that are interested in doing a lighting talk on "the use OpenStack for hardware specific ML evaluations"
21:30:06 <b1airo> throw it in the Etherpad martial
21:30:17 <janders> we use Bright and I believe they introduced some functionality to assist with GPU utilisation monitoring. Previously we had a homebrew solution based on nvidia-smi - that I know
21:30:22 <b1airo> thanks janders !
21:30:37 <janders> no worries
21:30:38 <martial> having them post it right now
21:31:41 <martial> stepping out for a few minutes
21:33:27 <b1airo> i think we are more-or-less done for today. couple of interesting things in the pipeline for upcoming meetings, but not much to discuss this week without a bigger crew
21:33:45 <janders> sounds good!
21:34:08 <janders> I put a generic "RDMA storage for OpenStack" talk proposal into the Etherpad
21:34:26 <janders> it will crystallise into something more concrete closer to the Summit
21:34:32 <b1airo> 👍
21:35:18 <janders> We're playing around with GPFS, I will have a look at BeeGFS as well and we've done some interesting long-range work as well I should probably brief you on at some point - maybe at the Summit :)
21:35:26 <b1airo> i might be able to throw higher level together too, will see how things develop here...
21:36:09 <janders> great!
21:37:22 <b1airo> very interested in GPFS multi-tenancy. at this point i can't see a low-risk way of supporting sensitive-data research in a typical large shared/multi-user HPC environment, better to create dynamic clusters per data-set/project/group as needed
21:38:16 <janders> I was thinking along the lines of what Stig and his team did for BeeGFS
21:38:36 <janders> they allow access to the RDMA filesystem by sharing the private network that storage is connected to with "trusted" tenants
21:38:51 <janders> and the deal is - if you have native RDMA storage, you don't have root
21:38:59 <janders> (or at least this is my understanding)
21:39:36 <janders> However, having said that, BeeGFS-OND could offer some interesting ways of tackling this for Slurm running on bare metal OpenStack
21:40:03 <janders> and OND also means a brand new approach to resiliency
21:40:56 <b1airo> indeed - real scratch
21:41:00 <janders> (while we're quite deep down into the resiliency discussion for a single 1PB scratch and setting performance:capacity:uptime balance, Cambridge guys just don't have this problem, running on-demand.. :)
21:41:38 <janders> I'm not so much worried about losing data on scratch, more concerned that with no redundancy one NVMe dies and a good chunk (if not all) of HPC grinds to a halt..
21:41:42 <janders> that's expensive downtime
21:42:06 <b1airo> good point
21:42:11 <janders> now if each job has own storage, if a couple per-job ephemeral filesystems die, who cares
21:42:13 <janders> resubmit jobs
21:42:14 <janders> sorry!
21:42:31 <janders> if it's 15x faster than the current storage the users will happily accept this
21:42:56 <janders> but - having said this - if you have a significant existing investment in GPFS that's less applicable
21:43:20 <janders> we're starting fresh in this particular field so can do any filesystem or a combination of a couple different ones
21:43:58 <janders> having the ability to provision high performance storage through OpenStack really helps with trying different things
21:44:33 <b1airo> yes, that's something i'm missing somewhat at the moment
21:46:16 <janders> ok - I don't have anything more to add - shall we wrap up?
21:48:57 <martial> (back)
21:49:01 <b1airo> yes, thanks for the chat janders . catch you next time!
21:49:06 <martial> sounds like we are ready to wrap
21:49:11 <b1airo> seems so
21:49:28 <martial> and entry added to the etherpad:)
21:49:29 <b1airo> i have a cake to bake (public holiday day off here)
21:49:50 <janders> Happy Waitangi Day! :)
21:50:05 <b1airo> thanks!
21:50:38 <b1airo> #endmeeting