21:00:21 <oneswig> #startmeeting scientific-sig
21:00:22 <openstack> Meeting started Tue Jul 24 21:00:21 2018 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:26 <openstack> The meeting name has been set to 'scientific_sig'
21:00:34 <oneswig> I even spelled it right
21:00:41 <janders> g'day all! :)
21:00:46 <oneswig> #link Today's agenda https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_July_24th_2018
21:00:52 <oneswig> Hey janders!
21:01:00 <trandles> o/
21:01:09 <oneswig> Tim - you made it - bravo
21:01:14 <oneswig> How is PEARC18?
21:01:33 <trandles> it's good...I'm only here today :P
21:01:42 <oneswig> you were in the UK last week, right?
21:01:43 <trandles> Mike and Martial should make it too
21:01:47 <oneswig> How was that?
21:01:47 <trandles> yeah
21:01:56 <trandles> busy and tiring
21:01:58 <trandles> long days
21:02:08 <oneswig> It's ruddy hot over here right now, you'd have fitted right in.
21:02:09 <janders> thanks for the voting reminder! :)
21:02:32 <oneswig> Would have been just like that time you had no AC :-)
21:02:44 <oneswig> janders: ah, right, lets get onto that.
21:02:52 <oneswig> #topic Voting closes thursday
21:02:55 <trandles> I used to think we over-air conditioned in the US...no more
21:03:41 <oneswig> trandles: I'm melting over here...
21:04:06 <oneswig> We visited CERN last week.  Big surprise there was that the offices (in Geneva) have no AC.
21:04:25 <trandles> I'm impressed with the beers from Tiny Rebel btw.  Wish we could get them in the states.
21:04:49 <oneswig> Tiny rebel?  I'll look out for it.  Whereabouts was this?
21:05:20 <janders> whoa... it's been close to 40C last time I visited. Would be painful to work in offices without AC if that were to last for days..
21:05:35 <trandles_> sorry, wifi here is really dodgy
21:05:51 <oneswig> trandles_: where's the conference?
21:06:00 <trandles_> Pittsburgh
21:06:04 <martial__> I am seriously waiting for IRC to add 5 ___ to my nick at some point :)
21:06:11 <martial__> hey Stig
21:06:13 <oneswig> Hey martial__, welcome
21:06:16 <trandles_> expect jmlowe, martial__, me to have connection problems
21:06:25 <oneswig> #chair martial__
21:06:26 <openstack> Current chairs: martial__ oneswig
21:06:29 <jmlowe> Not me, different hotel
21:06:32 <martial__> well Mr Randles, long time no see :)
21:06:37 <trandles_> lol
21:06:42 <oneswig> Hi jmlowe
21:06:47 <jmlowe> Hey Stig
21:06:57 <oneswig> How was the panel - is this filmed?
21:06:59 <martial__> Mike: too easy
21:07:04 <martial__> not filmed
21:07:09 <martial__> I think it went well
21:07:16 <trandles_> oneswig: Tiny Rebel is just north of Cardiff I think
21:07:21 <oneswig> tough crowd?
21:07:35 <oneswig> trandles_: not too far at all then.  I'll keep an eye out for it.  Thanks
21:07:40 <martial__> #link https://etherpad.openstack.org/p/pearc18-panel
21:07:52 <martial__> here are the questions we went through
21:07:59 <martial__> (well most of them anyhow)
21:08:00 <trandles_> ah, it's in Newport
21:08:05 <janders> stig: I see you've got some _really_ cool presos submitted
21:08:14 <martial__> yep reused the Etherpad method ... worked well in truth
21:08:17 <oneswig> I know those people :-)
21:08:40 <oneswig> janders: thanks!  Have you got a link to yours?
21:09:05 <janders> https://www.openstack.org/summit/berlin-2018/vote-for-speakers#/22223
21:09:18 <janders> https://www.openstack.org/summit/berlin-2018/vote-for-speakers#/22219
21:09:32 <janders> https://www.openstack.org/summit/berlin-2018/vote-for-speakers#/22164
21:09:33 <oneswig> I'm guessing most of the NSF folks are doing SC instead of Berlin so this is of passing interest but voting for sessions closes Thursday
21:10:11 <martial__> 3x really good "I would love to see topics" from me
21:10:43 <martial__> but no favoritism ... send your proposal in :)
21:10:46 <oneswig> Erez and Moshe up with you again, eh?  Like the 3 Musketeers :-)
21:10:58 <janders> yes :)O
21:11:29 <janders> though for the nova-compute on Ironic we team up with RHAT for a change
21:12:05 <oneswig> janders: It sounds like this trick is in use at CERN also - talk to Arne Wiebalck about it
21:12:29 <janders> speaking of.. sorry I was real busy and never followed up with John on the compute/ironic bit. Are you back in office oneswig?
21:12:43 <oneswig> janders: did you get it doing everything you wanted and what are the limitations?  Don't keep us hanging until november!
21:12:51 <oneswig> Right now I'm in Cambridge, but close enough
21:13:14 <oneswig> Here's the talks from our team
21:13:19 <janders> :) I will try to restart that email thread this week or early next
21:13:21 <oneswig> Doug: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22078
21:13:21 <oneswig> Stig: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22446
21:13:21 <oneswig> Stig: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22233
21:13:23 <oneswig> Mark: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22454
21:13:25 <oneswig> Mark: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22579
21:13:27 <oneswig> John: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22438
21:14:40 <oneswig> There's also some interesting developments on various fronts around preemptible instances
21:15:03 <oneswig> priteau pointed out a talk from the Blazar core on delivering these within that context.
21:15:27 <oneswig> janders: please do, I'm sure john would be interested.
21:15:40 <janders> oneswig: that's some really excellent stuff. It's all great, monasca, preemptible instances and container networking are supercool
21:15:52 <martial__> +3'ed all
21:16:00 <martial__> next? :)
21:16:01 <oneswig> ... in each case, when they work! :-)
21:16:08 <oneswig> Good plan, thanks martial__
21:16:15 <oneswig> #topic Ansible+OpenHPC
21:16:26 <oneswig> A topic of longstanding interest over here.
21:16:38 <janders> yes! :)
21:17:01 <oneswig> You guys in Pittsburgh, right?  How about calling in at Santa Clara on your way home...
21:17:17 <oneswig> #link OpenHPC event https://lists.openhpc.community/g/main/message/4
21:17:50 <oneswig> BTW that link's not an error.  This was apparently the 4th message this year on the OpenHPC mailing list
21:19:06 <martial__> there was a presentation on OpenHPC yesterday I think
21:19:07 <oneswig> There's some playbooks online already: https://github.com/Linaro/ansible-playbook-for-ohpc
21:19:22 <oneswig> Apparently following the Warewulf-style deploy process.
21:20:05 <oneswig> What I'm hoping is that once the infrastructure is deployed, there's a natural point where OpenStack-provisioned infrastructure could dovetail into the flow
21:20:16 <martial__> Tim, Mike: did Evan not speak about this in our panel?
21:20:41 <trandles> um...yeah kinda
21:21:24 <oneswig> They are using OpenHPC at Minnesota?
21:21:25 <jmlowe> in passing
21:21:43 <jmlowe> I think he went to the talk yesterday
21:22:05 <martial__> was looking for it in the agenda, not sure it is there
21:22:13 <oneswig> A talk on OpenHPC?
21:22:17 <jmlowe> yes
21:22:21 <trandles> tbh I'm still not sure what OpenHPC provides of value other than a collection of things you might want on your cluster
21:23:35 <oneswig> They even have this: http://build.openhpc.community/OpenHPC:/1.3:/Update6:/Factory/CentOS_7/x86_64/charliecloud-ohpc-0.9.0-4.1.ohpc.1.3.6.x86_64.rpm - what more could you want?
21:23:57 <trandles> lol, touché
21:24:16 <trandles> but you can get that via "git clone https://github.com/hpc/charliecloud"
21:24:28 <martial__> I'll +3 this one too :)
21:24:30 <oneswig> trandles: we like it because it's a lot easier to automate the deploy and configure
21:24:52 <oneswig> someone's gone to the trouble of building and packaging. Maybe even testing
21:25:49 <trandles> I'm leaning more and more to the side of containerized the world, stop provisioning clusters, use Spack to manage build and runtime complexities
21:25:56 <janders> oneswig: trandles: I don't have much hands-on with OHPC but my hope is to simplify things - using Bright Cluster Manager is cool, but creates a network of dependencies from hell and at times creates almost as many problems as it solves..
21:25:58 <trandles> *containerize
21:28:00 <oneswig> trandles: we've got some nodes running Fedora Atomic.  There's basically no other way with those nodes.  We are facing containerising BeeGFS OSS etc to create something hyperconverged
21:28:53 <trandles> feels like a sane way to manage a cluster in my opinion
21:29:21 <oneswig> I'd prefer to be all-in on one approach or the other.
21:29:32 <oneswig> but I don't disagree with you.
21:30:16 <trandles> looking at the tools available today, especially the cloud stuff for both hardware management and runtime portability, it seems like we can do a better job managing our HPC clusters
21:30:42 <trandles> or maybe I'm just jetlagged and tired
21:31:10 <oneswig> trandles: on a related note, there was some recent interest here in running OpenMPI jobs in Kubernetes.  There's some prior work on this that passed our way.  Ever seen this work?
21:31:35 <trandles> we haven't looked at k8s for much more than deploying services
21:31:47 <trandles> we have looked at lot at OpenMPI + linux namespaces
21:32:00 <oneswig> I'm hoping to understand the capabilities - and limitations
21:32:11 <trandles> OpenMPI has a large amount of internal voodoo
21:32:17 <oneswig> (especially if my talk goes through...)
21:32:26 <oneswig> trandles: it's not small!
21:32:42 <janders> oneswig: does your talk touch on RDMA in containers?
21:32:58 <trandles> it plays games in the name of efficiency and those games break with certain namespace constraints
21:33:14 <trandles> but we've run 1000+ node, 10000+ rank OpenMPI jobs using Charliecloud
21:33:19 <oneswig> janders: We need that.  We need to understand what's hackery-bodgery vs what's designed for this purpose
21:33:43 <trandles> RDMA in containers is no different than RDMA on bare metal isn't it?
21:33:53 <janders> oneswig: +1
21:34:06 <oneswig> trandles: How does Charliecloud work with (eg) orted?  Does that run on the host, outside the container?
21:34:29 <trandles> depends on how everything is built
21:34:35 <janders> trandles: last time I checked (which was few months back) there were some issues with running multiple RDMA enabled containers on one bare-metal node
21:34:46 <oneswig> trandles: I'm not sure how it works with the /dev objects that get opened for RDMA access
21:34:49 <trandles> the easiest is building your resource manager with PMI support and no orted is required
21:35:14 <trandles> janders: probably depends on the container runtime
21:35:16 <janders> oneswig: exactly... the /dev challenge..
21:35:28 <trandles> Charliecloud bind mounts /dev into your namespace
21:35:48 <trandles> docker does the wrong thing for HPC (IMO) by creating it's own /dev entries
21:35:51 <trandles> *its
21:35:51 <oneswig> trandles: thanks.  Good to know.  I'm wondering how K8S + OpenMPI achieves this
21:35:56 <oneswig> (ie, without slurm)
21:36:13 <oneswig> trandles: is that the private devices thing?
21:36:29 <trandles> we can launch using mpirun but you start to have OpenMPI version mismatch issues inside vs. outside
21:36:56 <janders> trandles: which runtime is better than docker for RDMA in containers?
21:37:13 <trandles> our position is, for the HPC use case, you want as little abstraction/isolation as possible
21:37:38 <trandles> Shifter and Charliecloud both have full RDMA support
21:37:55 <trandles> I assume Singularity does to, but I haven't looked
21:37:55 <janders> trandles: great, thank you
21:38:21 <trandles> at one point Singularity was playing games to pass things between the container and the bare metal, but Shifter and Charliecloud do not
21:38:41 <oneswig> trandles: Can we pin you down to a date to talk this over in great depth?
21:38:59 <trandles> oneswig: would like that, yes
21:39:08 <janders> +1!!!
21:39:32 <trandles> August 7 works
21:39:49 <trandles> (no travel in August!!! :) )
21:40:33 <oneswig> Excellent - although I'll be unable to join you that day (holidays)
21:40:50 <trandles> 21st works too
21:41:02 <oneswig> Perfect for me :-)
21:41:07 <trandles> or I could do a separate WebEx for StackHPC ;)
21:41:57 <oneswig> trandles: we really need to get this installed in our OpenHPC environment before we can ask useful questions
21:42:38 <martial__> Aug 21, will publicize it
21:42:56 <oneswig> On the Ansible+OpenHPC side, I'm hoping to gather a few interested sites together
21:43:00 <martial__> maybe we can do a video meetup
21:43:15 <oneswig> janders: sounds like you're in?
21:43:19 <janders> sounds great!
21:43:27 <janders> I can organise a Google Meet if that is of interest
21:43:44 <oneswig> martial__: could be good.  Then Tim could end by singing the Charliecloud Song
21:43:48 <martial__> I can host a goto meeting too
21:44:05 <martial__> 101 users enough you think?
21:44:15 <trandles> <damn, need to write a song>
21:44:34 <martial__> (hey isn't that your plan with Mike in a few minutes? :) )
21:45:34 <oneswig> martial__: trandles: jmlowe: if you see Evan Bollig later, can you gauge his interest in Ansible+OpenHPC?
21:45:45 <jmlowe> Will do
21:46:20 <oneswig> Thanks!
21:46:42 <oneswig> OK, I'll get that going in the next few days.  Good stuff.
21:47:12 <oneswig> Lets move on,
21:47:20 <oneswig> #topic Ceph days Berlin
21:47:42 <oneswig> Hooray, if you were worried that a 3-day OpenStack summit was too short, I have the solution for you
21:48:14 <janders> excellent. I will aim to rock up early for this event :)
21:48:16 <trandles> \o/
21:48:19 <oneswig> #link Ceph day Berlin, Monday 12 November https://ceph.com/cephdays/ceph-day-berlin/
21:48:41 <oneswig> janders: What better antidote to jet lag?
21:48:42 <jmlowe> Side note: I'm starting to really like the cephfs manila ganesha nfs I have going now.
21:48:45 <janders> ...and hopefully good RDMA support will arrive even before me... :)
21:49:29 <oneswig> jmlowe: ganesha is the piece we've not used - our clients are given slightly more trust and access CephFS direct
21:49:41 <janders> jmlowe: very interesting! can you tell us bit more about it? (use cases, performance, security)
21:49:47 <oneswig> I'm noting your positive experience...
21:50:05 <janders> sounds like another great topic for a presentation here? :)
21:50:28 <oneswig> janders: +1 from me
21:50:46 <martial__> janders: so you are offering to do it, very well I will remember this ;)
21:51:05 <jmlowe> security is ip address in exports, ebtables prevents eves dropping, random uuid for export path as a kind of shared secret
21:51:53 <janders> martial: unfortunately I don't have much to say on this topic for the time being
21:51:58 <jmlowe> performance wise in relatively limited testing I can max out  my 10GigE
21:52:28 <oneswig> jmlowe: that's better than I'd expected, nice job.
21:53:06 <oneswig> Ganesha is running locally on the hypervisor, right?  What's the route between Ganesha and the client VM - is it that VIRTIO_SOCK thing?
21:53:19 <jmlowe> I was pleasantly surprised, metadata on nvme backed pool
21:53:37 <janders> jmlowe: maxing out 10GE - is that over multiple clients, or can a single client reach that?
21:53:43 <jmlowe> 264 spinning rust bluestore osd's with mimic
21:53:48 <jmlowe> single client
21:53:55 <janders> impressive!
21:54:19 <oneswig> jmlowe: I've been cursing mimic recently due to some weirdness with ceph-volume failing to deploy new OSDs
21:54:36 <janders> +1 to oneswig's VIRTIO_SOCK question
21:54:53 <jmlowe> !!!! I'm waiting on 13.2.1 so I can get a failed disk back in
21:54:55 <openstack> jmlowe: Error: "!!!" is not a valid command.
21:55:39 <oneswig> silly old openstack...
21:55:54 <oneswig> OK, we should wrap up...
21:55:58 <oneswig> #topic AOB
21:56:05 <oneswig> So what else is new?
21:56:35 <janders> UFM6 is out, seems to work for SDN/IB
21:56:55 <janders> I need to track down some dependency issues but it's most likely not UFM's fault
21:56:58 <oneswig> I talked on Ceph RDMA with the team at CERN, and also MEERKAT in South Africa.  There's plenty of interest in this, in the right places
21:57:10 <janders> excellent to hear!
21:57:15 <trandles> janders: I haven't forgotten about IB + SDN, I've just been too busy with travel
21:57:19 <oneswig> janders: what changes did you need to apply for new UFM?
21:57:33 <janders> oneswig: none. Drop-in replacement.
21:57:38 <oneswig> phew
21:57:49 <janders> but more detailed testing to follow
21:58:02 <jmlowe> sysbench 16 threads random write written, MiB/s:               13560.71, read, MiB/s:                  14416.96
21:58:02 <oneswig> That chain of services is long enough without being cranky along with it
21:58:03 <janders> I think 6 has some resiliency enhancements
21:58:28 <janders> though I don't think the uppercase/lowercase thing made it into UFM6 GA
21:59:14 <janders> had some issues with uppercase GUIDs in the past, but that's probably not worth discussing here
21:59:15 <oneswig> janders: thanks for the heads up on that, seems like we (luckily) dodged a howler on our setup
21:59:32 <oneswig> jmlowe: > 14 GB/s?
21:59:33 <janders> trandles: no worries, feel free to reach out any time
21:59:50 <jmlowe> there's some caching in there
22:00:01 <janders> oneswig: our blade chassis only speak uppercase for MACs/GUIDs hence we hit this at full speed. It hurt :)
22:00:02 <jmlowe> too short of a test to fill them
22:00:17 <oneswig> Ah, thanks
22:00:23 <jmlowe> 'sysbench --test=fileio --file-fsync-freq=0 --file-test-mode=rndrd  --num-threads=16 run'
22:00:27 <oneswig> Sorry y'all, we are at time
22:00:43 <janders> Thank you all! Don't forget to cast your votes
22:00:45 <oneswig> PEARC18 folks, have a good conference
22:00:51 <janders> I will put in mine as soon as I get into the office
22:00:52 <martial__> thanks Stig
22:00:56 <oneswig> #endmeeting