11:00:47 <oneswig> #startmeeting scientific-sig
11:00:49 <openstack> Meeting started Wed Jun 19 11:00:47 2019 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
11:00:50 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
11:00:52 <openstack> The meeting name has been set to 'scientific_sig'
11:01:00 <oneswig> Hello
11:01:13 <verdurin> Hello.
11:01:16 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_June_19th_2019
11:01:19 <janders> g'day all!
11:01:37 <oneswig> hi verdurin janders, glad you could make it
11:02:00 <oneswig> janders: have you seen martial at ISC?
11:02:15 <oneswig> #topic ISC roundup
11:02:29 <janders> oneswig: I haven't attended this year unfortunately
11:02:41 <oneswig> martial: guten tag :-)
11:02:42 <martial> Hello
11:02:48 <oneswig> #chair martial
11:02:49 <openstack> Current chairs: martial oneswig
11:02:55 <oneswig> janders: ah, for some reason I thought you were going
11:02:59 <oneswig> my mistake
11:03:06 <janders> few of my colleagues were there though
11:03:12 <janders> we scored a decent IO500 score :)
11:03:25 <oneswig> as did we :-)
11:03:36 <martial> janders: coming to Shanghai btw?
11:04:01 <oneswig> janders: bracewell?
11:04:09 <martial> Actually talked to the organizers by chance last night
11:04:15 <martial> Mentioned the BoF
11:04:27 <martial> Was told it is comitee driven
11:04:47 <janders> that's right! :)
11:04:52 <oneswig> The Bof from ISC?
11:04:57 <martial> Yep
11:05:33 <janders> still thinking whether I should aim for Shanghai or not. The political climate right now is so so. Good destination though!
11:06:22 <martial> The Chinese teams here are interesting, they have a demo of sugon too
11:06:36 <martial> Really nice
11:07:00 <oneswig> martial: what are they demonstrating?
11:07:25 <martial> Mostly their different data centers
11:07:58 <martial> And a sale team explain how you can use it in your setup
11:08:10 <janders> looking at IO500 - you have the fastest Lustre, while we have the fastest BeeGFS :)
11:08:28 <oneswig> Calling that evens janders?
11:08:37 <martial> Very interesting hardware
11:08:55 <martial> Very popular beegfs here
11:09:17 <janders> the 10 node challenge was close to even :)
11:09:55 <janders> on the main list, a different story though. Very well done!
11:09:56 <oneswig> I think a lot of the tuning at Cambridge was done on striping for large server configurations.
11:10:31 <oneswig> Thanks janders, although I don't think we were involved in the performance tuning at all, just the dynamic orchestration
11:10:36 <janders> Nearly double the score of the second place. That's something! :)
11:11:10 <oneswig> It was about half the score of the (then) top place in the last list, which goes to show what tuning can do for Lustre!
11:11:27 <janders> your good advice helped us out with making some early architectural decisions - so thank you! :)
11:11:30 <janders> indeed
11:11:47 <janders> pretty amazing IOPS numbers too - I didn't know Lustre can do that
11:12:13 <janders> aggregate bandwidth - hell yeah but IOPS? That's something quite new.
11:12:26 <martial> Am going to walk around and read ;)
11:13:14 <oneswig> martial: what's the news on hpc cloud at ISC?
11:13:20 <janders> storage aside, what were the ISC highlights?
11:13:23 <janders> exactly :)
11:14:41 <martial> Trend is toward arm (very present), AMD new processors and beegfs
11:14:53 <martial> Nvidia is not present which is surprising
11:15:35 <oneswig> not at all?
11:16:15 <martial> Couple nvidia people for talks but that is it
11:16:38 <oneswig> That's surprising to me
11:16:52 <martial> Same here
11:16:53 <janders> esp given the mlnx acquisition..
11:17:12 <martial> Mellanox is here
11:17:33 <martial> But no Nvidia "themselves"
11:19:43 <martial> Oracle, IBM, hp, Microsoft,...
11:20:08 <oneswig> How much public cloud presence is there?
11:20:19 <martial> Google cloud
11:20:37 <martial> Aws
11:20:44 <martial> Microsoft
11:20:55 <oneswig> have Google come up with an HPC offering yet?  Its well overdue
11:21:15 <martial> Actually they had a booth talk about this yesterday
11:22:09 <martial> Am looking for the name
11:22:31 <HPCJohn> JohnH here. Interested in what Google are doing
11:22:32 <oneswig> While you're digging that up, let's shift over to HPCW
11:22:38 <martial> It provision hpc workloads
11:22:48 <oneswig> Hi HPCJohn :-)
11:23:21 <oneswig> martial: was there anything like the Bof we proposed?
11:23:46 <martial> Nothing comparable
11:24:39 <oneswig> huh.
11:24:41 <janders> google HPC offering = k8s DCaaS? :)
11:25:18 <martial> #link https://cloud.google.com/solutions/hpc/
11:25:33 <martial> In the videos
11:25:51 <oneswig> janders: would love to see your BeeGFS tuning make its way into https://galaxy.ansible.com/stackhpc/beegfs
11:26:06 <martial> Basically deploys slurm and other tools and you can use it
11:26:28 <oneswig> martial: it's bobbins unless they've done something to improve the networking
11:26:37 <martial> As a regular cloud with hpc tools pre installed
11:27:26 <janders> oneswig: nice! :) we're far behind with ansible-isation of BeeGFS but hoping to catch up :)
11:28:02 <martial> Seems more of a use case at the demo I saw but seems interesting
11:28:22 <martial> As for hpcw it is happening tomorrow
11:28:44 <martial> Quite an interesting program
11:28:54 <oneswig> martial: ah, so my round-up agenda item was premature!
11:28:58 <martial> With 5 minutes slot for most presenters
11:29:24 <janders> yeah Thursdays at ISC can be quite interesting :)
11:29:25 <martial> It's okay, I should have access to all the slide decks
11:29:32 <martial> And should be able to share
11:29:33 <oneswig> martial: how many 5-minute presentations?
11:30:04 <martial> #link http://qnib.org/isc/
11:30:14 <martial> That's the program so far
11:30:18 <janders> sounds like a lightning-talk-storm over Frankfurt.. :)
11:30:53 <martial> Maybe I could had a slide on Kayobe
11:31:19 <oneswig> That would be cool martial, let me know if you need resources for that
11:31:30 <oneswig> So what's Christian up to - apart from hosting HPCW?
11:31:39 <HPCJohn> I know Christian. Good guy
11:32:06 <HPCJohn> If it is relevant, the latest version of Singularity is out yesterday. Lots of OCI features.
11:32:24 <HPCJohn> Also you can check a Singularity container directly out of a Docker harbour, as I understand
11:33:10 <HPCJohn> The direct quoto so I get ti right    an oras:// URI allowing you to push and pull SIF files to supported OCI registries!
11:33:33 <oneswig> I was talking to someone about the latest on containers, MPI, kubernetes, etc.
11:34:13 <martial> Christian is starting at Amazon in July I think
11:34:13 <oneswig> #link Latest on MPI configuration with Singularity and K8S https://pmix.org/wp-content/uploads/2019/04/PMIxSUG2019.pdf
11:35:03 <martial> Stig I will follow up on slack
11:35:21 <martial> A one page for Kayobe would be cool indeed
11:35:32 <oneswig> martial: I'll see what I can find for you
11:36:09 <oneswig> OK, so we can't cover HPCW just yet but I was definitely at the Sanger Centre OpenStack day yesterday...
11:36:21 <oneswig> #topic Sanger OpenStack day round-up
11:36:40 <oneswig> Had a good session there, some interesting talks
11:36:56 <verdurin> Missed it again... Eheu.
11:37:13 <oneswig> Gerry Reilly from Health Data Research UK gave a good high-level presentation on secure computing environments
11:37:51 <oneswig> Had very similar thoughts in the abstract to the kind of projects going on across the research computing / Openstack space
11:38:28 <oneswig> There was some interesting update from the Sanger team themselves on their giant Ceph deployment, and plans for more science and more data on OpenStack
11:38:56 <oneswig> I mentioned you verdurin - in reference to your description of "sidegrades" a couple of years ago when you had your Crick hat on
11:39:01 <verdurin> Was at the HDRUK meeting last week when those environments were being discussed.
11:39:27 <verdurin> Gosh.
11:39:45 <oneswig> verdurin: interesting.  Is there any scope for alignment on these matters?
11:39:48 <martial> Stig sent you a pm on slack
11:40:23 <oneswig> Jani from Basel presented on something similar in Switzerland - SPHN - but with federation involved
11:41:17 <verdurin> There is, my impression is that there's still a diversity of viewpoints.
11:42:20 <oneswig> Bruno from the Crick presented on eMedlab - as an early pioneer it has plenty of hindsight to offer
11:43:07 <oneswig> verdurin: it would be interesting to hear how those discussions develop.
11:43:41 <HPCJohn> Me too please - hindsight is a wonderful thing.
11:43:55 <HPCJohn> I have followed eMEdlab from early days
11:45:26 <oneswig> I hadn't realised there was 512G RAM in each hypervisor - that's quite something, particularly in a machine that's >3 years old
11:46:56 <oneswig> johnthetubaguy presented on upstream developments for research computing use cases and I presented on some recent Rocky upgrade experiences
11:47:57 <oneswig> It was good to hear an update on the scale of the JASMIN system and their new Mirantis OpenStack deployment too.
11:49:26 <oneswig> #topic AOB
11:49:32 <oneswig> So what else is new?
11:49:40 <HPCJohn> JASMIN interesting. Scientists not using public cloud.   But gathering research grants into a central facility
11:50:08 <oneswig> HPCJohn: It does appear to be a success story.
11:50:21 <HPCJohn> I can see that model being applied more widely - I guess in climate research there is no patient ideitifiable data!
11:51:08 <oneswig> That niche field intersecting climate science and genomics - seasonally-affected disorder perhaps?
11:52:21 <janders> from my side - a little follow up from Denver
11:52:22 <janders> https://bugs.launchpad.net/neutron/+bug/1829449
11:52:23 <openstack> Launchpad bug 1829449 in neutron "Implement consistency check and self-healing for SDN-managed fabrics" [Wishlist,New]
11:52:28 <janders> some progress on the SDN front
11:52:56 <janders> tags:	added: rfe-approved
11:53:03 <janders> thanks to those involved! :)
11:53:16 <oneswig> The glacier of development grinds ever onwards
11:54:02 <oneswig> Good work janders, keep it up!
11:54:08 <oneswig> We had an interesting problem using Docker and the overlay storage driver
11:54:38 <oneswig> It turns out that you can exceed 64K hard links to a single inode!
11:55:34 <HPCJohn> Some filsystems hide small files inside the inode
11:55:47 <HPCJohn> You are turning this on its head - hiding a filesystem in the inode
11:55:49 <oneswig> In yum, there is a file called checksum_type in the metadata of each package which is hard-linked to one source.  ie,the hard link count is a function of the number of packages installed
11:56:20 <oneswig> The layers of Docker overlays, and multiple containers using the same base layers will compound this.
11:56:33 <oneswig> Hence we hit 65536 links - the limit for ext4
11:56:40 <oneswig> Solution: XFS allows more links :-)
11:57:20 <janders> I will have that with triple-nested VMs, thanks :)
11:57:34 <oneswig> The unexpected consequences of creating many virtual systems from one physical one.
11:58:06 <oneswig> We are nearly at the hour, any more to add?
11:58:22 <janders> some time back I was trying to convince mlnx to implemented SRIOV in nested-virt, but they didn't share my enthusiasm :)
11:58:30 <oneswig> janders: we may start work soon on HA support for Mellanox-NEO-UFM SDN
11:58:39 <janders> s/implemented/implement
11:58:54 <janders> cool! I'm heading in that direction, too
11:59:08 <oneswig> see you at the first hurdle then :-)
11:59:16 <janders> just hitting some roadblocks lower down the stack with new hardware
11:59:19 <janders> that's right! :)
11:59:23 <janders> no rest for the wicked
11:59:32 <martial> Cool talk soon team
11:59:40 <oneswig> And on that note, back to it.
11:59:42 <janders> thank you all
11:59:45 <oneswig> Thanks all
11:59:48 <oneswig> #endmeeting