11:00:47 #startmeeting scientific-sig 11:00:49 Meeting started Wed Jun 19 11:00:47 2019 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:50 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:52 The meeting name has been set to 'scientific_sig' 11:01:00 Hello 11:01:13 Hello. 11:01:16 #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_June_19th_2019 11:01:19 g'day all! 11:01:37 hi verdurin janders, glad you could make it 11:02:00 janders: have you seen martial at ISC? 11:02:15 #topic ISC roundup 11:02:29 oneswig: I haven't attended this year unfortunately 11:02:41 martial: guten tag :-) 11:02:42 Hello 11:02:48 #chair martial 11:02:49 Current chairs: martial oneswig 11:02:55 janders: ah, for some reason I thought you were going 11:02:59 my mistake 11:03:06 few of my colleagues were there though 11:03:12 we scored a decent IO500 score :) 11:03:25 as did we :-) 11:03:36 janders: coming to Shanghai btw? 11:04:01 janders: bracewell? 11:04:09 Actually talked to the organizers by chance last night 11:04:15 Mentioned the BoF 11:04:27 Was told it is comitee driven 11:04:47 that's right! :) 11:04:52 The Bof from ISC? 11:04:57 Yep 11:05:33 still thinking whether I should aim for Shanghai or not. The political climate right now is so so. Good destination though! 11:06:22 The Chinese teams here are interesting, they have a demo of sugon too 11:06:36 Really nice 11:07:00 martial: what are they demonstrating? 11:07:25 Mostly their different data centers 11:07:58 And a sale team explain how you can use it in your setup 11:08:10 looking at IO500 - you have the fastest Lustre, while we have the fastest BeeGFS :) 11:08:28 Calling that evens janders? 11:08:37 Very interesting hardware 11:08:55 Very popular beegfs here 11:09:17 the 10 node challenge was close to even :) 11:09:55 on the main list, a different story though. Very well done! 11:09:56 I think a lot of the tuning at Cambridge was done on striping for large server configurations. 11:10:31 Thanks janders, although I don't think we were involved in the performance tuning at all, just the dynamic orchestration 11:10:36 Nearly double the score of the second place. That's something! :) 11:11:10 It was about half the score of the (then) top place in the last list, which goes to show what tuning can do for Lustre! 11:11:27 your good advice helped us out with making some early architectural decisions - so thank you! :) 11:11:30 indeed 11:11:47 pretty amazing IOPS numbers too - I didn't know Lustre can do that 11:12:13 aggregate bandwidth - hell yeah but IOPS? That's something quite new. 11:12:26 Am going to walk around and read ;) 11:13:14 martial: what's the news on hpc cloud at ISC? 11:13:20 storage aside, what were the ISC highlights? 11:13:23 exactly :) 11:14:41 Trend is toward arm (very present), AMD new processors and beegfs 11:14:53 Nvidia is not present which is surprising 11:15:35 not at all? 11:16:15 Couple nvidia people for talks but that is it 11:16:38 That's surprising to me 11:16:52 Same here 11:16:53 esp given the mlnx acquisition.. 11:17:12 Mellanox is here 11:17:33 But no Nvidia "themselves" 11:19:43 Oracle, IBM, hp, Microsoft,... 11:20:08 How much public cloud presence is there? 11:20:19 Google cloud 11:20:37 Aws 11:20:44 Microsoft 11:20:55 have Google come up with an HPC offering yet? Its well overdue 11:21:15 Actually they had a booth talk about this yesterday 11:22:09 Am looking for the name 11:22:31 JohnH here. Interested in what Google are doing 11:22:32 While you're digging that up, let's shift over to HPCW 11:22:38 It provision hpc workloads 11:22:48 Hi HPCJohn :-) 11:23:21 martial: was there anything like the Bof we proposed? 11:23:46 Nothing comparable 11:24:39 huh. 11:24:41 google HPC offering = k8s DCaaS? :) 11:25:18 #link https://cloud.google.com/solutions/hpc/ 11:25:33 In the videos 11:25:51 janders: would love to see your BeeGFS tuning make its way into https://galaxy.ansible.com/stackhpc/beegfs 11:26:06 Basically deploys slurm and other tools and you can use it 11:26:28 martial: it's bobbins unless they've done something to improve the networking 11:26:37 As a regular cloud with hpc tools pre installed 11:27:26 oneswig: nice! :) we're far behind with ansible-isation of BeeGFS but hoping to catch up :) 11:28:02 Seems more of a use case at the demo I saw but seems interesting 11:28:22 As for hpcw it is happening tomorrow 11:28:44 Quite an interesting program 11:28:54 martial: ah, so my round-up agenda item was premature! 11:28:58 With 5 minutes slot for most presenters 11:29:24 yeah Thursdays at ISC can be quite interesting :) 11:29:25 It's okay, I should have access to all the slide decks 11:29:32 And should be able to share 11:29:33 martial: how many 5-minute presentations? 11:30:04 #link http://qnib.org/isc/ 11:30:14 That's the program so far 11:30:18 sounds like a lightning-talk-storm over Frankfurt.. :) 11:30:53 Maybe I could had a slide on Kayobe 11:31:19 That would be cool martial, let me know if you need resources for that 11:31:30 So what's Christian up to - apart from hosting HPCW? 11:31:39 I know Christian. Good guy 11:32:06 If it is relevant, the latest version of Singularity is out yesterday. Lots of OCI features. 11:32:24 Also you can check a Singularity container directly out of a Docker harbour, as I understand 11:33:10 The direct quoto so I get ti right an oras:// URI allowing you to push and pull SIF files to supported OCI registries! 11:33:33 I was talking to someone about the latest on containers, MPI, kubernetes, etc. 11:34:13 Christian is starting at Amazon in July I think 11:34:13 #link Latest on MPI configuration with Singularity and K8S https://pmix.org/wp-content/uploads/2019/04/PMIxSUG2019.pdf 11:35:03 Stig I will follow up on slack 11:35:21 A one page for Kayobe would be cool indeed 11:35:32 martial: I'll see what I can find for you 11:36:09 OK, so we can't cover HPCW just yet but I was definitely at the Sanger Centre OpenStack day yesterday... 11:36:21 #topic Sanger OpenStack day round-up 11:36:40 Had a good session there, some interesting talks 11:36:56 Missed it again... Eheu. 11:37:13 Gerry Reilly from Health Data Research UK gave a good high-level presentation on secure computing environments 11:37:51 Had very similar thoughts in the abstract to the kind of projects going on across the research computing / Openstack space 11:38:28 There was some interesting update from the Sanger team themselves on their giant Ceph deployment, and plans for more science and more data on OpenStack 11:38:56 I mentioned you verdurin - in reference to your description of "sidegrades" a couple of years ago when you had your Crick hat on 11:39:01 Was at the HDRUK meeting last week when those environments were being discussed. 11:39:27 Gosh. 11:39:45 verdurin: interesting. Is there any scope for alignment on these matters? 11:39:48 Stig sent you a pm on slack 11:40:23 Jani from Basel presented on something similar in Switzerland - SPHN - but with federation involved 11:41:17 There is, my impression is that there's still a diversity of viewpoints. 11:42:20 Bruno from the Crick presented on eMedlab - as an early pioneer it has plenty of hindsight to offer 11:43:07 verdurin: it would be interesting to hear how those discussions develop. 11:43:41 Me too please - hindsight is a wonderful thing. 11:43:55 I have followed eMEdlab from early days 11:45:26 I hadn't realised there was 512G RAM in each hypervisor - that's quite something, particularly in a machine that's >3 years old 11:46:56 johnthetubaguy presented on upstream developments for research computing use cases and I presented on some recent Rocky upgrade experiences 11:47:57 It was good to hear an update on the scale of the JASMIN system and their new Mirantis OpenStack deployment too. 11:49:26 #topic AOB 11:49:32 So what else is new? 11:49:40 JASMIN interesting. Scientists not using public cloud. But gathering research grants into a central facility 11:50:08 HPCJohn: It does appear to be a success story. 11:50:21 I can see that model being applied more widely - I guess in climate research there is no patient ideitifiable data! 11:51:08 That niche field intersecting climate science and genomics - seasonally-affected disorder perhaps? 11:52:21 from my side - a little follow up from Denver 11:52:22 https://bugs.launchpad.net/neutron/+bug/1829449 11:52:23 Launchpad bug 1829449 in neutron "Implement consistency check and self-healing for SDN-managed fabrics" [Wishlist,New] 11:52:28 some progress on the SDN front 11:52:56 tags: added: rfe-approved 11:53:03 thanks to those involved! :) 11:53:16 The glacier of development grinds ever onwards 11:54:02 Good work janders, keep it up! 11:54:08 We had an interesting problem using Docker and the overlay storage driver 11:54:38 It turns out that you can exceed 64K hard links to a single inode! 11:55:34 Some filsystems hide small files inside the inode 11:55:47 You are turning this on its head - hiding a filesystem in the inode 11:55:49 In yum, there is a file called checksum_type in the metadata of each package which is hard-linked to one source. ie,the hard link count is a function of the number of packages installed 11:56:20 The layers of Docker overlays, and multiple containers using the same base layers will compound this. 11:56:33 Hence we hit 65536 links - the limit for ext4 11:56:40 Solution: XFS allows more links :-) 11:57:20 I will have that with triple-nested VMs, thanks :) 11:57:34 The unexpected consequences of creating many virtual systems from one physical one. 11:58:06 We are nearly at the hour, any more to add? 11:58:22 some time back I was trying to convince mlnx to implemented SRIOV in nested-virt, but they didn't share my enthusiasm :) 11:58:30 janders: we may start work soon on HA support for Mellanox-NEO-UFM SDN 11:58:39 s/implemented/implement 11:58:54 cool! I'm heading in that direction, too 11:59:08 see you at the first hurdle then :-) 11:59:16 just hitting some roadblocks lower down the stack with new hardware 11:59:19 that's right! :) 11:59:23 no rest for the wicked 11:59:32 Cool talk soon team 11:59:40 And on that note, back to it. 11:59:42 thank you all 11:59:45 Thanks all 11:59:48 #endmeeting