21:00:26 #startmeeting scientific-sig 21:00:26 Meeting started Tue Jun 12 21:00:26 2018 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:27 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:30 The meeting name has been set to 'scientific_sig' 21:00:45 Think I got the spelling right this week... 21:00:59 #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_June_12th_2018 21:02:33 g'day everyone 21:02:41 Hello - Jacob? 21:02:53 yes, that's me :) 21:03:03 how are you Stig? 21:03:14 Good, but busy... 21:03:36 Rolling out a deploy of bare metal + IB, as it happens... 21:03:49 (although we are not onto the interesting bits yet) 21:03:50 sounds familiar :) that'd do it to you 21:04:07 How's your work going? 21:04:24 I keep hearing new MOFED with eth_ipoib will be out soon (even if only a beta) 21:04:57 it's good. I've been focusing mostly on the networking lately, running into un-interesting ethernet problems... (solved) 21:04:58 I'm quite pleased we decided not to depend on it. 21:05:23 I read through your solution - clever! :) 21:05:27 All ethernet problems are interesting, in the right audience :-) 21:05:39 the more interesting part is multirail 21:05:42 thanks janders - borne of necessity 21:05:57 I got it working, though there are some "interesting" issues 21:06:06 have you ever tried multirail IB with Ironic? 21:06:23 No. We only have one NIC and we use one port as IB and the other as Ethernet 21:06:32 I need it for 1) storage 2) AI (DGX) systems 21:06:37 What happens? 21:06:50 the issue I'm seeing is - if I have multiple IB ports defined, instance creation fails 21:06:54 "neutron timeout" 21:07:01 (though no errors in neutron) 21:07:05 dug any deeper? 21:07:27 however if I have one ironic port on instance creation and add the other three after - that works and I can attach ports OK 21:07:44 Morning. A bit hectic here, need 10 mins to get kids organised... 21:07:45 I suspect something between neutron server and NEO. I need to chat to my mlnx friends.. 21:08:01 good morning Blair 21:08:02 I was wondering if you were getting the wrong "MAC" for DHCP on the wrong physical rail 21:08:04 Hey b1airo 21:08:08 #chair b1airo 21:08:10 Current chairs: b1airo oneswig 21:08:11 Hi Stig, Blair 21:08:19 aha, hi martial_ 21:08:23 #chair martial_ 21:08:25 Current chairs: b1airo martial_ oneswig 21:08:40 How many DockerCons have you been to this year martial_? :-) 21:08:55 two this year :) 21:09:09 Federal one + the real US one ... not the EU one 21:09:12 Can't get enough of the stuff, eh... 21:09:38 something like that :) 21:10:19 How's the level of interest in GPU/AI/ML (and by some extension, HPC)? 21:11:33 will tell you when it really start (ie tomorrow) 21:11:40 Ah, the calm before the storm. 21:11:51 Has DataMachines got a booth? 21:11:51 today is more see who is out there, what ... 21:11:56 OpenStack has a booth 21:12:01 DMC does not 21:12:15 You're manning the OpenStack booth? 21:12:33 The scientific effort will be in a SIG tomorrow. I will attend 21:12:45 I am not, Chris Hoge and others are 21:13:50 Something I don't know, what's the overlap between Kata and Docker? Lots? None? 21:14:04 this DockerCon SIG meeting sounds interesting - maybe it's something we could add to next week's agenda for this meeting? :) 21:14:27 some, kata is a container solution native to OpenStack, Docker is still not 21:14:33 I'm guessing the images are different, given the Kata ones are booted rather than started 21:15:07 this reminds me of something from the Vancouver Summit that might be worth mentioning 21:15:12 I will tell you after I talk to the people of the SIG (Christine and Christian are going to be there) 21:15:12 Yes, but you can make a Kata container from a Docker image I believe 21:15:35 some of you have attended the DK Panda's talk on the last day 21:15:46 he was talking about optimising overheads in containers 21:15:47 yes, same as CharlieCloud or Singularity (or docker save to tar) 21:15:57 martial_: conversely, thanks to Kolla, OpenStack runs natively well on Docker (once you disable all that networking...) 21:16:20 I had to leave early to attend a meeting so didn't get to ask this question but where did he see these overheads? 21:16:23 networking I suppose? 21:17:00 janders: networking is painful owing to the overlays and bridging on the host side. 21:17:18 indeed 21:17:53 but I wonder if he's hit container overheads in other areas 21:17:56 I'm interested in the scope for VF pass-through, SRIOV style. In a way that doesn't circumvent the overlays - wonder if that is possible 21:18:38 janders: filesystems of many overlays can often be cited - each adding a layer of indirection 21:18:58 re SRIOV/vf - to my best knowledge nothing like this exists today with k8s and kuryr, howver it would be of interest to us as well 21:20:00 speaking of filesystems - do you have experience with passing through an RDMA-native system mounted on the "container host" to containers? 21:20:04 I think the Cloud Native foundation will be there as well, I want to talk to there as well 21:20:46 janders: We've used GlusterFS like that I believe and I think it JFWed 21:21:16 martial_: keep an eye out for Michael Jennings - I saw a really great presentation from him in April on Charliecloud 21:22:36 janders: it ought to work because the container's seeing a filesystem, the VFS implementation shouldn't be affected by the encapsulation (in theory) 21:23:27 oneswig: thanks! :) I will likely try a similar approach soon 21:24:04 martial_: be very interested to know if you hear more about https://rootlesscontaine.rs or similar efforts - unprivileged container runtimes 21:24:40 Yeh, I think someone here at Monash tried it on a DGX and it works 21:24:59 janders: for extra points, orchestrate your GlusterFS volume using Manila 21:25:01 interesting indeed 21:25:11 do not know if they are here but I can try to check 21:25:39 Hey there. Sorry, been lurking and working...I can ask Michael Jennings if he's going to be at if someone fills in that blank for me. 21:25:50 Hi Tim! Smoked you out :-) 21:25:55 oneswig: we'll likely use BeeGFS or perhaps GPFS 21:25:56 indeed 21:25:59 DockerCon 21:26:19 Hi Tim, yes DockerCon :) 21:26:27 b1airo: do you happen to run your DGXes with OpenStack? 21:26:28 janders: I have an ansible module under my wing for BeeGFS but it's a little alpha right now. 21:26:46 How are you provisioning the filesystem and mounting the client? 21:27:18 janders: not today, but we are planning to put Ironic under everything eventually 21:27:34 @mej: Nope, no DockerCon for me. Would love to go sometime, though. 21:27:53 brb 21:28:29 oneswig: with our BeeGFS, we've finished specing out the hardware and the kit is "in flight" so we haven't done much hands on work yet, but it is coming soon 21:29:03 janders: might have these roles on Ansible Galaxy by then, will let you know if so... 21:29:33 b1airo: thanks! DGX1 is one of my motivations behind implementing multi-rail IB with Ironic. 21:30:02 oneswig: great! we would be more than happy to work with you on this one 21:30:19 Guess we ought to look at the agenda... 21:30:24 good call 21:30:43 b1airo: martial_: can you guys drive a sec - family bed-time matters to attend to... 21:31:15 Copy oneswig 21:31:24 sure thing 21:32:02 joined a little after the hour ... need to quickly check the agenda link 21:32:05 #link https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_June_12th_2018 21:32:50 #topic sensitive data processing on OpenStack 21:33:10 mmmh extra agenda item? 21:33:14 how sensitive? 21:33:23 Firstly, we're still collecting input on this and trying to form up what efforts here should look like 21:34:16 so my experience with this is related to a private cloud type work 21:34:17 back now. Good topic :-) 21:34:20 I think we can at least extend the book by a chapter or two 21:34:33 agreed, if we can 21:34:39 not a shared cloud 21:34:49 for SC18? 21:34:54 (re book) 21:34:59 martial_: private in the most private sense, ie one tenant only? 21:35:15 Maybe, not sure whether we'll have it ready 21:35:49 Ha, well tenant per hypervisor isolation is one control we've considered using 21:36:06 But remember, "defence in depth"! 21:36:17 I'm interested in hearing all the hoops people jump through and what each gives them 21:36:50 more with known tenants 21:37:01 otherwise we have to create separate hardware instances 21:37:12 How would you implement per-tenant hypervisor isolation - is there a filter for doing that or would it be about creating host aggregates with restricted visibility (if that's possible) 21:37:21 good news, Ansible makes it easier obviously 21:37:25 #link https://etherpad.openstack.org/p/Scientific-SIG-Controlled-Data-Research 21:37:44 martial: even easier with ansible + ironic 21:37:53 oneswig: yeah there is an isolation filter 21:38:27 But other ways with aggregates too, depends on the overall cloud design I guess 21:38:51 Might be worth a line item under the technologies section b1airo 21:39:07 janders: what are you doing for cleaning - how thorough? 21:39:24 Yeah I still haven't dumped much in the etherpad yet, definitely have more to add 21:39:39 b1airo: the perfect accompaniment to breakfast :-) 21:39:51 I think AggregateMultiTenancyIsolation is one of the keywords 21:40:02 Ha, I'm already smearing honey on my phone so why not! 21:40:06 I personally haven't used it though, just remember it exists 21:41:05 Thanks janders, good to know 21:41:16 oneswig: not cleaning yet, though will be looking at this shortly. Are you thinking storage, firmware or both? 21:41:25 Ok, shall we move on briefly..? 21:41:46 janders: the first spanner is local disks, if you have more than one. 21:42:05 We can follow up offline, we've got some handy tools up on github for when you tackle this 21:42:16 oneswig: ok! thanks 21:42:22 oneswig: do self encrypting drives help there? 21:42:54 yes - insta-scramble - without that one of our bare metal flavors takes 29 hours to clean. Not much use... 21:43:16 ouch! 21:43:30 Lol, yeah I think our users would assume Nova was out to lunch if that happened 21:43:50 Thankfully we only have one of those nodes, but it's too long. What we really want is LVM signatures and partition tables erasing. 21:44:11 That's enough to prevent it breaking the next deployment. 21:44:25 I'm hoping to use some of the smarts in the SSDs to help 21:44:32 anyway, let's move on... 21:44:43 fair enough 21:44:44 #topic workload management 21:45:29 We have a summer intern to work on polishing a slurm-as-a-service packaging 21:46:00 Sounds interesting oneswig , what needs polishing? 21:46:18 And how does that relate to what's in openhpc? 21:46:24 Right now, we've got a bag of tools, it really needs some better integration without loss of modularity 21:47:19 There's a pair of Ansible modules on Galaxy for creating heat stacks or magnum clusters and spitting out a structured ansible inventory for further config. 21:47:50 It works pretty well for people who know it's innards. Should be easier to use though. 21:48:02 Ok, sounds interesting 21:48:49 #link heat clusters https://galaxy.ansible.com/stackhpc/cluster-infra/ 21:48:52 Is that largely about setting up the inventory then or is the idea to go all the way to compete running SLURM 21:49:04 *complete 21:49:11 #link magnum k8s/swarm clusters: https://galaxy.ansible.com/stackhpc/os-container-infra/ 21:49:52 b1airo: we follow a policy of only doing the infra in heat because it's like working in a small dark cupboard when you're debugging something 21:50:06 Ansible takes over pretty much as soon as the infra's up. 21:50:21 Using OpenHPC, it's pretty easy to put Slurm on top 21:50:55 Right 21:51:07 That ansible's here: https://github.com/stackhpc/p3-appliances - somewhat ragged round the edges 21:51:33 So SLURM seems to be all the rage but has anyone had any luck getting it to behave nicely in a dynamic environment? 21:51:55 oneswig: ragged around the edges, I can relate! 21:51:56 To make a slurm config file from an ansible inventory: https://github.com/stackhpc/p3-appliances/blob/master/ansible/roles/openhpc_runtime/templates/slurm.conf.j2 21:52:30 It turns out Ansible's facts have just enough data to populate what Slurm requires - phew 21:52:47 oneswig: nice work! 21:52:53 b1airo: by "dynamic environment" do you mean adding/removing compute nodes? 21:52:59 on the fly 21:53:06 The dynamic one's interesting. A lot of people follow the maximum-sized cluster, and work with most of it missing. 21:53:10 trandles: yep 21:53:29 last I looked closely there was functionality around "cloud burst" type stuff 21:53:51 The thorny bit - would love to see a working demo of this - is scaling down 21:54:00 the max cluster stuff oneswig references is another way, you just look like you have a load of down nodes when slurmd isn't running on cloud nodes 21:54:01 Is it really cloud burst or actually thinly disguised power management... 21:54:32 b1airo: I can't answer that, never gave it a serious look 21:54:44 Can you rename nodes without bringing slurmd down? 21:55:23 there are annoying things to deal with...such as telling slurmctld and the slurmd's that they shouldn't care if the config files differ 21:55:25 b1airo: I don't think so. Or perhaps I'm thinking of Ceph OSDs 21:55:42 Another point is, I'm not sure how suitable SLURM really is for handling high throughput workloads that would be bread and butter for a dynamic cluster 21:55:51 trandles: you can do that? revolutionary. How? 21:56:16 it will be interesting to see when will cluster management software vendors add "native" OpenStack support - as in using OpenStack APIs to create all the infrastructure from scratch 21:56:23 From what I've seen at a distance it is not very good at handling huge numbers of jobs in queue 21:56:25 oneswig: there's a config param in later versions, not sure if it only appeared in 17.X or earlier, to say "don't hash the configs" 21:56:33 I was having a chat about this with Bright recently, they seem to be working on it 21:56:40 we 21:56:54 b1airo: huge as in? Cambridge Slurm, apparently the queue was normally 4 days long 21:56:55 've gone slurm-only at LANL because moab couldn't handle large queues efficiently 21:57:23 slurm can limit queue sizes, both max and by UID 21:57:29 so one user can't hose up the works 21:57:37 janders: Bright's doing some interesting stuff around mixing bare metal clusters and openstack dynamically 21:57:45 Huge as in millions of jobs 21:58:01 that'll do it. 21:58:08 I know we handle 10's of thousands, not sure about millions 21:58:35 b1airo: what's the alternative? HTCondor? 21:58:36 does any HPC WLM handle millions without a 2 hour scheduling cycle? 21:59:18