#openstack-meeting log

21:02:21 <b1airo> #startmeeting scientific-wg
21:02:21 <openstack> Meeting started Tue Dec 13 21:02:21 2016 UTC and is due to finish in 60 minutes.  The chair is b1airo. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:02:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:02:25 <openstack> The meeting name has been set to 'scientific_wg'
21:02:28 <ttx> christx2 blairo: sorry for overrunning
21:02:30 <christx2> cool, listening - christoph - london, uk
21:02:34 <b1airo> no probs ttx
21:02:46 <b1airo> hi christx2 !
21:02:58 <b1airo> #chair martial
21:02:59 <openstack> Current chairs: b1airo martial
21:03:02 <trandles> hello all o/
21:03:11 * ildikov is lurking :)
21:03:14 <b1airo> hi there Time
21:03:22 <b1airo> *Tim (sorry!)
21:03:32 <trandles> lol, np
21:04:02 <b1airo> agenda is at #link https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_December_13th_2016
21:04:04 <lizhong> Hi all, I am Lizhong working with Martial from NIST
21:04:18 <b1airo> greetings lizhong, welcome
21:04:34 <b1airo> GPUs and Virtualisation:
21:04:34 <b1airo> Getting set up for virtualising GPU hardware
21:04:34 <b1airo> Performance optimisation and further topics
21:04:34 <b1airo> Monitoring round-up
21:04:34 <b1airo> Review gathered notes on wiki at https://wiki.openstack.org/wiki/ScientificWGMonitoringAndTelemetry
21:04:41 <powerd> evening / morning b1airo et al!
21:04:51 <b1airo> oneswig is an apology for today
21:04:58 <b1airo> howdy powerd
21:05:23 <b1airo> ok let's get into it then
21:05:36 <martial> b1airo: asked Lizhong to join as he is the coder of the Dmoni software so he will be able to comment/describe it if needed
21:05:36 <b1airo> #topic GPUs and Virtualisation
21:06:04 <powerd> ok so i've made a start at the GPUs investigation
21:06:20 <b1airo> thanks martial - i take it lizhong has already seen the etherpad from last couple of weeks?
21:06:33 <powerd> we put a K80 and a K2 in to a system in our lab and have passthrough working now
21:06:52 <powerd> just put a quick blog post together this afternoon - i'll update the wg wiki with the address
21:06:52 <b1airo> powerd, did you end up starting an etherpad or anything yet?
21:06:56 <martial> b1airo: yes, he contributed even
21:07:10 <rbudden> hello
21:07:33 <b1airo> i have just dug up some notes that we had and also some that Joe Topjian from Cybera passed on - i think we could make a pretty good start to some proper docs with these
21:07:44 <b1airo> g'day rbudden o/
21:08:30 <b1airo> powerd, cool - i will take an action to start an etherpad and share
21:08:41 <powerd> not started an etherpad
21:08:47 <b1airo> #action b1airo to start GPUs on OpenStack etherpad
21:09:21 <b1airo> (that should probably be "GPUs in OpenStack" - oh well)
21:09:36 <b1airo> so powerd , did you have any problems with your setup?
21:10:52 <b1airo> we've been doing the GPU passthrough thing for a fairly long time in prod now - ~2.5 years
21:11:56 <jmlowe> from what I remember of our chat the passthrough works great but there is no way currently to virtualize a pci switch?
21:12:11 <b1airo> anyone else doing GPUs with OpenStack ?
21:12:21 <jpr> can I ask a naive question.  when you use GPU within an OS instance, I assume you mean it's seen as a HW accelerator like on the compute side, or is this something about the instances graphics
21:12:44 <jmlowe> I'm in the attempt to wrestle money free to purchase some gpus stage
21:12:59 <b1airo> jmlowe, well not strictly true - you can tell qemu to emulate a particular pci/e topology, but i haven't played with it much yet - lot's of manual XML-ing required
21:13:09 <jmlowe> I'm aiming for a 1U with 4x p100's
21:13:28 <b1airo> jpr, good question - we should aim to have docs answering this sort of thing up front
21:13:29 <powerd> The starting point of some instructions have been added to: https://wiki.openstack.org/wiki/ScientificWGGPUs
21:13:48 <jmlowe> b1airo: ok, might be possible but certainly not baked in
21:14:07 <powerd> No - no real problems setting up. But only starting to look at the perf now.
21:14:29 <powerd> We'll have P100 in the lab by the end of the week so will be giving that a spin too
21:14:38 <jpr> @powerd: thanks. that's clear
21:14:41 <b1airo> jmlowe, correct - and also, i'm not even sure whether the emulated pci topology is the problem for e.g. CUDA P2P
21:14:51 <b1airo> it could actually be PCIe ACS
21:15:13 <lizhong> In our team, we have someone did passthrough with OpenStack. The GPU appears in the VM, however it fails when you start a GPU program.
21:15:39 <b1airo> jpr, going back to your question - we use GPU passthrough both for GPU compute and to accelerate grpahics, e.g. with VirtualGL
21:17:21 <b1airo> lizhong, we have seen those sort of problems plenty - the two primary issues that cause that appear to be either hypervisor support (i.e. not new enough qemu/kvm) or a driver in the host already bound to the device
21:17:42 <b1airo> do you know what hypervisor version you were using?
21:19:05 <jpr> @b1airo: ah, so do you use that under the hood of the instance to make it think it has GPU hardware  or within the context of the instance via the virtualgl support for graphics at the user level
21:19:06 <lizhong> I don't know the exact version, but I'll let my colleague know.
21:19:11 <b1airo> powerd, what hypervisor platform are you using?
21:19:14 <lizhong> thanks
21:19:25 <leong> fyi.. there is a talk in Barcelona about GPU on openstack.
21:20:02 <leong> #link Barcelona Presentation about GPU: https://www.youtube.com/watch?v=j9DmE_oUo5I
21:20:43 <rbudden> FWIW, we have K80s and P100s currently that I may try and integrate in OpenStack at some point, but no immediate plans since they are under heavy us through interactive/batch/reservation jobs. If there’s some interesting research/tests to be done I could snag some time on the debug node
21:20:57 <b1airo> jpr, from inside the instance. particularly for HPC desktops, we have a cluster focusing on characterisation and visualisation, so accelerated rendering is important for many of those use-cases and with a virtual desktop on the cluster all data on the PFS is local
21:22:03 <b1airo> rbudden, cool - we're about to get a rack full of P100 (PCIe) nodes, so far I haven't had any confirmation of anyone using them in pass-through, so crossing fingers at this point!
21:22:14 <powerd> lizhong/b1ario - we saw that hang on workload with previous versions of OS/KVM ourselves also. Using centos 7 with liberty which works fine. Will be updating to Mitaka soon too and will report back.
21:23:01 <b1airo> powerd, i assume you are getting the virt stack from another repo though?
21:23:22 <b1airo> the RHEV equivalent for centos maybe?
21:24:02 <jmlowe> rbudden: b1airo: I'd love to hear how it goes with the p100's
21:24:17 <b1airo> leong, thanks for reminding me about that talk - if i recall correctly that was more about Intel's GPU virtualisation specifically?
21:24:39 <leong> b1airo: yup
21:25:00 <jpr> @b1airo: nice! thanks.
21:25:08 <powerd> virt stack is either epel or standard centos core - i'd need to go double check (or RDO)
21:26:01 <b1airo> leong, what is the target use-case for that? VM graphics acceleration?
21:28:03 <leong> graphics virtualization...
21:28:22 <b1airo> i have seen some work on the libvirt-dev list to add support for virtualisation-able (i.e. device multiplexing) NVIDIA gear too
21:28:34 <leong> can be used for media transcoding or hpc use cases as well...
21:29:23 <b1airo> DNNs? (sorry, just playing buzz-word bingo) ;-)
21:30:11 <jmlowe> virtualizing the gpu's, this is new to me, I also need the buzzwords to google
21:30:17 <b1airo> so one problem that we have not yet solved with using host device based accelerators like GPUs in OpenStack is that it turns nodes into unicorns
21:31:30 <b1airo> e.g. i've got a bunch of hypervisors with 2x K1 or 2x K2, which equates to 4 or 8 usable (passthrough-able) GPUs and thus 4-8 GPU-enabled cloud instances that can run on that node
21:31:50 <leong> there is some related work in the dev community
21:31:50 <martial> so if we can get the GPU passed through we can share it :) and then nvidia themselves created docker-nvidia to split the GPU into multiple sub systems https://github.com/NVIDIA/nvidia-docker
21:32:19 <leong> this one is about adding support to xenapi
21:32:35 <leong> #link XenAPI/GPU: https://review.openstack.org/#/c/280099/
21:33:00 <b1airo> that's all fine until we realise that those GPU instances are often only lightly utilised and we have a bunch of excess compute+memory capacity on the node
21:33:47 <b1airo> the problem is, i have not yet figured out a way with nova-scheduler to always have room for GPU instances but also let regular CPU-only instances onto the node
21:36:29 <b1airo> i'm guessing there are not that many people who have felt that problem yet though
21:36:48 <b1airo> or just don't care because they only have a few GPU nodes anyway
21:37:01 <powerd> another thing I plan to look into is bitfusion for virtualising the GPUs - could be useful for this VDI-like requirement
21:37:20 <b1airo> bitfusion? don't think i've seen that
21:37:22 <goldenfr_> yea the 2nd one for us, only a few GPU nodes so its not a huge issue yet
21:37:24 <lizhong> I didn't some benchmarking for GPU with  baremetal, KVM and Docker. Baremetal and docker get really better performance than KVM. If we have baremetal + GPU on OpenStak would be really nice.
21:37:32 <priteau> b1airo: you want more CPU overcommit on the GPU nodes, but only when there are GPU instances running?
21:37:48 <rbudden> b1airo: agreed. i believe we tend to ignore the CPUs on the GPU nodes
21:37:50 <lizhong> Sorry for type. I DID some benchmarking for GPU with  baremetal, KVM and Docker. Baremetal and docker get really better performance than KVM. If we have baremetal + GPU on OpenStak would be really nice.
21:38:20 <rbudden> essentially we schedule no CPU jobs during GPU jobs. the GPU jobs get a portion of the CPU based on the number of GPUs assigned.
21:38:35 <powerd> www.bitfusion.io - not opensource but we'll give it a spin anyway.
21:39:15 <b1airo> looks suspiciously magic
21:39:52 <b1airo> lizhong, that's interesting about your benchmarking - the papers i have read all indicate KVM with passthrough is or is almost BM performance
21:39:53 <jmlowe> lizhong: which version of qemu?
21:40:52 <b1airo> of course, you need to have pinning, numa topology, etc. so perhaps that is difference you saw? we have seen very bad perf degradation for GPU heavy workloads on KVM without that tuning
21:42:23 <lizhong> b1airo, actually that's possible, if it's a tuned KVM
21:42:38 <b1airo> also, you can have baremetal GPU on OpenStack, you just need an Ironic cloud with GPUs in the nodes - i'm not sure what the hardware discovery and scheduling support is like though
21:43:26 <b1airo> powerd, what perf tests are you planning?
21:44:01 <rbudden> b1airo: true, we have our GPUs controlled through Ironic, although we don’t do anything fancy on the scheduling side since SLURM handles that
21:44:19 <rbudden> we do have a flavor for the GPUs so we can boot independant images if necessary
21:45:21 <powerd> So we are planning on performing some of the host <-> gpu transfer speeds/latencies, linpack and a couple of others. anything to add?
21:45:48 <priteau> b1airo: for baremetal you can add extra_specs to your flavor and use that for scheduling on a specific node type
21:46:27 <powerd> we'll add more GPUs and try node to node / GPUDirect (on a single host, then across fabric too)
21:46:56 <b1airo> priteau, ah cool
21:47:25 <martial> b1airo: 13 minutes to go
21:48:10 <b1airo> and to answer your earlier question about mixing GPU-enabled and CPU-only instances on the same hypervisor, i just always want to be able to launch n GPU instances (where n is # of GPUs in the node), but otherwise i'm happy to fill available capacity
21:48:42 <b1airo> (this is for general purpose cloud usage, not specifically HPC, so lots of idle dev instances and light desktop acceleration stuff)
21:49:08 <b1airo> powerd, excellent - that's the problem i'm currently stuck on!
21:49:46 <martial> what is the wiki page where people can add their notes on the GPU work ?
21:49:55 <b1airo> ok, that was useful (for me anyway!). martial lizhong did you want to cover off anything about Dmoni
21:49:56 <powerd> good - guess i'll be stuck there too so soon enough ;)
21:50:06 <b1airo> martial, i will draft an etherpad and share so that everyone can edit
21:50:42 <martial> I see a lot of what is needed to be known is already in https://wiki.openstack.org/wiki/ScientificWGMonitoringAndTelemetry
21:50:51 <b1airo> #topic Monitoring round-up
21:50:56 <martial> lizhong entered this content on the etherpad at the time
21:50:59 <priteau> b1airo: it sounds a bit tricky. There may be a combination of nova-scheduler filters and their config to do it, but I don't know it!
21:51:20 <martial> we have started the discussion on adding the code to github
21:51:35 <martial> so that people can see how it works
21:51:54 <martial> we have also discussed the release of our prototype BDaaS VM that relies on DMoni
21:52:06 <b1airo> priteau, i had some ideas i discussed with Joe Topjian (Cybera) a long while back - will add them to the etherpad
21:52:16 <b1airo> would be good to get your thoughts
21:52:35 <martial> lizhong is the main engineer on this tool, so I will let him comment
21:52:41 <b1airo> martial, sounds good
21:52:56 <b1airo> BDaaS = big data ?
21:53:43 <martial> b1airo: yes
21:54:25 <martial> we started the work to provide a mean to run a VM with builtin BD tools within our stack on a sequestered set of data
21:54:30 <lizhong> Dmoni is cluster monitoring tool targeting specific application running in a cluster
21:54:40 <lizhong> like Hadoop, Spark, etc.
21:55:09 <martial> and we wanted to know the benchmarking of different algorithms under different paradigms at the "step" level
21:55:10 <jmlowe> from now on I will be pronouncing bdaas as badass
21:57:02 <b1airo> jmlowe, i have a sticker on my laptop that says BADaaS, but it means something else - i don't even know what, it was just fun o_0
21:57:11 <martial> so Dmoni was created to give us additional hooks into the benchmarking layer
21:57:22 <lizhong> Dmoni differs from Ganglia and other cluseter monitoring tools which collect all system info
21:57:27 <b1airo> martial, the BDaaS thing might be interesting in the context of the scientific datasets activity area
21:57:42 <martial> b1airo: happy to discuss it obviously
21:59:33 <b1airo> sorry - just been distracted by a security incident on my cloud
22:00:06 <b1airo> someone spamming from it, the joys of having a research user-bae
22:00:16 <martial> b1airo: uncool
22:00:19 <b1airo> time to wrap it up
22:00:26 <martial> yep 5pm
22:00:28 <jmlowe> glad to see it happens to everybody
22:00:31 <b1airo> thanks all!!
22:00:32 <martial> (well here :) )
22:00:39 <b1airo> #endmeeting