#openstack-meeting log

11:01:20 <martial__> #startmeeting Scientific-SIG
11:01:21 <openstack> Meeting started Wed Jul 18 11:01:20 2018 UTC and is due to finish in 60 minutes.  The chair is martial__. Information about MeetBot at http://wiki.debian.org/MeetBot.
11:01:22 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
11:01:24 <openstack> The meeting name has been set to 'scientific_sig'
11:01:54 <martial__> Good day, welcome to a short version of the weekly Scientific SIG meeting
11:02:05 <daveholland> o/
11:02:14 <janders> gday Martial!
11:02:26 <martial__> hello daveholland and janders
11:02:57 <martial__> Both Stig and Blair are otherwise occuppied and our agenda is light
11:03:21 <janders> Berlin CFP closed - bit of relief
11:03:49 <daveholland> John G persuaded me to put a proposal in, first time I've submitted one
11:03:50 <priteau> Hi everyone
11:04:14 <martial__> Hi priteau
11:04:24 <martial__> daveholland: cool :)
11:04:32 <janders> daveholland: great, congratulations! :) what is your preso about?
11:06:21 <daveholland> it's intended to be a high-level description of our first 12-18 months running a production OpenStack service (content to be finalized/argued about, if the proposal is accepted)
11:07:08 <martial__> look forward to hearing about it
11:07:11 <janders> very cool!
11:07:13 <janders> me too
11:07:40 <martial__> like I mentioned at the top of the hour, today is light
11:07:46 <martial__> CFP is closed
11:08:22 <janders> martial: have you put any proposals in?
11:08:29 <martial__> just to mention we have a Super Computing 18 Birds of a Feather proposal in the work to discuss OpenStack, Containers and Kubernetes
11:08:41 <janders> very cool!
11:09:01 <martial__> janders: with SC18 at the exact same time as Berlin, sadly I did not, I will be in Dallas
11:09:14 <janders> yeah it's a tough one, isn't it...
11:09:27 <martial__> sadly it is is
11:09:38 <janders> I'll almost certainly choose Berlin but I will have hard time bringing more people with me
11:10:34 <martial__> a lot of our usual HPC operators are at SC18. I have confirmed this with them
11:10:47 <martial__> some are on our BoF as well
11:11:00 <martial__> it is likely Blair will be as well
11:11:20 <martial__> Stig will be in Berlin to run the Scientific SIG
11:11:34 <janders> ACK! That's good to know
11:11:50 <martial__> and that is the entire content of the agenda that I had
11:12:05 <daveholland> hehe. I have a quick NUMA question for AOB
11:12:15 <martial__> like I said a short meeting so
11:12:18 <martial__> #topic AOB
11:12:24 <martial__> daveholland: go
11:12:44 <verdurin> Hello
11:12:48 <daveholland> so, we're using NUMA-aware instances for one particular user/project, with extra_spaces like hw:cpu_policy='dedicated', hw:cpu_thread_policy='isolate', hw:numa_nodes='2'
11:13:08 <daveholland> it's successful in that they see a performance benefit. Now we're being asked to enable this more widely.
11:13:30 <daveholland> What are people's experiences with mixing NUMA-aware/non-NUMA-aware instances on the same hypervisor?
11:13:39 <daveholland> (should be go "NUMA only"?)
11:13:59 <janders> what's your motivation behind using NUMA aware instances?
11:14:12 <janders> consistent, predictable performance?
11:14:12 <daveholland> "make it go faster"
11:14:29 <janders> I haven't tried this myself, but my gut feel is mixing it will yield random results
11:14:30 <daveholland> this is a CPU/memory access heavy workload
11:14:53 <janders> I did something similar before the NUMA aware days, I had flavors with and without CPU overcommit
11:15:02 <janders> these would be tied to different host aggregates
11:15:06 <janders> worked well
11:15:22 <janders> if I were to implement NUMA I'd probably start with the same
11:15:31 <daveholland> I should clarify, this is in an isolated aggregagte, we are looking at 26 or 52 VCPU instances (on a 56 CPU host... 28 with HT enabled)
11:15:48 <verdurin> daveholland: mixing the two sounds like a Brave and Exciting move to me
11:16:01 <daveholland> verdurin: that was my initial reaction too
11:16:31 <daveholland> we are considering enabling it for the biggest flavor only (so it would be the only instance on the hypervisor so couldn't trip over anything else)
11:16:46 <janders> do you overcommit CPUs on NUMA-aware instances?
11:17:09 <martial__> (sorry have to check on kids)
11:17:11 <janders> from your above comment I understand you don't - correct?
11:17:15 <daveholland> janders: not yet. We think it would be a Bad Idea because the cpu pinning + noisy neighbours would make life worse than expected
11:17:50 <janders> ok.. I think I'm getting it more
11:18:17 <janders> so - the concern is that you'll have enough cores to run both instances w/o overlapping cores
11:18:38 <janders> however having mix of NUMA and non-NUMA can somehow cause scheduling both VMs onto the same cores?
11:18:44 <janders> do I get this right?
11:18:47 <daveholland> janders: yes, I think that sums it up
11:18:59 <janders> (I'm using a case with two VMs to make it easier for me to follow)
11:19:47 <janders> if we take NUMA out of the picture for a second - if we had two non-NUMA instances on the hypervisor w/o CPU overcommit
11:20:05 <janders> is there any chance they would hit the same cores? My guess - no.
11:20:24 <janders> I wonder how KVM handles that
11:20:33 <priteau> From http://specs.openstack.org/openstack/nova-specs/specs/mitaka/implemented/virt-driver-cpu-thread-pinning.html, I read: If the host does have an SMT architecture (i.e. one or more cores have “thread siblings”) then each vCPU will be placed on a different physical core and no vCPUs from other guests will be placed on the same core.
11:20:37 <janders> and then question is - how NUMA awareness can affect that
11:20:43 <daveholland> for a non-overcommit flavor/host/aggregate - I think you are correct (we don't cpu-pin for those flavours as no perceived need)
11:20:58 <priteau> It's not clear if that works only when all instances use hw:cpu_thread_policy=isolate
11:21:25 <daveholland> priteau: thanks, I hadn't seen that spec
11:21:36 <daveholland> (we are on Pike)
11:22:04 <janders> what OS and OpenStack "distro"?
11:22:06 <priteau> daveholland: It's the first hit I got on Google, but I see pretty much the same text is used on the official doc: https://docs.openstack.org/nova/pike/admin/flavors.html
11:22:10 <daveholland> RHOSP12
11:22:20 <janders> have you looked at real time KVM doco?
11:23:14 <janders> I'm not sure if they go down to that level of detail, but I remember the RHATs using these things for their NFV implementations
11:23:27 <daveholland> no (I thought that was more for SRIOV or NFV, we haven't touched those)
11:23:45 <janders> I'll have a quick look out of curiosity
11:24:04 <janders> I noticed that at my time at RHAT - and now it's on my TODO list for a bit later in the project
11:25:24 <daveholland> heh, Google tells me there are past summit presentations on RT KVM, I will check them out
11:26:28 <martial__> daveholland: keep us updated what you find in a follow up meeting?
11:26:50 <janders> I was hoping that RHAT have a dedicated section on RT KVM in OSP doco, but unfortunately not
11:26:55 <janders> maybe worth a support ticket?
11:27:04 <daveholland> Certainly will.I think we understand most of the machinery (how to pin, what to pin, what thread policy etc) - and have had success with single instance per hypervisor in a separated aggregate - our uncertainty is mixing this configuration with the vanilla flavors.
11:27:16 <daveholland> +1 to support ticket, thanks
11:27:30 <janders> idea:
11:27:40 <janders> say you have 32 pCPU
11:27:55 <janders> spin up 16x 2vCPU instances, half NUMA aware, half not
11:28:03 <janders> look at xmls to see what cores they landed on
11:28:36 <priteau> daveholland: Quick look at the code that enforces the isolate policy (nova/virt/hardware.py), I *think* that your non-NUMA instances may be forbidden to execute on the core where a NUMA instance is pinned
11:28:53 <janders> using fewer larger VMs may yield more sensible results, but chances of running into a scheduling clash are lower
11:29:17 <janders> priteau: good stuff!
11:29:52 <janders> does NUMA aware xml differ much from a vanilla one?
11:30:22 <daveholland> you get to see the CPU mapping AIUI
11:30:27 <daveholland> thanks for the ideas and pointers.
11:30:50 <janders> thanks for an interesting question! :)
11:30:57 <priteau> daveholland: You could enable debug logs and look at what Nova (I assume nova-compute) prints out
11:31:14 <priteau> It should say something like "Selected cores for pinning: …"
11:32:25 <janders> are you using cpu_mode=host-passthrough ?
11:32:25 <priteau> http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/hardware.py?h=stable/pike#n890
11:32:29 <daveholland> OK I think our best bet is to do some experiments too
11:33:26 <daveholland> we have cpu_mode=host-model (all the hypervisors are identical....... currently)
11:34:02 <janders> regarding experiments - perhaps it's worth running Linpack with just the NUMA-aware instance running
11:34:09 <janders> and then add non-NUMA-aware
11:34:16 <janders> see if there's much fluctuation
11:35:12 <janders> I only tried running Linpack in VMs with CPU passthru though
11:35:52 <janders> I'd think that if there's more of an overhead in other CPU modes it will be consistent, but it's not that I tested that..
11:36:02 <daveholland> the future is hazy but if we agreed not to want migration then host-passthrough is worth a look, yes
11:36:40 <janders> I remember needing passthru for nested virt, too
11:36:48 <janders> (OpenStack-on-OpenStack development)
11:37:15 <janders> but spot on all these optimisations can come back to bite later..
11:43:46 <daveholland> plenty to think about, thanks all
11:44:59 <martial__> thanks daveholland
11:45:20 <martial__> please follow up in the channel for an update on this
11:45:25 <martial__> another other AOB?
11:45:53 <martial__> Otherwise let me call this meeting to an end (I must go unfortunately)
11:46:00 <martial__> thanks everybody
11:46:03 <martial__> #endmeeting