09:02:57 <b1airo> #startmeeting scientific-wg
09:02:58 <openstack> Meeting started Wed Feb 15 09:02:57 2017 UTC and is due to finish in 60 minutes.  The chair is b1airo. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:02:59 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:03:00 <dariov> hello folks!
09:03:01 <openstack> The meeting name has been set to 'scientific_wg'
09:03:02 <Ellac> so the meeting is 900 evening?
09:03:33 <b1airo> Ellac, in Melbourne, yes! ;-)
09:03:43 <StefanPaetowJisc> Ciao Dariov! Evening B1airo. I hope weather in Melbourne is nice! :-)
09:03:48 <Ellac> got it now. thanks blairo :)
09:04:00 <b1airo> hi StefanPaetowJisc
09:04:04 <noggin143> Hi from Switzerland
09:04:06 <b1airo> hi dariov
09:04:18 <b1airo> hi there noggin143
09:04:19 <StefanPaetowJisc> I may become less communicative in around 30 mins… I have to join an AARC meeting :-/
09:04:56 <b1airo> yes i may have to disappear in about 30 too, so either martial continues or we may have to cut short
09:05:21 <b1airo> but martial doesn't seem to be here yet, or perhaps can't - i forget whether his TZ suits
09:05:34 <priteau> b1airo: I think it's 4 AM for martial
09:05:36 <b1airo> oneswig is on holiday this week
09:05:42 <verdurin> Morning.
09:05:50 <b1airo> ah thanks priteau - i guess not then!
09:05:55 <zioproto> morning
09:05:55 <priteau> He usually joins the other slot
09:06:03 <b1airo> hi priteau, hi verdurin
09:06:09 <priteau> Hi b1airo!
09:06:12 <b1airo> hello zioproto
09:06:29 <b1airo> ok, let me dig out the agenda...
09:06:53 <b1airo> #link https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_February_15th_2017
09:07:08 <zioproto> noggin143, hi from Zurich :)
09:07:09 <b1airo> #topic Boston Declaration update
09:07:37 <b1airo> i was hoping martial was going to give us an update on this, but as he isn't here, can anyone else?
09:07:53 <noggin143> I was in the call this week.
09:07:56 <b1airo> i have not managed to catch up on it yet so don't want to speculate
09:08:15 <b1airo> thanks noggin143, how is it looking for Boston do you think?
09:08:35 <noggin143> The documents are at https://drive.google.com/drive/u/0/folders/0B4Y7flFgUgf9dElkaFkwbUhKblU
09:08:52 <noggin143> Thursday 11th/Friday 12th confirmed for the dates
09:09:26 <noggin143> looking at a number of working groups
09:09:58 <noggin143> in areas like policy, trust, federation
09:10:27 <b1airo> are there any key things for potential signatories to figure out before the meeting
09:10:50 <noggin143> I think that will be covered in the next call.
09:11:03 <noggin143> there are some questions in the documents to think about
09:11:22 <b1airo> i'm not sure whether there are any key use-cases/stories/scenarios that have been agreed yet?
09:11:52 <noggin143> the calls are every couple of weeks for US/Europe and alternatively a more Asia friendly time in between
09:12:20 <noggin143> not yet got to the user stories, more organisational at this point
09:12:37 <b1airo> zioproto, i know you have an interest in this, are you feeling positive about how it is progressing for the moment?
09:12:42 <noggin143> one item was to take a multi-cloud approach though (i.e. not just OpenStack)
09:13:21 <b1airo> noggin143, yes that particular discussion did pass via my inbox
09:13:33 <zioproto> mmm progress is a bit slow
09:13:59 <zioproto> I actually got a complain because https://review.openstack.org/400738 is not very much active
09:14:15 <zioproto> I was hoping other people would push more patchsets to the user story
09:14:26 <zioproto> also I will not be able to be in Boston for VISA issues
09:14:30 <b1airo> all for that, though not entirely sure what the people asking that really mean. are they talking about alternative private cloud middleware or (potentially non-openstack) public/commercial clouds?
09:15:55 <b1airo> zioproto, have you directed anyone getting involved in the boston declaration activities to that story?
09:15:56 <noggin143> my understanding is that the proposal is both (i.e. other private clouds and public clouds). We'll have to see how it evolves.
09:16:04 <dariov> b1airo, first time I hear of this boston declaration update, but at least for us multicloud means also public/commercial cloud
09:16:33 <dariov> I’ll pass the link up to my boss(es), I’ve got the feeling is something we might be interested in engaging with
09:17:23 <b1airo> dariov, yes multicloud certainly means that, i'm just struggling to figure out how it fits in cloud federation, which is what the boston declaration is about
09:17:46 <dariov> I’ve just scanned through the “next steps” document, and many bells rang on each working group
09:18:11 <noggin143> I'd suggest people have a read of the documents and if interested, join in the next call
09:18:43 <dariov> noggin143, any suggestion where to start from? The slides maybe?
09:18:57 <b1airo> i think multicloud is more-or-less solved by using higher level provisioning/orchestration/broker services (e.g. the likes of cloudforms) ?
09:19:02 <noggin143> the slides are a good intro to understand the initiative
09:19:07 <StefanPaetowJisc> zioproto, am going to read over the patch and comment too
09:19:34 <noggin143> We're certainly happily doing multi-cloud with terraform
09:19:41 <StefanPaetowJisc> zioproto: Are you in Milano? If so, we can touch base there.
09:19:59 <noggin143> there is a lot of work to do on the policy/trust side of identity federation and resource ownership
09:20:00 <dariov> noggin143, same here, terraform is our main “multi-cloud” layer at the moment
09:20:43 <noggin143> ideally, we'd also see some OpenStack blueprints come out of it as well as the documents/agreements.
09:21:06 <b1airo> noggin143, yes that is the work i was envisaging (hence the confusion around non-open clouds)
09:21:10 <zioproto> StefanPaetowJisc, yes I will be in Milano
09:21:32 <StefanPaetowJisc> zioproto: Excellent
09:21:35 <noggin143> we may end up with a reference implementation on OpenStack for example
09:21:55 <zioproto> b1airo, no I did not direct anyone to that story, I mean not a specific single person
09:23:19 <b1airo> zioproto, maybe worth adding a section on related efforts into the document, where you could promote engagement on that patchset for example
09:23:54 <b1airo> anything else to say on boston or shall we toddle on?
09:24:05 <zioproto> ok, thanks for the feedback
09:25:07 <b1airo> #topic Hypervisor tuning update from Blair
09:25:35 <b1airo> i already talked about this last week and we seem to have a bit of attendance overlap so i'll try to keep this brief
09:26:08 <b1airo> will, priteau and zioproto were here last week, maybe that's all
09:27:02 <b1airo> presumably most of you will have heard me talk about one of our HPC deployments by now
09:27:20 <b1airo> an OpenStack+KVM based cluster
09:28:20 <b1airo> we were digging into some performance issues recently and created some nice graphs looking at SMP Linpack and MPI HPL on both the bare-metal and guests
09:29:03 <b1airo> SMP linpack - #link http://pasteboard.co/vBwRsur2Q.png
09:29:36 <b1airo> MPI HPL - #link http://pasteboard.co/vByv44w1d.png
09:30:17 <b1airo> we had some users complaining about slow jobs compared to another much older cluster we have
09:31:30 <b1airo> so we ran those tests to check cpu performance, and as expected that is quite good, except for a few guests where performance is dropping off at 120k Ns matrix size
09:32:36 <b1airo> next suspicion was memory as we have been running with regular page size up till now due to some unresolved issues encountered when we first tried to use hugepages
09:33:10 <simon-AS559> o/ sorry for joining late, had some urgent bureaucracy to do
09:33:22 <b1airo> hello simon-AS559 !
09:33:51 <b1airo> stumbled across the Intel Memory Latency Checker tool, which was useful but didn't seem to be exhaustive enough
09:34:53 <b1airo> one of my colleagues wrote a small C program to randomly index and update a large array per CPU and spit out an updates per second number
09:35:28 <b1airo> on the hypervisors that clocked ~6.5 million updates per second
09:36:10 <b1airo> interestingly when we turned off linux kernel numa_balancing that went up to 9 mups
09:36:48 <b1airo> so with default settings, numa_balancing seems to be quite expensive for memory heavy workloads
09:37:12 <verdurin> b1airo: which kernel?
09:37:19 <noggin143> interesting. Were the vCPUs pinned to a particular numa node?
09:37:21 <b1airo> we also saw very variable numbers in the guests - 2-6 mups
09:37:31 <b1airo> verdurin, 4.4 kernel on the hypervisors
09:37:42 <b1airo> noggin143, yes cpu and numa pinned
09:38:21 <noggin143> we had very good results with huge pages (2M in our case) to avoid thrashing the EPT
09:38:23 <b1airo> unforunately i don't yet have a number for guest without numa_balancing as this work is ongoing
09:38:36 <b1airo> noggin143, yes agreed
09:39:04 <b1airo> i was already thinking this was the problem as we were working through test cases and so returned to the earlier issue in using hugepages to begin with
09:39:22 <b1airo> in the end i found a libvirt bug
09:39:51 <b1airo> turns out it works on some of our hypervisors, those with 128GB RAM, but on the larger nodes guests won't start
09:40:03 <b1airo> we are hitting a timeout waiting for the qemu socket to show up
09:40:26 <b1airo> it takes longer than 30s for the qemu process to have its memory allocated when on hugetlbfs
09:40:53 <StefanPaetowJisc> :-O
09:40:57 <priteau> nice edge case!
09:41:09 <verdurin> Unfortunate.
09:41:39 <b1airo> a small sample size test with numa_balancing off and hugepage backed guest reduced the user code runtime down to 1/3 of what it was before
09:43:17 <b1airo> the really odd thing is though that the same code was slow on the hypervisor, slower than a system about 4 years older - still haven't figured that one out yet but seems to be something to do with the amount of memory in the system
09:43:21 <noggin143> BTW, did you have numad running on the hypervisor? We found it better to turn that off
09:44:02 <b1airo> noggin143, no numad on ubuntu for the moment - i asked james page about that in austin and he made an icky face
09:45:32 <b1airo> on the face of it it seems like a nice tool for a performance focused environment where you still want some over-subscription/consolidation
09:46:42 <b1airo> anyway, that's that for now, hopefully by boston i can share a war-story that has explanations for the things that remain ... weird at the moment
09:47:41 <b1airo> i might skip over the RoCE on ECMP stuff unless anyone wants to talk about it particularly?
09:48:49 <verdurin> b1airo: was that covered last week? If so, I can just read the archive.
09:49:04 <b1airo> ah, Dan Berrange and Michal Privoznik have just responded to me on libvirt-users :-)
09:49:51 <b1airo> verdurin, yes i talked about it there - basically we have just moved to Cumulus Linux for our switch OS and from L2 to L3 based Ethernet fabric
09:50:28 <b1airo> found an issue or two along the way - check the log if you are interested in that
09:50:33 <b1airo> i am liking Cumulus though
09:50:34 <verdurin> b1airo: yes, reading now.
09:50:51 <verdurin> b1airo: we can cover the GPU-related work in a future meeting?
09:51:11 <b1airo> verdurin, yes definitely, but happy to talk about that now if you like
09:51:32 <b1airo> the other parts of the agenda are also repeat
09:52:05 <verdurin> I'd like to hear about your GPU work, but what do others think?
09:52:52 <b1airo> the only other thing is SC17 - oneswig graciously put some workshop proposals in for that
09:53:09 <b1airo> but the calls for BoFs etc are still a little while away
09:53:27 <b1airo> i would definitely like to run another if they'll have us
09:54:05 <b1airo> also quite keen on a high-performance KVM BoF and wondering what others think
09:54:12 <b1airo> #topic AOB
09:55:14 <b1airo> verdurin, regarding GPUs - i started an etherpad a month or so ago with initial draft of what i am hoping to push upstream as a guide for creating GPU-enabled openstack clouds
09:55:58 <b1airo> it's still very rough, but we also need to figure out where it should belong in the context of other docs/guides
09:56:32 <b1airo> either in the arch or operators guides i think, possibly spread/duplicated across them in some way
09:57:01 <b1airo> need a few more hours on the draft and then to approach the docs team
09:57:08 <verdurin> b1airo: worth adding a link to https://wiki.openstack.org/wiki/ScientificWGGPUs for now?
09:57:39 <b1airo> ha! i didn't even realise david had created that
09:57:52 <b1airo> yes i will link the etherpad there, thanks
09:58:22 <b1airo> i'm also interested in collecting known hardware and OS+hypervisor combinations
09:58:49 <b1airo> (and know broken boxes, cards, firmwares)
09:58:58 <b1airo> *known
09:59:39 <b1airo> so please ping me if you have any experiences, working or not, to share on that
09:59:44 <verdurin> will do
10:00:00 <b1airo> looks like we are out of time
10:00:08 <b1airo> i was supposed to leave already!
10:00:08 <verdurin> Bye everyone.
10:00:19 <b1airo> bye all
10:00:37 <b1airo> thanks noggin143 for the BCD discussions
10:00:39 <simon-AS559> bye!
10:00:45 <priteau> bye!
10:01:03 <b1airo> #endmeeting