09:02:57 #startmeeting scientific-wg 09:02:58 Meeting started Wed Feb 15 09:02:57 2017 UTC and is due to finish in 60 minutes. The chair is b1airo. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:02:59 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:03:00 hello folks! 09:03:01 The meeting name has been set to 'scientific_wg' 09:03:02 so the meeting is 900 evening? 09:03:33 Ellac, in Melbourne, yes! ;-) 09:03:43 Ciao Dariov! Evening B1airo. I hope weather in Melbourne is nice! :-) 09:03:48 got it now. thanks blairo :) 09:04:00 hi StefanPaetowJisc 09:04:04 Hi from Switzerland 09:04:06 hi dariov 09:04:18 hi there noggin143 09:04:19 I may become less communicative in around 30 mins… I have to join an AARC meeting :-/ 09:04:56 yes i may have to disappear in about 30 too, so either martial continues or we may have to cut short 09:05:21 but martial doesn't seem to be here yet, or perhaps can't - i forget whether his TZ suits 09:05:34 b1airo: I think it's 4 AM for martial 09:05:36 oneswig is on holiday this week 09:05:42 Morning. 09:05:50 ah thanks priteau - i guess not then! 09:05:55 morning 09:05:55 He usually joins the other slot 09:06:03 hi priteau, hi verdurin 09:06:09 Hi b1airo! 09:06:12 hello zioproto 09:06:29 ok, let me dig out the agenda... 09:06:53 #link https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_February_15th_2017 09:07:08 noggin143, hi from Zurich :) 09:07:09 #topic Boston Declaration update 09:07:37 i was hoping martial was going to give us an update on this, but as he isn't here, can anyone else? 09:07:53 I was in the call this week. 09:07:56 i have not managed to catch up on it yet so don't want to speculate 09:08:15 thanks noggin143, how is it looking for Boston do you think? 09:08:35 The documents are at https://drive.google.com/drive/u/0/folders/0B4Y7flFgUgf9dElkaFkwbUhKblU 09:08:52 Thursday 11th/Friday 12th confirmed for the dates 09:09:26 looking at a number of working groups 09:09:58 in areas like policy, trust, federation 09:10:27 are there any key things for potential signatories to figure out before the meeting 09:10:50 I think that will be covered in the next call. 09:11:03 there are some questions in the documents to think about 09:11:22 i'm not sure whether there are any key use-cases/stories/scenarios that have been agreed yet? 09:11:52 the calls are every couple of weeks for US/Europe and alternatively a more Asia friendly time in between 09:12:20 not yet got to the user stories, more organisational at this point 09:12:37 zioproto, i know you have an interest in this, are you feeling positive about how it is progressing for the moment? 09:12:42 one item was to take a multi-cloud approach though (i.e. not just OpenStack) 09:13:21 noggin143, yes that particular discussion did pass via my inbox 09:13:33 mmm progress is a bit slow 09:13:59 I actually got a complain because https://review.openstack.org/400738 is not very much active 09:14:15 I was hoping other people would push more patchsets to the user story 09:14:26 also I will not be able to be in Boston for VISA issues 09:14:30 all for that, though not entirely sure what the people asking that really mean. are they talking about alternative private cloud middleware or (potentially non-openstack) public/commercial clouds? 09:15:55 zioproto, have you directed anyone getting involved in the boston declaration activities to that story? 09:15:56 my understanding is that the proposal is both (i.e. other private clouds and public clouds). We'll have to see how it evolves. 09:16:04 b1airo, first time I hear of this boston declaration update, but at least for us multicloud means also public/commercial cloud 09:16:33 I’ll pass the link up to my boss(es), I’ve got the feeling is something we might be interested in engaging with 09:17:23 dariov, yes multicloud certainly means that, i'm just struggling to figure out how it fits in cloud federation, which is what the boston declaration is about 09:17:46 I’ve just scanned through the “next steps” document, and many bells rang on each working group 09:18:11 I'd suggest people have a read of the documents and if interested, join in the next call 09:18:43 noggin143, any suggestion where to start from? The slides maybe? 09:18:57 i think multicloud is more-or-less solved by using higher level provisioning/orchestration/broker services (e.g. the likes of cloudforms) ? 09:19:02 the slides are a good intro to understand the initiative 09:19:07 zioproto, am going to read over the patch and comment too 09:19:34 We're certainly happily doing multi-cloud with terraform 09:19:41 zioproto: Are you in Milano? If so, we can touch base there. 09:19:59 there is a lot of work to do on the policy/trust side of identity federation and resource ownership 09:20:00 noggin143, same here, terraform is our main “multi-cloud” layer at the moment 09:20:43 ideally, we'd also see some OpenStack blueprints come out of it as well as the documents/agreements. 09:21:06 noggin143, yes that is the work i was envisaging (hence the confusion around non-open clouds) 09:21:10 StefanPaetowJisc, yes I will be in Milano 09:21:32 zioproto: Excellent 09:21:35 we may end up with a reference implementation on OpenStack for example 09:21:55 b1airo, no I did not direct anyone to that story, I mean not a specific single person 09:23:19 zioproto, maybe worth adding a section on related efforts into the document, where you could promote engagement on that patchset for example 09:23:54 anything else to say on boston or shall we toddle on? 09:24:05 ok, thanks for the feedback 09:25:07 #topic Hypervisor tuning update from Blair 09:25:35 i already talked about this last week and we seem to have a bit of attendance overlap so i'll try to keep this brief 09:26:08 will, priteau and zioproto were here last week, maybe that's all 09:27:02 presumably most of you will have heard me talk about one of our HPC deployments by now 09:27:20 an OpenStack+KVM based cluster 09:28:20 we were digging into some performance issues recently and created some nice graphs looking at SMP Linpack and MPI HPL on both the bare-metal and guests 09:29:03 SMP linpack - #link http://pasteboard.co/vBwRsur2Q.png 09:29:36 MPI HPL - #link http://pasteboard.co/vByv44w1d.png 09:30:17 we had some users complaining about slow jobs compared to another much older cluster we have 09:31:30 so we ran those tests to check cpu performance, and as expected that is quite good, except for a few guests where performance is dropping off at 120k Ns matrix size 09:32:36 next suspicion was memory as we have been running with regular page size up till now due to some unresolved issues encountered when we first tried to use hugepages 09:33:10 o/ sorry for joining late, had some urgent bureaucracy to do 09:33:22 hello simon-AS559 ! 09:33:51 stumbled across the Intel Memory Latency Checker tool, which was useful but didn't seem to be exhaustive enough 09:34:53 one of my colleagues wrote a small C program to randomly index and update a large array per CPU and spit out an updates per second number 09:35:28 on the hypervisors that clocked ~6.5 million updates per second 09:36:10 interestingly when we turned off linux kernel numa_balancing that went up to 9 mups 09:36:48 so with default settings, numa_balancing seems to be quite expensive for memory heavy workloads 09:37:12 b1airo: which kernel? 09:37:19 interesting. Were the vCPUs pinned to a particular numa node? 09:37:21 we also saw very variable numbers in the guests - 2-6 mups 09:37:31 verdurin, 4.4 kernel on the hypervisors 09:37:42 noggin143, yes cpu and numa pinned 09:38:21 we had very good results with huge pages (2M in our case) to avoid thrashing the EPT 09:38:23 unforunately i don't yet have a number for guest without numa_balancing as this work is ongoing 09:38:36 noggin143, yes agreed 09:39:04 i was already thinking this was the problem as we were working through test cases and so returned to the earlier issue in using hugepages to begin with 09:39:22 in the end i found a libvirt bug 09:39:51 turns out it works on some of our hypervisors, those with 128GB RAM, but on the larger nodes guests won't start 09:40:03 we are hitting a timeout waiting for the qemu socket to show up 09:40:26 it takes longer than 30s for the qemu process to have its memory allocated when on hugetlbfs 09:40:53 :-O 09:40:57 nice edge case! 09:41:09 Unfortunate. 09:41:39 a small sample size test with numa_balancing off and hugepage backed guest reduced the user code runtime down to 1/3 of what it was before 09:43:17 the really odd thing is though that the same code was slow on the hypervisor, slower than a system about 4 years older - still haven't figured that one out yet but seems to be something to do with the amount of memory in the system 09:43:21 BTW, did you have numad running on the hypervisor? We found it better to turn that off 09:44:02 noggin143, no numad on ubuntu for the moment - i asked james page about that in austin and he made an icky face 09:45:32 on the face of it it seems like a nice tool for a performance focused environment where you still want some over-subscription/consolidation 09:46:42 anyway, that's that for now, hopefully by boston i can share a war-story that has explanations for the things that remain ... weird at the moment 09:47:41 i might skip over the RoCE on ECMP stuff unless anyone wants to talk about it particularly? 09:48:49 b1airo: was that covered last week? If so, I can just read the archive. 09:49:04 ah, Dan Berrange and Michal Privoznik have just responded to me on libvirt-users :-) 09:49:51 verdurin, yes i talked about it there - basically we have just moved to Cumulus Linux for our switch OS and from L2 to L3 based Ethernet fabric 09:50:28 found an issue or two along the way - check the log if you are interested in that 09:50:33 i am liking Cumulus though 09:50:34 b1airo: yes, reading now. 09:50:51 b1airo: we can cover the GPU-related work in a future meeting? 09:51:11 verdurin, yes definitely, but happy to talk about that now if you like 09:51:32 the other parts of the agenda are also repeat 09:52:05 I'd like to hear about your GPU work, but what do others think? 09:52:52 the only other thing is SC17 - oneswig graciously put some workshop proposals in for that 09:53:09 but the calls for BoFs etc are still a little while away 09:53:27 i would definitely like to run another if they'll have us 09:54:05 also quite keen on a high-performance KVM BoF and wondering what others think 09:54:12 #topic AOB 09:55:14 verdurin, regarding GPUs - i started an etherpad a month or so ago with initial draft of what i am hoping to push upstream as a guide for creating GPU-enabled openstack clouds 09:55:58 it's still very rough, but we also need to figure out where it should belong in the context of other docs/guides 09:56:32 either in the arch or operators guides i think, possibly spread/duplicated across them in some way 09:57:01 need a few more hours on the draft and then to approach the docs team 09:57:08 b1airo: worth adding a link to https://wiki.openstack.org/wiki/ScientificWGGPUs for now? 09:57:39 ha! i didn't even realise david had created that 09:57:52 yes i will link the etherpad there, thanks 09:58:22 i'm also interested in collecting known hardware and OS+hypervisor combinations 09:58:49 (and know broken boxes, cards, firmwares) 09:58:58 *known 09:59:39 so please ping me if you have any experiences, working or not, to share on that 09:59:44 will do 10:00:00 looks like we are out of time 10:00:08 i was supposed to leave already! 10:00:08 Bye everyone. 10:00:19 bye all 10:00:37 thanks noggin143 for the BCD discussions 10:00:39 bye! 10:00:45 bye! 10:01:03 #endmeeting