09:00:35 <oneswig> #startmeeting scientific-wg
09:00:36 <openstack> Meeting started Wed Apr 26 09:00:35 2017 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:00:38 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:00:40 <openstack> The meeting name has been set to 'scientific_wg'
09:00:50 <oneswig> Hello and good morning
09:01:06 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_April_26th_2017
09:01:52 <priteau> Good morning oneswig
09:01:58 <verdurin> Morning.
09:02:06 <b1airo> Allo
09:02:10 <oneswig> Hi priteau verdurin b1airo
09:02:14 <oneswig> #chair b1airo
09:02:15 <openstack> Current chairs: b1airo oneswig
09:02:22 <oneswig> what's new?
09:02:37 <oneswig> Our bare metal kolla machine is awesome!
09:03:06 <oneswig> very happy with that.  I was looking at chameleon appliances yesterday...
09:03:52 <priteau> I have heard good things about Kolla but never tried it
09:04:12 <oneswig> Think I've fixed a bug in our deployments where biosdevname would occasionally not name a device, which made automated deployments impossible.
09:04:27 <dariov> hello!
09:04:45 <oneswig> Kolla itself seems to work but we've got that "new deployment smell" about it - don't know the dodgy bits yet
09:04:49 <oneswig> Hi dariov
09:05:23 <oneswig> OK should we get started with the agenda
09:05:43 <oneswig> #topic Boston summit sessions
09:06:26 <oneswig> #link planning's in here https://etherpad.openstack.org/p/Scientific-WG-boston
09:06:50 <oneswig> I put a mail out this morning calling for Lightning talk submissions
09:07:14 <oneswig> Do I have any volunteers? :-)
09:08:05 <oneswig> ... ok ... next time ...
09:08:52 <oneswig> I've been asking around again for a prize for the best talk.
09:09:21 <oneswig> Nothing confirmed yet but last summit it was arranged on the day
09:10:06 <zioproto> hello
09:10:21 <oneswig> The next session was the committee meeting - have you seen the agenda on the same ehterpad?
09:10:23 <oneswig> Hi zioproto
09:10:50 <oneswig> zioproto: have you seen the video of your talk at HPCAC is posted?
09:11:10 <zioproto> oneswig: I have no idea
09:11:41 <oneswig> #link HPCAC videos http://insidehpc.com/video-gallery-switzerland-hpc-conference-2017/
09:11:55 <b1airo> Sorry it's a bad time here
09:12:10 <oneswig> no problem b1airo
09:12:20 <b1airo> Wondering if there would be any objections to moving this an hour later...?
09:13:03 <b1airo> Anything exciting at the insidehpc conference oneswig ?
09:13:13 <oneswig> It would be useful to collect together any activity that has gone on recently relating to the activity areas of the WG
09:13:16 <zioproto> oneswig: looks like I am on air https://www.youtube.com/watch?v=Z7I2WI5Ay1w
09:13:51 <oneswig> zioproto: I'll be looking at your layer-2 work again later!  How is it working out?
09:14:36 <zioproto> oneswig: we are upgrading our production cluster to Newton this Saturday, so we can test the feature on the production hardware :)
09:15:00 <oneswig> Oh cool - good luck
09:15:03 <zioproto> oneswig: now on the staging cluster we can test functionality of the parts but not the performance
09:15:35 <oneswig> zioproto: I'd be very interested to hear your experiences when you start to extend these layer 2 networks at range and at scale
09:15:45 <oneswig> keep us posted!
09:16:19 <zioproto> oneswig: I will !
09:16:25 <oneswig> Which makes me think that hybrid cloud for research computing use cases could make a good activity area
09:17:14 <zioproto> are we follwing an agenda ? I lost the beginning of the meeting, I dont know if there is an agenda link
09:17:51 <oneswig> Agenda is https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_April_26th_2017
09:18:07 <oneswig> Looking at etherpad at https://etherpad.openstack.org/p/Scientific-WG-boston
09:18:35 <b1airo> oneswig: did you get to a happy outcome with your metrics work?
09:19:16 <oneswig> b1airo: It's ongoing.  I can link to some of the things we've contributed, but a sheaf of gerrit reviews isn't going to make for an exciting meeting
09:19:35 <oneswig> I'll look for some key points we've put articles out on.
09:19:45 <oneswig> b1airo: Is the RACmon blog active?
09:20:37 <b1airo> It's out of date, there is a big backlog of posts in the pipeline - no time, not enough hands :-(
09:22:42 <oneswig> Ah, too bad.  Made my day when somebody mentioned they found one of our old blog posts and it helped them.
09:23:29 <oneswig> b1airo: do you have anything on GPUs, or your upcoming presentation?
09:24:12 <oneswig> zioproto: how is the work going at SWITCH on data sets?
09:24:33 <zioproto> b1airo: we identified two open issues
09:24:42 <zioproto> the first issue is the permissions on the objects
09:24:58 <zioproto> using the radosgw with S3 interface
09:25:12 <zioproto> is not scalable with big datasets, because the r/w permissions are handled per object
09:25:25 <zioproto> there is no inheritance of permissions from the bucket
09:25:42 <zioproto> so if you have many objects, you have to touch all of them to grant an additional user a read-only permission
09:25:50 <b1airo> Yes, presenting at NVIDIA GTC on GPU accelerated OpenStack clouds - will be announced overview of how to build one and how OpenStack supports it, then a quick show and tell of our system and some benchmarks
09:26:03 <oneswig> zioproto: that's a nuisance indeed!
09:26:14 <b1airo> a/announced/an/
09:26:38 <oneswig> b1airo: you mentioned peer-to-peer - is gpu direct working virtualised?
09:26:52 <b1airo> In other news, passthrough works with NVLink
09:27:10 <b1airo> And can do P2P over it
09:27:11 <zioproto> b1airo: the second issue is the support for AWS4 signature by the radosgw. Now it is I think two weeks I dont follow the story on that bug
09:27:20 <oneswig> ooh.
09:28:04 <b1airo> zioproto: is that permissions behaviour a bug?
09:28:13 <zioproto> b1airo: no public updates AFAIK http://tracker.ceph.com/issues/19056
09:28:29 <zioproto> b1airo: permission behavior is a design problem of S3
09:29:04 <zioproto> b1airo: the bug is about the AWS4 signature and keystone integration. This is a just a software bug. But it is bad because recent S3 clients like Hadoop need the AWS4 signature to be working in the backend
09:29:07 <oneswig> zioproto: bit late for a feature request then
09:29:22 <b1airo> I did open a Red Hat case on the AWS4 thing but the answers so far do not make much sense to me. Basically they were asserting it isn't supported with Keystone but of course no detail as to why
09:29:34 <zioproto> oneswig: I guess so. But Openstack Swift for example is able to inherit the permissions from the Swift container to the objects
09:29:37 <oneswig> zioproto: do you maintain a blog of what you're doing at SWITCH?
09:30:18 <zioproto> oneswig: http://cloudblog.switch.ch/
09:30:26 <b1airo> I need to get involved in bedtime wrangling...
09:30:38 <oneswig> zioproto: great, thanks
09:31:52 <oneswig> zioproto: do you think you'd be able to summarise your project on datasets in a blog post?
09:32:10 <oneswig> I recall Sofiane had a detailed page on it
09:33:57 <b1airo> oneswig: do you know if powerd had anything new to report?
09:33:57 <oneswig> #link Did you see there is an eventbrite page for the WG evening social at Boston https://www.eventbrite.com/e/openstack-scientific-working-group-boston-social-tickets-33928219217
09:34:07 <zioproto> oneswig: yes, I will probably have to do it ! I put this in my TODO list for the month of May
09:34:09 <oneswig> b1airo: not that I'm aware of, sorry
09:34:44 <oneswig> BTW I heard already more than half the tickets have gone for the social and it's only been public a couple of days
09:35:48 <oneswig> OK, move on from the summit?
09:36:58 <oneswig> #topic Cloud congress
09:37:12 <oneswig> If you can attend and you haven't registered yet...
09:37:24 <oneswig> #link here's the link https://www.eventbrite.com/e/boston-open-research-cloud-workshop-tickets-31893256589
09:37:44 <oneswig> Is there any more to add on that?
09:38:23 <oneswig> #topic WG IRC channel
09:38:28 <oneswig> Quick one this
09:38:48 <oneswig> Mike Lowe and some others suggested an IRC channel for WG discussion
09:39:12 <oneswig> given we seem to work best as an information sharing forum
09:39:43 <oneswig> I've gone through the process of creating #scientific-wg, it's pending review
09:40:15 <oneswig> We should have a channel within a few days.
09:40:43 <oneswig> Nothing more to add on that...
09:41:40 <b1airo> Cool thanks for doing that oneswig
09:41:43 <dariov> nice one, oneswig
09:41:51 <priteau> Great news!
09:42:16 <oneswig> sorry, lost control of my keyboard for a sec, took a while to find my way back :-)
09:42:33 <oneswig> #topic AOB
09:42:47 <oneswig> Any more news?
09:43:32 <oneswig> The Cambridge team have been looking at heat+ansible for lustre client integration via SR-IOV interfaces.  b1airo - what do you do for this?
09:44:40 <b1airo> The config management specifically?
09:44:56 <oneswig> b1airo: yes - how do you do it?
09:45:33 <oneswig> got anything you can share with your pommy mates? :-)
09:46:16 <b1airo> Our HPC team uses ansible. Currently our Lustre provider networks are not actually managed by Neutron, so we use a little hack and set the L3 interface config in the guest based on the config provided by Neutron on a different interface
09:46:54 <oneswig> interesting...
09:47:16 <b1airo> Today I found an issue with Ethernet​ NIC tuning that I'm about to open a case for
09:47:27 <oneswig> Mellanox NIC?
09:47:48 <oneswig> We've been moving to OFED 4 on our servers
09:48:32 <b1airo> When we bump up the rx/tx ring buffers, from their 1024 default to 8192 as Mellanox recommended for RoCE, each interface changed accounts for almost 3GB reduction in MemFree on the host
09:49:12 <oneswig> 8192 * 9216 < 3GB...
09:49:36 <oneswig> Is it a Rx ring per VF?
09:49:38 <b1airo> So on a dual port card we lose almost 6GB, which is enough to mean we can't launch a 120GB guest on a 128GB host - OOM!!
09:49:50 <b1airo> ring per PF
09:50:15 <b1airo> I don't know where that memory is going though
09:50:37 <oneswig> Tried /proc/slabinfo before and after?
09:51:06 <b1airo> Nothing else in meminfo accounts for it, and whilst slab goes show an increase it's only on the order of <100M
09:51:21 <b1airo> s/goes/does/
09:52:17 <oneswig> Is that a regression after driver upgrade?
09:52:37 <b1airo> Even only setting it on one interface doesn't really work, host is under too much memory pressure and kswapd's start getting busy
09:53:30 <b1airo> We tried a MOFED upgrade recently too, but reverted back to 3.4 when we found they'd stupidly disabled VF enablement on bond slave PFs
09:54:07 <oneswig> You run VFs on a bond?  I had no idea that was even possible.
09:55:02 <b1airo> In our setup we have an active-backup bond with linux-bridge above it for some of our provider networks, preferred active on p2, then we put the high performance VFs dedicated on p1
09:55:28 <oneswig> Ah OK, was assuming LACP
09:55:37 <oneswig> makes more sense...
09:55:43 <b1airo> Yeah, apparently Mellanox were too
09:56:08 <b1airo> They are fixing it but I guess we will have to wait for next point release
09:56:29 <oneswig> There's no underestimating antipodean craftiness
09:56:43 <b1airo> Also, (Intel) Lustre does not yet support MOFED4
09:57:25 <b1airo> (not that Intel Lustre is a separate thing anymore, but I imagine it will take a while for that to change)
09:57:37 <oneswig> b1airo: interesting.  Any idea when?
09:57:49 <oneswig> on the ofed4 support
09:58:32 <b1airo> Soon I think, we are talking to Intel about it and I think they have given Gin a build to try
09:59:07 <oneswig> OK thanks b1airo
09:59:11 <oneswig> Out of time - any more?
09:59:27 <b1airo> Thanks all!
09:59:36 <oneswig> OK thanks everyone
09:59:37 <b1airo> Looking forward to Boston!
09:59:46 <zioproto> this VF Virtual Function is a Mellanox only thing ?
09:59:54 <oneswig> Good luck at GTC b1airo
10:00:03 <zioproto> enjoy Boston ! :)
10:00:10 <oneswig> zioproto: It's part of SR-IOV
10:00:26 <oneswig> Not specific although it seems the bond issue is
10:00:34 <oneswig> #endmeeting