09:00:40 <oneswig> #startmeeting scientific_wg
09:00:41 <openstack> Meeting started Wed Sep 14 09:00:40 2016 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:00:42 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:00:44 <openstack> The meeting name has been set to 'scientific_wg'
09:00:52 <oneswig> Hello hello hello
09:01:12 <oneswig> #link agenda for today is here https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_September_14th_2016
09:01:48 <priteau> Good morning oneswig
09:01:59 <oneswig> Hi priteau
09:02:00 <simon-AS559> Hello everybody
09:02:04 <oneswig> hello
09:02:32 <oneswig> Just checking for blairo
09:02:35 <priteau> Hi simon-AS559
09:02:55 <b1airo> allo allo
09:03:01 <oneswig> Evening b1airo
09:03:09 <b1airo> how goes it?
09:03:13 <priteau> Hey b1airo
09:03:15 <oneswig> Just getting started
09:03:39 <b1airo> hey priteau !
09:03:47 <oneswig> #topic Barcelonaaaaaaa
09:03:53 <b1airo> :-)
09:04:05 <oneswig> So, we have plans taking shape
09:04:29 <oneswig> BoF double-session requested Wednesday morning (time, venue, doubleness all TBC)
09:04:46 <oneswig> WG meeting also requested Wednesday morning (ideally not concurrently)
09:05:25 <oneswig> I am hoping those will be confirmed in the schedule soon - there was a closing date for submissions tomorrow
09:05:30 <oneswig> so it could be as early as that
09:06:04 <oneswig> Yet more progress on this front however:
09:06:05 <b1airo> (fyi, apologies in advance if i disappear - monitoring our major ceph cluster whilst a network change is underway)
09:06:18 <simon-AS559> Good luck b1airo
09:06:24 <oneswig> Holy cow - live-blogging the outage!
09:06:35 <b1airo> don't say the O word!!!
09:06:59 <oneswig> keep us informed :-)
09:07:15 <b1airo> thanks simon-AS559, are you lurking for mentions of ceph?!
09:07:49 <simon-AS559> (Not really in this case, more lurking for mentions of federation)
09:08:06 <oneswig> #link we also have this arranged: https://www.eventbrite.co.uk/e/openstack-scientific-working-group-barcelona-social-tickets-27567156106
09:08:59 <oneswig> Currently room for 30, might expand if we get further sponsorship
09:09:26 <b1airo> ah great, are you one of the folks on the end of Khalil's broadcasts?
09:09:30 <oneswig> I haven't circulated the event details yet
09:10:27 <b1airo> oneswig, are you wanting us all to register that way or are we going to hold a few spots for known attendees ?
09:10:54 <oneswig> I can hold a few back but perhaps best to register, I'll only forget otherwise
09:11:48 <priteau> I have just registered
09:11:48 <b1airo> on it!
09:12:01 <oneswig> Grand, thanks guys, now we are having a party
09:12:23 <oneswig> Final item for Barcelona from me was this request from a couple of weeks back
09:12:38 <oneswig> #link WG scientific openstack summit picks https://etherpad.openstack.org/p/Scientific-WG-summit-picks
09:12:49 <b1airo> just occurred to me to search eavesdrop irc logs for "eventbrite" in the weeks leading up to summits, probably a good way to uncover all the private (but free and open) events!
09:13:02 <oneswig> b1airo: genius
09:13:59 <oneswig> I think we've got a good mix there now on the etherpad, it sounds like it's going to get converted into something on SuperUser - last chance to add or amend...
09:14:47 <oneswig> OK, anything else for Barcelona today?
09:15:27 <oneswig> #topic Lustre, SRIOV, high-performance data
09:15:51 <b1airo> yeah i figured it would end up on superuser or some such, hence the commentary
09:16:01 <oneswig> This came up because we've been having some issues with Lustre clients on our Cambridge system
09:16:21 <oneswig> connection drops and such like
09:16:45 <oneswig> Still getting to the bottom of it, we had our vendors in yesterday and I think the issue's getting some focus
09:17:04 <oneswig> I was wondering, what's the best we can expect to see?
09:17:48 <b1airo> from the vendors? ... ;-)
09:18:10 <oneswig> b1airo: lol, not this time
09:18:27 <oneswig> What performance do you get on Monarch/M3?
09:18:37 <oneswig> as a proportion of bare metal ?
09:19:46 <b1airo> you know i actually don't have those numbers readily available :-(
09:19:54 <b1airo> Gin has some from M3
09:20:17 <oneswig> Thanks b1airo I'll ask in that thread with Gin
09:20:30 <oneswig> I assume it's working and everything's good though?
09:20:31 <b1airo> and I never personally performed comparative tests on MonARCH, though the main admins did
09:21:13 <b1airo> yes, when the SRIOV VFs stay put and the drivers behave, it works well
09:21:32 <oneswig> Is that unusual?
09:22:32 <oneswig> In a separate but possibly related problem I'm getting TCP retransmits on SR-IOV VFs in my VMs.  Possibly due to packet reordering...
09:22:40 <b1airo> i say that because we've found that e.g. doing a NIC firmware upgrade can cause Nova to delete the PCI device from its tables and thus the compute driver removes it
09:23:28 <oneswig> b1airo: I'm surprised you could do a NIC firmware upgrade without rebooting the node.  Most of the ones I've done with these NICs seem to end up requiring that level of reset
09:24:09 <b1airo> oh yes they do, but then we shutdown the guest and sometimes find it no longer has a PCI device on reboot
09:24:46 <oneswig> b1airo: uh-oh.  I'll bear that in mind.  Did you figure out why?
09:26:57 <oneswig> priteau: do people ever virtualise Chameleon hardware with SR-IOV ethernet?
09:27:13 <b1airo> no i guess we probably need to ensure nova is stopped before doing the upgrade work so that it won't report the PCI dev gone
09:27:16 <priteau> unfortunately our Ethernet cards don't support SR-IOV
09:27:25 <priteau> only our Infiniband cards do
09:28:05 <b1airo> and regarding driver issues, we have had a few guests where the MOFED driver stack is not fully loaded after boot
09:28:52 <b1airo> just a handful of the kernel modules but not the core ones needed to see the IB ports, so e.g. ibdevinfo and friends don't return anything
09:29:15 <oneswig> OFED in the guest?  I'll look for that.  We don't have many users booting those images currently
09:29:55 <b1airo> seems to be the mlx4_core module, as modprobing that (or openibd restart) causes a ~1min hang and then everything starts working
09:30:30 <oneswig> wrt NIC firmware, I've been wondering about a ramdisk element for Ironic which might be a good place to perform Mellanox FW updates
09:31:19 <oneswig> b1airo: brrr... and not something you can reproduce on tap
09:32:24 <oneswig> OK, I'll write to Gin to see what counts as good performance, thanks for those details
09:32:30 <b1airo> no not reliably reproducible :-(
09:33:32 <oneswig> Move on?
09:33:55 <b1airo> just pinged Gin on Slack, she was doing IOR tests just the other day
09:34:29 <b1airo> oneswig, re. the reordering issue...
09:34:33 <oneswig> OK, thanks b1airo, as it happens so was I (for cinder volumes)
09:34:51 <b1airo> have you narrowed it down any more?
09:35:24 <oneswig> Not yet.  I'm busy preparing for a presentation tomorrow - OpenStack Day UK - hoping to pick it up on Friday
09:35:33 <b1airo> can't recall if it is possible to tcpdump/tap the VF from the host... seems unlikely
09:35:56 <oneswig> I don't think so, but I can mirror the switch port
09:36:06 <b1airo> yeah that's the next option
09:36:27 <oneswig> I think that'll be my strategy, be great to have proof
09:37:29 <oneswig> #topic inter-cloud federation
09:37:39 <oneswig> OK, lets move on
09:38:20 <oneswig> So there is a discussion forming about the best ways to tackle scientific compute on shared federated clouds
09:38:38 <oneswig> I think there's already a good deal achieved here in different areas
09:38:55 <oneswig> So a discussion on common cause is perhaps overdue
09:39:38 <b1airo> yes, did you join last night's (melbourne time) chat?
09:40:37 <oneswig> Yes, it was a useful discussion and the consensus was that the most productive path for this discussion would be to focus on resolving the policy issues between federated sites
09:40:56 <oneswig> Accounting and (possibly) chargeback are also gaping holes from what I can see
09:41:37 <oneswig> There was some discussion on European projects - EGI, HN Sci Cloud, Indigo-DC
09:41:44 <b1airo> indeed, but is there a minimal commitment option that we could start with?
09:41:45 <oneswig> I took an action item from the discussion to seek European TZ WG members who would be interested in taking part
09:41:49 <simon-AS559> I'm very interested in these aspects, though from a different perspective.
09:42:36 <oneswig> simon-AS559: great, can I put you in touch with Khalil who is organising the group?
09:42:43 <simon-AS559> Sure
09:43:31 <oneswig> You're at SWITCH, right?
09:43:35 <simon-AS559> Right.
09:43:51 <oneswig> OK, thanks I'll follow up.  What's different about your perspective?
09:43:58 <simon-AS559> I'll try to explain:
09:44:13 <simon-AS559> I'm interested in giving academic *institutions* (which are complex in themselves)
09:44:20 <simon-AS559> access to our community cloud
09:44:30 <simon-AS559> using Federated Identity Management systems that are already in place
09:44:33 <simon-AS559> (SAML-based)
09:44:40 <simon-AS559> So it's less "federation between clouds"
09:44:51 <simon-AS559> and more "using identity federations"
09:45:07 <simon-AS559> (in a "traditional" B2B context, not "sharing" like in the Grid/EGI community etc.)
09:45:45 <simon-AS559> Actually it's in the academic context with a shared service provider (such as an NREN or a national compute center)
09:45:51 <simon-AS559> so not quite "traditional B2B".
09:46:16 <b1airo> simon-AS559, i can point you to the code we use for this in the Nectar cloud
09:46:19 <oneswig> Right, that makes sense - so people from University A know that they are accessing your system.  It isn't that they log on to some local portal and their workload happens to launch somewhere else.
09:46:24 <simon-AS559> I guess this is partly similar to what projects like Indigo Datacloud work on, but different.
09:46:38 <simon-AS559> oneswig: right.
09:46:43 <b1airo> we bootstrap users onto the cloud through AAF (Australian Access Federation), which is a Shibboleth federation
09:47:00 <simon-AS559> Same here, but our bootstrapping method is super simple and woefully inadequate.
09:47:11 <simon-AS559> And we still need to build showback/reporting for institutions.
09:47:19 <simon-AS559> We already have billing though!
09:47:31 <simon-AS559> (That's why the institutions are so interested in reporting, surprise surprise!)
09:47:59 <simon-AS559> Also, delegated administration (letting institutional IT managers on- and offboard users etc.)
09:48:11 <simon-AS559> (…set up projects and quotas...)
09:48:32 <b1airo> very similar to CERN's requirements
09:48:51 <simon-AS559> Again, similar but different (CERN doesn't send bills)
09:49:05 <simon-AS559> (and everybody has a CERN account :-)
09:49:11 <b1airo> what are the gaps in your bootstrapping?
09:49:26 <simon-AS559> Putting users into the right project(s)
09:49:39 <simon-AS559> Authorizing request for access via the "responsible person" at the site
09:49:54 <simon-AS559> Getting rid of users/their resources
09:50:13 <simon-AS559> —Some of our customer institutions have told us they want auto-expiring accounts
09:50:38 <simon-AS559> Also, a "bulk mode" for onboarding many users, e.g. as part of a course
09:50:46 <simon-AS559> (and offboarding them when the semester is over...)
09:51:20 <simon-AS559> Anyway, this is more "academic WG" than "scientific WG", but I'd be happy to talk about these issues in Barcelona if there's space.
09:51:31 <b1airo> so the authz part sounds like what you really need is an allocation process for a project associated with reasonable quota?
09:51:37 <simon-AS559> There should be a couple people from other European providers like us there.
09:52:05 <simon-AS559> b1airo: Maybe, where "reasonable" should be defined by someone from the "home" institution.
09:52:19 <oneswig> simon-AS559: sounds great.  I'll forward you the details of this group and hopefully you can join in the discussion right away, and keep it going at Barcelona
09:52:26 <simon-AS559> Thanks.
09:52:31 <b1airo> does your current bootstrap process sit in-front of your horizon?
09:52:37 <simon-AS559> Interesting work in this area includes the Hexaa/RegSite project.
09:52:58 <simon-AS559> b1airo: Our current bootstrap process is a separate Rails app that can administrate "vouchers" etc.
09:53:42 <simon-AS559> Anyway, thanks for listening! :-)
09:54:21 <oneswig> Thanks simon-AS559!  Really useful to know.  Is this all openly available from SWITCH?
09:55:10 <simon-AS559> Unfortunately no, but anyway it's very limited and tailored to other things here.
09:55:21 <simon-AS559> The Hexaa/RegSite stuff is on GitHub and more powerful.
09:55:30 <simon-AS559> Probably a better start
09:55:36 <oneswig> OK thanks.
09:55:39 <simon-AS559> (It wasn't there when we needed something.)
09:56:08 <oneswig> #topic Any other business
09:56:26 <oneswig> Anything to share?
09:57:24 <oneswig> #link I'm speaking tomorrow at https://openstackday.uk/
09:57:31 <b1airo> simon-AS559, i think the nectar bootstrap approach could work for you if it's a shib federation you want to leverage
09:57:53 <b1airo> that would solve your user account creation issues at least
09:57:55 <simon-AS559> Thanks, that would fit. I'll have a look!
09:58:13 <simon-AS559> Is that on Github or somewhere?
09:59:18 <b1airo> i think it's in our internal git though :-/, but let me invite you to our Slack and i'll introduce you to Sam (you probably know him already?)
09:59:33 <priteau> oneswig: I will be at the OpenStack Day tomorrow, looking forward to meet you!
09:59:51 <simon-AS559> b1airo: Yes, we met.  Thanks!
09:59:55 <oneswig> priteau: Great!  Looking forward to seeing you
10:00:28 <oneswig> OK we should close the meeting - final comments?
10:00:58 <oneswig> Thanks simon-AS559 b1airo priteau
10:01:06 <oneswig> #endmeeting