21:01:54 <b1airo> #startmeeting scientific-wg
21:01:55 <openstack> Meeting started Tue Jun 28 21:01:54 2016 UTC and is due to finish in 60 minutes.  The chair is b1airo. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:01:56 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:01:58 <openstack> The meeting name has been set to 'scientific_wg'
21:02:11 <b1airo> #chair oneswig
21:02:11 <openstack> Current chairs: b1airo oneswig
21:02:24 <edleafe> \o
21:02:26 <oneswig> #topic roll-call
21:02:29 <oneswig> hi everyone!
21:02:56 <trandles> hello
21:02:59 <julian1> Hi oneswig!
21:03:11 <julian1> \o
21:03:15 <b1airo> #topic Agenda
21:03:23 <oneswig> #link https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_June_28th_2016
21:03:35 <b1airo> -----HPC / Research speaker track at Barcelona Summit
21:03:35 <b1airo> Spread the word!
21:03:35 <b1airo> Review of Activity Areas and opportunities for progress:
21:03:35 <b1airo> Bare metal
21:03:35 <b1airo> Parallel filesystems
21:03:36 <b1airo> Accounting and scheduling
21:03:37 <b1airo> User Stories
21:03:41 <b1airo> Other business
21:03:43 <b1airo> -----
21:04:22 <oneswig> Nice, thanks b1airo
21:04:50 <oneswig> Lets get started
21:05:10 <oneswig> #topic HPC/Research speaker track at Barcelona
21:05:26 <b1airo> tell your friends!
21:05:38 <oneswig> After the track at Austin we get a thumbs up to run again
21:06:27 <oneswig> I am interested to know what people thought was missing from the content in Austin?
21:07:11 <trandles> yay!  I enjoyed the HPC/Research track as it brought together a lot of us with common interest.  I think that ability to focus the community was missing in past summits.
21:07:29 <oneswig> I wish we'd had a talk covering Lustre/GPFS for one
21:07:34 <oneswig> Thanks trandles, agreed
21:07:44 <oneswig> Got another talk in you Tim?
21:08:01 <trandles> I think so.  Working on titles and abstracts now.
21:08:20 <oneswig> Great!  Deadline was 13 July IIRC
21:08:23 <trandles> getting approval for foreign travel is the difficult part
21:09:47 <b1airo> did we ask whether you're attending SC trandles ?
21:09:52 <oneswig> in last week's discussion (EMEA time zone) an email was being drafted that people might be able to circulate in other forums
21:10:06 <trandles> I'll +1 the lack of a lustre/GPFS talk.  There have to be user stories around provisioning infrastructure in a scientific context that we're missing.
21:10:09 <blakec> Tutorial and instructional content seemed to be missing from HPC track in Austin
21:10:12 <oneswig> Hopefully, we can help people with a template for spreading the word
21:10:30 <trandles> blairo: I don't have plans for SC this year
21:10:47 <blakec> i.e. optimizing nova for HPC workloads
21:10:59 <oneswig> blakec: step-by-step this is how I did it kind of stuff? +1
21:11:55 <blakec> Correct, even very entry level content... As Summit grows I suspect those talks have a wider audience
21:12:10 <trandles> pulling blakec's tutorial thread, a lessons learned from someone like Pittsburgh where they're deploying HPC using OpenStack would be very nice
21:12:52 <trandles> during the Austin WG planning session, the ironic breakout basically turned into just that, Robert fielding questions about lessons learned with Bridges
21:13:26 <oneswig> #link https://etherpad.openstack.org/p/hpc-research-circulation-email Lets put together some points that can be shared to raise the HPC/Research speaker track profile
21:13:40 <b1airo> agreed re. lessons learned
21:13:48 <oneswig> If you're on other lists or groups, please consider mailing around to spread the wrod
21:13:52 <oneswig> word...
21:14:40 <b1airo> i'm interested to know if people are actually optimising openstack (e.g. nova) or just the hypervisor and then making use of lesser known hypervisor features that can be exposed through nova
21:15:17 <oneswig> Our first order efforts at optimisation are all around SRIOV and RDMA for Cinder
21:15:38 <oneswig> But we're just getting going really
21:16:53 <b1airo> what sort of optimisation are you looking at with sriov oneswig (other than using it) ?
21:17:54 <oneswig> Just using it basically... We keep a VF in the hypervisor for Cinder and pass through the rest
21:19:51 <oneswig> I would be really interested in a discussion at the summit that brings together some of the recent conversations on the EMEA side wrt combining Nova resource reservations (Blazar) with preemtible instances.  Seems far out but really interesting as a long-term goal
21:20:14 <dfflanders> +1
21:20:21 <b1airo> right, i suspect other common hypervisor tunings such as cpu/mem pinning and numa topology probably make a reasonable bit of difference there too, but we haven't done any work to quantify that yet (been focused on cpu and memory performance mainly)
21:20:56 <b1airo> +1 to e.g. blazar and opie
21:21:38 <oneswig> How could we help that happen?
21:22:26 <b1airo> at this stage i imagine it'd be more a matter of gathering supporters
21:22:33 <oneswig> Agreed I suspect any talk that isn't purely hot air in this area may be two summits away...
21:22:51 <b1airo> not likely to be anyone using it in the wild expect a few folks working on the dev
21:22:55 <oneswig> (Plus there is the question of feasibility)
21:23:36 <kyaz001> hi guys, Khalil just joining
21:24:02 <oneswig> Hi Khalil, we were discussion the HPC speaker track for Barcelona
21:24:02 <b1airo> but having a lightening talk from some devs about how it works and fits in would be cool
21:24:48 <oneswig> b1airo: good idea, a session on future tech and wish-lists perhaps
21:24:52 <kyaz001> apologies for being tardy... do we have a list of topics?
21:25:08 <b1airo> kyaz001, https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_June_28th_2016
21:26:09 <oneswig> Shall we move on to activity areas?
21:26:27 <b1airo> sure
21:26:39 <oneswig> #topic Bare metal
21:27:25 <oneswig> I'm interested in following the developments re: serial consoles in Ironic but have not got involved yet.
21:27:32 <oneswig> It's on my radar.  Anyone using it?
21:27:55 <trandles> I'm about to make an attempt at using it
21:28:07 <b1airo> i'm just hoping to get some resourcing for us to start playing with ironic later in the year
21:28:13 <oneswig> Through Ironic or side-band?
21:28:43 <trandles> I have serial consoles working side-band but want to swap it for ironic eventually
21:29:16 <oneswig> Same here, would be a great help to have it all under one umbrella
21:30:31 <oneswig> What other areas of activity are WG members interested in wrt bare metal?
21:31:33 <dfflanders> would be good to have Chameleons opinion on this re baremetal as a service to researchers.
21:31:49 <oneswig> dfflanders: good point
21:31:50 <trandles> I'm interested in bare metal scalability issues but don't yet have a tested large enough to push the boundaries
21:32:07 <trandles> *testbed
21:32:44 <oneswig> We have a problem I'd like to fix at the Ironic level: our NIC firmware seems to get upgraded awfully often.  I'm wondering how we might use the Ironic IPA ramdisk to do this for us to keep the deployment nice and clean
21:32:47 <dfflanders> https://www.youtube.com/watch?v=ycJeWH8FjL0
21:32:55 <trandles> we will have some large clusters retiring in the next ~12-18 months though and I hope to get some time with the hardware before it goes out the door
21:33:01 <trandles> ~2000 nodes
21:33:36 <oneswig> That's an awful lot for Ironic to pick up in one go!
21:33:45 <trandles> indeed
21:34:27 <trandles> but it's a chance to identify problem areas
21:35:05 <oneswig> I recall Robert saying there were deployment races between Nova and Ironic that limited him to deploying small numbers of nodes at a time - and he's got ~800 nodes IIRC
21:35:52 <oneswig> trandles: how long will you have?
21:36:03 <trandles> it varies a lot
21:36:27 <trandles> if there's no immediate demand for the floor space (and power, and cooling...) I could have several months
21:37:19 <b1airo> oneswig, have you reviewed https://blueprints.launchpad.net/nova/+spec/host-state-level-locking ?
21:37:29 <oneswig> trandles: I assume this is on-site and somewhat restricted but I'm sure I'd be interested to hear
21:37:55 <trandles> I'll keep it on the radar when decommissioning talk gets started
21:39:34 <oneswig> b1airo: not seen that but it sounds quite fundamental.  My understanding of python's global interpreter lock and concurrency model fall short of this but I'm surprised that threads in python can preempt one another at all
21:42:32 <oneswig> Thinking of actions, we've shared some interests here and found some in common.
21:43:18 <oneswig> #action trandles b1airo oneswig we should keep in touch if we get underway with ironic serial consoles
21:43:21 <kyaz001> can you summarize the area of interest?
21:43:55 <oneswig> kyaz001: Right now we are looking at the bare metal area of activity, looking at incoming developments
21:44:50 <oneswig> Many new capabilities in this area, many of which are interesting to many people on the WG
21:45:48 <oneswig> We ought to crack on, time's passing
21:46:04 <oneswig> #topic parallel filesystems
21:46:21 <oneswig> Alas I've not seen much go by in this area but I had one thought
21:46:48 <oneswig> Is anyone in the WG interested in putting up a talk on Lustre or GPFS for the speaker track?
21:47:42 <b1airo> we could probably do one
21:48:10 <oneswig> I think one of the principal guys at Cambridge might be able to share a combined Lustre / GPFS talk
21:48:18 <b1airo> though i'd make that dependent on getting one of my colleagues to work on the lustre bits
21:48:38 <b1airo> that's an idea
21:49:10 <oneswig> The timezones would be a killer for planning but I think Cambridge could cover the Lustre side - and we'd get a benchmark bake-off :-)
21:49:16 <b1airo> two quick tours of a HPFS integration and then maybe an open Q&A
21:49:43 <oneswig> I'll check and report back.
21:50:17 <blakec> We (ORNL) could contribute to the Lustre side as well.
21:50:22 <b1airo> has some promise and judging from the wg sessions in Austin would be very relevant
21:51:04 <oneswig> Great, let's note that
21:51:35 <trandles> likewise someone at LANL is looking at deploying GPFS using openstack
21:51:49 <oneswig> #action b1airo blakec oneswig trandles to consider options for Lustre/GPFS talk proposal
21:53:03 <b1airo> blakec, your experience is with integrating Lustre and Ironic based compute, or have you done hypervisor based compute too?
21:54:41 <oneswig> Time for one more topic?
21:55:16 <oneswig> #topic accounting and scheduling
21:55:22 <blakec> With hypervisor - we have multiple Lustre networks (TCP for VMs, and IB for bare metal). No sriov
21:56:01 <b1airo> blakec, but all talking to the same filesystem/s i take it?
21:56:21 <blakec> yes, that's correct
21:56:49 <b1airo> sounds interesting
21:57:22 <oneswig> My colleague in Cambridge has responded with interest re: HPC filesystem talk proposal, lets follow up on that
21:58:03 <oneswig> We've already covered much of the recent discussion on scheduling, was there anything from WG members in this area?
21:58:11 <b1airo> absolutely - good iea oneswig
21:58:28 <oneswig> b1airo: thanks
21:58:40 <oneswig> Time's closing in
21:58:44 <oneswig> #topic AOB
21:58:57 <oneswig> any last-minute items?
21:59:12 <b1airo> coffee...?
21:59:18 <trandles> yes please
21:59:31 <oneswig> Sounds ideal!
21:59:49 <julian1> \o
22:00:11 <oneswig> Hi julian1
22:00:14 <b1airo> stayed up way late gathering mellanox neo debug info :-/
22:00:26 <oneswig> you too? :-)
22:00:28 <julian1> Hey oneswig!
22:01:27 <oneswig> Time to wrap up / brew up - thanks all
22:01:38 <oneswig> #endmeeting