21:02:11 <b1airo> #startmeeting Scientific-SIG
21:02:12 <openstack> Meeting started Tue Feb 19 21:02:11 2019 UTC and is due to finish in 60 minutes.  The chair is b1airo. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:02:12 <janders> good morning, good evening All
21:02:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:02:16 <openstack> The meeting name has been set to 'scientific_sig'
21:02:27 <b1airo> #chair martial
21:02:28 <openstack> Current chairs: b1airo martial
21:02:36 <b1airo> hi janders
21:03:10 <janders> I just had a really interesting chat about baremetal cloud & networking
21:03:23 <janders> would you be interested in spendning some time on this?
21:03:26 <b1airo> oh yeah! who with?
21:03:39 <janders> RHAT, Mellanox and Bright Computing
21:03:53 <b1airo> do tell
21:04:11 <janders> have you guys played much with pxe booting baremetals from neutron tenant networks (as opposed to the nominated provisioning network)?
21:05:07 <b1airo> no, not at all, but it's clearly one of the key issues in multi-tenant bare-metal...
21:05:17 <janders> with Bright, we're trying to make Bright Cluster Manager a Cloud-native app
21:05:42 <janders> it might seem out of the blue but I do see a ton of benefits of this approach and few disadvantages (other than the initial work to make it work)
21:05:44 <b1airo> right, that's a very sensible approach for them
21:06:11 <janders> now - they're all about pxe booting (as you probably know - you're a Bright user right?)
21:06:23 <b1airo> presumably they'd make a new agent or something that sat inside each tenant
21:06:55 <janders> so you need to spin up frontend and backend networks. You put BCM-cloud on both. Then you pxeboot the compute nodes off BCM's backend
21:07:04 <b1airo> NeSI has Bright (BCM and BOS) so I'm picking some stuff up, but I'm not hands on in the ops stuff here
21:07:40 <janders> same as me - big bright user but we have bright gurus in the team so I know what it does but log in maybe once a month
21:08:12 <janders> on Ethernet - it's all easy and just works. Networking-ansible is plumbing VLANs, there's no NIC side config
21:08:27 <janders> on IB though provisioning_network pkey needs to be pre-set in the HCA FW
21:09:12 <janders> after that's done it's rock-solid but one negative side effect is you can't pxeboot off tenant networks. We tried some clever hacks with custom compiled ipxe but nah. The silicon will drop any packets that do not match the pre-set pkey
21:09:22 <janders> looks like dead end
21:09:28 <janders> so - back to the drawing board
21:09:45 <janders> the kit I've got does have ethernet ports, they are just shut off cause the switching has no SDN
21:10:01 <janders> it does have ssh though hence... networking-ansible!
21:10:09 <b1airo> hmm, until MLNX creates a new firmware with some pre-shared trusted key or some such
21:10:34 <janders> I was told it can be made work in parallel to Mellanox SDN, both can be configured at the same time just for different physnets
21:10:51 <janders> I think this can be made work (pending networking-mellanox testing on these switches)
21:11:04 <b1airo> which kit do you have?
21:11:12 <janders> M1000es from Dell
21:11:29 <janders> (I have many different platforms but that's the one I got in large numbers for OpenStack to start with)
21:11:55 <janders> here I wanted to ask you what you think about all this and what is your experience with networking-ansible
21:12:14 <janders> I'll VPN into work to get the exact switch details (RHAT are asking too) bear with me if I drop out
21:13:33 <janders> ok.. still here?
21:14:20 <b1airo> haven't used networking-ansible myself yet, though it was on the radar when looking at trying to insert an Ironic Undercloud into things at Monash, where our networks are all 100G Mellanox Spectrum running Cumulus
21:16:23 <janders> Force10 MXL Blade
21:16:27 <janders> that's my eth platform
21:17:55 <janders> on that - what are your thoughts on NEO+Mellanox ethernet vs networking-ansible+Mellanox ethernet?
21:18:32 <janders> I bet there won't be feature parity yet but conceptually it's quite interesting
21:19:01 <janders> very different approach
21:19:10 <janders> I suppose the same question applies to Juniper SDNs etc
21:19:46 <b1airo> we first looked at NEO over 2 years ago, it wasn't very polished or nice from a UI perspective yet, so was difficult to actually realise the potential value add. i had been promising the MLNX folks we'd try again for at least 6 months before I left Monash - no doubt way back in the pile for them now
21:21:07 <b1airo> we spent quite a lot of effort moving to Cumulus + Ansible for basic network management and I was/am very keen to see that progress to full automation of the network as a service
21:21:49 <gt1437> We will get there
21:22:53 <janders> yeah this tenant-pxeserver on baremetal/IB challenge might actually lead to interesting infrastructure changes
21:23:10 <janders> on my side
21:23:27 <janders> running SDN and networking-ansible side-by-side does more complexity but I think more capability going forward
21:24:07 <b1airo> aiming for bare-metal multi-tenant networking with Mellanox RoCE capable gear - it seemed like we would need to try Cumulus + Neutron native (networking-ansible would be a possible alternative there) and also NEO + Neutron. interesting thing there is that value of Cumulus suddenly becomes questionable if you're managing the BAU network operations via Neutron, given e.g. NEO can do all the day0 switch provisioning
21:24:08 <b1airo> stuff
21:24:28 <janders> OpenStack internal traffic (other than storage) couldn't care less about IB - maybe it's better if these are keep separate so there is even more IB bandwidth for the workloads (and less jitter)
21:25:04 <janders> with Cumulus, do they have the same SDN-controller-centric architecture as NEO?
21:25:36 <janders> Do you configure Cumulus URL in Neutron SDN section, and the flow is Neutron>Cumulus>Switching gear?
21:26:24 <b1airo> yeah our basic cloud hpc networking architecture used dualport eth with active-backup bonding for host stuff (openstack, management, etc) and sriov into tenant machines on top of the default backup interface
21:27:15 <janders> https://docs.cumulusnetworks.com/display/CL35/OpenStack+Neutron+ML2+and+Cumulus+Linux
21:27:23 <janders> I see Cumulus has HTTP service listening
21:27:34 <janders> However, in Neutron, individual switches are configured
21:27:36 <b1airo> no central controller that i'm aware of for Cumulus, the driver has to talk to all the devices
21:27:58 <janders> ok.. so what's the HTTP with REST API service for? Do switches talk to it?
21:28:12 <b1airo> but yes, all restful apis etc, no ssh over parallel python ;-)
21:30:22 <janders> or is the REST API running on each switch?
21:30:25 <b1airo> the rest api is on the switches themselves
21:30:27 <b1airo> ya
21:30:30 <janders> ok!
21:30:37 <janders> right...
21:30:53 <janders> nice - no two sources of truth (Neutron DB and then SDN DB)
21:31:06 <janders> however troubleshooting will be fun (log aggregation will help to a degree)
21:31:08 <b1airo> that's true
21:31:29 <janders> with centralised SDN my experience is - they work 99% - but when they stop, life sucks
21:31:35 <b1airo> exactly, one of the things we were trying to wrangle with - monitoring, visibility, etc etc
21:32:01 <janders> manual syncing of the DBs would be a nightmare and a high risk operation
21:32:40 <b1airo> ok, i'm at a conference and talk i'm in (Barbara Chapman talking about OpenMP for Exascale) just finished
21:33:08 <b1airo> so i'm going to have to scoot pretty soon
21:33:16 <martial> what are the other topics to cover?
21:33:21 <b1airo> better touch on the ISC19 BOF in case anyone is lurking
21:33:22 <janders> what conference? :)
21:33:23 <b1airo> ...
21:33:39 <b1airo> eResearchNZ :-)
21:34:40 <janders> do you guys know if we're still on track for speaker notifications for Denver?
21:34:53 <janders> it would be great to book travel by the end of this week if possible
21:35:00 <b1airo> had a plenary before that on the Integrated Data Infrastructure at Stats NZ - enviable resource coming from the Aus context, clearly digital health in australia didn't talk to them before launching my health record...
21:35:18 <janders> haha :)
21:35:32 <martial> janders: I heard they are runing a day late
21:35:56 <b1airo> ah yes, the track chairing process got extended till Sunday just gone as there was a slight booboo that locked everyone out of the tool
21:36:14 <janders> right.. so no notifications this week?
21:36:42 <martial> I heard the 20th instead of the 19th
21:36:51 <b1airo> but we've done our job now, so i guess foundation staff will be doing the final checks and balances before locking in
21:37:00 <janders> ah ok
21:37:06 <janders> I will keep this in mind while making plans
21:37:10 <janders> thanks guys - this is very helpful
21:37:13 <b1airo> sounds possible martial , let me check my email
21:37:40 <b1airo> janders: just quietly, you can confidently make plans
21:39:22 <janders> :) :) :) thank you
21:41:12 <janders> b1airo: not sure if this is relevant to you but Qantas double status credits are on
21:41:52 <janders> or are you with AirNZ these days?
21:42:45 <b1airo> yeah kiwi-air all the way ;-)
21:43:04 <janders> Nice. Good airline and nice aircraft.
21:43:16 <b1airo> ok i'd better run. martial , shall we finish up now or do you want to kick on and close things out?
21:43:34 <janders> I don't have anything else.
21:43:46 <martial> I can close
21:43:49 <b1airo> agreed janders , they can be a bit too progressive/cringe-worthy with their safety videos at times though
21:43:52 <martial> although not sure if we have much to add
21:44:00 <martial> #topic ISC BoF
21:44:10 <b1airo> ok, i'm off to the coffee cart. o/
21:44:19 <janders> safe travels mate!
21:44:48 <martial> like @b1airo mentioned, we are proposing a OpenStack HPC panel for ISC 19 (Frankfurt, June)
21:44:57 <janders> I haven't managed to get anything out of my colleagues who might be going to ISC so unfortunately can't contribute
21:45:28 <gt1437> I put in a talk in vHPC, so I might be there for ISC
21:46:07 <martial> gt1437: cool :)
21:46:13 <janders> nice!
21:47:03 <gt1437> and just realised the one after Denver is Shanghai, that's cool
21:47:36 <janders> oh wow
21:48:48 <janders> thanks, I didn't know (Shanghai was in my top 3 guesses though)
21:49:15 <gt1437> yeah thought it was going to be beijing, anyway, still good
21:49:43 <janders> any goss about the exact dates?
21:50:10 <janders> (the website states just "nov 2019")
21:50:21 <gt1437> no idea
21:51:50 <martial> no dates yet
21:54:55 <gt1437> I'm off to meetings..ciao
21:55:02 <janders> have a good day
21:55:29 <martial> have a good day all,
21:55:33 <martial> anything else?
21:55:49 <martial> otherwise just a reminder that we will have the Lighting talk at the summit :)
21:56:45 <janders> I think we're good
21:56:55 <janders> thank you Martial
21:57:10 <martial> #endmeeting