11:00:19 <oneswig> #startmeeting scientific-sig
11:00:20 <openstack> Meeting started Wed Jan 29 11:00:19 2020 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
11:00:21 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
11:00:23 <openstack> The meeting name has been set to 'scientific_sig'
11:00:36 <oneswig> Hello
11:00:58 <oneswig> ... echo ...
11:05:03 <ttx> ... o ...
11:05:49 <oneswig> hi ttx
11:06:05 <oneswig> quiet, real quiet today (should have posted an agenda... :-(
11:06:12 <ttx> I replied so that you did not feel too lonely :)
11:06:57 <oneswig> We usually pull in 4-5 but there were no agenda items from the sig slack channel
11:08:26 <janders> g'day all
11:08:30 <janders> sorry mac issues
11:08:47 <janders> time to get a real laptop not a toy
11:08:55 <oneswig> Hi janders, just us and ttx right now...
11:09:23 <janders> I did see that mention of the Mellanox HDR OpenStack press release from last week's log
11:09:24 <oneswig> Macbook?  Mine runs IRC just fine... got about 6GB in swap doing something though.
11:09:48 <janders> my 5 cents - it was indeed running fine for quite a while, I think they just streamlined it a little more
11:09:56 <oneswig> janders: is that your doing I wonder?
11:10:16 <oneswig> what's different with hdr that required change?
11:10:20 <janders> I might have contributed a little :)  but mostly good work by MLNX
11:11:00 <janders> I think most work was around tripleo integration bits
11:11:20 <oneswig> Ah, Ok.
11:11:21 <janders> HDR... good question - I haven't played with HDR200 vfs
11:11:36 <janders> I've got HDR200 storage and HDR100 computes
11:11:43 <oneswig> I've been having lots of fun this last week with VF-LAG
11:11:47 <janders> *maybe* there was a bit of fiddling there
11:12:29 <janders> specifically - HDR200 on PCIe3
11:12:39 <janders> other than that - I think it was most integration work
11:12:49 <janders> VF-LAG - interesting!
11:12:57 <janders> what is it like?
11:13:24 <janders> I've never touched it - and it could be interesting for my cyber system going forward in certain circumstances
11:13:39 <oneswig> Jury's out at present.  The major issue I had was the systemd/udev plumbing to put VFs into switchdev mode instead of "legacy" mode.
11:14:10 <oneswig> Curiously, it worked in legacy mode but I haven't got around to trying it in switchdev mode yet.  That's todays fun.
11:14:31 <janders> what does it look like from VM owner perspective? ethX and ethY, each coming off a different port?
11:14:37 <oneswig> When I was misconfiguring it, I got Tx bandwidth of 1 port and it appeared I could receive on either port.
11:14:56 <oneswig> janders: no, 1 VF, somehow coupled to 2 PFs.
11:14:59 <janders> or is it magicnic0 which stays up despite ports backing it going up and down?
11:15:09 <janders> ok so option 2) nice!
11:15:12 <oneswig> I think that's the idea.  Haven't tested the failover bit yet.
11:15:37 <janders> is the promise that you can aggregate bandwidth and failover at the same time?
11:16:32 <oneswig> The aggregated bandwidth needs verifying but the failover capability is intended for sure.
11:16:38 <oneswig> I'll let you know!
11:17:38 <oneswig> Any news on Supercloud?
11:18:11 <oneswig> btw I'm hoping to write up the VF-LAG experience for next week, it's almost completely undocumented fwiw
11:18:45 <janders> actually I do have some news, what might be disappointing is these are not very performance related
11:18:58 <janders> SuperCloud got bogged down in the cyber security realm for some time it seems
11:19:15 <janders> but I have hit some interesting challanges and issues lately
11:19:51 <janders> in terms of cyber specifics I proposed the volume-transfer based mechanism of copying unsafe data in (and at a later stage - possibly copying data out)
11:20:05 <janders> will see what my security team have to say about this one
11:20:19 <janders> but so far a so good, people seem to be liking the idea
11:20:23 <oneswig> this is malware payloads for analysis?
11:20:30 <janders> yes, in encrypted form
11:20:51 <oneswig> Loving your work, janders :-)
11:21:01 <janders> I might try submit something for Vancouver on this, it is getting quite interesting really
11:21:16 <janders> very different than what I typically do
11:21:19 <oneswig> Is this into bare metal instances?
11:21:25 <janders> VMs at this stage
11:21:36 <janders> but you raised an interesting point
11:21:56 <oneswig> probably wise unless you really trust your layers of defence :-)
11:22:09 <janders> this system will be used by guys who uncovered spectre/meltdown so looking at hardware layer security might be in scope
11:22:25 <janders> it wasn't brought up yet, so mostly looking at VMs
11:22:59 <janders> I should probably say - guys who participated in uncovering spectre/meltdown - it was a big, orchestrated effort
11:23:11 <janders> so that's one interesting bit
11:23:16 <janders> the other bit is more operational
11:23:33 <janders> hit some interesting failure modes in OSP13 which likely impact later releases, too
11:23:35 <janders> BZ coming soon
11:23:47 <janders> https://bugzilla.redhat.com/show_bug.cgi?id=1795402
11:23:47 <oneswig> You might need to give those researchers access to bios settings, microcode etc.
11:23:48 <openstack> bugzilla.redhat.com bug 1795402 in openstack-nova "Nova list --all-tenants | ERROR (ClientException): Unexpected API Error. | <type 'exceptions.TypeError'> (HTTP 500)" [Medium,New] - Assigned to nova-maint
11:24:09 <janders> essentially we had AMQP dropout brick an entire project
11:24:14 <oneswig> typeerror - suggests a code path nobody else is tickling
11:24:35 <janders> (and "openstack server list --all-projects")
11:24:48 <janders> I think it is rare, but when you hit it it does get very ugly very quickly
11:24:59 <oneswig> ouch
11:25:03 <oneswig> is that Queens?
11:25:24 <janders> essentially if AMQP dropout happens in a very bad time during instance creation, instance ends up with missing fields in DB and nova can't handle that
11:25:26 <janders> yes, queens
11:25:44 <janders> OSP16 should be out soon, but it will probably be a while before we upgrade
11:26:10 <janders> I fixed this within an hour from when it happened by setting deleted=1 for instance and re-creating
11:26:11 <oneswig> Are you getting rabbit issues often enough to hit this?
11:26:19 <janders> it only happened once so far
11:26:33 <oneswig> Nice work for getting to the bottom of it!
11:26:33 <janders> but as per Murphy's law it happened at the least appropriate time
11:26:52 <janders> when I had a contractor set up something time-sensitive
11:27:14 <janders> so I did a quick and dirty fix and then had RHAT guys had a look
11:27:30 <janders> they have some really smart fellas up in Brisbane
11:27:42 <janders> the "proper" fix in the BZ was their work
11:28:04 <janders> it's interesting how nova ended up in that state - I thought I will share
11:28:33 <oneswig> thanks, really interesting to know!
11:28:41 <janders> there are known issues with OSP13 AMQP connection handling in nova-compute which are fixed in latest minor updates
11:28:48 <janders> those might be affecting later versions, not sure
11:28:59 <janders> so the root cause of this is likely fixed
11:29:08 <janders> but in case of future errors... nova should be smarter
11:29:17 <janders> I hope to see some resiliency enhancementw
11:29:30 <janders> it should purge network cache itself instead of needing this
11:29:47 <janders> supposedly there is a periodic task that does that but what im hearing is it is yet to be seen to fix anything
11:30:15 <janders> so - if you ever hit this (hope not) here's the SQL :)
11:31:20 <janders> other than that I will be putting in some policy customisations into heat - and splitting GPFS backend for "connected" and "disconnected" instances soon
11:31:37 <janders> (connected=with_floating_ips disconnected=only_accessible_via_vnc in our terminology)
11:32:35 <janders> so - long story short - not a lot of action, but some interesting stuff happening
11:32:50 <oneswig> sounds like you've had your hands busy!
11:33:20 <janders> a little, yes! :)
11:33:43 <oneswig> Thanks for the update, always good to hear your news
11:34:49 <janders> https://photos.app.goo.gl/dcLYzuRMNeHJKHv4A this nasty thing has been keeping us busy lately
11:35:22 <janders> Canberra has been affected by bushfire smoke lately but was lucky enough to avoid big fires nearby... till this week
11:35:32 <janders> so far it's not too bad but we gotta keep an eye on it
11:35:32 <oneswig> ouch.  Take care janders!
11:35:56 <janders> thanks oneswig :)
11:36:15 <oneswig> I don't think there's anything else to cover here.
11:36:19 <janders> do you know when does CFP for Vancouver open?
11:36:35 <janders> I keep checking the website but haven't seen much so far
11:36:55 <priteau> janders: Given the new format of this event, I am not sure if there will be a CFP as usual?
11:37:09 <janders> interesting!
11:37:12 <priteau> Maybe ttx knows more about this
11:37:19 <janders> would you be happy to explain the new format in a nutshell/
11:37:51 <priteau> I only know what's publicly available on the eventbrite page
11:38:00 <priteau> Each day will be broken into three parts:
11:38:10 <priteau> Short kickoff with all attendees to set the goals for the day or discuss the outcomes of the previous day
11:38:22 <priteau> OpenDev: Morning discussions covering projects like Ansible, Ceph, Kubernetes, OpenStack, and more
11:38:38 <priteau> PTG: Afternoon working sessions for project teams and SIGs to continue the morning’s discussions.
11:38:51 <priteau> See https://www.eventbrite.com/e/opendev-ptg-vancouver-2020-tickets-88923270897
11:38:56 <janders> sounds like less presentations, more discussions?
11:39:47 <priteau> Yes, it explicitely says: OpenDev [...] will include discussion oriented sessions around a particular topic to explore a problem within a topic area, share common architectures, and collaborate around potential solutions.
11:40:10 <janders> looks like the balance between the "main" summit and PTG is about to turn upside-down...
11:40:34 <janders> which might not be a bad thing, but it is indeed very different than what it used to be
11:40:45 <janders> good? bad? what do you guys think?
11:41:08 <priteau> AFAIK there will still be a usual summit in Q4 2020
11:41:27 <priteau> Dunno if this new event is an improvement, we shall see
11:41:30 <oneswig> I like the focus of the new session, I hope it turns out how it is being described
11:41:51 <janders> one tricky bit is
11:42:08 <janders> with many organisations getting travel approvals is heaps easier when you have a presentation slot in
11:42:14 <janders> (that does include CSIRO)
11:42:41 <oneswig> lightning talk slot in a sig session perhaps :-)
11:42:56 <janders> haha :D  I like your thinking oneswig
11:43:45 <janders> if SARS-ng epidemic isn't too bad I'd very much like to re-visit Vancouver (and probably LA/California on the way back)
11:44:34 <janders> looks like Qantas is now flying direct SYD-YVR - AND getting out of US is so much easier than getting in, immigration-queues wise
11:44:54 <janders> the above would be my way to go
11:45:42 <janders> June is the right time to do this, too
11:46:45 <janders> allright! it's getting close to the hour - so if you guys would like to wrap up, please go ahead
11:46:55 <janders> good chatting - and we'll chat next week
11:47:07 <ttx> re opendev
11:47:12 <oneswig> same to you janders
11:47:22 <ttx> there will not really be a CFP per se
11:48:09 <janders> ttx so more like tracks moderated by project leads, etc?
11:48:09 <ttx> each theme should have a programming committee responsible for picking content. Those may use a CFP, but probably will just reach out to find suitable speakers.... Presentations are just one side of the event
11:48:14 <ttx> most of it is open discussion
11:48:47 <ttx> More details will be out soon as we push out the call for programming committee members
11:49:04 <oneswig> great, thanks for clarifying ttx
11:49:52 <janders> +1
11:50:20 <oneswig> OK, any final items?
11:51:08 <janders> ttx we're happy to help as the SIG :)
11:52:21 <janders> oneswig I think we're good. See you next week! :)
11:52:30 <oneswig> likewise janders
11:52:33 <oneswig> #endmeeting