#openstack-meeting log

11:00:31 <oneswig> #startmeeting scientific-sig
11:00:32 <openstack> Meeting started Wed Jan 17 11:00:31 2018 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
11:00:33 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
11:00:35 <openstack> The meeting name has been set to 'scientific_sig'
11:00:40 <oneswig> ahoy there
11:00:42 <daveholland> morning
11:00:52 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_January_17th_2018
11:00:56 <oneswig> Hi daveholland, morning
11:01:04 <yankcrime> o/
11:01:10 <oneswig> Hi Nick!
11:01:15 <johnthetubaguy> o/
11:01:17 * ildikov is lurking :)
11:01:20 <oneswig> And John!
11:01:22 <strigazi> Hello
11:01:22 <vabada> hi
11:01:39 <oneswig> and indeed, hi everyone
11:01:54 * ildikov is looking at johnthetubaguy with cat eyes from Shrek :)
11:02:22 <priteau> Good morning!
11:02:37 <oneswig> While people gather, we have Spyros with us today, Magnum PTL.  Thanks for coming Spyros
11:02:41 <oneswig> Hi priteau
11:02:51 <belmoreira> hi
11:03:14 <oneswig> Ready to get stared?
11:03:18 <oneswig> started...
11:03:22 <oneswig> (hi belmoreira)
11:03:41 <oneswig> #topic Magnum for research computing use cases
11:04:17 <oneswig> Spyros, thanks for coming.  We have a ton of questions and I'm sure others do too.
11:04:50 <oneswig> Can you set the ball rolling by describing what's happening with Magnum and bare metal?  It appears to have really improved in the last year
11:05:08 <strigazi> Sure, thanks
11:05:41 <strigazi> Since the Newton cycle in 2016 magnum has the concept of cluster drivers
11:06:28 <strigazi> The goal of this change is to focus each cluster deployment in a combination of server type (vm|bm), Operating System and COE
11:07:11 <oneswig> A single driver for each 3-tuple of these?
11:07:16 <strigazi> Following this pattern we ended up having a driver for vm based drivers and ironic
11:07:19 <strigazi> yes
11:07:36 <strigazi> ed fedora-atomic, virtual machine, kubernetes
11:07:43 <strigazi> s/ed/eg
11:08:36 <strigazi> Ironic wasn't playing nice with neutron at that time
11:09:19 <strigazi> And the implementation was a assuming that all nodes are in a pre-existing network
11:09:46 <strigazi> plus we didn't have a way to make the usage of cinder optional.
11:10:07 <oneswig> All good points...
11:10:34 <oneswig> So what happened?
11:10:34 <strigazi> We used to have a cinder volume in each node moutned for container storage without the option to opt-out
11:11:30 <strigazi> So, we just made all these configurable and optionnal
11:11:56 * johnthetubaguy noodles about manila mounts for storage needs
11:12:02 <strigazi> We can now use the same setup for both VMs and physical servers
11:12:12 <oneswig> So Ironic is a support through a set of options.  But COE is still a different driver, right?
11:13:07 <priteau> strigazi: You mentioned issues between Ironic and Neutron. Do you require multi-tenant networking to be configured?
11:13:20 <strigazi> yes, we have a patch in flight to consolidate and weach COE is tied to the operating system
11:14:22 <strigazi> priteau: the usual use case is to have a private network where all nodes will be running. This network can be shared among tenants
11:15:11 <oneswig> strigazi: any plan to offer an unusual case?  ingest bandwidth is an issue for us
11:15:38 <strigazi> oneswig: what do you mean? What network configuration do you need?
11:16:01 <oneswig> anything without a neutron router process between the data source and the container cluster, ideally
11:17:06 <oneswig> For example, a simplified setup where the nodes are deployed bound to a pre-existing (provider) network.  Would work for us but we may be niche
11:17:37 <daveholland> that layout would be of interest to us too
11:17:38 <johnthetubaguy> I was thinking of using additional ips rather than floating ips, for example
11:18:02 <strigazi> Will you be able to create ports to that network?
11:18:18 <oneswig> strigazi: yes
11:19:00 * johnthetubaguy thinks you can pass ports into nova for ironic instances now, not 100% sure
11:19:24 <strigazi> I think that can done today already without patching magnum
11:19:46 <priteau> oneswig: Do you have a write-up of how you use these existing provider networks to bypass a centralized Neutron router? We would be interested in Chameleon to perform high-bandwidth experiments
11:20:38 <oneswig> strigazi: so we can configure Magnum not to create the network and router?
11:21:22 <oneswig> priteau: I'm not sure there's any rocket science to it - but our system is not as multi-tenant as yours.  Lets talk after.
11:21:22 <strigazi> oneswig: yes, that is already in pike for vms, ironic machines are expected to work in the same way in queens
11:21:39 <priteau> oneswig: Thanks, I don't want to go off-topic
11:22:05 <oneswig> strigazi: sounds good to me.
11:22:19 <oneswig> strigazi: we are a little off mainline but we may have problems with deploying using Atomic Host cloud images - they lack drivers for some of the krazy hardware in our nodes (IB, etc).  Is it possible to build our own images and bake this stuff in?
11:22:25 <strigazi> Getting the network architecture set correctly is the first thing to do
11:24:05 <strigazi> We used to have a diskimage-builder script for custom atomic hosts but we stopped using it and went for the upstream fedora releases. We can update it
11:24:39 <oneswig> strigazi: If it's on git, we'd be happy to refresh it.
11:24:44 <strigazi> The best option would be if those drivers could be installed on a running instance
11:25:03 <strigazi> Would that be possible?
11:25:25 <oneswig> Can be problematic for RAID devices, otherwise it should be
11:26:36 <strigazi> another option would be to use the standard images build with DIB and add the ostee storage for system containers
11:26:58 <johnthetubaguy> image builder is really good for ironic though, given ironic doesn't do snapshots
11:27:42 <oneswig> It's already part of our workflow in many other areas, we are immune to the pain
11:27:53 <oneswig> or perhaps numbed
11:27:58 <johnthetubaguy> heh
11:28:04 <johnthetubaguy> both probably
11:29:06 <strigazi> so we can have a working item after this meeting, run system container on non-atomic hosts OR build custom atomic images
11:29:30 <strigazi> always with diskimage-builder
11:29:43 <strigazi> you use diskimage-builder right?
11:29:55 <oneswig> strigazi: I haven't checked if DIB supports atomic as a target, but if it does, that sounds like a good plan to me.
11:31:26 <johnthetubaguy> those network requirements, its it really just the need to define the ports before building the instance (to get the IP address?)
11:31:41 <strigazi> johnthetubaguy: yes
11:32:07 <priteau> oneswig: I don't see atomic support in the DIB repo, but there is http://teknoarticles.blogspot.co.uk/2016/04/generate-fedora-atomic-images-using.html
11:32:32 <oneswig> Thanks priteau, I'll bookmark that
11:33:16 <oneswig> OK, did we have other questions on Magnum for now?
11:33:27 <strigazi> it is also here https://github.com/openstack/magnum/tree/master/magnum/drivers/common/image/fedora-atomic
11:33:36 <oneswig> It kind of ties in to the next topic which relates to the PTG activities
11:33:42 <strigazi> oneswig: I have a question towards users :)
11:34:05 <oneswig> strigazi: good link - I know that guy :-)
11:34:47 <strigazi> Is your main use case short or long living clusters?
11:35:25 <oneswig> For us on the SKA project, it's long-lived, by which we mean weeks
11:35:34 <oneswig> That seems to be the pattern so far.
11:35:56 <johnthetubaguy> not sure about the virtual lab case, depends if they run on the cluster, or they are clusters
11:36:02 <daveholland> at Sanger: possibly both/either (long-lived for replacing the current compute clusters; short-lived for burst-y adding capacity to those)
11:36:02 <oneswig> How short is short?
11:36:30 <strigazi> Let's say less than two months
11:36:56 <oneswig> One thing we'd love to see is dynamic ramp up/down in a demand driven cycle.  We have a use case for that
11:38:00 <oneswig> #link would be excellent to see a Magnum equivalent of this http://www.informaticslab.co.uk/dask/2017/07/21/adaptive-dask-clusters-on-kubernetes-and-aws.html
11:38:00 <johnthetubaguy> true, add/remove node is more likely than add/remove cluster
11:38:35 <strigazi> We can work on a kubernetes autoscaler for magnum
11:38:49 <oneswig> strigazi: do you know of anything underway?
11:39:16 <martial__> I agree, I think most of us have this need to autoscale
11:39:29 <oneswig> Hi martial__, morning
11:39:32 <oneswig> #chair martial__
11:39:33 <openstack> Current chairs: martial__ oneswig
11:39:36 <martial__> Hi Stig
11:39:38 <strigazi> Nothing at the moment, but since kube 1.8 or 1.9 it is possible to write autoscaler with user defined metrics
11:40:09 <oneswig> I believe there are public cloud plugins for this but nothing for OpenStack as of ~6 months ago
11:40:40 <strigazi> Well in openstack the resources usually are not infinite
11:41:15 <strigazi> The auto scaling feaure is not the same in a private cloud with quotas
11:41:31 <johnthetubaguy> although credit cards have limits too
11:41:42 <johnthetubaguy> but granted, its not identical
11:42:14 <oneswig> strigazi: http://www.stackhpc.com/baremetal-cloud-capacity.html - we do want to use the capacity we have to the fullest extent
11:43:02 <oneswig> We should move on
11:43:10 <strigazi> When we have preemptibles :)
11:43:11 <oneswig> Final items for Magnum?
11:43:21 <strigazi> One last thing from me.
11:43:23 <oneswig> strigazi: excellent.  To be continued...
11:44:20 <strigazi> Do you support server rebuilds on ironic nodes? We can continue another time if you want
11:44:34 <oneswig> strigazi: sometimes it helps to rebuild a node
11:44:53 <oneswig> I've used it.  Not in a Magnum context.  But people like the IPs they know...
11:45:38 <oneswig> We can continue in ~5 weeks at the PTG perhaps?
11:45:40 <strigazi> oneswig: sounds good to me, we want to base the upgrade capability on it (rebuild)
11:45:46 <strigazi> sure
11:46:13 <oneswig> strigazi: I've seen Nova people not liking rebuilds - johnthetubaguy can you comment?
11:47:24 <oneswig> Perhaps we should take that offline too
11:47:32 <oneswig> Time is pressing
11:47:34 <oneswig> #topic SIG representation at PTG
11:47:56 <oneswig> Aha, so we have some time during the cross-project phase to advocate use cases.
11:48:06 <martial__> should we have a second session on this topic?
11:48:29 <oneswig> Currently this has some CERN/SKA discussions and the Ironic configurable deploy steps
11:48:43 <oneswig> martial__: perhaps request input and follow up?
11:48:58 <johnthetubaguy> oneswig: thinking still, should work in theory.
11:49:35 <oneswig> #link deployment steps spec https://review.openstack.org/#/c/412523/
11:50:21 <oneswig> Use cases I've seen have requested kexec and boot-to-ramdisk, for example
11:50:29 <oneswig> All a bit unusual in a cloud mindset
11:50:37 <oneswig> but very useful for SIG members
11:50:48 <oneswig> priteau: are you still following this spec?
11:50:55 <johnthetubaguy> getting the use cases and context accurate really helps get the right design
11:52:00 <ildikov> johnthetubaguy: +1
11:52:20 <ildikov> oneswig: I have one item regarding PTG too
11:52:32 <oneswig> ildikov: go for it
11:52:33 <priteau> oneswig: for Chameleon I think many users may want to use the new "Boot from Volume" functionality when we move to Pike or later, but I am still interested in a more configurable Ironic -- I haven't fully reviewed the spec yet though.
11:52:54 <ildikov> oneswig: I'm dedicated that we will have multi-attach in Queens
11:53:10 <oneswig> You certainly are dedicated :-)
11:53:44 <ildikov> oneswig: well I hope johnthetubaguy has a half day to review the Nova patches like today or tomorrow the latest :)
11:54:00 <ildikov> oneswig: back to the topic, it's a first version and we're planning a cross-project session with Cinder and Nova on improvements for the PTG
11:54:18 <johnthetubaguy> its honestly looking like Friday :(
11:54:33 <oneswig> I can expect this to be extremely useful if we can do the all-read-only, cached mode that alas seems to be beyond this version
11:54:40 <ildikov> oneswig: and would love to have use cases and some input on how people are planning to use it
11:55:05 <ildikov> johnthetubaguy: I will ask melwitt if she might have a little time for it
11:55:35 <johnthetubaguy> its permissive, so you can do it with read/write volumes I think
11:55:38 <ildikov> johnthetubaguy: please do it on Friday then as the gate is blowing up all the time, so if anything needs to be fixed that's a bloodbath next week to get it done :(
11:56:27 <ildikov> johnthetubaguy: we're turning cache off which I think is the issue with the case oneswig mentioned
11:57:09 <oneswig> correct - would need a model where clients can cache or the fan-in of load will be bad
11:57:51 <johnthetubaguy> I didn't think it was that good of a cache we turned off, but that is a needed optimization
11:58:02 <johnthetubaguy> I know v1 is libvirt only too, I guess we might need ironic support
11:58:15 <priteau> oneswig: Do you have more details about when the SIG session would happen during the PTG? The Blazar meetings are getting moved to Monday and Tuesday to remove conflict with Nova sessions.
11:58:18 <ildikov> oneswig: it's definitely an interesting case as those settings happening at attach time and I'm not aware of changing that easily later
11:58:18 <oneswig> We should continue this - and will - in following meetings.  Thanks ildikov for raising it
11:58:36 <ildikov> oneswig: thanks for the opportunity :)
11:58:40 <oneswig> priteau: half day on either Monday or Tuesday AFAIK.
11:58:44 <ildikov> johnthetubaguy: +1 for Ironic
11:59:01 <martial__> sounds like a plan, lots of things to continue in a follow up meeting
11:59:07 <johnthetubaguy> would be great to get phase 2 planned for multi-attach
11:59:49 <oneswig> johnthetubaguy: +1 on that
12:00:02 <ildikov> johnthetubaguy: I just wish to get phase one in finally first :)
12:00:03 <oneswig> It has huge potential, I think
12:00:11 <johnthetubaguy> ildikov: +100
12:00:11 <oneswig> We are out of time, alas
12:00:31 <oneswig> And johnthetubaguy has just used up every plus-sign in the country
12:00:32 <martial__> lots of good things to follow up on
12:00:42 <ildikov> :)
12:00:43 * johnthetubaguy takes a bow
12:00:44 <oneswig> Thanks everyone
12:00:47 <oneswig> #endmeeting