15:02:23 <zhipeng> #startmeeting openstack-cyborg
15:02:24 <openstack> Meeting started Wed May  3 15:02:23 2017 UTC and is due to finish in 60 minutes.  The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:02:26 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:02:29 <openstack> The meeting name has been set to 'openstack_cyborg'
15:03:07 <zhipeng> #topic BP discussion
15:03:20 <zhipeng> let's get a quick review of the spec work
15:03:27 <zhipeng> any outstanding issues ?
15:04:14 <jkilpatr> waiting for Roman to get back to me on my last patchset, he commented he had one comment on patchset 4 left but I think I addressed all of them so   ¯\_(ツ)_/¯
15:04:48 <jkilpatr> ooh there's a new driver patch, let me look at it.
15:05:10 <zhipeng> haha i also notice that one
15:05:21 <zhipeng> ping _gryf
15:07:51 <zhipeng> crushil anything from your side ?
15:08:33 <crushil> Nope. Just updated the patch like an hour ago. Nothing else. Will look at the summit presentation that you sent this morning
15:10:12 <jkilpatr> crushil, look at my slide on drivers, if our visions diverge we need to hash that out. Looking at your latest patch I'm not quite sure.
15:12:35 <crushil> jkilpatr, Will do
15:15:15 <crushil> jkilpatr, In any case, can you still review the latest patch?
15:15:20 <jkilpatr> already did
15:15:38 <jkilpatr> well I left one comment.
15:16:18 <crushil> I will try to come back sooner with the next patchset. I was busy with downstream things for the past month. Should have more time this month, hopefully
15:22:52 <jkilpatr> what outstanding as far as reviews go
15:23:14 <jkilpatr> I think everyone's happy with the agent, speak now or forever hold your -2
15:23:56 <jkilpatr> I'm happy with the api patch so +1
15:25:09 <jkilpatr> cyborg/nova needs bullet points, too wordy right now.
15:39:36 <zhipeng> agree jkilpatr on the agent
15:39:53 <zhipeng> and i think driver's spec after some fixing from crushil
15:39:57 <zhipeng> should be ok to merge as well
15:49:29 <zhipeng> the api spec will need further polishing I guess
15:49:40 <zhipeng> since ClintD also review ed that ....
15:50:00 <zhipeng> okey let's move to the next topic
15:50:07 <zhipeng> #topic BoS slide prep
15:50:27 <zhipeng> ping jkilpatr goldenfri
15:50:43 <zhipeng> let's discuss the flow
15:51:43 <jkilpatr> zhipeng, so the flow I'm getting right now is  we introduce ourselves, talk a little about cyborg as a cool project for Telecom/NFV, then we get into talking about why everyone needs and can use accelerators, and how cyborg will make that usecase easy for everyone, original stakeholders included
15:52:36 <jkilpatr> are slides 7 and 8 redundant? and maybe we want to move 6 somewhere after 7 once we go from "hey nomand was a project for nfv hardware" to "cyborg is for everyone"
15:53:34 <zhipeng> 6 is like a starter for the conversation
15:53:51 <zhipeng> that acceleration is a requirement, no longer just an icing on the top
15:54:03 <zhipeng> and 7 and 8 provides the history
15:54:12 <zhipeng> but I guess I should merge these two into one
15:54:17 <zhipeng> 7 and 8
15:54:59 <jkilpatr> do we want to move goldenfri's stuff forward? not sure it makes sense at the back?
15:55:00 <goldenfri> so I've pinged Blair for some information on GPU requirements because I haven't heard anything recently
15:55:15 <goldenfri> so I added a slide on GPU
15:55:26 <goldenfri> but yea I don't think it should go there
15:55:47 <jkilpatr> ok so, intro, history, why nfv and gpgpu sucks on openstack right now, why everyone needs accelerators, cyborg and it's design
15:56:14 <goldenfri> makes sense
15:56:15 <jkilpatr> that would put goldenfri's stuff somehwere around the background slides
15:56:34 <jkilpatr> goldenfri, go ahead and drop it where you think it makes the most sense right now.
15:57:07 <zhipeng> maybe 18 should be after 10 ?
15:58:25 <zhipeng> yes, so what I imagined is that 10 - 12 provides the motivation within the OpenStack
15:59:11 <zhipeng> Background parts covers high level descriptions, identify the need in a broader sense
15:59:25 <jkilpatr> then we drill down into the details, sounds great.
15:59:31 <zhipeng> yes
15:59:32 <goldenfri> ah ok, I've moved it in there before the "why cyborg"
15:59:46 <zhipeng> yes that looks good
16:00:22 <zhipeng> goldenfri could you add another slide of intro on the SWG ?
16:00:34 <goldenfri> yes, I'm going to work on it more today
16:00:41 <zhipeng> that way you could kickoff the deep dive discussion with SWG requirements
16:00:41 <goldenfri> I left the placeholder there
16:00:45 <zhipeng> :) nice
16:01:03 <zhipeng> and then Justin pick up and introduce the technical stuff
16:01:11 <jkilpatr> goldenfri, looking at your speaker notes. From what I understand you would spawn an instance then ping cyborg and say "attach a gpu to this instance" and cyborg takes care of the rest
16:01:21 <jkilpatr> right zhipeng? or are we using tags on instance creation?
16:01:35 <goldenfri> yea I wasn't sure about that
16:01:57 <zhipeng> I think so
16:01:58 <jkilpatr> the api design makes it look like the latter, but the nova interaction is fuzzy, if we attach the accelerator after the instance is spawned how do we make sure it's in the right spot? migration
16:02:22 <zhipeng> I think for the attach action
16:02:30 <zhipeng> user should not directly ping cyborg
16:02:35 <zhipeng> it should just use nova
16:02:40 <zhipeng> like how nova attach the volume
16:02:59 <zhipeng> that is why Roman mentioned that there are modifications needed in nova
16:03:02 <jkilpatr> ok so some special flavor or tag? that's fine Cyborg helps with scheduling and setup in the first place.
16:03:11 <zhipeng> yep
16:03:57 <zhipeng> so it would be like nova attach GPU instance-id, and it will call cyborg api
16:04:12 <goldenfri> so it would still need a tag to work with cyborg?
16:04:29 <jkilpatr> goldenfri, yes but cyborg would help handle other bits, like making sure the gpu is working, that it's not overloaded etc
16:04:33 <zhipeng> not a tag, but just a resource class I think
16:04:41 <zhipeng> like a tag
16:05:05 <jkilpatr> not to mention more thoughtful scheduling, don't fill up instances with accelerators until other computes are full
16:05:15 <goldenfri> right, that would be huge
16:05:22 <zhipeng> right
16:05:26 <goldenfri> like a priority
16:05:52 <jkilpatr> goldenfri, right now if you have more than one gpu in a machine how do you make sure they all get used?
16:05:55 <jkilpatr> you just don't?
16:06:18 <goldenfri> basically, you have to micromanage it
16:06:25 <jkilpatr> :(
16:06:39 <goldenfri> well it won't let you spawn if there are no GPUs available
16:07:19 <jkilpatr> I think cyborg will really shine when gpu virtualization matures
16:07:47 <jkilpatr> then load monitoring becomes more important because it's a timesliced not a monlithic asset, but getting ahead of ourselves.
16:07:56 <zhipeng> jkilpatr +1
16:08:05 <goldenfri> I agree, there is also the issue of KVM tuning, cpu pinning etc
16:08:12 <goldenfri> I assume cyborg won't do any of that
16:08:23 <jkilpatr> goldenfri, why not, just make a driver for it
16:08:53 <goldenfri> That would be great
16:08:54 <jkilpatr> drivers just have attach/detach setup/uninstall commands, setup would just be a do nothing function, attach/detach would just pin to a cpu on the same NUMA as the gpu
16:08:56 <jkilpatr> that's what you want right?
16:09:13 <jkilpatr> then you would take the instance and tag it for gpu and gpu_pinning drivers and boom its done
16:09:13 <goldenfri> yea because if you don't do any tuning performance is pretty bad
16:09:56 <jkilpatr> goldenfri, or you could just include pinning in your gpu 'driver'. This is why I think the drivers are the most important parts of Nova, its just the playbooks/ tools we already make to handle this just standardized to integrate with a management framework (cyborg api)
16:10:16 <goldenfri> sounds good
16:11:33 <zhipeng> shall we combine the bio slides into one ?
16:11:51 <goldenfri> I think so
16:12:36 <jkilpatr> I don't care too much, but I like my Saturn V picture
16:12:51 <goldenfri> yea that is a pretty sweet picture
16:12:54 <zhipeng> haha i like that too
16:13:07 <zhipeng> Jim you should pick a pic as well
16:13:18 <goldenfri> yea I will
16:13:23 <zhipeng> we could shrink the pics to put them at the bottom of the page
16:13:28 <zhipeng> and sqeeze the text
16:14:01 <goldenfri> zhipeng where did you want the SWG intro slide?
16:14:49 <zhipeng> now it is page 10
16:14:54 <zhipeng> i put the holder there
16:15:12 <goldenfri> oh wait I see it
16:16:33 <goldenfri> I'll add add something about using the gpu drivers for KVM turning later today, I think that is pretty compelling.
16:17:43 <jkilpatr> goldenfri, the point we're trying to get across is that cyborg drivers can be for anything you want to do to accelerate instances, if it means finding a non numa compute for your program that's great, write one.
16:18:13 <goldenfri> yea I think that is very important
16:18:26 <jkilpatr> by providing a framework that's good enough to work with arbitrary accelerators it has to be good enough to do basic tunings, so we may as well make them drivers too so they can take advantage of the management framework
16:18:40 <jkilpatr> at a high level cyborg is a scripting standard and scripting management engine.
16:20:52 <zhipeng> and in that thought, Cinder should be this way as well :P
16:21:03 <zhipeng> all the device mgmt modules should be designed this way
16:21:13 <goldenfri> :)
16:24:47 <jkilpatr> zhipeng, see how I slipped in the driver on slide 17
16:25:11 <jkilpatr> the diagram as problems because if we draw enough lines to cover everything we have a blob not a slide.
16:25:28 <zhipeng> haha yes
16:55:25 <zhipeng> #endmeeting