15:02:23 #startmeeting openstack-cyborg 15:02:24 Meeting started Wed May 3 15:02:23 2017 UTC and is due to finish in 60 minutes. The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:02:26 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:02:29 The meeting name has been set to 'openstack_cyborg' 15:03:07 #topic BP discussion 15:03:20 let's get a quick review of the spec work 15:03:27 any outstanding issues ? 15:04:14 waiting for Roman to get back to me on my last patchset, he commented he had one comment on patchset 4 left but I think I addressed all of them so ¯\_(ツ)_/¯ 15:04:48 ooh there's a new driver patch, let me look at it. 15:05:10 haha i also notice that one 15:05:21 ping _gryf 15:07:51 crushil anything from your side ? 15:08:33 Nope. Just updated the patch like an hour ago. Nothing else. Will look at the summit presentation that you sent this morning 15:10:12 crushil, look at my slide on drivers, if our visions diverge we need to hash that out. Looking at your latest patch I'm not quite sure. 15:12:35 jkilpatr, Will do 15:15:15 jkilpatr, In any case, can you still review the latest patch? 15:15:20 already did 15:15:38 well I left one comment. 15:16:18 I will try to come back sooner with the next patchset. I was busy with downstream things for the past month. Should have more time this month, hopefully 15:22:52 what outstanding as far as reviews go 15:23:14 I think everyone's happy with the agent, speak now or forever hold your -2 15:23:56 I'm happy with the api patch so +1 15:25:09 cyborg/nova needs bullet points, too wordy right now. 15:39:36 agree jkilpatr on the agent 15:39:53 and i think driver's spec after some fixing from crushil 15:39:57 should be ok to merge as well 15:49:29 the api spec will need further polishing I guess 15:49:40 since ClintD also review ed that .... 15:50:00 okey let's move to the next topic 15:50:07 #topic BoS slide prep 15:50:27 ping jkilpatr goldenfri 15:50:43 let's discuss the flow 15:51:43 zhipeng, so the flow I'm getting right now is we introduce ourselves, talk a little about cyborg as a cool project for Telecom/NFV, then we get into talking about why everyone needs and can use accelerators, and how cyborg will make that usecase easy for everyone, original stakeholders included 15:52:36 are slides 7 and 8 redundant? and maybe we want to move 6 somewhere after 7 once we go from "hey nomand was a project for nfv hardware" to "cyborg is for everyone" 15:53:34 6 is like a starter for the conversation 15:53:51 that acceleration is a requirement, no longer just an icing on the top 15:54:03 and 7 and 8 provides the history 15:54:12 but I guess I should merge these two into one 15:54:17 7 and 8 15:54:59 do we want to move goldenfri's stuff forward? not sure it makes sense at the back? 15:55:00 so I've pinged Blair for some information on GPU requirements because I haven't heard anything recently 15:55:15 so I added a slide on GPU 15:55:26 but yea I don't think it should go there 15:55:47 ok so, intro, history, why nfv and gpgpu sucks on openstack right now, why everyone needs accelerators, cyborg and it's design 15:56:14 makes sense 15:56:15 that would put goldenfri's stuff somehwere around the background slides 15:56:34 goldenfri, go ahead and drop it where you think it makes the most sense right now. 15:57:07 maybe 18 should be after 10 ? 15:58:25 yes, so what I imagined is that 10 - 12 provides the motivation within the OpenStack 15:59:11 Background parts covers high level descriptions, identify the need in a broader sense 15:59:25 then we drill down into the details, sounds great. 15:59:31 yes 15:59:32 ah ok, I've moved it in there before the "why cyborg" 15:59:46 yes that looks good 16:00:22 goldenfri could you add another slide of intro on the SWG ? 16:00:34 yes, I'm going to work on it more today 16:00:41 that way you could kickoff the deep dive discussion with SWG requirements 16:00:41 I left the placeholder there 16:00:45 :) nice 16:01:03 and then Justin pick up and introduce the technical stuff 16:01:11 goldenfri, looking at your speaker notes. From what I understand you would spawn an instance then ping cyborg and say "attach a gpu to this instance" and cyborg takes care of the rest 16:01:21 right zhipeng? or are we using tags on instance creation? 16:01:35 yea I wasn't sure about that 16:01:57 I think so 16:01:58 the api design makes it look like the latter, but the nova interaction is fuzzy, if we attach the accelerator after the instance is spawned how do we make sure it's in the right spot? migration 16:02:22 I think for the attach action 16:02:30 user should not directly ping cyborg 16:02:35 it should just use nova 16:02:40 like how nova attach the volume 16:02:59 that is why Roman mentioned that there are modifications needed in nova 16:03:02 ok so some special flavor or tag? that's fine Cyborg helps with scheduling and setup in the first place. 16:03:11 yep 16:03:57 so it would be like nova attach GPU instance-id, and it will call cyborg api 16:04:12 so it would still need a tag to work with cyborg? 16:04:29 goldenfri, yes but cyborg would help handle other bits, like making sure the gpu is working, that it's not overloaded etc 16:04:33 not a tag, but just a resource class I think 16:04:41 like a tag 16:05:05 not to mention more thoughtful scheduling, don't fill up instances with accelerators until other computes are full 16:05:15 right, that would be huge 16:05:22 right 16:05:26 like a priority 16:05:52 goldenfri, right now if you have more than one gpu in a machine how do you make sure they all get used? 16:05:55 you just don't? 16:06:18 basically, you have to micromanage it 16:06:25 :( 16:06:39 well it won't let you spawn if there are no GPUs available 16:07:19 I think cyborg will really shine when gpu virtualization matures 16:07:47 then load monitoring becomes more important because it's a timesliced not a monlithic asset, but getting ahead of ourselves. 16:07:56 jkilpatr +1 16:08:05 I agree, there is also the issue of KVM tuning, cpu pinning etc 16:08:12 I assume cyborg won't do any of that 16:08:23 goldenfri, why not, just make a driver for it 16:08:53 That would be great 16:08:54 drivers just have attach/detach setup/uninstall commands, setup would just be a do nothing function, attach/detach would just pin to a cpu on the same NUMA as the gpu 16:08:56 that's what you want right? 16:09:13 then you would take the instance and tag it for gpu and gpu_pinning drivers and boom its done 16:09:13 yea because if you don't do any tuning performance is pretty bad 16:09:56 goldenfri, or you could just include pinning in your gpu 'driver'. This is why I think the drivers are the most important parts of Nova, its just the playbooks/ tools we already make to handle this just standardized to integrate with a management framework (cyborg api) 16:10:16 sounds good 16:11:33 shall we combine the bio slides into one ? 16:11:51 I think so 16:12:36 I don't care too much, but I like my Saturn V picture 16:12:51 yea that is a pretty sweet picture 16:12:54 haha i like that too 16:13:07 Jim you should pick a pic as well 16:13:18 yea I will 16:13:23 we could shrink the pics to put them at the bottom of the page 16:13:28 and sqeeze the text 16:14:01 zhipeng where did you want the SWG intro slide? 16:14:49 now it is page 10 16:14:54 i put the holder there 16:15:12 oh wait I see it 16:16:33 I'll add add something about using the gpu drivers for KVM turning later today, I think that is pretty compelling. 16:17:43 goldenfri, the point we're trying to get across is that cyborg drivers can be for anything you want to do to accelerate instances, if it means finding a non numa compute for your program that's great, write one. 16:18:13 yea I think that is very important 16:18:26 by providing a framework that's good enough to work with arbitrary accelerators it has to be good enough to do basic tunings, so we may as well make them drivers too so they can take advantage of the management framework 16:18:40 at a high level cyborg is a scripting standard and scripting management engine. 16:20:52 and in that thought, Cinder should be this way as well :P 16:21:03 all the device mgmt modules should be designed this way 16:21:13 :) 16:24:47 zhipeng, see how I slipped in the driver on slide 17 16:25:11 the diagram as problems because if we draw enough lines to cover everything we have a blob not a slide. 16:25:28 haha yes 16:55:25 #endmeeting