14:05:26 #startmeeting openstack-cyborg 14:05:27 Meeting started Wed Mar 28 14:05:26 2018 UTC and is due to finish in 60 minutes. The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:05:28 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:05:30 The meeting name has been set to 'openstack_cyborg' 14:05:45 #topic cyborg gpu discussion 14:06:00 today we invited jianghua, who is the author of the nova vGPU spec here 14:06:18 to discuss how cyborg could better support vGPU/GPU in general 14:06:20 welcome jianghuaw_ 14:06:35 this is also a topic came out at the Scientific SIG meeting earlier today 14:07:02 shaohe_feng, thanks. 14:07:31 jianghuaw_ we touched a bit on the subject in Dublin, I remember one of the tentative conclusion is that we want to see if it is possible for cyborg to handle more nuanced vGPU topology representation 14:07:54 whereas the implementation at the Nova side remains abstract 14:07:58 yeap 14:08:11 jianghuaw_ do you still think this is the way to go ? what are your thoughts ? 14:09:31 yes, I think it's worthy to try to cover vGPU in Cyborg. 14:10:00 But tbh I don't have clear idea how Cyborg is planning to handle it. 14:10:48 is it possible that we start with implementing the tree structured vGPU topology model in cyborg ? 14:10:50 When we implement it in Nova, we have one problem is how to expose the different vGPU types on single pGPUs. 14:11:44 ATM the workaround is to use white list to restrict each pGPU only expose on vGPU type. 14:12:48 zhipeng, I think yes we can implement n-rp vGPU topology model. 14:13:11 nova has plan to move the vGPU from compute node to nestest resource provider. 14:13:39 I think for cyborg, we can directly start from n-RP. 14:13:55 what is n-RP 14:14:01 okey , and that would help address the "expose the different vGPU types on single pGPUs" issue ? 14:14:11 n-RP = nested resource provider 14:14:23 got it. 14:14:29 BTW, the problem of different vGPU types is analogous to multi-function accelerators in FPGAs, as Alex Xu pointed out. Just as you could have different counts of vGPU types (vGPU-type-A: N, vGPU-type-B: M), for a multi-function accelerator, we have to expose (region-type-A: 1, function-type-B: N). 14:15:02 The difference of course is that, you could program a FPGA with a different bitstream which has only a single function 14:15:24 yes, but we still have not identified a proper way to handle it. 14:15:31 even with n-rp. 14:15:47 Yes, not solved on FPGA side too :) 14:15:51 isnt that a process on the gpu or fpga? 14:16:44 @NokMikeR: can you elaborate? 14:16:44 jianghuaw_: a pGPU can support mult-type vGPU, but if we create one type, then the following vGPU request should be the same type? 14:16:59 shaohe_feng, yes. 14:17:28 gpus have a host process list, if you map n virtual gpus to a physical gpu each one would be a process? at least according to how I understand the nvidia driver. 14:18:02 Sundar: multi-function means multi-BDF. one function means one BDF? 14:18:13 so vgpu-type-a: 2; vgpu-type-b: 4; after consuming a vGPU-type-a; the remaining inventory is: vgpu-type-a: 1; vgpu-type-b: 0 14:18:39 @shaohe: yes, we will tie each accelerator function to a separate BDF 14:19:54 jianghuaw_: these types are non-orthogonal, which means they mutual effect 14:19:59 @jianghuaw: If the use of vgpu-type-a falls to 0, will you restore the original inventory of vgpu-type-a: 2; vgpu-type-b: 4 ? 14:20:38 I mean, would we want to restore? 14:20:50 Ideally it should but it's not supported by nova ATM. 14:21:07 yes, they should restore. 14:21:20 What's implemented in nova is to use an option to specify which vgpu is enabled. 14:21:35 so each PGPU only only expose one vGPU type. 14:21:50 s/which vgpu/which vgpu type/ 14:22:11 yes, there is no way to consume a resource and update an inventory at same time. If we do that, there must be race window between consume res and update inv. 14:22:16 It seems acceptable by customers. 14:22:42 alex_xu_, indeed. 14:22:57 Is there a scenario where the inventory can change from (vgpu-type-a: 2; vgpu-type-b: 4) to, say, (vgpu-type-a: 1; vgpu-type-b: 1)? 14:23:28 Sundar: you can do that with inventory update api 14:23:51 in the case there isn't any allocation over the 1 14:24:25 Sundar: placement can not support this scenario. but on an baremetal, we could support it manully. 14:24:49 @alex_xu_: If we do that from Cyborg agent, and there happens to be a request in flight in the controller to use one of those types, we may have a race. 14:25:54 Sundar: there is 'generation' field that is used check whether the info you already has is old or not 14:26:07 if the cyborg use the 'generation' field correctly, that won't be a problem 14:27:20 yes. every update need check generation to keep consistency 14:28:35 one type on a host is a simple solution, but can work. 14:29:01 We can go into the details offline. Basically, if there is a way to safely update inventory when resources are not in use, we can probably leverage a common approach for vGPU and multi-function accelerators. 14:29:37 sounds good :) 14:29:44 jianghuaw_ would it be possible for you to dig out the n-rp spec and submit it to cyborg ? 14:30:16 One type on a host is probably too restrictive? I believe Nova IRC discussions indicated one type per GPU device? 14:30:35 zhipeng, I'm not sure if I have time recently. I will start a long vacation since next week. 14:30:56 jianghuaw_ I meant just reuse the old version of the vGPU spec 14:31:07 But if anyone have interest, I'd be happy to give some help. 14:31:20 yes, one type per device. 14:31:41 This, right? https://specs.openstack.org/openstack/nova-specs/specs/ocata/approved/nested-resource-providers.html 14:32:09 https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/add-support-for-vgpu.html 14:32:21 ^ this is the nova spec for vgpu. 14:32:37 Sundar: a region is one BDF, after multi-function(for 4 functions) will become 4 BDF? 14:32:59 there will be some minor update on that. But most of the things have been covered in that spec. 14:33:47 kosamara would you be interested in maybe take a look at or revamp the vgpu spec ? 14:33:58 we could hammer out a general gpu spec 14:34:10 that is another option 14:35:29 one question: does Cyborg have mechanisms to support different hypervisors? 14:35:54 something like virt driver in nova? 14:36:17 shaohe_feng Sundar, I don't think so, right ? 14:36:46 Not yet but we probably should have :) 14:36:47 I think different hypervisors have different approach to expose resources e.g. vGPU. 14:37:01 yes that is another point raised in Dublin 14:37:01 zhipeng: yes, at present, I only consider kvm. but we can support it later. 14:37:02 ok. got it. 14:37:07 when a vGPU instance is created is a new driver associated to that by the hypervisor? do we need full display capabilities at all? hence back to my original point about physical vs virtual processes on the gpu. except I dont know how that works in practice. 14:37:58 * alex_xu_ lurks in the background again 14:39:43 NokMikeR i think that is the case 14:40:26 jianghuaw_ what is the next step you would suggest ? 14:42:38 okey uncomfortable silence, let's move on then, we could discuss more offline :) 14:42:41 Sorry zhipeng, I was away. I can take a look and try to find points for discussion around it. 14:42:51 kosamara thx ! 14:43:10 Yumeng_ Yumeng__ are you around ? 14:43:15 yes 14:43:31 #topic Clock Driver introduction 14:43:39 the floor is yours :) 14:43:42 ok 14:43:57 Yumeng_: you have a majia. :) 14:44:02 ZTE wants to discuss a use case where we want to use cyborg to manage a nonstandard hardware device. 14:44:08 The device is a high precision time synchronization card. (Let��s use HPTS card for abbreviation.) It��s a hardware in a separate unit from the CPU and support multiple VMs synchronization. 14:44:18 We hope Cyborg can do the followings as the management software: 14:44:18 Discover & report devices: When Cyborg agent starts or does resource checking periodically, the Cyborg HPTS driver should enumerate the list of HPTS devices and report the details of all available HPTS cards on the host such as resolution, card type(Master/slave),Vendor ID, PF/VF numbers, supported protocols (PTP/NTP) etc. 14:44:38 The motivation of designing this HPTS card is to address the high requirements of time synchronization in 5G Ultra Reliable & Low Latency Communication (uRLLC) scenarios. Furthermore, Time synchronization is essential to telecommunications network elements. The Information related to billing, maintenance, security, and authentication involved in various subnetworks requires a common standard ��time��. And the required resolution for time synchr 14:46:09 Yumeng_ many chars unrecognized 14:46:26 undestandable here. 14:46:56 Would multiple VMs be connected to the same card? If so, via PCI function passthrough? What is the need for multiple HPTS cards in same host? 14:48:03 yes it's possible for multiple VMs to be connected to the same card 14:48:22 NokMikeR do you think this is a valid use case ? 14:49:06 What is the need for multiple HPTS cards in same host? 14:49:30 one card in the same host 14:49:52 for multiple VMs to sync time 14:50:04 It is but Im having a problem understanding where or why cyborg is used here, the coherency of a system would come from the host, or are we talking multiple hosts multiple HPTS cards and you need to ensure they are all in sync? 14:50:07 one card for one host is enough 14:50:26 How many VFs does one HPTS card support? 14:51:33 @NokMikeR: Cyborg handles lifecycle management of all devices: discovery, publishing in Nova placement, attaching devices to VMs, termination, etc. 14:52:34 Yumeng_ do you want to develop a ZTE HPTS card specific driver ? or a general HPTS driver ? 14:52:38 @Yumeng_: You said, "all available HPTS cards on the host". But the use case is just one card per host. Got it. 14:52:49 that maybe could talk to different vendor HPTS cards ? 14:53:32 dolpher: that's not a fixed number. it depends on the resource capability 14:54:40 @Yumeng__: since you mentioned PF/VF, SR-IOV is implied. Are the VFs attached to VMs with PCI passthrough? (Just to be clear) 14:54:44 zhipeng: for this HPTS card, it would be a specific driver. but it's possible we can develop a general one later 14:55:28 okey 14:56:32 4 VMs read the HPTS card on the physical hosts bus, wouldnt each vm get a slightly different time based on how they are schedueled or is the HPTS only returning a time based on the average non-instantaneous value? 14:57:28 Sundar: yes, PCI passthrough could be a way 14:58:47 If it's sr-iov, I guess it should be pci passthrough. 14:59:14 Yumeng_, do you have some link describing this HPTS? 15:01:22 jianghuaw_:yep 15:01:32 NokMikeRemm.. Good question. tbh, the specific schema are still under discussion.: 15:02:30 okey I think it would be better for yumeng to send an email to the mailing list for a detailed description of the use case 15:02:37 with all the necessary information 15:02:39 jianghuaw_: I would give out one link later. 15:02:50 Id be more concerned if the underlying virtualisation can even support the resolution the e.g GPSDO can offer. But its a very good use case and definately needs supporting. 15:03:09 Yumeng_, awesome. thanks. 15:03:20 move to ml discussion, agree ? 15:04:16 yep pls go ahead 15:04:39 #action Yumeng_ to provide detailed description on mailing list 15:04:56 #topic current development discussion 15:05:38 shaohe_feng has helped put together an etherpad for scenario description when a user request fpga resources 15:05:47 #link https://etherpad.openstack.org/p/cyborg-fpga-request-scenarios 15:07:12 I responded there. There is also a description of use cases in the Cyborg/Nova scheduling spec 15:07:18 shaohe_feng could you help explain a little bit ? 15:07:52 thx Sundar 15:08:10 I'm just wondering what is the current conclusion 15:09:09 I think Shaohe wants more details than in the spec 15:09:24 okey then :) 15:09:29 I should add more, but I also don't want the spec to become longwinded :) 15:09:36 Suggestions appreciated! 15:09:38 Sundar: can support some case that nova/placement can not support well? 15:10:30 The Device as a Service can be supported well with placement. Accelerated Function as a Service needs Cyborg input 15:10:55 That is because he flow from PTG discloses only device types as resources in placement, but not functions 15:10:59 *the 15:11:31 Correlating functions to device types can be done in Cyborg, if we exclude the multi-function accelerator case 15:11:49 functions can be in traits 15:12:00 Sundar: from the PTG I understood that reprogramming a device is not expensive. Now I'm getting the feeling that it can be. 15:12:31 edleafe any evidence for the change of feeling ? 15:12:40 In some cases, reprogramming through Openstack is not an option. I mentioned some of those as 'preprogrammed scenarios' in the spec 15:12:43 edleafe: is not expensive, maybe we do not need weigher for it. 15:12:52 edleafe: s/is/if 15:13:24 For example, an operator may want to assure tenants that all FPGA updates are offline through an audited process 15:13:50 zhipeng: it just feels like a lot of effort is going into guaranteeing that the selected device has the desired function available. 15:14:55 edleafe: Could I ask if you have read the pre-programmed use case in the spec? 15:15:41 Sundar: no, I've been focused on Placement specs lately. DO you have a link? 15:16:19 https://review.openstack.org/#/c/554717/. I will be happy to elaborate and answer any questions 15:17:00 Sundar: thanks. I will review that later. 15:17:16 To give you a short answer, we may want the preprogrammed scenarios in some situations: For example, an operator may want to assure tenants that all FPGA updates are offline through an audited process, so that the bitstreams are trusted 15:18:00 and is that something that couldn't happen through Cyborg? 15:18:52 We can handle preprogrammed scenario in Cyborg. That means the accelerator functions are exposed, and cannot be reprogrammed during regular orchestration 15:18:58 Sundar: what do you means offline? 15:19:28 Looks Like this Chat is crashing, 15:19:53 are we still on ? 15:20:03 Vipparthy2 yes :) 15:20:06 Vipparthy2: yes. 15:20:20 @shaohe Not through regular orchestration. It could be done through other tools for host provisioning, may be ssh to host and run a tool 15:21:44 Sundar: does that means the operator should call cyborg API manually to change the FPGA info stored in cyborg DB? 15:21:51 We may not support all use cases in Rocky, but we want to ensure that there is path to get to each use case, so that operators are not forced to do a disruptive upgrade after Rocky 15:23:14 i think we could move the discussion to the spec :) 15:23:26 more comment should be reflected there :) 15:23:41 #link https://review.openstack.org/#/q/status:open+project:openstack/cyborg 15:23:45 Sundar: for every scenarios, can you give concrete provider resource class name and traits name? 15:24:07 speaking of which, currently we have three patches in the open 15:24:13 are low level operations such as fuse control and encryption covered or is that out of scope? 15:24:28 shaohe_feng could you help on jinghan's devstack patch ? 15:24:39 #link https://review.openstack.org/551875 15:25:07 zhipeng: yes. 15:25:33 gr8t thx :) 15:25:49 queens is still broken, so plz help fix the devstack issue 15:26:01 OK 15:26:03 Is there currently a workaround to run on devstack? 15:26:13 @NoMikeR: I think you are asking me whether FPGA fuse control etc. is covered in the spec? No, they are not. They are vendor-specific. The spec focuses on Cyborg/Nova scheduling. Other specs are expected for other aspects. 15:26:16 (queens) 15:26:41 kosamara I think the fix on pbr version might be one, but I got other error reports 15:26:46 thanks 15:26:59 so i guess the safest is waiting for the fix patch to land 15:27:40 I think my error message was related to versions, so probably it's that. OK, I can also experiment locally with that then. 15:27:51 kosamara thx :) 15:28:14 okey that's for what I have today 15:28:21 any other topics ? 15:28:46 Is there a bug opened for the devstack issue? 15:29:11 dolpher i'm not sure 15:29:13 Not for pbr as far as I can see 15:30:21 we need one, otherwise it's easy to get lost :) 15:31:11 +1 15:31:12 okey good suggestion :) 15:31:40 if there is no other topics, let's close the meeting today 15:31:52 thanks for everyone's participation :) 15:31:57 #endmeeting