Wednesday, 2018-03-28

*** openstack has joined #openstack-cyborg00:14
*** ChanServ sets mode: +o openstack00:14
*** Sundar has joined #openstack-cyborg01:42
*** Sundar has quit IRC01:42
zhulihi sundar06:42
kosamara1Hi! Today's meeting is irc-only?07:02
zhipengh[m]Or do you have something better explained in video conf ?07:05
kosamara1No, just making sure.07:06
*** zhipeng has joined #openstack-cyborg07:06
kosamara1And there's a previous meeting at 11:00 UTC at #openstack-meetings?07:06
zhipengkosamara1 will you have more detailed information shared on today's meeting ?07:06
zhipengah that is the scientific wg meeting07:06
zhipengi will do a intro for cyborg07:07
kosamara1No, I don't have anything new. You mentioned you would talk with the nvidia team, how did that go?07:07
zhipengi sent the email but haven't got any reply yet07:09
zhipengwill ping them again today07:09
kosamara1thanks! Who exactly are they?07:10
zhipengthe lead is Rohit Taneja07:14
zhipengI communicate with him :)07:14
zhipengHe said their team is evaluating cyborg07:14
*** zhipeng has quit IRC08:16
*** zhipeng has joined #openstack-cyborg08:16
*** zhipeng has quit IRC09:22
*** Yumeng__ has joined #openstack-cyborg09:38
*** verdurin has joined #openstack-cyborg10:11
*** zhipeng has joined #openstack-cyborg10:55
*** kosamara1 has left #openstack-cyborg10:56
*** kosamara has joined #openstack-cyborg11:08
*** amima has joined #openstack-cyborg11:22
*** amima has quit IRC11:23
zhipenghi amima11:24
*** yinx has joined #openstack-cyborg11:28
*** yinx has quit IRC11:28
*** Yumeng__ has quit IRC11:48
*** zhipeng has quit IRC12:23
*** sum12 has joined #openstack-cyborg12:33
*** sum12_ has joined #openstack-cyborg12:34
*** sum12_ has quit IRC12:42
*** NokMikeR has joined #openstack-cyborg13:12
*** jianghuaw_ has joined #openstack-cyborg13:28
*** zhipeng has joined #openstack-cyborg13:29
*** zhipeng has quit IRC13:33
*** Yumeng__ has joined #openstack-cyborg13:36
*** kongwei has joined #openstack-cyborg13:43
*** circ-user-bTPti has joined #openstack-cyborg13:47
*** kongwei has quit IRC13:47
*** shaohe_feng has joined #openstack-cyborg13:56
*** Sundar has joined #openstack-cyborg13:59
shaohe_fengmorning sundar14:00
SundarHi Shaohe14:00
*** kongwei has joined #openstack-cyborg14:00
Yumeng__Hi Sundar Shaohe14:01
sum12Hello all14:01
shaohe_fengevening, Yumeng__14:01
jianghuaw_hello, everyone:-)14:02
shaohe_fengjianghuaw_: hello14:02
SundarHi Yumeng__ and all14:03
*** yinx has joined #openstack-cyborg14:03
*** circ-user-bTPti has quit IRC14:04
*** circ-user-WYZX2 has joined #openstack-cyborg14:04
*** dolpher has joined #openstack-cyborg14:04
*** zhipeng has joined #openstack-cyborg14:05
zhipeng#startmeeting openstack-cyborg14:05
openstackMeeting started Wed Mar 28 14:05:26 2018 UTC and is due to finish in 60 minutes.  The chair is zhipeng. Information about MeetBot at
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.14:05
*** openstack changes topic to " (Meeting topic: openstack-cyborg)"14:05
openstackThe meeting name has been set to 'openstack_cyborg'14:05
zhipeng#topic cyborg gpu discussion14:05
*** openstack changes topic to "cyborg gpu discussion (Meeting topic: openstack-cyborg)"14:05
zhipengtoday we invited jianghua, who is the author of the nova vGPU spec here14:06
zhipengto discuss how cyborg could better support vGPU/GPU in general14:06
shaohe_fengwelcome jianghuaw_14:06
zhipengthis is also a topic came out at the Scientific SIG meeting earlier today14:06
jianghuaw_shaohe_feng, thanks.14:07
zhipengjianghuaw_ we touched a bit on the subject in Dublin, I remember one of the tentative conclusion is that we want to see if it is possible for cyborg to handle more nuanced vGPU topology representation14:07
zhipengwhereas the implementation at the Nova side remains abstract14:07
zhipengjianghuaw_ do you still think this is the way to go ? what are your thoughts ?14:08
jianghuaw_yes, I think it's worthy to try to cover vGPU in Cyborg.14:09
jianghuaw_But tbh I don't have clear idea how Cyborg is planning to handle it.14:10
zhipengis it possible that we start with implementing the tree structured vGPU topology model in cyborg ?14:10
jianghuaw_When we implement it in Nova, we have one problem is how to expose the different vGPU types on single pGPUs.14:10
jianghuaw_ATM the workaround is to use white list to restrict each pGPU only expose on vGPU type.14:11
*** spring_forever has joined #openstack-cyborg14:12
jianghuaw_zhipeng, I think yes we can implement n-rp vGPU topology model.14:12
jianghuaw_nova has plan to move the vGPU from compute node to nestest resource provider.14:13
jianghuaw_I think for cyborg, we can directly start from n-RP.14:13
shaohe_fengwhat is n-RP14:13
zhipengokey , and that would help address the "expose the different vGPU types on single pGPUs" issue ?14:14
jianghuaw_n-RP = nested resource provider14:14
shaohe_fenggot it.14:14
SundarBTW, the problem of different vGPU types is analogous to multi-function accelerators in FPGAs, as Alex Xu pointed out. Just as you could have different counts of vGPU types (vGPU-type-A: N, vGPU-type-B: M), for a multi-function accelerator, we have to expose (region-type-A: 1, function-type-B: N).14:14
SundarThe difference of course is that, you could program a FPGA with a different bitstream which has only a single function14:15
jianghuaw_yes, but we still have not identified a proper way to handle it.14:15
jianghuaw_even with n-rp.14:15
SundarYes, not solved on FPGA side too :)14:15
NokMikeRisnt that a process on the gpu or fpga?14:15
Sundar@NokMikeR: can you elaborate?14:16
shaohe_fengjianghuaw_: a pGPU can support mult-type vGPU, but if we create one type, then the following vGPU request should be the same type?14:16
jianghuaw_shaohe_feng, yes.14:16
NokMikeRgpus have a host process list, if you map n virtual gpus to a physical gpu each one would be a process? at least according to how I understand the nvidia driver.14:17
shaohe_fengSundar:  multi-function means multi-BDF. one function means one BDF?14:18
jianghuaw_so vgpu-type-a: 2; vgpu-type-b: 4; after consuming a vGPU-type-a; the remaining inventory is: vgpu-type-a: 1; vgpu-type-b: 014:18
Sundar@shaohe: yes, we will tie each accelerator function to a separate BDF14:18
*** kongwei has quit IRC14:19
shaohe_fengjianghuaw_: these types are non-orthogonal, which means they mutual effect14:19
Sundar@jianghuaw: If the use of vgpu-type-a falls to 0, will you restore the original inventory of vgpu-type-a: 2; vgpu-type-b: 4 ?14:19
SundarI mean, would we want to restore?14:20
jianghuaw_Ideally it should but it's not supported by nova ATM.14:20
shaohe_fengyes, they should restore.14:21
jianghuaw_What's implemented in nova is to use an option to specify which vgpu is enabled.14:21
jianghuaw_so each PGPU only only expose one vGPU type.14:21
jianghuaw_s/which vgpu/which vgpu type/14:21
alex_xu_yes, there is no way to consume a resource and update an inventory at same time. If we do that, there must be race window between consume res and update inv.14:22
jianghuaw_It seems acceptable by customers.14:22
jianghuaw_alex_xu_, indeed.14:22
SundarIs there a scenario where the inventory can change from (vgpu-type-a: 2; vgpu-type-b: 4) to, say, (vgpu-type-a: 1; vgpu-type-b: 1)?14:22
alex_xu_Sundar: you can do that with inventory update api14:23
alex_xu_in the case there isn't any allocation over the 114:23
shaohe_fengSundar: placement can not support this scenario. but on an baremetal, we could support it manully.14:24
Sundar@alex_xu_: If we do that from Cyborg agent, and there happens to be a request in flight in the controller to use one of those types, we may have a race.14:24
*** Yumeng_ has joined #openstack-cyborg14:25
alex_xu_Sundar: there is 'generation' field that is used check whether the info you already has is old or not14:25
alex_xu_if the cyborg use the 'generation' field correctly, that won't be a problem14:26
*** Yumeng__ has left #openstack-cyborg14:26
shaohe_fengyes. every update need check generation to keep consistency14:27
*** kongwei has joined #openstack-cyborg14:27
*** zhipeng has quit IRC14:27
*** Yumeng__ has joined #openstack-cyborg14:27
shaohe_fengone type on a host is a simple solution, but can work.14:28
*** zhipeng has joined #openstack-cyborg14:28
SundarWe can go into the details offline. Basically, if there is a way to safely update inventory when resources are not in use, we can probably leverage a common approach for vGPU and multi-function accelerators.14:29
zhipengsounds good :)14:29
zhipengjianghuaw_ would it be possible for you to dig out the n-rp spec and submit it to cyborg ?14:29
SundarOne type on a host is probably too restrictive? I believe Nova IRC discussions indicated one type per GPU device?14:30
jianghuaw_zhipeng, I'm not sure if I have time recently. I will start a long vacation since next week.14:30
zhipengjianghuaw_ I meant just reuse the old version of the vGPU spec14:30
jianghuaw_But if anyone have interest, I'd be happy to give some help.14:31
shaohe_fengyes, one type per device.14:31
kosamaraThis, right?
jianghuaw_^ this is the nova spec for vgpu.14:32
shaohe_fengSundar: a region is one BDF,  after multi-function(for 4 functions) will become 4 BDF?14:32
jianghuaw_there will be some minor update on that. But most of the things have been covered in that spec.14:32
*** Vipparthy has joined #openstack-cyborg14:33
zhipengkosamara would you be interested in maybe take a look at or revamp the vgpu spec ?14:33
zhipengwe could hammer out a general gpu spec14:33
zhipengthat is another option14:34
jianghuaw_one question: does Cyborg have mechanisms to support different hypervisors?14:35
*** Li_Liu has joined #openstack-cyborg14:35
jianghuaw_something like virt driver in nova?14:35
zhipengshaohe_feng Sundar, I don't think so, right ?14:36
SundarNot yet but we probably should have :)14:36
jianghuaw_I think different hypervisors have different approach to expose resources e.g. vGPU.14:36
zhipengyes that is another point raised in Dublin14:37
shaohe_fengzhipeng: yes, at present, I only consider kvm. but we can support it later.14:37
jianghuaw_ok. got it.14:37
NokMikeRwhen a vGPU instance is created is a new driver associated to that by the hypervisor? do we need full display capabilities at all? hence back to my original point about physical vs virtual processes on the gpu. except I dont know how that works in practice.14:37
* alex_xu_ lurks in the background again14:37
*** kongwei has quit IRC14:39
zhipengNokMikeR i think that is the case14:39
zhipengjianghuaw_ what is the next step you would suggest ?14:40
*** Vipparthy has quit IRC14:40
zhipengokey uncomfortable silence, let's move on then, we could discuss more offline :)14:42
kosamaraSorry zhipeng, I was away. I can take a look and try to find points for discussion around it.14:42
zhipengkosamara thx !14:42
zhipengYumeng_ Yumeng__ are you around ?14:43
zhipeng#topic Clock Driver introduction14:43
*** openstack changes topic to "Clock Driver introduction (Meeting topic: openstack-cyborg)"14:43
zhipengthe floor is yours :)14:43
shaohe_fengYumeng_: you have a majia. :)14:43
Yumeng_ZTE wants to discuss a use case where we want to use cyborg to manage a nonstandard hardware device.14:44
Yumeng_The device is a high precision time synchronization card. (Let¡¯s use HPTS card for abbreviation.) It¡¯s a hardware in a separate unit from the CPU and support  multiple VMs synchronization.14:44
Yumeng_We hope Cyborg can do the followings as the management software:14:44
Yumeng_Discover & report devices: When Cyborg agent starts or does resource checking periodically, the Cyborg HPTS driver should enumerate the list of HPTS devices and report the details of all available HPTS cards on the host such as resolution, card type(Master/slave),Vendor ID, PF/VF numbers, supported protocols (PTP/NTP) etc.14:44
Yumeng_The motivation of designing this HPTS card is to address the high requirements of time synchronization in 5G Ultra Reliable & Low Latency Communication (uRLLC)  scenarios.  Furthermore, Time synchronization is essential to telecommunications network elements. The Information related to billing, maintenance, security, and authentication involved in various subnetworks requires a common standard ¡°time¡±. And the required resolution for time synchr14:44
zhipengYumeng_ many chars unrecognized14:46
NokMikeRundestandable here.14:46
SundarWould multiple VMs be connected to the same card? If so, via PCI function passthrough? What is the need for multiple HPTS cards in same host?14:46
Yumeng_yes it's possible for multiple VMs to be connected to the same card14:48
zhipengNokMikeR do you think this is a valid use case ?14:48
SundarWhat is the need for multiple HPTS cards in same host?14:49
Yumeng_one card in the same host14:49
Yumeng_for multiple VMs to sync time14:49
NokMikeRIt is but Im having a problem understanding where or why cyborg is used here, the coherency of a system would come from the host, or are we talking multiple hosts multiple HPTS cards and you need to ensure they are all in sync?14:50
Yumeng_one card for one host is enough14:50
dolpherHow many VFs does one HPTS card support?14:50
*** cuixw has joined #openstack-cyborg14:50
Sundar@NokMikeR: Cyborg handles lifecycle management of all devices: discovery, publishing in Nova placement, attaching devices to VMs, termination, etc.14:51
*** kongwei has joined #openstack-cyborg14:51
zhipengYumeng_ do you want to develop a ZTE HPTS card specific driver ? or a general HPTS driver ?14:52
Sundar@Yumeng_: You said, "all available HPTS cards on the host". But the use case is just one card per host. Got it.14:52
zhipengthat maybe could talk to different vendor HPTS cards ?14:52
Yumeng_dolpher: that's not a fixed number. it depends on the resource capability14:53
Sundar@Yumeng__: since you mentioned PF/VF, SR-IOV is implied. Are the VFs attached to VMs with PCI passthrough? (Just to be clear)14:54
Yumeng_zhipeng: for this HPTS card, it would be a specific driver. but it's possible we can develop a general one later14:54
NokMikeR4 VMs read the HPTS card on the physical hosts bus, wouldnt each vm get a slightly different time based on how they are schedueled or is the HPTS only returning a time based on the average non-instantaneous value?14:56
Yumeng_Sundar: yes, PCI passthrough could be a way14:57
jianghuaw_If it's sr-iov, I guess it should be pci passthrough.14:58
jianghuaw_Yumeng_, do you have some link describing this HPTS?14:59
Yumeng_NokMikeRemm.. Good question. tbh, the specific schema are still under discussion.:15:01
zhipengokey I think it would be better for yumeng to send an email to the mailing list for a detailed description of the use case15:02
zhipengwith all the necessary information15:02
Yumeng_jianghuaw_: I would give out one link later.15:02
NokMikeRId be more concerned if the underlying virtualisation can even support the resolution the e.g GPSDO can offer. But its a very good use case and definately needs supporting.15:02
jianghuaw_Yumeng_, awesome. thanks.15:03
zhipengmove to ml discussion, agree ?15:03
Yumeng_yep pls go ahead15:04
zhipeng#action Yumeng_ to provide detailed description on mailing list15:04
zhipeng#topic current development discussion15:04
*** openstack changes topic to "current development discussion (Meeting topic: openstack-cyborg)"15:04
zhipengshaohe_feng has helped put together an etherpad for scenario description when a user request fpga resources15:05
SundarI responded there. There is also a description of use cases in the Cyborg/Nova scheduling spec15:07
zhipengshaohe_feng could you help explain a little bit ?15:07
zhipengthx Sundar15:07
zhipengI'm just wondering what is the current conclusion15:08
SundarI think Shaohe wants more details than in the spec15:09
zhipengokey then :)15:09
SundarI should add more, but I also don't want the spec to become longwinded :)15:09
SundarSuggestions appreciated!15:09
shaohe_fengSundar: can support some case that nova/placement can not support well?15:09
*** cuixw has left #openstack-cyborg15:10
SundarThe Device as a Service can be supported well with placement. Accelerated Function as a Service needs Cyborg input15:10
SundarThat is because he flow from PTG discloses only device types as resources in placement, but not functions15:10
*** kongwei has quit IRC15:11
SundarCorrelating functions to device types can be done in Cyborg, if we exclude the multi-function accelerator case15:11
shaohe_fengfunctions can be in traits15:11
edleafeSundar: from the PTG I understood that reprogramming a device is not expensive. Now I'm getting the feeling that it can be.15:12
zhipengedleafe any evidence for the change of feeling ?15:12
SundarIn some cases, reprogramming through Openstack is not an option. I mentioned some of those as 'preprogrammed scenarios' in the spec15:12
shaohe_fengedleafe: is not expensive, maybe we do not need weigher for it.15:12
shaohe_fengedleafe: s/is/if15:12
SundarFor example, an operator may want to assure tenants that all FPGA updates are offline through an audited process15:13
edleafezhipeng: it just feels like a lot of effort is going into guaranteeing that the selected device has the desired function available.15:13
*** spring_forever has quit IRC15:14
Sundaredleafe: Could I ask if you have read the pre-programmed use case in the spec?15:14
edleafeSundar: no, I've been focused on Placement specs lately. DO you have a link?15:15
*** yinx has quit IRC15:15
Sundar I will be happy to elaborate and answer any questions15:16
edleafeSundar: thanks. I will review that later.15:17
SundarTo give you a short answer, we may want the preprogrammed scenarios in some situations: For example, an operator may want to assure tenants that all FPGA updates are offline through an audited process, so that the bitstreams are trusted15:17
edleafeand is that something that couldn't happen through Cyborg?15:18
SundarWe can handle preprogrammed scenario in Cyborg. That means the accelerator functions are exposed, and cannot be reprogrammed during regular orchestration15:18
*** Vipparthy2 has joined #openstack-cyborg15:18
shaohe_fengSundar: what do you means offline?15:18
Vipparthy2Looks Like this Chat is crashing,15:19
Vipparthy2are we still on ?15:19
zhipengVipparthy2 yes :)15:20
shaohe_fengVipparthy2: yes.15:20
Sundar@shaohe Not through regular orchestration. It could be done through other tools for host provisioning, may be ssh to host and run a tool15:20
shaohe_fengSundar: does that means the operator should call cyborg API manually to change the FPGA info stored in cyborg DB?15:21
SundarWe may not support all use cases in Rocky, but we want to ensure that there is path to get to each use case, so that operators are not forced to do a disruptive upgrade after Rocky15:21
zhipengi think we could move the discussion to the spec :)15:23
zhipengmore comment should be reflected there :)15:23
shaohe_fengSundar: for every scenarios, can you give concrete provider resource class name and traits name?15:23
zhipengspeaking of which, currently we have three patches in the open15:24
NokMikeRare low level operations such as fuse control and encryption covered or is that out of scope?15:24
zhipengshaohe_feng could you help on jinghan's devstack patch ?15:24
shaohe_fengzhipeng: yes.15:25
zhipenggr8t thx :)15:25
zhipengqueens is still broken, so plz help fix the devstack issue15:25
kosamaraIs there currently a workaround to run on devstack?15:26
Sundar@NoMikeR: I think you are asking me whether FPGA fuse control etc. is covered in the spec? No, they are not. They are vendor-specific. The spec focuses on Cyborg/Nova scheduling. Other specs are expected for other aspects.15:26
zhipengkosamara I think the fix on pbr version might be one, but I got other error reports15:26
zhipengso i guess the safest is waiting for the fix patch to land15:26
kosamaraI think my error message was related to versions, so probably it's that. OK, I can also experiment locally with that then.15:27
zhipengkosamara thx :)15:27
zhipengokey that's for what I have today15:28
zhipengany other topics ?15:28
dolpherIs there a bug opened for the devstack issue?15:28
zhipengdolpher i'm not sure15:29
kosamaraNot for pbr as far as I can see15:29
dolpherwe need one, otherwise it's easy to get lost :)15:30
zhipengokey good suggestion :)15:31
zhipengif there is no other topics, let's close the meeting today15:31
zhipengthanks for everyone's participation :)15:31
*** openstack changes topic to "OpenStack Cyborg Project Discussion"15:31
openstackMeeting ended Wed Mar 28 15:31:57 2018 UTC.  Information about MeetBot at . (v 0.1.4)15:31
openstackMinutes (text):
*** dolpher has quit IRC15:32
*** zhipeng has quit IRC15:32
*** circ-user-WYZX2 has quit IRC15:33
*** NokMikeR has quit IRC15:33
*** Vipparthy2 has quit IRC15:42
*** Yumeng_ has quit IRC15:55
*** Yumeng__ has quit IRC16:35
*** Sundar has quit IRC17:16
-openstackstatus- NOTICE: the zuul web dashboard will experience a short downtime as we roll out some changes - no job execution should be affected21:51
-openstackstatus- NOTICE: Zuul has been restarted to update to the latest code; existing changes have been re-enqueued, you may need to recheck changes uploaded in the past 10 minutes23:57

Generated by 2.15.3 by Marius Gedminas - find it at!