Thursday, 2020-02-27

*** brinzhang has joined #openstack-cyborg00:03
*** brinzhang_ has joined #openstack-cyborg00:33
*** brinzhang has quit IRC00:36
*** brinzhang has joined #openstack-cyborg00:44
*** brinzhang_ has quit IRC00:47
*** brinzhang_ has joined #openstack-cyborg01:54
*** brinzhang has quit IRC01:57
*** brinzhang has joined #openstack-cyborg01:59
*** brinzhang has quit IRC02:01
*** brinzhang has joined #openstack-cyborg02:02
*** brinzhang_ has quit IRC02:02
*** brinzhang_ has joined #openstack-cyborg02:16
*** brinzhang has quit IRC02:19
*** chenke has joined #openstack-cyborg02:45
*** Sundar has joined #openstack-cyborg02:49
*** brinzhang has joined #openstack-cyborg02:54
*** s_shogo has joined #openstack-cyborg02:56
*** brinzhang_ has quit IRC02:57
*** Yumeng has joined #openstack-cyborg02:58
SundarHello all03:00
*** xinranwang has joined #openstack-cyborg03:00
Sundar#startmeeting openstack-cyborg03:00
chenkeHello sudnar.03:00
openstackMeeting started Thu Feb 27 03:00:17 2020 UTC and is due to finish in 60 minutes.  The chair is Sundar. Information about MeetBot at
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.03:00
*** openstack changes topic to " (Meeting topic: openstack-cyborg)"03:00
openstackThe meeting name has been set to 'openstack_cyborg'03:00
chenke#info chenke03:00
Sundar#info Sundar03:00
Yumeng#info Yumeng03:00
brinzhang#info brinzhang03:00
xinranwang#info xinranwang03:00
*** shaohe_feng has joined #openstack-cyborg03:00
SundarDo we have any major topics for today? I'd like to provide a status update on Nova integ after other discussion.03:01
xinranwangI have one topic about microversion03:02
SundarOk, xinranwang. Anything else to discuss, before we get int that?03:02
xinranwangNow I implemented a decorator api_version to check the microversion.03:02
YumengI only have gpu_driver_improve patches need your reivew: nothing else.03:03
xinranwangplease see
brinzhangI want to talk about the functional tests, I found there are so many work should to do, anyone have some sugestion? or any simple project we can reference?03:04
Sundarxinranwang: do you have a question or something to discuss about that patch?03:05
xinranwang Now I implemented a decorator api_version to check the microversion.03:06
brinzhangSundar, xinranwang, I think mainly is
xinranwangDo you think we should have a schema check in this patch, IMO,  I think schema check is based on microversion support, we can do it later.03:08
xinranwangthanks brinzhang  to paste the link03:08
shaohe_fengit can be another patch.03:09
Sundar"This decorator MUST appear first (the outermost03:09
shaohe_fengschema check  should not only for microversion03:10
shaohe_fengit should for all APIs.03:10
brinzhangI don't think we should rewrite the same API interface every time while we need to add a new microversion, which will cause a lot of code redundancy.03:11
xinranwangbrinzhang:  Yes, I understand your concern. For now, we have only one microversion, (the v2.1 is my PoC code, will be merge to v2.0), so there is only one API function.03:14
brinzhangxinranwang: Yeah, understand03:14
s_shogo#info s_shogo03:15
xinranwangWhat I suggest to do, is to support the microverison firstly, and them we can add schema check in another patch.03:15
shaohe_fengI'd like to talk one things, one patch for one issues.  except the nits fix.03:17
shaohe_feng^ this is a example03:18
Li_Liu#info Li_Liu03:18
*** brinzhang_ has joined #openstack-cyborg03:18
Yumengagree with xinranwang. we can have a bp of API shema check later or in next release.03:19
shaohe_fengwe have already know there are some issues in devstack config03:19
*** brinzhang_ has quit IRC03:19
*** brinzhang_ has joined #openstack-cyborg03:20
shaohe_fengabout half years ago.03:20
Sundarxinranwang: I think we all agree with Brin's concern, and that we all agree there should be a schema check. You just want it in another patch, right?03:20
shaohe_fengbut why we can not continue on it?03:20
shaohe_fengwe always want to one patch to fix many issues.03:21
shaohe_fengfix one by one03:21
shaohe_fengthis will be a fast way.03:21
shaohe_fengand you can see,
*** brinzhang_ has quit IRC03:21
xinranwangSundar:  yes, exactly. As Yumeng  said, we should have another bp and patch to implement it.03:21
*** brinzhang has quit IRC03:22
*** brinzhang has joined #openstack-cyborg03:22
shaohe_fengIt is also devstack fix. we have config it right before.03:22
SundarOk, I agree, xinranwang, Yumeng03:22
shaohe_fengMix many issues together.03:22
shaohe_fengreally a bad idea.03:22
xinranwangIt seems brinzhang was dropped off, I will talk to him later. Thanks guys.03:23
Sundars_shogo: No sure what you are trying to say. The link you gave is for an old patch set. Are you saying we should enforce one patch for one issue?03:23
*** brinzhang_ has joined #openstack-cyborg03:24
s_shogoSundar: IMHO, the mention intended to shaohe_feng ?03:24
shaohe_fengyes, one by one.03:25
SundarYes, sorry03:25
*** brinzhang has quit IRC03:25
*** brinzhang_ has quit IRC03:26
shaohe_fengmicro step and may be more fast.03:26
*** brinzhang has joined #openstack-cyborg03:26
Sundarshaohe_feng: Just change the title for your patch, and maybe make 'multinode' as the topic. We could have a patch series for multinode, including Sean's patches03:26
SundarYou already have the right topic03:27
shaohe_fengmircroversion can also be fix one issues.03:27
shaohe_fengand  more patch fix other issues.03:27
shaohe_fengThis is a right way.03:27
shaohe_fengor it will be the same as we fix "multinode" issues in devstack. last half years, and no progress.03:28
*** brinzhang_ has joined #openstack-cyborg03:30
*** brinzhang_ has quit IRC03:31
*** brinzhang_ has joined #openstack-cyborg03:32
brinzhang_xinranwang, Sundar, Agree the microversion schema do in next release, and it better done in this release, later we(xinranwang) can talk how to do it easy.03:34
chenkeAgree, a patch does not need to be modified too much. we can make a series of patches  to speed up review.03:34
*** brinzhang has quit IRC03:34
Sundarchenke, shaoe_feng: Sure, organize in a series of patches. A reviewer may want to see a placeholder patch for the next step before agreeing to previous step?03:35
Sundarshaohe_feng: ^03:35
chenkeYe. Agree03:35
chenkereviewer's suggestions, we should carefully consider and give a reasonable response.03:36
SundarI think we are all in the same page. shaohe_feng may feel that his patch has not merged in a long time. Let's review it quickly and help it get merged.03:37
SundarOk, anything else?03:37
shaohe_fengAdd NOTE in patch or summarize the tasks lists in etherpad to follow, if we want to address more issues.03:38
SundarYumeng: you seem to be adding mdev support to Cyborg. That is interesting03:39
SundarTht will require changes to Nova patches too03:39
SundarI am only allowing 'PCI' in the current patches.03:40
Yumengyes, I found that controlpatch_id_type of nvidia GPU should be "MDEV"03:40
YumengSundar: I was thinking just now: attach_handle type should also be "MDEV". do you agree?03:41
shaohe_fengnow the intel and NVIDIA vGPU are MDEV devices.03:41
SundarYumeng: What do you mean that you found that? If you are doing PCI passthrough of the physical function, it would be 'PCI'. The mdeiated device is a different way of attaching a device, different from PCI passthrough.03:41
shaohe_fengOnly AMD's are SRIOV.03:41
Sundarshaohe_feng and all: We had already discussed that Cyborg intends to support only physical PF passthrough for GPUs for now.03:42
YumengThanks shaohe_feng for pointing this.03:42
shaohe_fengIMHO, MDEV can exist with nova's together for a long time. like PCI devices03:42
SundarWhat prevents a Nvidia GPU's PF from being passed through to a VM, instead of mdev?03:42
shaohe_fengwe should not prevents it.03:43
SundarYumeng: Re. attach_handle type should also be "MDEV"  -- yes, in fact control path ID cannot be 'mdev', only the attach handle can be. Because mdev refers to how the device is attached to a VM.03:46
SundarThe cpid is PCI even for a device that supports mdev.03:46
SundarBecause that is the management interface, which needs to be a PCI interface03:46
YumengSundar: ops. emmmm. I was testing vGPU, and found that controlpatch_id_type-- "MDEV"  cannot be reported to DB Shema.03:46
Yumengso I was thinking that was a bug of DB.03:47
SundarYumeng and all: do we intend to support vGPUs? In previous discussions, we said that Cyborg is for offload i.e. GPGPU type use cases, not media or graphics. So, the most comon usage is to ssign an entire GPU (may be multiple GPUs) to the same VM.03:48
Yumengseems that was an intend, right?03:48
brinzhang_Sundar, I would like we Cyborgcan support vGPU, it's an intend03:50
shaohe_fengBut I know some Public cloud has support vGPU03:50
shaohe_fengwhy not we support it?03:51
Sundarbrinzhang_: Ho do we distinguish Cyborg from Nova, if both can do all use cases for GPUs?03:51
SundarAnyway, I am fine with vGPU support if you all want it.03:52
shaohe_fengand the first propose for smart-nic generic solution in kernel by Redhat also make the device under mdev bus.03:53
brinzhang_Sundar, I think Cyborg can provide the vGPU architecture to Nova, so Nova can choose the vGPU to use03:53
chenkeI think supporting vgpu is not a bad thing for cyborg. So, I agree.03:55
Sundarbrinzhang_: I don't know what that means. Nova already has vGPU support without Cyborg. Cyborg can report VFs as attach handles (with SR-IOV) or mdevs as attach handles (with Nvidia GPUs). Both would work AFAICS. It is just that Nova and CYborg overlap in functionality.03:55
SundarAs i said, I am fine with vGPU support in Cyborg. However, I'd rather close the current Nova patch series as is. I have added only 'PCI" as supported type in
SundarOne of you could add 'mdev' to that.03:56
shaohe_fengWe know k8s,  CNI for network, CSI for storage, DPI for device/accelerators, let the right components do the right functions03:57
shaohe_fengIt said they want to add mdev in cyborg03:57
*** brinzhang has joined #openstack-cyborg03:57
shaohe_fengbut cyborg's progress is slow, so that they can wait it and add it to nova.03:58
YumengSundar:  I think the good side is for the Operators(users). They can manage FPGA and vGPU using just one component cyborg. configuring at cyborg.conf instead of vGPU at nova.conf and FPGA at cyborg.conf.03:59
chenkeI think the current nova's patch is high priority.03:59
*** brinzhang has quit IRC03:59
*** brinzhang has joined #openstack-cyborg03:59
SundarYumeng and all: Sure. No objections to vGPUs or mediated devices in Cyborg. But it is going to take some effort probably to get that change into Nova04:00
shaohe_fengYes, the cyborg goal is similar to DPI in k8s, manage different accelerators.04:00
*** brinzhang_ has quit IRC04:00
shaohe_fengNova change for cyborg mdev can later.  Let's focus on PCI04:01
SundarYumeng: My main comment would be to move 'mdev' type to attach handles from CPIDs. Also, it needs some testing. Do you plan to support 3rd party CI for GPUs?04:01
xinranwangCan vGPU be attached like normal PCI device passthrough, if so, I think there is no conflict.04:01
Yumengxinranwang: it cannot.04:02
Yumengxinranwang: nova already support this. we can reuse those code.04:03
xinranwangSo VGPU has a different way to be attach, comparing with GPU PF?04:03
Yumengyes, different way.04:04
shaohe_fengbut cyborg can support it. The cloud provider can change it in nova by themselves downstream, if they really want to use cyborg.04:04
shaohe_fengyes, MDEV is uuid in mdev bus.04:04
shaohe_fengPCI is bdf in PCI bus.04:04
Yumengyes, exactly. MDEV is uuid in mdev bus.04:04
shaohe_fenglet  cyborg support mdev firstly.04:05
shaohe_fengleave the change in nova to cloud provider.04:05
shaohe_fengthey have ability to change it.04:05
shaohe_fengat present.04:06
shaohe_fengthey will contribute to nova, if they think it is necessary04:07
SundarYumeng: do you plan to support 3rd party CI for GPUs?04:08
Yumengok. thanks for the suggestion. shaohe_feng.  I think we can talk later about the vGPU.  Sundar: pls go back to nova-interaction topic.04:08
YumengSundar: not yet for now.04:08
SundarNova interaction: currently held up on the requirement to support rescheduling. I have a single-node devstack setup, so cannot test rescheduling.04:09
SundarThis does not mean multi-node for CYborg -- the functional tests that I wrote mock the Cyborg APIs, so only need it for the Nova side04:10
SundarIf anybody has any ideas to handle this, please LMK04:11
SundarThat's all I have. Anything else to discuss?04:13
Yumengok. not from me.04:14
brinzhangThe functional tests by me, I think I should do some search, then to talk, others none of me.04:15
Sundarbrinzhang: yes, we need functional tests. You have already started that. What do we need to discuss?04:16
brinzhangSundar, I think there are need so many work I should consider, wait for enough, I will re take it in ircmeeting04:17
brinzhangSome cleanup patch, can you +A?04:18
shaohe_fengno from me.  If you need some help, please ping me on Skype.04:18
*** shaohe_feng has quit IRC04:18
Sundarbrinzhang: Got it. Sure04:19
SundarGood. Thanks a lot, everybody. Have a good day!04:19
brinzhangThese are not necessary in cyborg or cyborg-specs project, so I want to remove it and make they are clear04:19
*** openstack changes topic to "Pending patches (Meeting topic: openstack-cyborg)"04:19
openstackMeeting ended Thu Feb 27 04:19:34 2020 UTC.  Information about MeetBot at . (v 0.1.4)04:19
openstackMinutes (text):
s_shogoOnly repot: The openstacksdk patch was merged! Thank you,all.
Sundars_shogo: Congrats! That's good news. It was all your hard work.04:21
openstackgerritBrin Zhang proposed openstack/cyborg-specs master: Switch to sphinx-build and cleanup
*** brinzhang_ has joined #openstack-cyborg04:22
brinzhang_Sundar: done, thanks04:23
*** brinzhang_ has quit IRC04:23
*** brinzhang_ has joined #openstack-cyborg04:24
*** s_shogo has quit IRC04:25
*** brinzhang_ has quit IRC04:26
*** brinzhang has quit IRC04:26
*** brinzhang has joined #openstack-cyborg04:27
*** brinzhang has quit IRC04:27
openstackgerritBrin Zhang proposed openstack/cyborg master: Delete sandbox directory
*** Sundar has quit IRC04:50
*** links has joined #openstack-cyborg05:53
openstackgerritMerged openstack/cyborg-specs master: Switch to sphinx-build and cleanup
*** Yumeng has quit IRC06:17
openstackgerritBrin Zhang proposed openstack/cyborg-specs master: Remove releasenotes directory
openstackgerritMerged openstack/cyborg-specs master: Remove releasenotes directory
*** xinranwang has quit IRC07:19
*** brinzhang has joined #openstack-cyborg08:00
*** brinzhang has quit IRC08:02
*** brinzhang has joined #openstack-cyborg08:02
openstackgerritMerged openstack/cyborg master: Delete sandbox directory
*** brinzhang_ has joined #openstack-cyborg09:25
*** brinzhang_ has quit IRC09:27
*** brinzhang_ has joined #openstack-cyborg09:27
*** brinzhang has quit IRC09:28
*** chenke has quit IRC10:53
*** brinzhang has joined #openstack-cyborg11:10
*** brinzhang_ has quit IRC11:14
*** brinzhang_ has joined #openstack-cyborg11:59
*** brinzhang_ has quit IRC12:01
*** brinzhang_ has joined #openstack-cyborg12:01
*** brinzhang has quit IRC12:02
*** brinzhang_ has quit IRC12:03
*** brinzhang_ has joined #openstack-cyborg12:03
*** brinzhang_ has quit IRC12:05
*** brinzhang_ has joined #openstack-cyborg12:05
*** brinzhang_ has quit IRC12:06
*** brinzhang_ has joined #openstack-cyborg12:07
*** brinzhang_ has quit IRC12:59
*** brinzhang_ has joined #openstack-cyborg12:59
*** brinzhang_ has quit IRC13:00
*** brinzhang_ has joined #openstack-cyborg13:01
*** brinzhang_ has quit IRC13:02
*** brinzhang_ has joined #openstack-cyborg13:03
*** brinzhang has joined #openstack-cyborg13:53
*** brinzhang has quit IRC13:54
*** brinzhang has joined #openstack-cyborg13:54
*** brinzhang has quit IRC13:56
*** brinzhang has joined #openstack-cyborg13:57
*** brinzhang has quit IRC13:58
*** brinzhang has joined #openstack-cyborg13:59
*** brinzhang has quit IRC14:00
openstackgerritDan Smith proposed openstack/cyborg master: Always stop cyborg-api process uwsgi container
*** links has quit IRC16:14
*** igordc has joined #openstack-cyborg17:39
*** igordc has quit IRC18:01
*** igordc has joined #openstack-cyborg18:04
-openstackstatus- NOTICE: Memory pressure on is causing connection timeouts resulting in POST_FAILURE and RETRY_LIMIT results for some jobs since around 06:00 UTC today; we will be restarting the scheduler shortly to relieve the problem, and will follow up with another notice once running changes are reenqueued.19:10
-openstackstatus- NOTICE: The scheduler for has been restarted; any changes which were in queues at the time of the restart have been reenqueued automatically, but any changes whose jobs failed with a RETRY_LIMIT, POST_FAILURE or NODE_FAILURE build result in the past 14 hours should be manually rechecked for fresh results19:44

Generated by 2.15.3 by Marius Gedminas - find it at!