14:00:42 #startmeeting openstack-cyborg 14:00:44 Meeting started Wed Jun 6 14:00:42 2018 UTC and is due to finish in 60 minutes. The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:45 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:47 The meeting name has been set to 'openstack_cyborg' 14:00:55 #topic Roll Call 14:00:59 #info Howard 14:03:11 #info Sundar 14:05:54 #info sum12 14:08:23 #info Mike 14:11:31 okey let's start 14:11:36 we have only one topic today :) 14:11:45 #topic rocky spec finalization 14:12:59 let's start from quota spec 14:13:07 which xinran__ has been working on 14:13:07 For https://review.openstack.org/#/c/554717/, we probably need at least one +1 from Nova 14:13:40 Sundar got it :) 14:13:43 we get to that 14:13:53 #info xinran__ 14:14:07 Sure, zhipeng. NP 14:14:22 #link https://review.openstack.org/560285 14:14:55 Hi for quota spec, do you think implement this on api layer? 14:15:06 For quotas, we have had a discussion with Nova folks: http://lists.openstack.org/pipermail/openstack-dev/2018-May/130563.html 14:15:22 It doesn;t look like there is any foolproof way to enforce quotas today 14:15:35 So, you guys cna make a call 14:15:41 yes this is what I understand as well 14:15:47 I mean when there is a new api request, we should check/update quota 14:16:43 It also depend on how cyborg interact with nova. 14:17:10 xinran__ what is your current proposal ? 14:17:13 If nova call agent directly, we should change it 14:17:24 On api layer 14:17:37 i'm inclined to that option 14:17:53 i don't see a good scenario for nova to call cyborg-agent directly 14:18:23 Yes 14:18:39 why nova need to call agent? 14:19:07 the current nova/cyborg interaction is calling agent I am not sure about this 14:19:25 What do you guys think? 14:19:31 no atm we have conductor interact with placement 14:19:33 that is all 14:19:44 there will be/should be the api layer interaction 14:19:48 I mean the current spec :) 14:19:49 but we are not there yet 14:19:58 oh, if nova call cyborg-agent, there is a problem. 14:20:22 I would suggest to have it on the api layer 14:20:28 Xinran, can you clarify which spec says Nova compute does or should call into Cyborg agent? 14:21:00 Let me find it 14:21:26 okey for the spec itself in the current shape 14:21:30 everyone happy with it ? 14:22:53 zhipeng, the present scheme is not ideal, but I have no objections, since I know it is a priority for others. :) 14:23:08 We can improve upon it over time 14:23:17 Sundar agree :) 14:23:22 IMO, cyborg can do lazy quota. 14:23:32 so then let's mark the spec ok to go 14:23:41 Agreed 14:23:42 https://review.openstack.org/#/c/566798/5/doc/specs/rocky/compute-node.rst 14:23:51 #action quota spec https://review.openstack.org/#/c/560285/ ready 14:24:22 What to you mean lazy quota shaohe_feng 14:25:04 which means, even scheduler pass, cyborg will still refuse accelerator quest for quota failed. 14:25:06 Xinran, compute-node (os-acc) spec does not say that Nova compute will call directly into Cyborg agent. It says that Nova compute should call into os-acc, which should call into Cyborg agent. 14:25:37 os-acc can call cyborg agent? 14:26:13 guys this is another problem 14:26:18 let's move on :) 14:26:25 we have limited time for a lot of specs 14:26:36 Does that means nova will call cyborg agent without api? 14:26:55 moving on 14:27:10 Li Liu's two patches on metadata and programming 14:27:18 xinran__, lazy quota, just performance issue. 14:27:38 shaohe_feng: ok got it 14:27:51 #link https://review.openstack.org/558265 14:28:03 any further comment on the metadata spec ? 14:28:24 Sundar: the “directly” I mean is not pass api layer 14:28:46 I thought Li Liu added a function name. Looking for it 14:29:05 also Melissa 14:29:21 @Guest24200 14:29:31 has Xilinx team also went over the spec ? 14:32:21 Sundar has Li Liu addressed your comment ? 14:32:28 what bootstraps the driver instance in the first place? e.g if a driver is requested that may require an additional tool to already be present like a driver deamon from the fpga vendor to be already in place. 14:32:42 During last spec day, Li Liu and I agreed to add a function name to the bitstream spec as an optional property. #link http://eavesdrop.openstack.org/irclogs/%23openstack-cyborg/%23openstack-cyborg.2018-05-09.log.html#t2018-05-09T19:00:22 14:33:37 This is also documented in the scheldung spec #link https://review.openstack.org/#/c/554717/ 14:33:43 *scheduling 14:34:06 Can we get that addressed? 14:34:26 sure 14:34:53 #action metadata spec https://review.openstack.org/558265 to add a function name 14:35:16 other than that there should be no problems right ? 14:36:20 Zhipeng, yes. Thanks 14:36:36 sounds great :) 14:37:06 #link https://review.openstack.org/#/c/559395/ 14:37:10 programming 14:37:36 i think other than Zuul, we should be happy about this now 14:37:40 :P 14:38:33 I don't see any issue with the basics. We don't have an end-to-end flow that uses this. I guess this is meant to be a standalone API? 14:39:43 yep 14:40:32 OK. We can tweak this later as needed, when we need to define end-to-end flows. With that understanding, we can approve it as it stands. Sounds good? 14:41:06 shaohe_feng and others ? 14:41:21 at least from the review I see Li Liu had addressed all the comments 14:42:12 zhipeng, OK, it looks good for me 14:43:33 cool 14:43:46 #action programming spec https://review.openstack.org/#/c/559395/ ready to go 14:44:40 okey let's go to Sundar's four spec 14:44:53 Sundar which one do you think is ready to go ? 14:45:03 All of them :) 14:45:17 haha great 14:45:20 I will still request Nova folks to do a +1 on scheduling spec. 14:45:37 yes that one has gone through a lot 14:45:48 For os-acc also, we should probably get Nova ok, right? 14:46:42 os-acc will call cyborg-agent? 14:46:55 and nova will call os-acc? 14:46:55 https://review.openstack.org/#/c/566798/ 14:46:59 this one right ? 14:47:23 zhipeng, yes 14:47:25 shaohe_feng I think nova-compute calls os-acc to do the attach/detach 14:47:58 but I think the original goal of os-acc is to serve as a library 14:48:06 shaohe: Yes, as zhipeng says. 14:48:31 Sundar does nova-compute also calls os-brick or os-vif ? 14:48:32 zhipeng, yes. It is still a library that Cyborg provides, like os-vif for Neutron 14:49:20 zhipeng, yes, nova compute calls into os-vif -- plug(), unplug() API 14:49:25 I'll try to get a link 14:49:37 zhipeng, so that means nova will call agent by attach/detach directly? 14:50:05 is there no race for cyborg API and os-acc? 14:50:29 i think for the nova scenario, which means accelerator attachment for the VM 14:50:52 the attach should be issued by nova-compute 14:50:58 however for the baremetal usecase 14:51:07 it should be gone through cyborg-api 14:51:08 Sundar, os-vif plug API call nuetron agent? 14:51:50 i think we should target os-acc for VM usecase for Rocky 14:52:12 which means os-acc alls the cyborg-agent directly to call upon the driver 14:52:26 That is my understanding. We can double check. But, please note that the spec doesn't make implementation commitments yet. We can implement it the same way as os-vif 14:52:28 (and the driver invoke the bus protocols) 14:52:54 zhipeng: Agreed :) 14:52:56 edleafe, os-vif can call neutron agent directly by RPC by-pass API? 14:53:00 shaohe: I think you may be referring to scenarios like the one where Cyborg API is called to program a region, which is already in use? 14:53:42 i think the attach/detach does not make assumption on the operation 14:53:47 Hi, I'm confused. Could anyone explain. When nova-compute call cyborg by os-acc, It will call by http(API) or just by rpc(message queen)? 14:53:50 no matter a region needs to be programmed or not 14:54:02 if attach, it presumes the device is ready 14:54:33 Or both of them? 14:54:35 wangzhh the current thinking is via rpc to call cyborg-agent in order to get to the driver 14:54:47 no api involved in this scenario 14:55:21 OK. Thx. 14:55:29 Yes, agreed with zhipeng 14:55:48 Here are the interfaces exposed by os-vif: https://github.com/openstack/os-vif/blob/master/os_vif/__init__.py 14:56:04 it means that this is a VM related operation and we assume the Nova got the necessary previllage 14:56:04 I will try to locate how nova compute calls it 14:56:30 we could focus on the details later, whether the lib itself could be called 14:56:45 or something implement the lib interfaces should be called 14:57:35 okey so everyone good on the os-acc spec ? 14:57:35 Nova compute calls the initialize API of os-vif directly here: https://github.com/openstack/nova/blob/master/nova/cmd/compute.py#L49 14:59:05 what does os_vif.initialize do? 14:59:27 will it set up RPC client? 14:59:39 It sets up os_vif for further calls, like plug and unplug to attach/detach network ports to instances 15:00:06 we want to know the plug detail 15:00:17 how does it call neutron. 15:00:37 by API or RPC? 15:00:51 shaohe: It seems to a direct call AFAICS> Here's the implementation: https://github.com/openstack/os-vif/blob/master/os_vif/__init__.py#L24 15:01:51 shaohe: Could I ask whether it is important to settle this now before approving the spec? 15:02:18 we could discuss this more in detail for implementation 15:02:27 The spec focuses on 2 things: behavior of accelerators during start/top/etc. and the os-acc interfaces 15:02:48 Sundar, it should block the spec. 15:02:55 The implementation is up to us. There will be more comments on the code patch that implements this :) 15:03:10 for if it can agent, we should be careful. 15:04:11 s/for if it can agent/for if it call agent 15:04:13 Sundar regarding nrp 15:04:21 We should be careful in the implementation. But the spec does not say RPC, API or whatever 15:04:26 I think from the maillinglist discussion 15:04:31 we should still go for it 15:04:38 per your sched spec 15:05:02 Shaohe, if we don;t close on os-acc spec now, what is your proposed plan for Rocky? 15:06:04 Sundar shaohe_feng let's moving on from the os-acc specifics 15:06:19 OK 15:06:20 actually let me put down a comment 15:06:51 #action os-acc spec https://review.openstack.org/566798 after Li Liu remove -1 is ready to go 15:07:06 so Li Liu still has to check :) 15:07:12 Thanks, zhipeng :) 15:07:21 now back on the sched spec 15:07:31 nrp 15:07:38 I think we should still go for it 15:07:39 zhipeng, re. nRP, it may still take more time 15:07:50 We may not deliver anything if we keep waiting for it 15:08:13 Can we start by applying the traots on compute node RP and moving later when ever nRP is ready? 15:08:18 *traits 15:08:23 given the impression from the Nova team 15:08:52 nrp should be a worthy goal for Rocky, if we wait that's gonna be another cycle 15:09:12 plus we have Alex here :P 15:09:45 What will be delivered in Rocky then? 15:10:17 all the nrp based traits and rcs we discussed 15:10:28 and placement will be ready i suspect 15:11:46 Sundar, we have a discuss nRP on Monday's meeting. can you summary it? 15:14:20 Sundar are you still around ? 15:14:22 shaohe: after Monday's meeting, I started a thread with Nova. Please see some of the responses, like: http://lists.openstack.org/pipermail/openstack-dev/2018-June/131157.html 15:14:35 the agent-driver api spec 15:14:52 is this urgent for rocky as well ? 15:14:58 The virt-drivers need to be updated for nRP, and there are still some concerns around in-place upgrades with nRP 15:15:40 zhipenf, Just trying to understand :) -- if nRP is not ready in Rocky, what will Cyborg deliver in Rocky? 15:15:45 *zhipeng 15:16:37 Sundar we could make a hack work 15:16:49 zhipeng, we should make a decision on how we report the resource to placement. 15:17:14 shaohe_feng specifically ? 15:18:36 Sundar I see the feedbacks in the email thread, and the general feedback is that most of the stuff could be done 15:19:46 zhipeng: Say the nRP functionality is ready by mid-July. Would we have enough time to get it done after that? We have lots of people waiting to use Cyborg. Cyborg has got popular. ;) 15:20:02 zhipeng, Must it be ready for resource report in R release? 15:20:08 Sundar we have more than capable devs :) 15:20:42 shaohe_feng basic functionality should be ready I presume 15:20:46 nothing too fancy 15:21:04 Sundar, yes, cyborg is becoming popular. 15:21:21 zhipeng: Definitely. :) But it may help to have a backup plan, right? Without that, we cannot get basic VM placement to work, AFAICS. 15:22:22 yes we will have time for a backup plan, this could be planned together with Li Liu and Zhuli 15:23:10 we shoot for NRP as priority, if Nova could not deliver it, then we could go backup 15:23:27 but I don't want to drop NRP to a secondary concern at first 15:25:02 So, for development till then, we could invoke placement in some ad hoc way to populate inventory and traits ? 15:25:49 i think so 15:27:07 OK, zhipeng. Your call. :) I had updated the spec to reflect compute node RP as a backup. I cna further clarify that nRP is the preferred way. Would that be enough? 15:27:37 that'd be great :) 15:28:04 btw all the specs we deemed ready today will be merge no later than the end of the week 15:28:31 Sure, thanks. :) Could we say that the spec is ready modulo that clarification? 15:29:06 yes 15:29:16 Sundar, another question about nRP 15:29:17 that's actually more related to the implementation 15:29:30 how we call placement? 15:29:49 folks I got to drop, plz continue discussion, I will come back and terminate the meeting :) 15:29:55 The placement client it ready? 15:30:13 Sundar, I did not find it. 15:30:22 shaohe, I also need to drop for another call. Can we pursue in this IRC channel later? 15:30:31 OK. 15:30:43 then we can terminate the meeting 15:30:50 ^ zhipeng 15:34:35 okey then :) 15:34:42 thx everyone for the discussion 15:34:45 #endmeeting