03:05:12 #startmeeting openstack-cyborg 03:05:13 Meeting started Thu Aug 29 03:05:12 2019 UTC and is due to finish in 60 minutes. The chair is Sundar. Information about MeetBot at http://wiki.debian.org/MeetBot. 03:05:14 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 03:05:16 The meeting name has been set to 'openstack_cyborg' 03:05:18 #info Coco_gao_ 03:05:20 Hi all 03:05:24 morning Coco_gao_ 03:05:33 Hi all 03:05:35 #topic Attendance 03:05:35 morning shaohe 03:05:40 #info SUndar 03:05:47 #info s_shogo 03:05:51 Hi~ 03:05:58 Hi all 03:05:59 Hi chenke 03:05:59 #info chenke 03:06:01 #info shaohe_feng 03:06:07 #info Yumeng 03:06:08 #info yikun 03:06:19 Agenda: https://wiki.openstack.org/wiki/Meetings/CyborgTeamMeeting#Agenda 03:06:19 Hi Coco_gao_ 03:07:53 Python 3: Since OpenStack Train release has some Python 3 goals, due by Milestone 3, and it seems that we are close to fixing Py3 issues for Cyborg, 03:08:17 I have requested s_shogo to make Python 3 tests as a voting job in Zuul. 03:08:27 Any objections or comments? 03:10:01 I'll take the silence as agreement. ;) There were requests for fixing Python 3 in the cyborg client too. Luckily, it has taken only 1 patch so far, so we don't need to spend much time on it. 03:10:21 +1 03:11:05 I'll do the py3 work in cyborg client, too. 03:11:19 good job 03:12:16 I had modify the tox.ini default env support py36,py37 03:12:22 thank you 03:12:26 s_shogo: The catch is, the current client is for v1 API code and not based on the openstacksdk method. Bringing it to v2 is more important, right? 03:12:27 s_shaogo 03:12:35 s_shogo 03:12:37 Cool. 03:12:49 https://review.opendev.org/#/c/673228/ 03:13:06 this is a python3 issue fix for client 03:13:25 But somebody else proposed a patch and it got merged. 03:13:50 Sundar: I think so, My openstackSDK patch is made for the v2 Deployable API, now. 03:14:12 And the P5-P9 patches doesn't include the migration code , "Deployable" API , from v1 to v2. 03:15:03 s_shogo: Great. Please add device profiles, as that is more importan IMHO. Operators need to create device profiles to use Cyborg, but doing that with curl is not easy 03:15:26 agree, Sundar 03:15:49 As 2nd priority, I'd say devices -- that will give an inventory of accelerator devices in the cluster 03:16:57 IMHO, when devices are asked for, we can return the components like deployables and attributes, so the client gets a full picture 03:16:59 yes, client if more friendly than curl 03:17:25 As related the client, the deadline for openstackSDK's commit seems to be near, so would like to begin commit to that, prior to the merge of APIv2 patches. 03:17:34 Yes, makes sense 03:17:48 Thanks, s_shogo! 03:18:28 The main thing that is holding me back is that I am testing P5-P9 with the notification and Placement report patches. Plus, Nova code changes to create a merge conflict for me. 03:18:46 Once those are resolved, hope we can merge the P5-P9 patches 03:19:01 ANy other comments on the client, anybody? 03:19:13 yes, async job depends on P5-P9 03:19:25 In my assumption,python-cyborg client and openstacksdk could to be completed before the Train release, 03:19:38 great 03:19:38 but I'm anxious of sufficiency in my test codes, thus please review that in following patches, and help that if necessary. 03:19:53 shaohe_feng: Agreed. I'll expedite as much as I can. 03:20:06 s_shogo: Agreed, we'll help for sure 03:20:23 Thanks , Sundar 03:20:29 maybe the test codes can be add later. 03:20:35 wangzhh: Thanks for proposing the RBAC patch. I had some concerns/questions in the patch. Please take a look. 03:20:50 firstly let the client can work. 03:21:11 s_shogo, thank you . We will review the code. 03:21:28 Yep. I have updated my code. May commit after meeting. 03:21:39 Thanks, wangzhh 03:22:03 shaohe_feng : OK, I'll do that preferentially. 03:22:26 shaohe_feng: Part of the issue is that some Nova developers want to test Cyborg code with Nova code in theor env. Also, we need to show tempest working end-to-end. 03:23:41 Anybody else trying out the Placement report? With GPUs, AI chip, etc.? 03:23:42 What's the remaining work for tempest? 03:23:46 yes, tempest can eliminate their concerns 03:24:07 Coco_gao_: It is mostly to get the patches to work together, I think 03:24:34 Xinran's patches look good IMO. Trying to make sure they work with P5-P9 03:24:51 I have tried the Placement report With GPUs 03:25:20 Yumeng: Good to know 03:25:42 #topic Nova functional tests 03:26:40 There was talk at the PTG that we should propose functional tests for Nova, which mock CYborg API in a test fixture, and use that to test Nova patches 03:27:08 They seem to cover a few more scenarios than unit tests and tempest 03:27:52 mock cyborg API's return? 03:27:53 I agree we need to import functional test for nova. 03:28:09 Coco_gao_: Yes 03:28:44 We have an entry in the Storyboard too. I have not any comments of late, but there is concern that it may come up at the last moment 03:29:11 Since there is lots of stuff in Nova runway, it can be tough to get a 2nd look if this issue comes up 03:29:38 DO we have any volunteers for writing Nova functional tests? I'll help as much as I can 03:31:35 Please think it over and LMK if you can. 03:32:33 shaohe_feng: Do you want to bring up the discussion about ARQ states and transitions, as followup? Or is it settled? 03:33:12 yes 03:33:45 one things is that, who delete the ARQ 03:34:09 when delete API tag the state as delete_pending? 03:34:22 There is Nova code to delete the ARQ in some error cases and when VM is terminated 03:35:03 maybe it is still in bind process 03:35:47 the bind process to delete it when it find the state is delete_pending? 03:36:06 Yes. In that case, IMHO, it is best to let the bind complete and the traits get updated in Placement, and then unbind/delete the ARQ 03:36:26 If we try to interrupt FPGA progamming, bad things can happen 03:36:46 we will not add any rollback this release for bind. just go through the whole process even deleting. 03:36:54 Agreed 03:37:06 OK. 03:37:09 OK 03:37:48 any state transform should be transaction. 03:38:39 Yes, db transaction 03:39:21 seems there is a state machine in oslo lib 03:39:26 Any other issue, shaohe_feng? 03:39:43 we will not introduce it release 03:40:12 Ok by me. What are the benefits of using that? 03:40:13 for I need time to read up it. 03:40:31 do not look into it at present. 03:40:35 ok 03:40:55 maybe after the whole flow code are finished 03:41:07 we can have a look for cons and pros 03:41:33 Sure. We'll trust your judgement on this :) 03:41:48 another things, should the async job timeout? 03:42:01 On a different note, I am seeing this issue for allocating attach handles: https://opendev.org/openstack/cyborg/src/branch/master/cyborg/db/sqlalchemy/api.py#L269 The in_use field does not get written to db 03:42:37 but there's still a problem. 03:42:40 The timeout should correspond to default Nova timeout 03:43:10 maybe it is in programming or other critical job 03:43:34 The programming typically takes a few seconds, so default of 300 seconds (I think) is good enough 03:43:40 timeout can be disaster 03:44:21 another things 03:44:43 currently the bind process is specify for FPGA 03:45:40 Umm, bind if for all accelerators. Only programming is for FPGA. the bind means the ARQ is associated with a host and deployable in Cyborg's db, and the device is ready to use 03:45:44 *is for 03:45:55 there should be good extension for other kinds 03:46:05 I means: 03:46:22 1. get the resource type. 03:46:50 every resource type should has its own extend bind action 03:46:56 for FPGA it is program. 03:47:08 other's maybe evn setup, not sure. 03:47:43 2. every resource should be has its own placement report. 03:48:20 the report info maybe different 03:48:35 so the code should be: 03:49:09 type, num = arq.group_get_resource() 03:49:17 for n in num: 03:50:05 action = get_accelerator_action(type) # fpga is program 03:50:08 action() 03:50:15 somethings like this 03:50:38 and these code should be split from the arq object file 03:51:36 In general, the process should be generic for all accelerators. The current code looks at the device profile request group to see if it has function_id or bitstream_id entries, which are specific to FPGA, to decide if programming is needed 03:52:32 we maybe add other spec in 03:52:36 AFAIK, for non-FPGA devices in this release, there is nothing required to prepare the device, right? 03:53:10 devices profile for different acclerations 03:53:14 such as HDDL 03:53:19 we can add 03:54:18 "accel:affinity": true 03:54:44 Ok 03:54:46 which means we need 4 accelerator in one card 03:54:49 We had an idea of a generic prepare_device API in the driver, which gets a dictionary as a parameter, where the dictionary values depend on the device type. 03:55:26 yes, different devices maybe take different action during bind. 03:56:14 Quick process check: Since we have only few minutes left, should we continue this via email, copying all of us and openstack-ML? What do you all think? 03:56:34 also another things, where we init the threadpoolexcutor? 03:56:43 int the arq object file? 03:56:48 seems not good. 03:57:11 OK. 03:57:52 All, please look at this issue for allocating attach handles: https://opendev.org/openstack/cyborg/src/branch/master/cyborg/db/sqlalchemy/api.py#L269 The in_use field does not get written to db 03:58:34 All, we are seeing good review activity of late. Thank you all, and please keep it up. We are literally 2 weeks from the milestone. :) 03:58:41 #topic AoB 03:59:20 shaohe_feng: if you prefer, I can initiate an email thread for the good points that you brought up. Good? 03:59:39 OK 03:59:48 Anything else, folks? 04:00:50 do you have a look that the in_use is in the arguments of the update function? 04:01:00 Yes 04:01:14 and the your DB really have the in_use field? 04:01:32 directly use mysql command. 04:02:04 Oh yes. The ref.update has it, but it doesn;t get written to db. Use mysql cmd from Python code? 04:02:20 no 04:02:22 such as: 04:02:33 mysql -uroot -ppass cyborg 04:02:58 Yes, update command works from CLI 04:03:25 We'll follow up on this too by email. 04:03:33 desc haddler; 04:03:43 OK. 04:03:52 Thanks, everybody. Happy coding and reviewing :). Have a good day. Bye. 04:03:58 #endmeeting