03:01:40 <Sundar> #startmeeting openstack-cyborg
03:01:41 <xinranwang> Hi Sundar
03:01:42 <openstack> Meeting started Thu Sep  5 03:01:40 2019 UTC and is due to finish in 60 minutes.  The chair is Sundar. Information about MeetBot at http://wiki.debian.org/MeetBot.
03:01:43 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
03:01:44 <xinranwang> Hi chenke
03:01:45 <openstack> The meeting name has been set to 'openstack_cyborg'
03:01:56 <Sundar> #topic Who's here
03:02:05 <Sundar> o/
03:02:06 <xinranwang> #info xinranwang
03:02:11 <s_shogo> #info s_shogo
03:02:12 <yikun> o/
03:02:15 <chenke> #info chenke
03:02:18 <shaohe_feng> o/
03:02:20 <chenke> o/
03:02:25 <s_shogo> o/
03:02:25 <Sundar> Hi all
03:02:31 <zhurong> \0/
03:02:46 <Sundar> zhurong: Welcome
03:03:10 <Sundar> The main item on the agenda is, of course, the reviews
03:04:06 <Sundar> Please focus on the P5-P9 patches. They and Nova notifn patch need to merge soon, before we get reviews of Nova patches. Also, they are needed for tempest to work, and that is a prereq too.
03:05:14 <Sundar> yikun and all: Would you be able to review it? (Sorry for calling you out, yikun :))
03:05:34 <yikun> Sure, I will review it today
03:05:40 <Sundar> Thanks :)
03:06:32 <yikun> and one detail question about db migrations on P5.5
03:06:34 <yikun> https://review.opendev.org/#/c/677115/12/cyborg/db/sqlalchemy/alembic/versions/c1b5abada09c_update_for_nova_integ.py
03:06:51 <Sundar> We had a bit of a wrinkle on the Nova side too. The Cyborg's notification event would come too soon after we start the bind and would get lost. I think we have a way to solve it which (a) should hopefully be acceptable to Nova and (b) does not involve change to Nova objects or db
03:06:53 <yikun> Can we append the change on existing migration script?
03:07:16 <yikun> and most of other patches are okay to me
03:08:01 <Sundar> yikun: There is one migration script for v2 API, so that we can show people exactly what we did. Once we releaseTrain, of course, previous migration scripts cannot be edited.
03:08:35 <Sundar> As long as we don't change the migration script for Stein, I think we should be ok. Any concerns?
03:10:41 <yikun> it will break upgrade from existing to now, but as you said if we still think the train relaase it's a start version,  if other people have no question,  I have no objection
03:12:01 <Sundar> I understand the existing practice but (a) too many migration scripts slows down upgrade for operators and (b) it is good to show onw script for all v2 API IMHO. Thanks for your understanding.
03:12:13 <wangzhh> Agree with yikun. It's better to append a migration script. And other patch is fine with me.
03:12:37 <wangzhh> *patches are
03:13:16 <Sundar> wangzhh: Given the reasons above, would you still ask for a separate migration script?
03:14:30 <Coco_gao_> #info Coco_gao
03:15:07 <Sundar> Hi Coco_gao
03:15:16 <Sundar> We are discussing the patch review
03:15:50 <wangzhh> So, we 'll consider T as a start version?
03:17:00 <Sundar> Stein is the starting version -- v1 API. We should not modify or change the Stein migration script. We introduced a new migration script in Train, for v2 API. Anything unrelated to it can be in a separate script. But, so far, everything we did has been for v2 and Nova integ, I think.
03:18:16 <Sundar> From operators' point of view, the migration scripts that are part of a release should not be changed. Within a release, we usually introduce separate scripts for developers' convenience, I think.
03:19:10 <wangzhh> Uh, Okay. I have no objection.
03:19:33 <Sundar> Anyways, that is just my view. Thank you, wangzhh, yikun and all.
03:19:43 <Sundar> Any other concerns or thoughts on P5-P9?
03:21:19 <chenke> why are you put -2 on P5?
03:23:06 <Sundar> chenke: It is only a procedure, so that the patch series does not merge piece by piece. Once we get +2 on all patches, I will remove -2, so that it will all merge together. Nova guys are following this practice for my patch series too:
03:23:09 <Sundar> https://review.opendev.org/#/c/631242/
03:23:30 <Sundar> I should have done that right at the beginning.
03:24:05 <Sundar> So, even P5 can be reviewed fully.
03:24:05 <chenke> Okay. I understand it.
03:25:07 <Sundar> Can we hope it to merge them by tomorrow? I don't want to rush you :)  but till P5-P9 and notificaiton patch merge, we cannot even ask Nova folks for review
03:25:31 <Sundar> And Sep 9 deadline is just few days away
03:26:52 <wangzhh> One question: https://review.opendev.org/#/c/678177/1..2/cyborg/common/policy.py  Related comment in this patch.  I did't find program in update ARQ.
03:28:02 <Sundar> update refers to bind/unbind here, and that would involve programming.
03:28:17 <Sundar> Are you saying the term 'update' should be changed?
03:29:13 <wangzhh> No. Just check if it is admin or not before program. As I replied.
03:30:18 <Sundar> Hmm, it could be owner or admin, right? Whoever created the ARQ should be allowed to bind it, otherwise they should not be able to create it.
03:32:23 <wangzhh> Yep.  Owner or admin should be allowed to Bind and Unbind. But when  involve programming, we should check if it is admin or not.
03:32:50 <wangzhh> Only admin can program? Right?
03:33:54 <Sundar> Binding involves programming for FPGAs. A tenant who has the privilege to create ARQs for an FPGA should also be able to program it, right? That's why I raised the question about 'create' privileges.
03:34:04 <shaohe_feng> program had better have sudo privilege no node
03:34:28 <Sundar> Yes, we do sudo fpgaconf today, and have a patch to move to oslo privsep.
03:34:28 <shaohe_feng> s/no/on
03:34:55 <shaohe_feng> yes. let's from simple
03:35:05 <shaohe_feng> Firstly it can work.
03:35:17 <shaohe_feng> and let nova can call cyborg
03:35:21 <shaohe_feng> then improve cyborg
03:35:28 <Sundar> wangzhh: I understand your concern. However, if only admins can create/bind ARQs for FPGAs, that will be very restrictive, right?
03:35:35 <chenke> Yes. Firstly make it work.
03:36:21 <wangzhh> Yep. Sundar.
03:36:44 <wangzhh> So, Just make it work first.
03:37:17 <Sundar> Ok, I agree with wangzhh that security is important and with others that we should first get it to work by Train-3. I'll bring RBAC on the agenda for the next IRC meeting. Hopefully, other things will be working by then. :)
03:37:35 <wangzhh> Cool.
03:37:36 <chenke> Agreee.
03:37:54 <Sundar> #topic Python 3
03:38:29 <shaohe_feng> the admin or  tenant own the bit steam image, can download the image and to do program.
03:38:42 <Sundar> There is a Train goal to enable Py3 by Train-3 milestone, which is next week. Thanks to s_shogo (and Ikuo) for the patch. Hope we can merge that too by next week.
03:39:00 <shaohe_feng> we should challenge the tenant
03:39:46 <Sundar> shaohe_feng: Sure, let's take it up next week.
03:39:56 <s_shogo> I'll check  again that soon,  after P5-P9 merged. (latest patch seems to be no problem)
03:40:20 <Sundar> Thanks, s_shogo.
03:40:49 <Sundar> #topic AoB
03:41:08 <s_shogo> About cyborg client.
03:41:38 <xinranwang> Hi all, I have to drop now. Thanks for your reviews for placement patch, it is merged finally :) And please review https://review.opendev.org/#/c/667231/ tempest code and notify patch https://review.opendev.org/#/c/674520/ too.
03:41:41 <Sundar> Outside of patches and Py3, I don't have much for today. As i said, I think I have a way to avoid the race in Nova, and will work on getting it out. Anything else for today?
03:41:55 <Sundar> s_shogo: Yes
03:42:05 <Sundar> I saw your patch -- thanks.
03:42:09 <s_shogo> I have posted a patch for OpenStack SDK, and request review to SDK team. https://review.opendev.org/#/c/679914/
03:42:37 <s_shogo> Subsequently ,I'll post cyborg-python client patch ,please review that > All
03:42:56 <chenke> Cool.
03:42:59 <Sundar> Great, sure, will do.
03:43:24 <shaohe_feng> one thing.
03:43:36 <shaohe_feng> we may start several asyn jobs
03:44:03 <shaohe_feng> only all jobs are finished, the bind is successful
03:44:18 <shaohe_feng> one of the jobs failed, then the bind should be failed
03:44:24 <shaohe_feng> right?
03:45:10 <Sundar> Yes.
03:45:44 <shaohe_feng> so how to sync all the state of jobs?
03:46:20 <Sundar> Wait for all of them to finish. I think eventlet has a wait() method, IIUC
03:46:35 <shaohe_feng> yes.
03:46:53 <shaohe_feng> but wait is block
03:47:04 <shaohe_feng> so it can not in main thread.
03:47:18 <shaohe_feng> also another,  if one jobs failed, we should cancel others, right?
03:47:28 <shaohe_feng> where should we put the wait?
03:48:18 <Sundar> It can't be in main thread, I agree. We don't need to cancel the rest if one fails, at least in Train. As we discussed, if the job has already started flashing the card, we shouldn't cancel it then
03:49:03 <Sundar> Just a thought: May be we have one eventlet whichis a master and it launches other eventlets, and the master does the wait.
03:50:00 <shaohe_feng> yes, we need a such mechanism
03:50:14 <shaohe_feng> I'm thinking it.
03:50:44 <shaohe_feng> the master thread should not be from a  thread pool, right?
03:50:58 <shaohe_feng> for example, there a 2 pools.
03:51:01 <Sundar> Sure. My suggestion is to do something super-simple for Train-3 milestone -- may be just one eventlet for all ARQs, because we won't have too many ARQs anyway
03:51:45 <shaohe_feng> we create 2 jobs, so no pool for the master thread to monitor these 2 jobs.
03:51:59 <Sundar> Simplcity and reliability are more important, IMHO. It will also reduce your work :)
03:52:20 <shaohe_feng> yes, agree.
03:52:34 <shaohe_feng> async is more complex than sync.
03:53:26 <Sundar> Yes. IMHO, we can even leave unbind as sync in the worst case, for now, because we don't do anything there today
03:53:46 <shaohe_feng> yes. agree
03:53:58 <Sundar> We just want some patch in the works by Sep 3, so that we can kickstart the review
03:54:18 <Sundar> I mean, Sep 9
03:55:43 <Sundar> Anything else, anybody?
03:55:59 <chenke> No
03:56:37 <Sundar> Cool. Thank you, all. Please review the patches. Have a good day! Bye.
03:56:41 <Sundar> #endmeeting