#openstack-cyborg log

15:04:20 <shaohe_feng_> #startmeeting openstack-cyborg-driver
15:04:21 <openstack> Meeting started Mon Mar 18 15:04:20 2019 UTC and is due to finish in 60 minutes.  The chair is shaohe_feng_. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:04:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:04:25 <openstack> The meeting name has been set to 'openstack_cyborg_driver'
15:04:39 <shaohe_feng_> let waits for a minutes.
15:05:39 <shaohe_feng_> #info shaohe_feng_
15:05:40 <wangzhh> Fine.
15:06:03 <xinranwang> Hi all
15:06:07 <wangzhh> Hi xinran.
15:06:11 <xinranwang> Sorry for late
15:06:16 <xinranwang> #info xinranwang
15:06:19 <shaohe_feng_> evening xinranwang
15:06:24 <wangzhh> #info wangzhh
15:06:35 <xinranwang> hi shaohe_feng_  wangzhh
15:06:53 <shaohe_feng_> we have not hold this meeting for a long time.
15:07:24 <shaohe_feng_> #link https://wiki.openstack.org/wiki/Meetings/CyborgDriverTeamMeeting#Agenda_for_next_meeting_:_Mar_18th.2C_2019
15:07:28 <shaohe_feng_> here is the agent.
15:07:50 <shaohe_feng_> s/agent/agenda
15:07:57 <Li_Liu> Hi Gyus
15:08:04 <wangzhh> Hi, uncle Li.
15:08:16 <shaohe_feng_> Li_Liu: morning uncle Li.
15:08:19 <Li_Liu> Hi, xiaohei~~
15:08:28 <Li_Liu> hi shaohe
15:08:38 <Li_Liu> you guys wanna do a zoom meeting instead?
15:08:38 <shaohe_feng_> I want to introduce some some hardware accelerators.
15:09:10 <shaohe_feng_> 1. the current know type of accelerator card
15:09:45 <shaohe_feng_> as we all know cyborg will support mdev and pci card.
15:10:11 <shaohe_feng_> but now I find there are 2 other kinds of hardware card we can support.
15:10:35 <shaohe_feng_> one is ip over PCIE, another is USB.
15:10:47 <Li_Liu> i see
15:11:06 <shaohe_feng_> wangzhh: do you know these two kind cards?
15:11:08 <Li_Liu> can they fit into our current design?
15:11:29 <shaohe_feng_> not sure, so we need more discuss with them.
15:11:56 <wangzhh> I don't know much about ip over pcie, what does that mean?
15:12:29 <Li_Liu> I think it's a remote case
15:12:55 <Li_Liu> PCI over ethernet?
15:13:14 <shaohe_feng_> Li_Liu: yes.
15:13:43 <shaohe_feng_> #link https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/vca-2-visual-compute-accelerator-product-brief.pdf
15:13:53 <shaohe_feng_> Li_Liu: No, IP over pci.
15:14:05 <Li_Liu> from Operation System point of view, it's still a pci device right?
15:14:45 <shaohe_feng_> Li_Liu: is it s pci devices, but you communicate with it by it.
15:14:52 <shaohe_feng_> Li_Liu: it is a local pci card.
15:15:16 <shaohe_feng_> such as the vca2 card, see link above,
15:15:34 <Li_Liu> you mean other hosts can communicate with it over ethernet?
15:16:29 <wangzhh> So, actually, it is a pci device?
15:16:31 <shaohe_feng_> oh, the local host communicate the local card, over PCIE.
15:16:49 <wangzhh> from os view.
15:17:50 <shaohe_feng_> wangzhh: from the os view, you can see it a new kind device with new driver maybe.
15:18:18 <shaohe_feng_> there's another card I have attend meeting last week, seem this is a common way for some card.
15:18:58 <shaohe_feng_> we can dig more about this kind of card.
15:19:52 <shaohe_feng_> for usb card, the movidius AI card is this kind.
15:20:10 <Li_Liu> I think we just need to make sure 2 things: 1. can os-acc attach it like all the other devices, 2. can the resource fit into our current data model
15:20:53 <shaohe_feng_> yes.
15:20:55 <Li_Liu> as long as these two requirements can meet, we should be good
15:21:26 <wangzhh> It make sense.
15:21:48 <shaohe_feng_> I think the usb devices can satisfy these two requirements.
15:22:01 <Li_Liu> shaohe_feng_ have they finalize the resource structure yet?
15:23:10 <shaohe_feng_> Li_Liu: usb, yes.
15:23:31 <shaohe_feng_> just remind these 2 kind devices.
15:23:31 <wangzhh> How about another one?
15:24:20 <shaohe_feng_> wangzhh: I'm not looking into looking into  it well.
15:24:35 <shaohe_feng_> OK, let's go ahead.
15:24:48 <wangzhh> OK.
15:24:52 <shaohe_feng_> Re-enumeration of hardware card
15:25:17 <shaohe_feng_> most of us know the issue of Re-enumeration.
15:25:52 <Li_Liu> the issue we discussed last week?
15:26:16 <shaohe_feng_> no, but this is a common issue.
15:26:43 <shaohe_feng_> the bus of a hardware card maybe change  after we resize a hardware and reboot.
15:27:12 <shaohe_feng_> seem this is a big problem for accelerator manage in cyborg.
15:28:01 <shaohe_feng_> I have discuss it with Yongli, the main PCI devices contributor in nova.
15:28:29 <shaohe_feng_> he say, nova does not allow resize hardware
15:29:35 <Li_Liu> you mean add/remove device after reboot?
15:29:54 <shaohe_feng_> yes
15:30:04 <shaohe_feng_> unless evict all VMs from this node.
15:30:30 <shaohe_feng_> Li_Liu: wangzhh: xinranwang: what's do you think about?
15:31:00 <shaohe_feng_> Or do you have a good ideas for hardware resize?
15:31:33 <wangzhh> Wuu, IMO, it's better to change status to error or offline in cyborg.
15:31:58 <wangzhh> And let operator sync it manaully.
15:32:17 <Li_Liu> Let's say before restart we have 3 cards, after restart we now have 4 cards
15:32:44 <Li_Liu> I think driver can find out which one is the new one right?
15:32:47 <wangzhh> We can supply a tool or api for operator to update it.
15:32:50 <xinranwang> If we plug in a new card on server, and reboot. But the hw resource assigned to an instance does not change.
15:33:17 <xinranwang> will the bdf change? if so, that should be an issue.
15:33:20 <wangzhh> Li_liu, as xinran said.
15:33:57 <Li_Liu> the bdf might change, but we don't need to guarantee give user the card with the same bdf
15:34:12 <shaohe_feng_> the bdf maybe change, bus-port of a usb devices also maybe change.
15:34:18 <Li_Liu> just give user the card with the same type
15:35:27 <wangzhh> The most tricky thing is how to handle the resource which had been  assinged.
15:36:18 <xinranwang> if user has done some work on old hw, that will be a loss.
15:37:12 <Li_Liu> in that case, operator has to notify the user to backup first
15:37:35 <Li_Liu> size operator should know when the resizing is happening
15:37:44 <Li_Liu> since*
15:38:50 <Li_Liu> In 99% of the scenarios tho, I don't think it matters anyway
15:38:57 <wangzhh> What about power failure？
15:39:28 <xinranwang> how will nova record the hw resource from cyborg, there should be a field of nova instance to record this.  is this attach_handle_uuid?
15:39:28 <Li_Liu> if power failure happen, the device should not be resized right?
15:40:56 <shaohe_feng_> if you hotplug in a hardware before failure happen, the things is also bad.
15:42:33 <wangzhh> Li, If we just reboot the server, the bus wont't change?
15:42:43 <wangzhh> *won't
15:43:03 <Li_Liu> lol... as I said.. if operator wants to do this... he/she needs to notify users...
15:43:14 <shaohe_feng_> if you do not resize hardware, the bus wont't change.
15:43:16 <shaohe_feng_> Li_Liu: yes.
15:43:31 <xinranwang> wangzhh:  no, it will not change
15:43:36 <wangzhh> shaohe_feng_, Got it.
15:43:36 <Li_Liu> wangzhh, I think simple reboot should not change the bdf
15:43:48 <Li_Liu> bios just scan the pci tree
15:44:04 <Li_Liu> if nothing new is inserted, it should not change
15:44:07 <shaohe_feng_> live migrate the VM to another host.
15:44:50 <xinranwang> that's more complex...
15:45:21 <wangzhh> scheduler filter should deal with this part. shaohe_feng_
15:46:19 <shaohe_feng_> the data center can scale their hardwares. For example the want to support more AI card in their exist hosts.
15:47:27 <shaohe_feng_> OK, let keep this issue in mind, maybe we can  find a good way to solve it
15:47:31 <shaohe_feng_> go ahead.
15:47:52 <shaohe_feng_> multi-level resources support
15:48:18 <shaohe_feng_> now I want to support a new multi-level card.
15:48:29 <shaohe_feng_> similar to pfga card.
15:49:15 <shaohe_feng_> for example. There is a one region in a card but  4 functions in a region.
15:49:16 <Li_Liu> sure, to support new cards. as long as it can meet the requirements I mentioned earlier
15:49:38 <Li_Liu> 4 different functions?
15:49:42 <shaohe_feng_> there's 3 requirements:
15:50:29 <shaohe_feng_> Li_Liu: in my new card, they are same function, but for fpga, it may different functions. fpga is more complex.
15:51:03 <shaohe_feng_> 1. we should know the topology of this devices.
15:52:05 <shaohe_feng_> 2. user can apply any level of the resources, for example, he want to apply a region or just one function.
15:52:37 <shaohe_feng_> 3. avoid fragmentization
15:53:35 <shaohe_feng_> Li_Liu: now the cyborg satisfy the the former 2 requirements, right?
15:54:11 <Li_Liu> shaohe_feng_ it should
15:54:18 <shaohe_feng_> Ok, greate.
15:54:26 <shaohe_feng_> what's about 3.
15:54:27 <Li_Liu> cyborg was designed to have these in mind
15:54:36 <shaohe_feng_> good.
15:54:49 <Li_Liu> the 3rd one is related to scheduling algorithm
15:55:10 <Li_Liu> we might need to work with nova weigher for that
15:55:22 <shaohe_feng_> Li_Liu: that's need cyborg help.
15:55:31 <Li_Liu> that's for sure
15:55:47 <shaohe_feng_> let me elaborate it
15:56:01 <shaohe_feng_> 3 regions
15:56:03 <Li_Liu> cyborg can provide a weigher like mechanism and work with nova
15:56:20 <shaohe_feng_> one region with 4 function.
15:56:43 <shaohe_feng_> User 1 apply one function from region 1
15:58:14 <shaohe_feng_> user 2 want another 2 more functions. I expect cyborg allocate them from  region 1 instead of region 2/3.
15:58:48 <shaohe_feng_> user 3 want another one more functions, it is also from  region 1.
15:59:49 <shaohe_feng_> the allocation should not scatter among region 1，2 and 3
16:00:28 <shaohe_feng_> they should centralize 1 region.
16:00:29 <Li_Liu> that should be easy to do. a weigher would do the job'
16:01:24 <shaohe_feng_> so user 4 can apply the rest 2 whole regions.
16:02:08 <shaohe_feng_> Li_Liu: OK, is there a weigher mechanism for it now？
16:02:17 <Li_Liu> not yet
16:02:27 <Li_Liu> we can plan this
16:02:42 <shaohe_feng_> OK, good.
16:02:44 <Li_Liu> coz I think numa scheduling also needs this feature
16:02:56 <shaohe_feng_> this is useful.
16:03:15 <Li_Liu> for sure
16:03:40 <Li_Liu> I will add this to T release plannig
16:03:59 <shaohe_feng_> there's a common scenario for this feature.
16:04:10 <shaohe_feng_> Li_Liu: good, thanks.
16:04:26 <Li_Liu> npnp
16:04:57 <shaohe_feng_> AoB?
16:05:02 <Li_Liu> I need to pick up my lunch now, you guys can go ahead. don't stay too late.. :P
16:05:08 <shaohe_feng_> Li_Liu: wangzhh: xinranwang ?
16:05:16 <Li_Liu> I am all good\
16:05:26 <shaohe_feng_> good.
16:05:32 <shaohe_feng_> glad to talk with you.
16:05:42 <wangzhh> Me, too.
16:06:10 <shaohe_feng_> let's end the meeting.
16:06:11 <xinranwang> i am fine with that. NUMA should also need the similar mechanism
16:06:33 <shaohe_feng_> thanks all.
16:06:55 <shaohe_feng_> #endmeeting