15:00:18 #startmeeting openstack-cyborg 15:00:19 Meeting started Wed Mar 29 15:00:18 2017 UTC and is due to finish in 60 minutes. The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:20 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:22 The meeting name has been set to 'openstack_cyborg' 15:00:38 #topic Roll Call 15:00:49 \o 15:01:09 \o 15:01:22 o/ 15:02:20 waiting for more folks if we have 15:03:35 o/ 15:03:47 \o 15:06:51 okey let's proceed 15:07:02 #topic BP review 15:07:12 #link https://review.openstack.org/#/q/project:openstack/cyborg 15:07:44 i must appologize i was buried with kubecon work this week so haven't got the time for the reviews 15:08:02 i see _gryf posted a lot of good comments 15:08:10 as well as rushil and others 15:08:26 if we have any outstanding issues, please feel free to shout out 15:11:16 do we want to consolidate all DB interaction to the api end point (presumably on the controllers) or have the agent on the computes update the database themselves. 15:11:49 essentially that's a question of where to handle communication and parallelism. 15:13:08 if we choose the latter one, does it mean the controller side will be sometimes out of sync with the agents ? 15:14:11 it would make it a concern, it would be possible to prevent if we tried. 15:14:32 on the other hand if the agent's don't store anything themselves then we can lose accelerator state if anything disrupts the agent. 15:17:21 how could we prevent the out-of-sync problem ? using heartbeat ? 15:17:44 I would prefer the latter option> And I believe heartbeat can be an option 15:18:02 just make sure both sides refresh info from the DB when it might have changed. So proper cache invalidation. 15:18:07 and probably more DB load 15:18:33 But the agent should definitely keep its database updated 15:19:06 but in real deployment that always hard to manage, I always hear complaints from our product team about the heartbeat 15:19:21 cache invalidation is prong to go wrong 15:19:58 is there a way for us to simplify so that we could avoid the common shortcomings 15:20:25 so agent <-> rabbit <-> api end point <-> db 15:20:54 that way we don't need invalidation because the end point is the only one that interacts with the DB 15:22:18 Isn't that overkill though? Going through hoops to update db? 15:22:35 Can 15:23:05 Can't we have agent maintain a local copy of the db and agent keeps updating it? 15:23:52 but if say we have 3 compute node that all have FPGA based iNIC on it 15:24:01 why does the agent need info about all accelerators? it only needs to manage one machine (there are many agents on many compute nodes, but each one only needs to care about it's own scope) I guess they could have a mini db each but what good is that. 15:24:07 if the DB is updated only via local copy 15:24:52 i'm terminating my thoughts here~~ 15:25:04 Well, I feel that the agent needs to have a more efficient way of updating the db 15:26:24 maybe we just have the agent update the db for things that are longer term? Like a new accelerator? 15:26:52 how about the config changes 15:27:03 that would be part of the life cycle mgmt usually 15:28:09 or we have two seperate DBs (sorta), the main one resides with the control node 15:28:13 I guess agents need to read from the db for config updates, we don't want to obligate the operator to ssh into every node to change a config value. 15:28:28 like jkilpatr it only get updated on major events 15:28:38 and agent locally manages a cache 15:28:49 for the more constant changes ? 15:29:40 That seems fine, although how would you define major events? 15:29:49 the info in the locan cache should be of no concern of either other compute nodes or the control node 15:30:23 like when accelerator attached, created, deleted 15:30:47 and for small changes 15:30:57 agent could advertise to the api-endpoint 15:31:16 as a bulk of its local cache info, if it deemed important by the control node 15:31:44 so what I'm think is that user could explicitly define events that they care 15:31:52 in the cyborg.conf for example 15:32:13 and those events will be aggregated and reported by the agents to the api-end-point 15:32:18 * _gryf waves late 15:32:21 So, you're suggesting maintaining a central db and a local cached db. The central db would be updated on all events and cached db will be updated only on select events? 15:32:26 alongside the usual big event 15:32:47 Jkilpatr here had to go mobile. 15:33:22 central/main DB will get updated on all the major events (attach/detach/delete/create) and smaller events that user specified 15:33:46 the local copy is just cache for all the local updates, refreshed constantly 15:33:55 that is my thinking 15:34:08 <_gryf> zhipeng, +1 for the cache instead of full blown db 15:34:37 Sounds like a workable compromise. 15:34:56 that sounds ok to me 15:35:36 great :) then the DB design would remain the same, and there will be an additional design on the local cache for the agent 15:35:57 is this ttk2[m] or crushil's ? 15:36:44 I'll do the agent cache. 15:37:14 fantastic :) 15:37:15 cool 15:37:30 #action Justin to update with the agent cache design 15:37:46 Should I put that in the same blueprint? 15:37:59 i think you could do that 15:38:12 it should not be a big deal 15:40:33 okey any other topics ? 15:40:50 any questions on Roman's BP re the interaction between Nova and Cyborg 15:43:19 nothing from me 15:44:02 well let's keep review the spec anyways :P 15:44:10 #topic AoB 15:44:17 any other buisness ? 15:44:36 Do we want to set a blueprint finish goal? 15:44:52 ttk[m] what do you mean ? for a deadline date ? 15:44:58 <_gryf> ttk2[m], like a deadline or smth? 15:45:14 Yes. Just something to keep things moving. 15:45:42 <_gryf> in other projects that depends on how big is the core reviewers pipe 15:46:10 i would love to set a deadline lol 15:46:28 <_gryf> in nova there are deadlines, so they can be sure, that core reviewers spend quality time on desired and accepted features 15:46:35 what we called in China is the lazy cancer kicked in all the time :P 15:47:03 <_gryf> otoh, I've seen projects, (smaller ones), that they obey only openstack release schedule 15:47:45 I think setting a deadline/milestone would be a good idea 15:47:54 <_gryf> that fine. 15:47:56 so that everyone on the same page and pace 15:48:05 others ? 15:48:20 As long as it's not too tight. 15:48:22 So, for current specs, should we say EOR should be the deadline? 15:48:28 (if we have a milestone then maybe we need a sprint someday ...) 15:48:41 crushil for specs that would be too relax 15:48:49 we need to get code in for Pike 15:48:59 I meant implementation of the specs 15:49:07 yes that'd agre 15:49:18 I would agree 15:49:23 Let's just put a deadline on specs for now. 15:49:29 Apr 15th ? 15:49:46 That's what 3 meetings? 15:49:57 lol 15:50:02 sounds about right 15:50:10 too relax or too tight ? lol 15:50:40 I think for most of the specs, the owners know what needs to be added 15:50:57 So, it is slightly on the more relaxed side 15:51:01 it is the reviews that takes time 15:51:02 * _gryf seen blueprints which lasts several months, before being accepted ;] 15:51:25 Fair 15:51:25 _gryf we are not that famous project so we could move faster XD 15:51:31 <_gryf> which obviously means slipping to the next release :> 15:51:40 Let's do April 15th then 15:51:52 +1 15:52:10 #vote Apr 15th as the deadline for specs 15:52:21 okey wrong cmd ... 15:52:32 but anyway we should all agree on this 15:52:46 #info Apr 15th for the first milestone on spec freeze 15:53:01 #agreed Apr 15th for the first milestone on spec freeze 15:53:31 great discussions folks 15:53:36 any other buisness ? 15:54:36 My Chromebook just died.. 15:54:58 zhipengh[m], Get a real computer. :P 15:55:11 lol give me money 15:58:18 Okey if no other biz, let me try to end meeting using this handle... Not sure it will work 15:58:27 #endmeeting 15:59:14 Bummer... 15:59:48 <_gryf> #endmeeting 15:59:56 <_gryf> lol 17:51:01 the longest meeting ever... 17:51:05 #endmeeting