#openstack-cyborg log

15:00:18 <zhipeng> #startmeeting openstack-cyborg
15:00:19 <openstack> Meeting started Wed Mar 29 15:00:18 2017 UTC and is due to finish in 60 minutes.  The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:22 <openstack> The meeting name has been set to 'openstack_cyborg'
15:00:38 <zhipeng> #topic Roll Call
15:00:49 <crushil> \o
15:01:09 <zhipeng> \o
15:01:22 <jkilpatr> o/
15:02:20 <zhipeng> waiting for more folks if we have
15:03:35 <skelso> o/
15:03:47 <mpaolino> \o
15:06:51 <zhipeng> okey let's proceed
15:07:02 <zhipeng> #topic BP review
15:07:12 <zhipeng> #link https://review.openstack.org/#/q/project:openstack/cyborg
15:07:44 <zhipeng> i must appologize i was buried with kubecon work this week so haven't got the time for the reviews
15:08:02 <zhipeng> i see _gryf posted a lot of good comments
15:08:10 <zhipeng> as well as rushil and others
15:08:26 <zhipeng> if we have any outstanding issues, please feel free to shout out
15:11:16 <jkilpatr> do we want to consolidate all DB interaction to the api end point (presumably on the controllers) or have the agent on the computes update the database themselves.
15:11:49 <jkilpatr> essentially that's a question of where to handle communication and parallelism.
15:13:08 <zhipeng> if we choose the latter one, does it mean the controller side will be sometimes out of sync with the agents ?
15:14:11 <jkilpatr> it would make it a concern, it would be possible to prevent if we tried.
15:14:32 <jkilpatr> on the other hand if the agent's don't store anything themselves then we can lose accelerator state if anything disrupts the agent.
15:17:21 <zhipeng> how could we prevent the out-of-sync problem ? using heartbeat ?
15:17:44 <crushil> I would prefer the latter option> And I believe heartbeat can be an option
15:18:02 <jkilpatr> just make sure both sides refresh info from the DB when it might have changed. So proper cache invalidation.
15:18:07 <jkilpatr> and probably more DB load
15:18:33 <crushil> But the agent should definitely keep its database updated
15:19:06 <zhipeng> but in real deployment that always hard to manage, I always hear complaints from our product team about the heartbeat
15:19:21 <zhipeng> cache invalidation is prong to go wrong
15:19:58 <zhipeng> is there a way for us to simplify so that we could avoid the common shortcomings
15:20:25 <jkilpatr> so agent <-> rabbit <-> api end point <-> db
15:20:54 <jkilpatr> that way we don't need invalidation because the end point is the only one that interacts with the DB
15:22:18 <crushil> Isn't that overkill though? Going through hoops to update db?
15:22:35 <crushil> Can
15:23:05 <crushil> Can't we have  agent maintain a local copy of the db and agent keeps updating it?
15:23:52 <zhipeng> but if say we have 3 compute node that all have FPGA based iNIC on it
15:24:01 <jkilpatr> why does the agent need info about all accelerators? it only needs to manage one machine (there are many agents on many compute nodes, but each one only needs to care about it's own scope) I guess they could have a mini db each but what good is that.
15:24:07 <zhipeng> if the DB is updated only via local copy
15:24:52 <zhipeng> i'm terminating my thoughts here~~
15:25:04 <crushil> Well, I feel that the agent needs to have a more efficient way of updating the db
15:26:24 <jkilpatr> maybe we just have the agent update the db for things that are longer term? Like a new accelerator?
15:26:52 <zhipeng> how about the config changes
15:27:03 <zhipeng> that would be part of the life cycle mgmt usually
15:28:09 <zhipeng> or we have two seperate DBs (sorta), the main one resides with the control node
15:28:13 <jkilpatr> I guess agents need to read from the db for config updates, we don't want to obligate the operator to ssh into every node to change a config value.
15:28:28 <zhipeng> like jkilpatr it only get updated on major events
15:28:38 <zhipeng> and agent locally manages a cache
15:28:49 <zhipeng> for the more constant changes ?
15:29:40 <crushil> That seems fine, although how would you define major events?
15:29:49 <zhipeng> the info in the locan cache should be of no concern of either other compute nodes or the control node
15:30:23 <zhipeng> like when accelerator attached, created, deleted
15:30:47 <zhipeng> and for small changes
15:30:57 <zhipeng> agent could advertise to the api-endpoint
15:31:16 <zhipeng> as a bulk of its local cache info, if it deemed important by the control node
15:31:44 <zhipeng> so what I'm think is that user could explicitly define events that they care
15:31:52 <zhipeng> in the cyborg.conf for example
15:32:13 <zhipeng> and those events will be aggregated and reported by the agents to the api-end-point
15:32:18 * _gryf waves late
15:32:21 <crushil> So, you're suggesting maintaining a central db and a local cached db. The central db would be updated on all events and cached db will be updated only on select events?
15:32:26 <zhipeng> alongside the usual big event
15:32:47 <ttk2[m]> Jkilpatr here had to go mobile.
15:33:22 <zhipeng> central/main DB will get updated on all the major events (attach/detach/delete/create) and smaller events that user specified
15:33:46 <zhipeng> the local copy is just cache for all the local updates, refreshed constantly
15:33:55 <zhipeng> that is my thinking
15:34:08 <_gryf> zhipeng, +1 for the cache instead of full blown db
15:34:37 <ttk2[m]> Sounds like a workable compromise.
15:34:56 <crushil> that sounds ok to me
15:35:36 <zhipeng> great :) then the DB design would remain the same, and there will be an additional design on the local cache for the agent
15:35:57 <zhipeng> is this ttk2[m] or crushil's ?
15:36:44 <ttk2[m]> I'll do the agent cache.
15:37:14 <zhipeng> fantastic :)
15:37:15 <crushil> cool
15:37:30 <zhipeng> #action Justin to update with the agent cache design
15:37:46 <ttk2[m]> Should I put that in the same blueprint?
15:37:59 <zhipeng> i think you could do that
15:38:12 <zhipeng> it should not be a big deal
15:40:33 <zhipeng> okey any other topics ?
15:40:50 <zhipeng> any questions on Roman's BP re the interaction between Nova and Cyborg
15:43:19 <crushil> nothing from me
15:44:02 <zhipeng> well let's keep review the spec anyways :P
15:44:10 <zhipeng> #topic AoB
15:44:17 <zhipeng> any other buisness ?
15:44:36 <ttk2[m]> Do we want to set a blueprint finish goal?
15:44:52 <zhipeng> ttk[m] what do you mean ? for a deadline date ?
15:44:58 <_gryf> ttk2[m], like a deadline or smth?
15:45:14 <ttk2[m]> Yes. Just something to keep things moving.
15:45:42 <_gryf> in other projects that depends on how big is the core reviewers pipe
15:46:10 <zhipeng> i would love to set a deadline lol
15:46:28 <_gryf> in nova there are deadlines, so they can be sure, that core reviewers spend quality time on desired and accepted features
15:46:35 <zhipeng> what we called in China is the lazy cancer kicked in all the time :P
15:47:03 <_gryf> otoh, I've seen projects, (smaller ones), that they obey only openstack release schedule
15:47:45 <zhipeng> I think setting a deadline/milestone would be a good idea
15:47:54 <_gryf> that fine.
15:47:56 <zhipeng> so that everyone on the same page and pace
15:48:05 <zhipeng> others ?
15:48:20 <ttk2[m]> As long as it's not too tight.
15:48:22 <crushil> So, for current specs, should we say EOR should be the deadline?
15:48:28 <zhipeng> (if we have a milestone then maybe we need a sprint someday ...)
15:48:41 <zhipeng> crushil for specs that would be too relax
15:48:49 <zhipeng> we need to get code in for Pike
15:48:59 <crushil> I meant implementation of the specs
15:49:07 <zhipeng> yes that'd agre
15:49:18 <zhipeng> I would agree
15:49:23 <ttk2[m]> Let's just put a deadline on specs for now.
15:49:29 <zhipeng> Apr 15th ?
15:49:46 <ttk2[m]> That's what 3 meetings?
15:49:57 <crushil> lol
15:50:02 <zhipeng> sounds about right
15:50:10 <zhipeng> too relax or too tight ? lol
15:50:40 <crushil> I think for most of the specs, the owners know what needs to be added
15:50:57 <crushil> So, it is slightly on the more relaxed side
15:51:01 <zhipeng> it is the reviews that takes time
15:51:02 * _gryf seen blueprints which lasts several months, before being accepted ;]
15:51:25 <crushil> Fair
15:51:25 <zhipeng> _gryf we are not that famous project so we could move faster XD
15:51:31 <_gryf> which obviously means slipping to the next release :>
15:51:40 <crushil> Let's do April 15th then
15:51:52 <ttk2[m]> +1
15:52:10 <zhipeng> #vote Apr 15th as the deadline for specs
15:52:21 <zhipeng> okey wrong cmd ...
15:52:32 <zhipeng> but anyway we should all agree on this
15:52:46 <zhipeng> #info Apr 15th for the first milestone on spec freeze
15:53:01 <zhipeng> #agreed Apr 15th for the first milestone on spec freeze
15:53:31 <zhipeng> great discussions folks
15:53:36 <zhipeng> any other buisness ?
15:54:36 <zhipengh[m]> My Chromebook just died..
15:54:58 <crushil> zhipengh[m], Get a real computer. :P
15:55:11 <zhipengh[m]> lol give me money
15:58:18 <zhipengh[m]> Okey if no other biz, let me try to end meeting using this handle... Not sure it will work
15:58:27 <zhipengh[m]> #endmeeting
15:59:14 <zhipengh[m]> Bummer...
15:59:48 <_gryf> #endmeeting
15:59:56 <_gryf> lol
17:51:01 <zhipeng> the longest meeting ever...
17:51:05 <zhipeng> #endmeeting