#openstack-meeting log

15:00:12 <n0ano> #startmeeting gantt
15:00:13 <openstack> Meeting started Tue Jun  3 15:00:12 2014 UTC and is due to finish in 60 minutes.  The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:14 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:17 <openstack> The meeting name has been set to 'gantt'
15:00:20 <bauzas> \o
15:00:25 <mspreitz> o/
15:00:27 <n0ano> anyone here want to talk abou the scheduler?
15:00:43 <toan-tran> \o/
15:01:05 * n0ano wonders how many was you can combine / and \
15:01:26 <bauzas> I'm left-handed :)
15:01:41 <n0ano> bauzas, you & my wife :-)
15:01:41 <bauzas> so \o is better than o/
15:01:44 * mspreitz /*\
15:01:54 * mspreitz /o\
15:02:48 <n0ano> well, why don't we get started (all the important people are here)
15:02:51 * johnthetubaguy is lurking, but on a call
15:02:58 <n0ano> #topic forklift
15:03:12 <n0ano> mainly status I think, anything to report bauzas ?
15:03:53 <bauzas> sorry, was mailing
15:04:02 <bauzas> so, yes, big status
15:04:20 <bauzas> progress so far on implementing the sched-lib
15:04:28 <bauzas> https://review.openstack.org/82778
15:04:43 <bauzas> (that's eating most of my nights now, as juno-1 is next week)
15:05:22 <bauzas> I'm about delivering a new patchset (hoping to land by tomorrow) taking in account all comments
15:05:47 <bauzas> I spent most of my time this week on 2 big concerns
15:06:13 <bauzas> #1 : we're not using objects in RT, so I had to trick some things for using objects with sched-lib
15:06:35 <bauzas> that requires some refactoring effort on that patch
15:06:54 <mspreitz> "RT" ?
15:07:01 <bauzas> #2 : I raised the concern that IMHO, logic should stay in the Sched-manager
15:07:10 <bauzas> mspreitz: RT : ResourceTracker, my bad
15:07:36 <bauzas> about #2, a dependent patch has been landed by yesterday
15:07:48 <bauzas> https://review.openstack.org/97232
15:08:05 <bauzas> your comments are welcome on that patch
15:08:33 <n0ano> #action everyone to review  https://review.openstack.org/97232
15:08:37 <bauzas> it will be updated tomorrow with the updates from https://review.openstack.org/82778 (they are dependent)
15:08:52 <bauzas> well the most important thing is architectural
15:09:02 <bauzas> I mean, I ported the logic to the sched manager
15:09:20 <toan-tran> bauzas: on https://review.openstack.org/82778, client.py line 55
15:09:27 <bauzas> but with that 97232 patch, that means that now compute nodes are now sending updates to scheduler
15:09:28 <toan-tran> I put a comment there
15:09:38 <toan-tran> could you take a quick look please?
15:09:53 <toan-tran> https://review.openstack.org/#/c/82778/13/nova/scheduler/client.py, line 55
15:10:10 <n0ano> bauzas, in re updates to sched - this is in addition to the compute nodes updating the DB?
15:10:11 <bauzas> toan-tran: yay, saw your comment
15:10:34 <bauzas> toan-tran: I'm sorry, but no this is service_name
15:10:54 <bauzas> toan-tran: you're getting a service with possibly multiple nodes
15:11:18 <bauzas> toan-tran: but wait my new patchset, the logic will be rewritten so that it will be clearer to read
15:11:35 <bauzas> n0ano: not exactly
15:11:37 <toan-tran> bauzas: thanks, it's rather confusing the variables' name
15:12:06 <toan-tran> and please if you can add some description on compute_nodes' structure, that would be greate
15:12:09 <bauzas> n0ano: the problem is that computes are using conductor to update DB for compute_nodes
15:12:47 <bauzas> n0ano: even if we externalize the call to the conductor into a separate library, that still means that computes literally update compute_nodes
15:13:14 <bauzas> n0ano: it should only place a call to an API to the sched
15:13:26 <bauzas> n0ano: so the sched would update its own DB
15:13:45 <bauzas> n0ano: but that means now that all RT updates will go thru sched
15:13:50 <mspreitz> I thought no-db-scheduler was in the future
15:13:56 <bauzas> n0ano: that's a possible bottleneck
15:14:01 <n0ano> which is the way compute nodes used to work (the more things change the more they stay the same)
15:14:10 <bauzas> mspreitz: that's not related to no-db work
15:14:26 <mspreitz> but it sounds like it..?
15:14:43 <bauzas> no-db work is about having a no-db backend for scheduler
15:14:54 <bauzas> but the blueprint is confusing
15:15:11 <bauzas> on my side, I'm not changing how we store things
15:15:18 <n0ano> mspreitz, I think the point is compute sends update to the sched, where sched stores that info is upto the sched, db for now, memory when no-db is in
15:15:31 <bauzas> I'm just making sure that only sched holds the compute_nodes table
15:15:40 <bauzas> n0ano: +1
15:16:15 <canaima172423> yo no hablo ingles
15:16:17 <bauzas> anyway, if we consider Gantt, this is a long-term feature
15:16:30 <canaima172423> guah
15:16:34 <bauzas> as RT will need to call Gantt for updating its state
15:17:03 <bauzas> so anyway, RT will place an external call
15:17:20 <bauzas> the problem is that it requires Gantt (or the sched now) to be robust enough
15:17:22 <n0ano> I agree, I think compute status updates should go to the sched and then let sched decide the best way to store the info so this is good.
15:18:14 <toan-tran> n0ano: this is rather heavy for Gantt
15:18:20 <bauzas> so, to sum up the most important work is on https://review.openstack.org/82778
15:18:38 <toan-tran> should we have some synchronizer to handle DB ? like no-db
15:18:54 <bauzas> and reviews are welcome on https://review.openstack.org/97232 and https://review.openstack.org/89893
15:18:54 <n0ano> toan-tran, maybe but I've just created a BP ( https://blueprints.launchpad.net/nova/+spec/on-demand-compute-update ) to change the way we send updates...
15:19:12 <bauzas> n0ano: that's a good thing
15:19:19 <n0ano> change from periodic to on demand, I thought someone was already working on this but I guess not so I'll start it
15:19:32 <bauzas> mmm, that was about no-db discussion
15:19:36 <bauzas> IIRC
15:19:56 <toan-tran> n0ano: +1
15:20:00 <bauzas> n0ano: ping us the nova-spec draft once you're done with
15:20:17 <bauzas> n0ano: so I'll be able to review it
15:20:23 <n0ano> status updates are orthogonal to no-db, I think the no-db spec got a little overly complex
15:20:28 <toan-tran> n0ano: could you do some analysis on performance ? comparison with current method
15:20:42 <n0ano> bauzas, sure, the BP is there, I have to do the details for the git repo
15:20:47 <toan-tran> some graph would be nice :)
15:21:05 <canaima172423> hello how are;-)
15:21:09 <bauzas> n0ano: I subscribed to the BP, so I'll get the patch link
15:21:10 <n0ano> toan-tran, hard for me, I have like a max of a 3 node system :-(  I'm not a bluehost
15:21:49 <toan-tran> n0ano: well, we don't need a real system for that
15:22:06 <toan-tran> ok maybe I will make some Matlab graph so see
15:22:17 <bauzas> toan-tran: your ideas are welcome
15:22:17 <canaima172423> estup
15:22:50 <bauzas> canaima172423: we're in the middle of a meeting, please join #openstack-101 if you want to talk about Openstack
15:23:00 <n0ano> toan-tran, any suggestions on how to get some scaling date from a small system would be welcom
15:23:18 <n0ano> bauzas, I tried to talk to him on a private dialog but he seems to be ignoring me
15:23:46 <bauzas> I just remind you all that juno-1 is next week
15:24:09 <mspreitz> And we're having a Nova bug day today?
15:24:15 <bauzas> so, if you want to vote on having sched-lib to be merged by juno-1, please put some reviews :)
15:24:19 <n0ano> bauzas,  anyway, sounds like you have the forklift well in had (baring some reviews) any other help you need?
15:24:26 <n0ano> s/had/hand
15:24:58 <bauzas> n0ano: as said last week, I'll probably require some help for implementing https://review.openstack.org/89893
15:25:09 <bauzas> it's targeted for juno-3
15:25:47 <bauzas> btw, I'll travelling next week
15:25:52 <bauzas> s/be
15:26:07 <n0ano> I have some colleges (sp?) in China, let me see if I can get someone to work on that
15:26:08 <bauzas> so I won't be able to attend the meeting (:
15:26:18 <bauzas> :(
15:26:38 <n0ano> bauzas, NP but if you can send me a quick email update before hand that would be good
15:26:39 <bauzas> and Monday is bank holiday in France
15:26:58 <bauzas> n0ano: will do - don't hesitate to ping me by email ;)
15:26:58 <n0ano> so, we don't work for a bank :-)
15:27:24 * n0ano favorite holiday is Tomb Cleaning Day in China :-)
15:27:32 <bauzas> well, I don't know the word, I would say 'legal' holiday then :)
15:27:49 <bauzas> anyway, I'm done
15:27:55 <n0ano> bauzas, no, your were correct, I was just making a pun
15:28:01 <bauzas> any other questions about the forklift?
15:28:06 <toan-tran> bauzas: well, depending on company, mine still works :)
15:28:06 <n0ano> bauzas, tnx, good work
15:28:07 <bauzas> n0ano: :D
15:28:28 <n0ano> #action n0ano to get someone to work on https://review.openstack.org/89893
15:28:36 <n0ano> moving on
15:28:47 <n0ano> #topic no-db scheduler
15:28:50 <bauzas> toan-tran: don't leave me explain Pentecost Day in France and its paperwork-related stuff :)
15:28:53 <n0ano> YorikSar, you there
15:29:04 <YorikSar> Yea, hi
15:29:17 <YorikSar> I've seen a lot of comments to my spec
15:29:22 <bauzas> hi YorikSar :)
15:29:28 <bauzas> YorikSar: indeed :)
15:29:31 <n0ano> indeed, we finally got moving on that
15:29:41 <YorikSar> Although I never found time to answer or address them.
15:30:02 <YorikSar> I guess I'll be working on that this week.
15:30:15 <bauzas> YorikSar: cool let us know
15:30:29 <YorikSar> You all will know in Gerrit's emails ;)
15:30:48 <bauzas> ;)
15:30:48 <mspreitz> BTW, for the rest of us who do not know Kafka, is there a short sharp summary of what it is and why the advocate thinks it is relevant?
15:30:53 <YorikSar> (in? from? through?)
15:31:32 <toan-tran> YorikSar: in john garbutt's comment
15:31:40 <bauzas> mspreitz: I'm sorry, maybe johnthetubaguy can comment it ?
15:31:45 <YorikSar> mspreitz: I honestly didn't understand how it could fit in our scheme.
15:31:54 <toan-tran> http://kafka.apache.org/
15:32:20 <bauzas> YorikSar: I haven't said the word tooz :)
15:32:21 <johnthetubaguy> just seemed a lot like the mem cache queue of updates, but already implemented
15:32:53 <johnthetubaguy> the feed back in the summt was it sounds like we are re-inventing a DB
15:33:06 <bauzas> YorikSar: there is also https://github.com/stackforge/tooz
15:33:13 <n0ano> johnthetubaguy, +1 (that's what I heard at the summit also)
15:33:19 <bauzas> +2
15:33:30 <YorikSar> johnthetubaguy: That's very unfortunate outcoe. I wish I could be there to avoid such confusion.
15:34:33 <mspreitz> OK, I'll agree on the question of Kafka.  The proposed design is about getting updates to schedulers, it seems to be working around some presumed problem with fanout
15:34:34 <n0ano> YorikSar, maybe a focused email to the dev list to address this issue from you would be good
15:34:46 <YorikSar> bauzas: tooz seems to be not about delivering data from tons of servers to some number of recepients.
15:35:06 <bauzas> YorikSar: indeed, it's only about election, you're right
15:35:25 <bauzas> YorikSar: I was thinking about it for the scheduler
15:35:35 <mspreitz> I mean, "why not Kafka" is a good question
15:35:39 <bauzas> YorikSar: but my mind slipped a little bity
15:35:50 <mspreitz> I wouldn't mind background on why oslo's fanout is not good enough
15:35:51 <YorikSar> I'll take a closer look at Kafka, yes. But I feel like it won't be good for our case.
15:36:01 <bauzas> well, the problem is about the spec with regards to the timeline
15:36:49 <YorikSar> Synchronizer provides not only better delivery pace but also some semi-persistence for "subscribers" that just came online or were sleeping too long.
15:36:58 <bauzas> I mean, that's a big change, and we're only having 2 months for juno
15:37:24 <YorikSar> bauzas: That's not a big change...
15:37:40 <bauzas> YorikSar: well, you introduce many concepts here :)
15:37:52 <n0ano> bauzas, if the backend is selectable between the current DB and the new scheme then the change isn't that disruptive
15:37:59 <bauzas> YorikSar: and some of them are disruptive, see my comments in the spec :)
15:38:12 * YorikSar wishes to hide this work behind some other name so that everybody would forget what've been said about it during the whole year of dreaming the desing...
15:38:58 <n0ano> YorikSar, name change probably not an option but I understand you :-)
15:39:06 <bauzas> YorikSar: well, the problem is that the spec is not that clear, I'm sorry :(
15:39:35 <bauzas> YorikSar: I mean, it seems some points are overlapping other developments
15:40:03 <YorikSar> I think I'll try to convince people in spec first. And then I'll probably start some ML topic so that community could follow current state of things with this bp
15:40:09 <mspreitz> What is wrong with oslo's fanout messaging, and why would the proposed backend do the job better?
15:40:12 <bauzas> YorikSar: and you're proposing to rewrite the whole SQLA backend
15:40:41 <bauzas> mspreitz: IIRC, fanout has been banned a long time ago
15:40:49 <mspreitz> bauzas: why?
15:41:06 <bauzas> mspreitz: lemme find the thread :)
15:41:19 <mspreitz> (not an idle question, we need to know we are not re-producing the same problems)
15:41:20 <YorikSar> bauzas: Well... It's a backend, right? This work just replaces a piece of wiring from compute nodes to the scheduler itself.
15:41:47 <n0ano> we discussed fan out a long time agao but I don't think there was a definitive result, there are still proponents & opponents of it
15:42:29 <YorikSar> mspreitz: Imagine this. Currently we have 1 message for every node every 1 min. With fanout that numbet will get multiplied by the number of schedulers/
15:42:51 <YorikSar> mspreitz: AFAIC that had been placing too much load to MQ.
15:43:11 <mspreitz> YorikSar: the proposed design does as much messaging in total
15:43:43 <mspreitz> and with several schedulers, the backend is sending most of it
15:43:44 <n0ano> YorikSar, note my new BP ( https://blueprints.launchpad.net/nova/+spec/on-demand-compute-update ), change the 1 min update to on demand and a lot of that load goes away
15:44:01 <YorikSar> mspreitz: This desing keeps numeber of messages the same (unless you plug compute nodes directly to synchronizer).
15:44:12 <mspreitz> what is same as what
15:44:14 <mspreitz> ?
15:44:19 <bauzas> mspreitz: there we go : https://blueprints.launchpad.net/nova/+spec/no-compute-fanout-to-scheduler
15:44:32 <YorikSar> 1 message per node per minute
15:45:34 <mspreitz> Both new design and oslo fanout send O((num schedulers) * (compute node update rate)) messages from backend / through message broker
15:45:51 <YorikSar> n0ano: I thought the source of node state is not that static. E.g. you can add some RAM to compute node and it'll show up on periodic update.
15:45:55 <mspreitz> s/messages/message content/
15:46:36 <n0ano> YorikSar, you're talking about hot add of mem - that's just another (unlikely) event that causes an update
15:46:52 <bauzas> anyway, I don't think the main discussion about no-db is here :)
15:46:53 <YorikSar> mspreitz: No... Schedulers retrieve new records from backend in packs while compute nodes push them there with the same pace.
15:47:09 <mspreitz> that's why I s/messages/message content/
15:48:00 <mspreitz> How big is a compute node update?  n0ano's question is relevant here
15:48:19 <YorikSar> n0ano: Ok, I remember I had an example of change that was triggered independently from nova-compute but I don't remember what it was.
15:48:36 <bauzas> I'm just having pdb running
15:48:48 <bauzas> don't ask me to calculate the len
15:48:49 <n0ano> mspreitz, last I saw the log message it was about 20 lines of 80 characters
15:48:49 <bauzas> :)
15:49:30 <bauzas> 1226 chars
15:49:34 <bauzas> :)
15:49:46 <bauzas> well, that depends of course
15:50:07 <n0ano> bauzas, pretty close to my 1600 estimate and yes, it varies a little, but not that much
15:50:17 <bauzas> cpu_info is the most greedy
15:50:45 <bauzas> and the bad is that it's very static
15:50:53 <bauzas> you don't change CPUs every day
15:51:01 * toan-tran wonders how close is 1600 to 1226
15:51:08 <n0ano> bauzas, and the most static, we could change the update into two type (static/dynamic) if the size is a big problem.
15:51:08 <YorikSar> bauzas: Depends on your hobby :)
15:51:39 <bauzas> YorikSar: :)
15:51:39 <n0ano> toan-tran, within 1 order of magnitude, WFM :-)
15:52:08 <toan-tran> n0ano: now I understand when you said "we don't work for the bank" :)
15:52:23 <n0ano> toan-tran, touche :-)
15:52:25 <bauzas> guys, I know that hyper-v people cancelled the next meeting, but is it reasonable to chat about it while we're only havnig 8 mnis left ? :D
15:53:03 <YorikSar> I guess we can finish no-db topic here. We'll continue the discussion in the spec draft.
15:53:09 <n0ano> bauzas, I get fried after 60 min. anyway, I'd prefer to have YorikSar update his spec and send out the emails and then discuss later
15:53:24 <bauzas> n0ano: strong approval here
15:54:01 <n0ano> #action YorikSar to update the spec and start email thread on the dev list
15:54:05 <bauzas> but that RPC payload discussion is really passionating
15:54:50 <n0ano> bauzas, I don't mind, strong opinions are good as long as no one gets intimidated
15:55:09 <n0ano> let's move on
15:55:12 <n0ano> #topic opens
15:55:19 <n0ano> anyone have anything new to raise today?
15:55:22 <bauzas> yey, I mean I would love to discuss about it still
15:55:39 <bauzas> 5 mins left :)
15:55:57 <toan-tran> well, I intended to talk about my new patch: https://review.openstack.org/#/c/61386/
15:56:12 <bauzas> just a reminder, won't be avaiable from mon to thurs next week
15:56:20 <toan-tran> but I don't think we have time left, so maybe next time :)
15:56:37 <bauzas> toan-tran: I briefly readed your spec
15:56:50 <toan-tran> it's on my demo at Atlanta
15:56:51 <n0ano> toan-tran, sure, I'll queue it up for next week (doesn't look like it's getting much love so far)
15:56:59 <bauzas> toan-tran: very interesting, but I think we need to define a clear path for this
15:57:11 <toan-tran> bauzas: +1
15:57:24 <bauzas> toan-tran: and I would love to help you contributing on this
15:57:30 <toan-tran> in fact I submitted it some months ago
15:57:35 <n0ano> #action n0ano to add https://review.openstack.org/#/c/61386/ to agenda for next week
15:57:47 <toan-tran> and after Atlanta I got really good talk with Jay Lau
15:58:04 <toan-tran> his Tetris is what I need for complete my schema
15:58:04 <bauzas> toan-tran: yey, I think that Jay and I are sharing same views
15:58:06 <toan-tran> :)
15:58:23 <bauzas> toan-tran: but that's a big baby
15:58:38 <toan-tran> bauzas: here is my presentation: https://docs.google.com/file/d/0B598PxJUvPrwcWZlaUlaOW11enM/edit?
15:58:47 <toan-tran> page 20 is my vision on the whole scheduling
15:58:53 <bauzas> toan-tran: even bigger than Gantt IMHO :)
15:58:59 <toan-tran> and Tetris fits right in Service Manager
15:59:45 <bauzas> toan-tran: based on last Summit, I fear that it will be too big for Nova
15:59:54 <bauzas> toan-tran: but that's a good fit for Gantt
16:00:11 <toan-tran> bauzas: yeah, we expect Gantt will be part of it :D
16:00:20 <toan-tran> so that will be Gantt + Tetris + Congress
16:00:49 <bauzas> I was thinking that GTC was related to fast cars :)
16:00:56 <toan-tran> but the first step is small & simple, to make an policy-based engine that can fit in nova-scheduler or gantt
16:01:07 <toan-tran> bauzas: +)
16:01:13 <n0ano> top of the hour guys, tnx, good discussion, we'll talk on email and be here next week.
16:01:17 <bauzas> :)
16:01:19 <n0ano> #endmeeting