14:03:40 #startmeeting telcowg 14:03:40 :-) 14:03:41 Meeting started Wed Jan 14 14:03:40 2015 UTC and is due to finish in 60 minutes. The chair is sgordon. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:03:42 :) 14:03:43 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:03:44 hi 14:03:45 let's try that again :) 14:03:46 The meeting name has been set to 'telcowg' 14:03:49 hello 14:03:49 #topic roll call 14:03:54 hi 14:03:55 #link https://etherpad.openstack.org/p/nfv-meeting-agenda 14:04:25 #topic action items from last week 14:04:39 #info amitry to cross reference ops mid-cycle signups with telco wg participants to determine crossover 14:04:51 i dont believe registration for the ops summit has opened yet 14:04:54 signup opening this week 14:04:55 correct 14:04:56 so not much to be done on this yet 14:05:02 will carry it over 14:05:06 #action amitry to cross reference ops mid-cycle signups with telco wg participants to determine crossover 14:05:54 for those who were still on vacation etc. last week there is a general openstack operators midcycle meetup being held on march 9 and 10 host by amitry and comcast in philli 14:06:16 we are considering whether enough people are interested in or planning to attend that we could have a face to face session on telco use cases there 14:06:21 https://etherpad.openstack.org/p/PHL-ops-meetup 14:06:29 #link https://etherpad.openstack.org/p/PHL-ops-meetup 14:06:37 any questions on that? 14:07:01 the etherpad contains all available detail at this time - expect to see registration details on the openstack-operators list real soon now 14:07:23 #info steveg was to send out a brief survey monkey to the list and we select one to review at next week's meeting in detail 14:07:49 so this happened, i only got a handful of responses (6) so we can probably re-evaluate ordering again in the future 14:07:58 but the results were: 14:07:59 1st: Virtual IMS 14:07:59 2nd (tie): VPN Instantiation / Access to physical network resources 14:07:59 4th: Security Segregation 14:07:59 5th: Session border controller 14:08:18 i believe the vIMS use case was submitted by cloudon 14:08:28 yup, that's right 14:08:31 cloudon, are you available if we want to try deep dive on this in the meeting today 14:08:39 sure 14:08:50 #topic virtual IMS use case discussion 14:08:54 hi 14:08:55 #link https://wiki.openstack.org/wiki/TelcoWorkingGroup/UseCases#Virtual_IMS_Core 14:08:55 hi, what about service chaining? 14:09:17 vks: we have a draft here https://etherpad.openstack.org/p/kKIqu2ipN6 14:09:19 vks, we're going off the use cases that have been submitted to the wiki 14:09:26 there is a broader effort around service chaining 14:09:34 with discussion happening on the mailing list 14:09:53 ybabenko++ 14:10:18 sgordon: let us discuss our draft here today (comments, critique, etc) and later on i will put it into wiki 14:10:53 saw it, will it going to fall in line with gbp. 14:11:01 let's focus on the vIMS case first 14:11:07 ok 14:11:11 sgordon: +1 14:11:13 if we get time in other discussion we can loop back on service chaining 14:11:22 go ahead 14:11:32 ok 14:11:39 sgordon: it looks like to me that for such an VNF as IMS we need a serious HA setup for openstack 14:11:46 so cloudon you had already broken out some requirements in this one 14:11:52 with main constraints being in HA 14:11:56 does something like this exist already today in form of verified blueprint 14:12:06 ybabenko, not quite 14:12:12 exactly 14:12:22 in particular that second requirement about affinity/anti-affinity groups being nested 14:13:11 you can possibly force this by combining groups with host aggregate/az assignments 14:13:19 to mimic the same type of setup 14:13:31 the broader issue here I was trying to get at was how to represent the affinity requirements for services deployed as an N+k pool, with N large 14:13:32 ybabenko: We should be clear to differentiate between HA deployment/config of the vIMS app and OpenStack HA from the controller perspective 14:13:36 #info implemented as a series of N+k compute pools; meeting a given SLA requires being able to limit the impact of a single host failure 14:13:43 #info potentially a scheduler gap here: affinity/anti-affinity can be expressed pair-wise between VMs, which is sufficient for a 1:1 active/passive architecture, but an N+k pool needs a concept equivalent to "group anti-affinity" i.e. allowing the NFV orchestrator to assign each VM in a pool to one of X buckets, and requesting OpenStack to ensure no single host failure can affect more than one bucket 14:13:58 adrian-hoban, here we're talking about for the app itself 14:14:36 crudely: don't want too many of the service's VMs on the same host, but for perf reasons want them "nearby" for some definition of "nearby" 14:14:41 adrian-hoban: i am meaning OpenStack HA setup (core services like keyston) as IMS is normally deployed as a multi site VNF 14:14:52 sgordon: do we have doc on affinity/anti-affinity stuff in place 14:14:54 ? 14:15:17 are you sure IMS shall be assume as multi site? 14:15:30 a deployed vIMS will be hosted in several DC's 14:15:31 vks, there is some coverage in the nova scheduler documentation 14:15:37 ok 14:15:52 it's fairly minimal but so is the functionality today 14:15:53 imendels: yes 14:16:23 ybabenko: sgordon: Seems like you guys are talking about different aspects of HA then... 14:16:24 I have some references claiming different. Though I agree multi site also make sense 14:16:59 yes, in this use case was concentrating solely on HA at the app level, not the platform level 14:17:18 imendels: In the NFV architecture, I would see the multi-site aspects as being in the scope of NFVO 14:17:43 adrian-hoban, i was really just regurgitating what we have in the wiki so people understand the context 14:17:54 (and we have a record in meetbot of what we were referring to) 14:17:59 adrian: right but can we assume that all apps (IMS in this case) will run multi site becaue of it? 14:18:05 adrian-hoban: +1; and further NFV explicitly allows deployment of a VNF across multiple underlying cloud platforms (NFVIs) 14:18:12 adrian-hoban, : +1 14:18:25 my point is: for such critical NFV as vIMS we need a reference HA design for OpenStack services 14:18:33 agree 14:18:49 imendels: I think we can assume that it is a possibility but not required for all apps 14:18:57 agree 14:19:07 ybabenko: +1 14:19:19 Need of HA mainly depends on application elasticity 14:19:38 #info Require a reference HA design for OpenStack services for critical NFV such as vIMS 14:19:42 gmatefi1, not only elastcity 14:19:46 going to agree with adrian-hoban here: provide the functionalityt for things like multi-site, but don't make it a requirement. Remember that not everyone will deploy an app the same way 14:19:53 gmatefil: not if you have an SLA to meet... 14:19:55 to me though there is a broader use case discussion here though 14:19:58 for apps that are dynamically scaling, controller HA is a must-have as part of real-time operation 14:20:03 sgordon: who is able to address this in OS community? what is the right entry point for that? 14:20:12 in that there is a separate need to drill down on what multi-site means for telco 14:20:29 in other words we need to assume that not all apps are ready for it, or not. But it's a fundamental assumption we must have 14:20:35 ybabenko, there is currently an effort underway to reinvigorate the HA documentation - maybe in the scope of that effort 14:20:46 they need helpers though 14:20:54 sgordon: +1 can you link it? 14:21:24 sgordon: +1 .. so are we talking about multi-region setup? or different OS clusters... 14:21:26 here is the *current* scope of the guide: 14:21:29 #link http://docs.openstack.org/high-availability-guide/content/index.html 14:21:33 mkoderer, well this is the thing 14:21:36 sgordon: this is what I was getting at. mutli-site in openstack effectively means that a VNF will exist in more than one "Region" but perhaps a telco may deploy one large region in multiple DCs. It's possible to schedule into different regions using the neturon network as well 14:21:38 sgordon: ;) 14:21:41 sgordon, NFV requirement stress much on HA 14:21:42 their focus currently is single site HA 14:21:48 if people want to expand that scope 14:21:49 better HA docn would help but there are stil some fundamental issues such as seamless upgrade from one OS release to another 14:21:52 they need to get involved 14:21:53 ;) 14:22:08 trying to find the mailing list post(s) 14:22:36 for now single site HA would be fine 14:22:44 sgordon: so what are we going to do now.. listing gaps in OS? 14:22:48 that doc is more about OS control than anything else, no? 14:22:55 sgordon: thanks. We are familiar with that and have a strong feeling that a lot still need to happen in oder to be able to deploy something like vIMS in HA OS 14:23:36 #link http://lists.openstack.org/pipermail/openstack-operators/2014-August/004987.html 14:23:41 mkoderer: should be do gap analysis and address/list missing points 14:24:03 ybabenko, yes - but again if nobody is speaking to the team working on it about that 14:24:06 they arent going to cover it 14:24:07 :) 14:24:09 I'd like to suggest what we need to agree on first is what OpenStack should provide from an API perspective to support application HA configuration 14:24:38 sgordon: who will address this? 14:25:08 adrian-hoban: +1 14:25:09 And by that I mean HA deployment configuration (not config of the app itself) 14:25:13 sgordon: can we add all the gaps the we find during discussion to the use case 14:25:35 it's a wiki, people can add anything they want :) 14:25:42 and then start to find related blueprints 14:25:51 indeed 14:26:00 and open specs if needed 14:26:13 ok 14:28:25 so, a key question to adrian-hoban's point - what do we see as the 'API' here 14:28:38 given that e.g. server groups are implemented via scheduler hints 14:28:54 (albeit with some API calls for initial group creation) 14:29:41 sgordon: I think the Heat APIs are probably the closest in scope to parts of what is required of NFVO functionality. Perhaps we start there? 14:30:02 adrian-hoban: +1 14:30:17 are we still on vIMS? 14:30:20 I am confused 14:30:24 yes 14:30:29 doesn't that assume an NFVO would use Heat? not sure that's the case 14:30:47 adrian-hoban, do u really think heat-apis fit NFV case? 14:30:50 cloudon: it might be a requirement if you're going to need coordination features 14:31:27 vIMS -> need for OpenStack HA. Heat? We can use heat already today. But heat does not support multi-site configuration. How to address this? 14:31:37 vks: not yet.. but we can try to change that 14:31:52 ybabenko, actually it does depending on what you mean by multi-site 14:31:52 aveiga: sorry - not sure I follow - a core part of an NFVO is co-ordination? 14:31:58 e.g. multi-region support was recently added 14:32:23 #link https://blueprints.launchpad.net/heat/+spec/multi-region-support 14:32:27 are we sticking for one site or looking for multi site? 14:32:34 but i think that is still getting ahead of ourselves 14:32:40 cloudon: if you want to ensure that your app VMs are landing where you want them and automatically rebuilt/scaled to meet your HA and load capabilities, then yes 14:32:41 i think stick to the requirements within a single site 14:32:56 as i said earlier multi-site for telco should be analysed as a separate use case imo 14:32:58 sgordon, +1 14:32:59 vks: I'm not stating that. Just that Heat is close in functionality to some of things NFVO is required to do. There is of course a likely path that NFVO implementations would drive the other APIs (Nova, Neutron) directly. I suggested we consider Heat APIs as a means of fleshing out what may be needed from other core APIs 14:33:00 aveiga: ok, but don't need Heat to do that 14:33:06 because it's more general, it's not specific to e.g. vIMS 14:33:42 sgordon: I can right a use case for multi-site 14:33:46 ^write 14:33:57 if needed 14:33:57 #action mkoderer to take a stab at documenting use for multi-site 14:34:03 mkoderer, thanks - that would be much appreciated 14:34:25 so we have the OS::Nova::ServerGroup resource in heat 14:34:36 sgordon: +1. Agree we need to look at single site and multi-site deployments separately. 14:34:52 adrian-hoban, i just wanted to say heat-apis in my point of view doesn't fit. yes if we want to start with that , not a bad idea. But i think we should come up with new APIs in some time 14:34:52 which relates to the nova server group add call 14:34:56 under the hood 14:35:38 and then actual group membership is via the hints provided with the OS::Nova::Server resource 14:36:07 the key requirement here appears to be how do i express not only a relationship between servers in the group 14:36:14 but a relationship between those groups 14:36:23 so within a single site I want to deploy an N+k pool (which may just be a fraction of the overall service) - I still want to ensure no single host failure can knock out many VMs (and certainly no more than k...) - can server groups permit me to configure that? 14:36:36 "sort of" 14:36:54 so with the anti-affinity policy you obviously achieve that 14:37:01 at the expense that you dont get 'closeness' 14:37:01 sgordon::) 14:37:10 that is none of your servers/instances will reside on the same host 14:37:37 there have been proposals to implement "soft" anti-affinity that might be closer to what you want 14:37:39 ...which is too much spreading 14:37:47 sgordon, u mean to say service vms? 14:37:58 but again still would only place on same host after all options exhausted 14:38:00 vks, no 14:38:06 mkoderer: I suggest you distinguish between OS "control" and "servers" HA in the use case. Happy to assist if you want 14:38:08 vks, in the nova api instances are referred to as servers 14:38:11 hence "server groups" 14:39:41 imendels: thx.. yep sure 14:39:44 imendels: all the time we are speaking about OS HA 14:40:38 (hacky but might work) so could I define a host aggregate of a largish number of "close" hosts, then define my VMs to form a service group, then tell nova to instantiate them on the given aggregate with anti-affinity? 14:40:49 ybabenko: not sure.. look at the servers group above... vs. is your NOVA endpoint is HA and can be seamlessly upgraded 14:40:57 sgordon, here we are talking about special servers ? 14:41:09 not the normal instanes 14:41:14 rt?? 14:41:19 vks, no - we're talking about any servers/instances you want to deploy in the manner cloudon refers to 14:41:40 cloudon, yes that was something i mentioned very early in the conversation 14:41:41 s/service group/server group/ 14:41:43 as a way to achieve today 14:42:50 sgordon, but then u end up dealing with entire instances on cloud instead for the hosts which has special servers running on them 14:43:05 vks, i dont follow 14:43:10 ok, though bit sub-optimal as it requires using a host aggregate to segment your hosts for app affinity purposes rather than physical capabilities 14:43:19 vks, you end up dealing with as many or as few instances as you add to the gorup 14:43:21 *group 14:43:39 not all instances in the cloud need to be in a group, but those you want to place this way do 14:44:20 You could also leverage host aggregates to help identify if the servers had a special config 14:44:39 yes 14:44:52 sgordon, ok that make sense. but wherever those instances will be running will be in HA 14:44:58 the semantic you really want as an app is "instantiate VMs in this server group such that no more than X are on the same host" without reference to host aggregates unless the service needs some special physical capability 14:45:52 mmm 14:46:12 cloudon, what would expected behavior be if i have exhausted all hosts 14:46:16 can we just go line for line in the vIMS usecase and agree on it? 14:46:16 that is i say X is 5 14:46:22 and all hosts have 5 instances 14:46:27 fail request? 14:46:33 i.e. Mainly a compute application: modest demands on storage and networking. - what means "modest"? 14:46:37 if no option then overload - so more of a hint than a hard rule 14:46:48 which feature do we need from networking in order to support vIMS? 14:46:51 Ipv6? 14:46:57 Distributed routing? 14:47:03 cloudon, right but at that point it's really no different than soft-affinity imo 14:47:03 VRRP? 14:47:09 LB? 14:47:10 IP-SEC 14:47:11 etc 14:47:12 etc 14:47:13 etc 14:47:16 unless you are suggesting it should stack the first host until it gets 5, and so on 14:47:32 I would really like to see a transparent review of the use cases 14:47:34 no, definitely not stacking - that's an anti-pattern 14:47:52 mkoderer, can you expand on that 14:48:39 should we move them to a git repo and do a gerrit review?... I would really like that 14:49:05 cloudon: Do you see host separation as the only concern? What about rack-level separation or network-level separation? 14:49:19 my concern with that approach is that we lose many of the people who dont know how to interact with it 14:49:41 (similar to how we lose some who cant/wont do irc meetings by having these sessions in irc) 14:50:11 adrian-hoban: indeed, yes, but was wary of introducing new semantics (especially physically motivated) for groupings of hosts that don't already exist in OS 14:50:25 sgordon: but having it in the IRC meeting doesn't feel that productive 14:50:28 I see that more as a use for avail zones 14:50:54 mkoderer, i agree but is that because of the medium or because we spent 20 mins discussing broader HA issues 14:50:54 cloudon: Agree with starting with incremental changes :-) 14:51:44 cloudon: I guess we need to have addtional features for AZ/host aggregates in general for NFV 14:52:14 basically from my pov i dont want to raise the bar on use case submission, i already have a couple that were emailed to me because people were unsure about adding to the wiki 14:52:17 and the nova scheduling must me more flexible 14:52:23 i dont want to become the conduit for adding them to a git repo as well 14:52:34 sgordon: I think we should go through the uses cases. We may find that there are more commonalities 14:52:48 sgordon: I mean I can upload them to Gerrit... 14:52:52 just nore down in the wiki that HA happens to be one that would be common 14:53:02 note, even 14:53:09 mkoderer: agree; there are many multi-site issues but even if solved that leaves scheduling gaps for what you ideally want within each site 14:53:12 #info possible commonalities around HA and multi-site requirements to identify as we progress through use cases 14:54:00 #info need more flexibility from Availability Zone and Host Aggregate placement, along with more flexible placement rules for the scheduler 14:54:20 mkoderer, with the scheduling are we referring specifically to the server group filters in this case 14:54:27 mkoderer, or are there other desirable tweaks 14:54:32 I'd like it if we could complete the discussion on single site before tackling the multi-site items 14:55:05 adrian-hoban, +1 14:55:11 +1 14:55:12 +1 14:55:28 #info general agreement to focus on use cases in the context of single site deployment first 14:55:51 adrian-hoban: yep we move this discssion to the multi-site use case 14:56:01 #info Is gerrit a better mechanism for use case review? 14:56:07 so are we agreed for single site case (a) there is an affinity issue for N+k groups (b) could hack it with server groups + host aggregates (c) but that's not ideal? 14:56:45 that seems right from my pov, the question is really how would an implementation that solves (a) in particular work 14:57:08 can take that offline though 14:57:16 we only have ~ 3 min left 14:57:23 cloudon: i am not in details on clearwater but maybe it would be a good idea to provide all these details in the wiki 14:57:32 but let's quickly touch on how to move somewhere on service chaining 14:57:51 mestery had mentioned on the m/l thread that this is a topic with much broader interest in neutron than just telco 14:58:08 ybabenko: link in use case gives full details - didn't want to over-burden the wiki 14:58:16 so it's a question of how to ensure telco use case is documented and presentable when that comes around again at the vancouver summit 14:58:27 sgordon: could you give us a link 14:58:33 sgordon, can we have everything in single place 14:58:37 ? 14:58:46 vks, what is 'everything'? 14:59:08 sgordon: in [NFV] tag there is not email from mestery as far as i can see 14:59:17 that's really my point 14:59:18 use cases, and the plan of action 14:59:22 because he's not talking about NFV 14:59:28 here is our draft https://etherpad.openstack.org/p/kKIqu2ipN6 14:59:45 I would appreciate all the comments before putting it into wiki 15:00:25 #link https://etherpad.openstack.org/p/kKIqu2ipN6 15:00:39 we're at time 15:00:53 let's jump over to #openstack-nfv while i find the link 15:01:08 but basically i cant force people who are having a generic discussion about service chaining in neutron 15:01:13 to tag it nfv / telco 15:02:10 #link http://lists.openstack.org/pipermail/openstack-dev/2015-January/053915.html 15:02:14 thanks all 15:02:18 #endmeeting