12:58:54 #startmeeting Review of Dublin edge notes 12:58:55 Meeting started Fri Apr 20 12:58:54 2018 UTC and is due to finish in 60 minutes. The chair is csatari. Information about MeetBot at http://wiki.debian.org/MeetBot. 12:58:56 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 12:58:58 The meeting name has been set to 'review_of_dublin_edge_notes' 12:59:26 #link https://wiki.openstack.org/wiki/OpenStack_Edge_Discussions_Dublin_PTG 13:00:47 Good afternoon csatari. 13:00:52 Hi 13:02:12 I plan to go though the Wiki page topic by topic and record the comments in actions. 13:02:26 Hi 13:02:28 Sounds good. (Will there be a roll call?) 13:02:44 Oh, yes :) 13:02:56 #topic Roll Call 13:03:15 #info csatari - gergely.csatari@nokia.com 13:03:25 #info alebre - adrien.lebre@inria.fr 13:03:30 #info Paul Carver 13:03:36 #info jdandrea - jdandrea@research.att.com 13:03:51 #info Eric Sarault - eric.sarault@kontron.com 13:04:01 #info s/alebre/ad_ri3n_ 13:04:18 #info dpertin dimitri.pertin@inria.fr 13:04:28 #info Chris Price - christopher.price@est.tech 13:04:29 sorry … stupid habit 13:04:37 No worries :) 13:04:42 #info rcherrueau - ronan-alexandre.cherrueau@inria.fr 13:05:22 Okay, let's jump into the document. 13:05:34 so the idea was to go through the Dublin PTG wikipage. 13:05:37 :D 13:05:40 #link https://wiki.openstack.org/wiki/OpenStack_Edge_Discussions_Dublin_PTG 13:05:52 Yes, chapter by chapter. 13:05:52 * jdandrea (change topic?) 13:06:03 #topic Review Intro 13:06:11 Anything to the intro part? 13:06:20 not from my side 13:06:40 Okay, moving forward. 13:06:51 #topic Definitions 13:07:06 Original Site : The site to where the operator is connected 13:07:10 should be probably reworded as: 13:07:20 The site where the operation is performed/executed initially 13:07:34 Okay for me. 13:07:55 (BTW where are we taking notes? are we using an etherpad or something else? 13:07:56 ) 13:08:01 #action reword "The site to where the operator is connected" to "The site where the operation is performed/executed initially" 13:08:06 :-) 13:08:09 efficient ;) 13:08:15 #info jrbalderrama - javier.rojas-balderrama@inria.fr 13:08:16 I hope, that the meeting minutes will be enough,. 13:08:20 Let's see :) 13:08:28 So based on that I think we should also reword the Remote site by 13:08:57 Okay, any proposals? 13:08:59 Site(s) that are involved in an operation launched from the Orginal one. 13:09:05 but not sure it is clear enough 13:09:13 … 13:09:31 maybe affected instead of involved? 13:09:43 better yes 13:10:20 #action reword Remote Site to "Site(s) that are affected in an operation launched from the Orginal one." 13:10:27 The items Application Sustainability/Site Sustainability should be close each other 13:10:45 #info Francis Dagenais - francis.dagenais@kontron.com 13:11:14 The definitions are in alphabetical order now. 13:11:26 ok 13:11:32 Should I break the alphabetical order? 13:11:37 ok 13:11:41 don't know which one is the best one. 13:11:59 But since we do not have too many definitions, sorting by semantic is probably more relevant? 13:12:22 that's all from my side for 'Definitions' 13:12:24 I would prefer to keep the alphabetical order to aviod long discussions about semanthics 13:12:28 FOr those that's aren't well versed it might be easier to be alphabetical 13:12:33 ok 13:12:36 and I hope we will have more definitions. 13:12:37 Okay 13:12:46 Anyonelese on the definitions ? 13:13:25 #topic Review of 3 Edge use cases 13:13:36 hmm maybe 13:13:42 ^^ 13:13:54 ChrisPriceAB> to definitions? 13:14:29 Is "original site" the way we want to represent the "operative site". I assume an "original site" could be anywhere and refers to the site where an operation is being performed that may impact multiple sites. 13:14:35 I always wanted to try to switch back to a previous topic :) 13:14:41 lol 13:14:52 * ChrisPriceAB loves being fashionably late to a topic 13:15:12 I'm just not 100% sure we are using the right term. but I lack a better idea. 13:15:35 In my mind original site could be anywhere 13:15:42 if you have three sites A, B, C 13:15:43 original is rather concrete for what I assume is a transactional definition. 13:16:02 user a connects to A to launch a request (whatever the request), 13:16:05 Original - remote sounds better for me than Operative - Remote, but I let this be decided by native english speakers. 13:16:08 this is the original site for user a 13:16:34 user b connects to C, then C is the original site for the b request 13:16:42 then the request can be mono site or multi sites 13:16:53 Yes, but the definition is per operation "The site where the operation is performed/executed initially" 13:16:56 anyway, it's not urgent. The context is an operation, and the way "original" presents does not help the reader understand the meaning. 13:17:01 (in the latter case, then the request, which is multi sites, includes remote sites) 13:17:11 per operation = per API request 13:17:19 exemple: openstack server create …. 13:17:36 it will probably create a situation where the author understands, but maybe not the reader. 13:18:01 if the operation is start a VM on A with an image from B (sorry for this red thread), then the original site is A while site B is a remote one. 13:18:12 Maybe we can add a small schema to clarify that point? 13:18:15 * ChrisPriceAB is one of those people who doesn't read the definitions section before reading the document. :) 13:18:32 For now should I add a note to the page about this dilemma? 13:18:45 Original makes me think of the "Primary" site, if the relation is relative to the sender's point of view, it needs ot be clarified 13:18:46 not sure it is a dilemma 13:18:56 if we draw a figure? 13:19:09 A diagram would resolve that uncertainty for sure 13:19:16 Got it. 13:19:18 Okay 13:19:19 #topic Definitions 13:19:44 @dpertin do we have any figure related to that point? 13:19:50 from our side? 13:20:04 #action Create a drawing to explain Original and Remote sites 13:20:20 let me check 13:21:28 dpertin: if you have anything please send it to me ;) 13:21:40 Okay, anything else to the Definitions? 13:21:42 * ChrisPriceAB will try to stop making things difficult. (but may fail to do so) 13:21:49 * ildikov is sneaking in and hiding in the corner at the back of the room :) 13:22:13 Moving forward 13:22:13 #topic Review of 3 Edge use cases 13:22:17 csatari: sure 13:22:20 If you need some help on "beautifying" it let me know. Could probably get someone doing something in Adobe quickly 13:22:35 :-) 13:22:42 ok nothing csatari from my side 13:22:59 the use-cases are I think correctly explained in the White Paper (at least from now ;)) 13:23:02 dpertin, esarault Thanks. I deffinetly need someone with beutifying capabilities ;) 13:23:05 s/from/for now 13:23:19 ok 13:23:40 #topic Review of 4 Deployment Scenarios 13:23:56 Only chapte 4 for now. 13:24:03 (the sentrence) 13:24:12 ? 13:24:39 If you have comments specific to 4.1 that should go to that topic. 13:25:03 Small Edge: I 'm wondering whether we can have information related to the storage capacity? 13:25:19 #topic Review of 4.1 Small edge 13:25:45 i.e. a single hyperconverge server (i.e. ceph for instance will be also deployed to store VM/containers/baremetal images…) ? 13:25:47 This is all the info what I could collect from all the etherpads. 13:26:04 Maybe it can make sense to open questions? 13:26:07 Maybe one thing abour small edge, most SSDs are 240GB, not 225GB. 13:26:20 at least that a note on that? 13:26:22 And for the Maximum, if the tagret is a Xeon-D type or ARM processor, probably 16 cores could eb the max tagret 13:26:53 #action Change 225 GB to 24o GM 13:27:06 Just a quick comment about Edge use cases (section 3). I'd like to stress that while yes, high amount of data vs low latency is a challenge, predictive, guaranteed maximal response times should be there somewhere 13:27:50 This heavily implies RT schedulers support 13:27:56 @all Is it OK if I add the 16 cores to the max specs of the Small edge? 13:28:28 And the processor types. 13:28:59 #action add to the maximum hardware specs that the tagret is a Xeon-D type or ARM processor, probably 16 cores 13:29:45 fdag predictive, guaranteed maximal response time is a requirement, not an use case. Here we should describe why do we need predictive, guaranteed maximal response time. 13:30:49 I'm happy to add more use cases if you have anything in mind. 13:31:18 Anything else to Small Edge? 13:31:29 I'd also liek to point out that it's unlikely people will refresh every OpenStack release their Small edge appliance. What we're seeing mostly from Tier 1/2 is them starting to be comfortable with a 2 years update cycle, opened to discuss 1 year 13:32:07 updating every release is aggressive and time consuming for them and they typically have the mindset of "If it ain't broken, don't touch it" 13:32:44 esarault even if the upgrade will be possible remotely without any affect to the running workload? 13:33:03 Yeah the challenge is, you bring risk into the mix, a chance to mess with your SLA 13:33:11 csatari: did you consider my remarks regarding the containers/vm/baremetals image repository (and more generally what are the requirements in terms of cloud functionality) 13:33:30 esarault: Understood. 13:33:41 Are we expecting a FW release cycle of approx. 1month? This generally means downtime which may impact your on premise equipment. 13:34:04 Should I change the update frequency to 1-2 years, then? 13:34:07 An upgrade a year is managable I think. Sounds simple if you have a box, but if you have 10 000 of them scattered across the US, updating every 6 months becomes a burden 13:34:13 ^^ maybe we have general notes? It seems such a challenge will be valid for every edge deployments? 13:34:28 On Small Edge, the section on Remote access/connectivity reliability: It sounds like we assume allways on.... Why 100% uptime? I thought we assumed WAN link could fail often! 13:34:52 perhaps we should have a note that small edge availability is low. 13:35:01 Upgrades + Failures 13:35:12 parus: it depends from which side the endusers/devlops will be. 13:35:20 ? 13:35:31 csatari: understood for the requirements vs use case, but I see it more as a 'needs' in that case (as with 'needs high amount of data' and 'needs low latency handling') 13:35:33 maybe the connectivity between the enduser and the small edge site can be quite stable 13:35:54 while the connectivity between the edge site and the rest of the edge infrastructure can be intermittent 13:36:39 csatari: seems there are a couple of comments/questions on that part. should we open an etherpad and copy/paste the text in order to refine it? 13:36:50 Ok. but Failure or Management Upgrades may cause downtime without a local backup. 13:36:57 (i.e. on the depployment scenarios section). 13:37:47 fdag: Okay, let's open an ehterpad for the Edge Use Cases text. 13:37:54 Given the case is a small CPE box, I don't think the expectation is to maintain uptime during upgrade of software nor firmware. Target would be to minize the downtime but I don't think 100% is feasible given it's only one unit in the Min/Max specs 13:38:02 #topic Review of 3 Edge use cases 13:38:30 #info the text needs refinement. 13:38:34 It would definitely be expected in the Medium edge however 13:38:39 Spelling: Autonomous 13:39:36 #link https://etherpad.openstack.org/p/Dublin-edge-notes-wiki 13:40:23 #action Anyone who would like to make modifications on 3 Edge use cases go to https://etherpad.openstack.org/p/Dublin-edge-notes-wiki and add your proposals 13:40:46 Okay, back to the reliability of Small Edge-s 13:40:55 can we do the same for the deployment scenarios section? 13:41:04 (I copied/pasted the current text). 13:41:06 yes, someone already dud :) 13:41:09 did 13:41:47 #topic Review of 4 Deployment Scenarios 13:42:15 First timer with EtherPad here, you propose to just update the pad with what we propose directly? No need for comments or anything else? 13:42:32 #action Proposals about 4 Deployment Scenarios should be added to https://etherpad.openstack.org/p/Dublin-edge-notes-wiki 13:42:38 Don't want to break your usual process ^^ 13:43:35 esarault: No, just add your text. If you would like to add a note prefix it with "Note:" and I will not copy that to the wiki. 13:44:33 Okay, back to the reliability of the Small Edge-s 13:45:01 Should I add anything more or differnet, than the current text "No 100% uptime expected."? 13:45:59 Okay, I see a "No 100% uptime expected and variable connectivity (e.g. connected car)" in https://etherpad.openstack.org/p/Dublin-edge-notes-wiki 13:47:07 #topic Review of 4.2 Medium edge 13:47:09 About the hardware specs concerning the image repo, expect the small edge, being a single unit to likely only have 1x M.2 SSD or 1.8" SSD drive 13:47:38 * esarault added quantity to the Etherpad 13:48:15 csatari: I just read you comment on the pad. how do you want to process, actually I asked the question twice on the IRC, not sure you saw it (if yes sorry, I apologize but I just want to keep traces on that remark ;)) 13:48:53 I think we should envision the compute node scenario. 13:49:00 we also discuss the two connectivy aspects. i.e. connectivity between end-users and the edge site and connectivity between the different edge sites? 13:49:26 Regarding Medium Edge, I think 2U should be the minimum specs. There are units out there with Xeon Scalable with 4 independant servers within 2U and also some hyperconverged appliance with up to 18 servers in a 2U unit. 13:49:26 10 min left should we try to put all the questions we may have somewhere ? 13:50:00 ad_ri3n_ You mean the "i.e. a single hyperconverge server (i.e. ceph for instance will be also deployed to store VM/containers/baremetal images…) ?" issue? 13:50:16 yes 13:50:43 ad_ri3n_: who is the end-user here? the user of the VM? or an openstack tenant 13:51:22 parus: the devops who requests the provisionning of a new VM 13:51:31 so both 13:51:44 ad_ri3n_ We can start recording the open issue in info statements. 13:51:57 s/issue/issues/ 13:51:58 can be the end-user with his/her smartphone that requires to launch a new VM in the edge site 13:52:40 or can be a VM that is already running on the edge site and that wants to scale the service by provisionnng a new VM (just basic examples that come to my mind). 13:53:03 esarault: I think 2U-s are too big for a small edge :) 13:53:03 The end-user may be on his smartphone watching you-tube.... but it might be goodgle who would launch the VM on the web-site. 13:53:25 csatari: not sure I'm following you but ok, you are chairing the discussion. So I do what you propose ;) 13:53:40 ad_ri3n_, parus: Maybe we should add a definition for the end user 13:53:45 csatari: The comment from esarault applies to Medium Edge 13:53:57 ;) 13:53:58 fdag: okay. 13:54:12 whatever fadg said :) 13:54:15 *what 13:54:17 :) 13:54:43 #topic Definitions 13:54:54 How much longer can you guys go on this meeting? 13:55:15 +30m here 13:55:21 +30m 13:55:24 #action add a defininition for the ones who are interaction with the edge cloud infrastructure and for the ones who a re using the services running on top of the edge cloud infrastructure. 13:55:27 good with me. 13:55:39 +30 OK for me too. 13:56:05 #topic Review of 4.2 Medium edge 13:56:30 @All can we change the mimium specs of Medium edge from 4U to 2U? 13:57:15 I will leave in 3 min 13:57:17 I'll say yes but then that's voting for myself :p 13:57:19 sorry I have another meeting from my side 13:57:30 what will be the difference 13:57:33 between 2U and 4U 13:57:37 ad_ri3n_ okay 13:57:46 2U tagrets multi-node systems 13:58:00 also ensures you're able to fit more compute when replacing an old 6U unit that dates from the WW1 13:58:21 csatari: would it be possible to five a follow up to this meeting (actually I have several questions regarding the different acronyms such as MVS…. discussed lated in the wiki page). 13:58:23 also allows to think that within "one box" they can be more than 1 server 13:59:44 ad_ri3n_ Yes, I plan to have more sessions until we reach the end of the doucment :) Also if you have questions or comments feel free to send them to the edge DL or to this IRC channel. 14:00:05 ok thanks for all the great work you are doing ;) 14:00:13 it is really great to see more and more people contributing to this subject 14:00:14 thanks 14:00:15 CU 14:00:16 ++ 14:00:20 Okay, I do not have any prefernce on this to I'm happy to change it to 2RU 14:00:30 ad_ri3n_: Thanks, my pleasure. 14:00:57 #action Change the Minimum hardware specs of Meduim Edge to 2RU. 14:01:33 There is a comment in the etherpad: Expected frequency of updates to hardware: 5-7 years 14:01:44 Anyone against this? 14:01:56 Yes, that's me. That's the typical lifeyccle of a server from a fianncial standpoint 14:02:13 amortization in done in 3-5 years, system keep running for a few years afterward and slowly gets replaced 14:02:19 esarault okay, thanks for the info. 14:02:22 also ties to Intel's warranty on embeedded processors 14:02:28 About this one: "Expected frequency of updates to firmware: Never unless required to fix blocker/critical bug(s)" 14:02:55 I think we have more and more blocker/critical bugs in firmwares. 14:03:09 never sounds a bit too optimistic. 14:03:11 The thing is it'll create downtime 14:03:31 Yes, that is clear. 14:03:33 and from what we so, customers rarely update if ever there BIOS/BMC once in production 14:03:47 *their 14:04:06 csatari : You are right, but then qualification cycles of a specific FW is so long, most providers decide to live with the quirks instead 14:04:30 Okay 14:04:52 Anyone against this: Expected frequency of updates to control systems (e.g. OpenStack or Kubernetes controllers): 12 to 24 months 14:04:53 ? 14:05:03 Unless it's a big *cough* security flaw *cough* 14:05:13 and this feedback comes from seeing Tier 1 customer and major public cloud providers 14:05:20 Yeah, Specter let's say 14:05:53 can you solve Spectre with a firmware update? 14:06:52 @fadg by enforcing default disabling of options allowing Specter/Meltdown exploits, correct? 14:06:59 Isn't is a screwdriver kind of update if not solved in every software piece (however some parts of the CPU can be also updated as a software) 14:08:00 Okay, I will copy the text from the etherpad to the wiki based on this discussion. 14:08:04 Moving on . 14:08:21 Disabling some BIOS/ME options will 'workaround' the issue, yes 14:08:37 #topic Review of 5 Features and requirements 14:09:40 If nothing, then 14:09:55 #topic Review of 5.1 Architectural paradigms 14:10:35 what do you mean by cloud metadata? 14:10:36 nothing here either I guess 14:11:14 #topic Review of 5.2 Features 14:11:37 #action There are no levels mentioned anymore. Remove it from the sentence. 14:12:04 #topic Review of 5.2.1 Base assumptions for the features 14:13:20 #topic Review of 5.2.2.1 Elementary operations on one site 14:13:36 Typo: Network unreability -> Network unreliability 14:13:49 #action Network unreability -> Network unreliability 14:13:56 fdag Thanks 14:14:05 csatari: what do type: MVS and Non-MVS mean in the document? 14:14:26 dpertin: MVS is minimum viable soluiton 14:14:37 ok thanks 14:15:19 Anything else to 5.2.2.1? 14:15:35 #topic Review of 5.2.2.2 Use of a remote site 14:16:14 #action remove "Level 1" 14:16:33 #topic Review of 5.2.2.3 Network unreability 14:16:52 This is Nono-MVS? 14:16:57 *Non 14:17:08 yes, according to my first guess. 14:17:19 I'm open to discussions. 14:17:51 Without this we can still have an edge infrastructure if we have good networks :) 14:17:54 What's the impact if this is not there? 14:17:59 True 14:18:17 Just curious what a connection drop of 15 seconds impact's will be 14:18:26 Network issues will result in failed operations and maybe inconsistent config data in the edge cloud infra. 14:18:43 Some the user "resends" the command 14:18:57 No notion of command buffering to retry x amount of time until accepted or declined? 14:19:26 So basically "manual automation" until implemented :p 14:19:27 This is what I mean on "Have a policy for operation retries" 14:20:12 csatari : great! 14:20:22 Yeah fair enough :0 14:20:24 :) 14:20:32 In the final solution I would keep trying for ever, but with some smartness to aviod killing the network in the minue as it returned from an outage. 14:21:16 Yeah otherwise it might flag it as DDOS 14:21:33 Yes 14:21:41 furthermore a site is supposed to provide L1 operations, even if it is not able to contact other sites 14:22:12 Yes, and we miss this form this section 14:22:26 #action a site is supposed to provide L1 operations, even if it is not able to contact other sites 14:22:35 On that point, it might be good to clearly list the levels if they are to be refered to. Otherwise we'll loose people on terminologies here 14:23:05 Hehe, I just wanted to ask what are the L1 operations :) 14:23:47 At the moment I do not have a clear list of operation in mind. 14:24:18 No worries, just wanted to point out some of us are closer to the hardware than the network ;) 14:24:32 :) 14:24:37 Okay 14:25:17 #topic Review of 5.2.2.5 Containers 14:26:05 From my viewpoint, L1 operations are the ones already provided in OpenStack 14:26:32 Regarding containers, are we expecting to leverage Kuryr for this? 14:26:45 I would limit those operations. 14:27:28 Eg.: I would not let to overwrite the metadata locally what is received from a "parent" edge cloud instance. 14:28:01 esarault It is not clear at the moment. There are several architecture options for this. 14:28:04 I'm sure there's folks on the Medium edge that'll expect a symbiosis with K8S 14:28:21 csatari: what do you mean by metadata? images, flavors, etc? 14:28:55 dpertin yes, things like that. See https://wiki.openstack.org/wiki/OpenStack_Edge_Discussions_Dublin_PTG#Metadata_distribution 14:29:17 csatari: thanks 14:29:23 #topic What's Next 14:29:52 #info We reached 5.2.2.5 Containers. More similar meetings to come. 14:30:06 @all I'll need to leave now, thank you very much for this session, great work! 14:30:07 #action Csatari to organize the next meeting. 14:30:17 Thank you all 14:30:17 fdag: Thank you. 14:30:31 Thanks all, this was great. Looking forward to the next meeting! 14:30:38 Anything to he AoB section? 14:30:50 Thanks 14:30:53 Okay 14:30:56 A0B? 14:31:01 Thanks all. 14:31:07 esarault: Any other business. 14:31:15 #endmeeting