09:00:08 #startmeeting ha 09:00:09 Meeting started Mon Mar 21 09:00:08 2016 UTC and is due to finish in 60 minutes. The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:10 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:12 The meeting name has been set to 'ha' 09:00:22 hi 09:00:30 who do we have today? 09:00:36 o/ 09:01:10 if it's just us two, we can discuss our talk ;-) 09:01:16 yup 09:01:37 but maybe someone else will show, we can wait 2 minutes more ;) 09:01:44 sure 09:04:19 ok, it looks like it's really us two, aspiers 09:04:22 yep 09:04:24 no problem 09:04:47 well let's use normal structure anyway, it will work fine 09:04:51 #topic Current status (progress, issues, roadblocks, further plans) 09:04:58 not much from my end 09:05:06 I did a few tweaks to openstack-resource-agents 09:05:13 and reported some HA bugs 09:05:23 e.g. http://bugs.clusterlabs.org/show_bug.cgi?id=5271 09:05:24 bugs.clusterlabs.org bug 5271 in Documentation "usage of status action in OCF RAs needs clarifying or eliminating" [Normal,Unconfirmed] - Assigned to oalbrigt 09:05:43 so only short status from my side - I have prepared images for demo "Mistral HA". I didn't completed it yet since I worked on getting US visa 09:05:47 when I noticed that most of the RAs have the status action, and that's pointless 09:05:49 so I will drop it 09:05:57 ok 09:06:22 what type of images? 09:06:34 oh, images == pictures 09:06:39 oh, heh 09:06:49 like diagrams in visio ;) 09:06:51 too early in the morninng for me ;-) 09:06:57 cool! 09:07:16 I guess you also took a look at reveal.js, right? 09:07:26 or at least at Florian's talk 09:07:48 yes, I have watched Florians talk, didn't have time to play with technology itself but surely do this week 09:07:52 ok 09:08:03 I was also working on our neutron-ha-tool OCF RA which we use in SUSE OpenStack Cloud 5 and 6 for neutron HA 09:08:26 I guess for the next release we will probably switch to the standard upstream approach 09:08:39 that's good 09:09:01 yeah 09:09:08 or at least in the DVR case 09:09:35 oh 09:09:43 I just remembered 09:09:48 hm? 09:10:06 last Tuesday was the cross-project meeting 09:10:20 that was last week, right? 09:10:28 I think so 09:10:37 I guess most people read the minutes / logs the next day 09:10:41 hi 09:10:43 but I should mention it for completeness 09:10:46 oh hey bogdando :) 09:10:51 hello :) 09:11:11 so we talked about the auto-evacuation spec 09:11:30 #link https://review.openstack.org/#/c/257809/2 09:11:44 there seemed to be a LOT of interest in the topic 09:12:06 and I explained why I thought it was cross-project 09:12:43 however it was decided that whilst this is important work, a high-level cross-project spec doesn't really make sense 09:13:01 I think specs are supposed to have concrete low-level action items 09:13:12 rather than be technical strategy documents 09:13:26 hi 09:13:30 so I think the spec will be abandoned 09:13:35 hi masahito :) 09:13:52 I was just summarising the cross-project meeting 09:14:28 got it. I'll check eavesdrop. 09:14:33 http://eavesdrop.openstack.org/meetings/crossproject/2016/crossproject.2016-03-15-21.01.log.html 09:14:47 http://eavesdrop.openstack.org/meetings/crossproject/2016/crossproject.2016-03-15-21.01.html 09:15:01 any action items to the spec decomposition then? 09:15:04 #info the conclusion was: keep working on it and collaborating with the different projects 09:15:09 we shall not just abandon and forget :) 09:15:34 #info and submit specs to individual projects when there are APIs missing 09:16:30 bogdando: well I guess until now, each group was working on their individual solutions 09:16:44 and looking to the comments, I believe it was a good idea to make it CP 09:16:54 but maybe we need to start collaborating on the convergence 09:17:06 otherwise we'd never have collected so many comments from folks from different areas / projects 09:17:11 aspiers: I think still some groups working on their solitions 09:17:22 ddeja: yes, I think we all are 09:17:36 ddeja: maybe it is still too early 09:17:46 but we could start thinking about convergence action items 09:18:19 so, CP is eactly perfect place for the *high-level* concepts 09:18:26 #topic how to evaluate possible convergence strategies for auto evac 09:18:32 exactly 09:18:53 bogdando: high-level architectural discussion is cross-project for sure 09:19:04 but apparently specs are not the way to do it 09:19:38 maybe because that reaches too wide an audience 09:19:43 So I'd prefer we finish that instead of allowing the initiative to be split into detached local activities 09:19:59 I think a wiki or etherpad is fine 09:20:06 and mailing list / IRC 09:20:21 but only spec provides a path for future code reviews... 09:20:24 or we could move the spec somewhere else, e.g. mistral 09:20:39 several people said they thought it looked like a mistral spec 09:20:54 maybe yes 09:21:10 aspiers: from my experience with Mistral I don't think such spec suits in it 09:21:27 ddeja: yeah I'm not convinced by that either 09:21:39 since (despite some bugs) Mistral is only a Workflow executor 09:21:59 but there was some plans of having 'very good, reliable workflows' in Mistral repo 09:22:00 Sean Dague said "A cross project spec should either be a thing which affects nearly all openstack projects, or a thing that all the projects involved have agreed to already" 09:22:01 yes, we cannot design things like fencing there 09:22:13 or nova API changes, if any 09:22:35 bogdando: I think the point with nova API changes was that currently it is believed to do everything we need 09:22:52 so in that sense, nova is not involved until we find a bug or a gap in functionality 09:22:54 bogdando: for now, fencing is done outside OpenStack, so I don't know if we should discussed fencing at all 09:23:04 so we may end up having 3 specs - Mistral (if anything to be changed in the Mistral), Nova (ditto), OpenStack resource agents space ? 09:23:23 IMO we should just state that fencing must be configured 09:23:32 ddeja: that's a good point - fencing will probably never be inside OpenStack (unless using Ironic / Triple-O?) 09:23:43 yes, we can just note and leave out of scope 09:23:57 but fencing is a crucial part of the architecture of course 09:23:57 yeah, also we can provide OCF agent for Mistral 09:24:20 ddeja: that's a great example of something which deserves a spec in openstack-resource-agents :) 09:24:21 * ddeja has OCF agent that calls mistral API prepared 09:24:25 indeed, new OCF RA for Mistral fits well into the latter of 3 specs 09:24:27 which reminds me I need to set up a specs repo 09:24:37 #action aspiers to set up a specs repo for openstack-resource-agennts 09:25:05 and the rest things might be just put to the HA guide 09:25:07 (we also need that for planning Fuel reconvergence) 09:25:23 bogdando: hmm, I think it's maybe too WIP for the HA guide now? 09:25:24 like - make sure you configured the pcmk that way , and enabled fencing 09:25:36 oh, you mean just fencing? 09:25:43 yes, if we have a clear vision, nothing blocks us 09:25:59 everything you mentioned would not go as a part of OpenStack setup/op 09:26:16 since everyone is using pacemaker for fencing, it can be mentioned in HA guide 09:26:22 TBH I think we need an architecture diagram 09:26:30 we have a section for controllers HA setup 09:26:36 which maps all the required components of auto-evac 09:26:44 +1 09:26:45 we could add there all details near to the existing pacemaker/corosync sections 09:26:57 maybe we could use google drive to collaborate on drawing on? 09:27:00 *one 09:27:05 and add details how one shall configure pcmk remote for computes HA , for example 09:27:36 or some other tool which is like "ether-visio" 09:27:36 and we can add diagrams 09:27:40 we have many now :) 09:28:14 ddeja: I think we need this map for our talk anyway ;-) 09:29:03 we have bright examples Pacemaker Cluster Manager http://docs.openstack.org/ha-guide/intro-ha-arch-pacemaker.html and Keepalived http://docs.openstack.org/ha-guide/intro-ha-arch-keepalived.html architecture details and limitations 09:29:08 aspiers: you mean something like that https://github.com/gryf/mistral-evacuate/blob/master/Automatic%20evacuate%20design.jpg 09:29:15 bogdando: are there diagrams in the ha-guide? 09:29:20 bogdando: oh... thanks :) 09:29:22 :) 09:29:47 so it looks like we have enaough diagrams ;) 09:30:02 bogdando: is there a standard way to produce diagrams for upstream docs? 09:30:10 but we need ad more specific to the Instance ha + pacemaker remote now 09:30:15 I have no idea ;( 09:30:24 let's ask openstack docs folks 09:30:31 good idea 09:30:38 the neutron docs have tons of great docs 09:30:44 I wonder how they collaborate on them 09:30:45 or Andrew Beekhof, who probably created those above 09:30:47 :) 09:31:51 what i do? 09:31:54 Google Drawings is probably the easiest way 09:31:58 beekhof: it's all your fault! 09:32:08 so do we agree we can start working on the docs update w/o waiting for accepted implementations? 09:32:19 as it seems 100% to be pcmk_remote with OCF RA 09:32:21 beekhof: out of nowhere! 09:32:24 :) 09:32:39 * beekhof was cooking dinner - its been on of those days 09:32:42 beekhof, those? Pacemaker Cluster Manager http://docs.openstack.org/ha-guide/intro-ha-arch-pacemaker.html and Keepalived http://docs.openstack.org/ha-guide/intro-ha-arch-keepalived.html architecture details and limitations 09:33:00 yep, i made those 09:33:00 bogdando: what do you mean by docs update? a new section in ha-guide? 09:33:06 yes 09:33:23 to cover everything's missing to the setup required for the Instances HA 09:33:36 and in that we all 100% sure 09:33:58 fencing is such thing 09:34:00 like, pacemaker_remote, fencing, Mistral OCF RA 09:34:21 I find good section. 09:34:26 http://docs.openstack.org/ha-guide/compute-node-ha-api.html 09:34:30 exaclty 09:35:01 bogdando: well, are we 100% sure on Mistral? for me it is a very good option, but I think we still have to do a lot of work and testing to be 100% 09:35:15 we can skip Mistral then 09:35:22 that's still implementation details 09:35:33 I think for now, only architecture should be covered in ha-guide 09:35:38 those diagrams were made in keynote (apple application like powerpoint) 09:35:56 hm, I'm only trying to find a place for things will not go to any specs but still must be known (how-to) 09:35:56 i can probably export it into powerpoint which google docs might import 09:36:12 bogdando: yes, that is the challenge we need to figure out :) 09:36:21 aspiers: as far as I know triple-O guys are willing to use Mistral 09:36:31 bogdando: I think probably a wiki 09:36:47 bogdando: unless we want to use gerrit for review 09:37:22 do we agree on adding pacemaker_remote topics to the HA guide compute nodes HA? 09:37:35 it seems like 100% will go into the final solution 09:37:53 bogdando: Just pacemaker_remote? 09:37:53 and fencing details! 09:37:55 bogdando: +1 09:38:13 for now I see no alternative - maybe Ironic would be able to do fencing someday... 09:38:36 agreed 09:38:58 bogdando: I think it would also be good to document that compute HA is still WIP 09:39:06 bogdando: the ha-guide could point to our community here 09:39:13 btw, aspiers I saw you asked about remote stonith agents, so may be you could add some things you know now :) 09:39:20 bogdando: link to etherpad, weekly meetings etc. 09:39:32 aspiers: Big +1 on that 09:39:37 great then 09:39:42 it sounds nice. 09:40:00 bogdando: can you take care of that? and add us as reviewers? 09:40:22 If I had a time to play with setup verification 09:40:36 maybe we could attract more people to our community that way 09:40:56 would be nice though someone who already did just shared results and notes 09:41:07 so I could expose them as the guide (and test them as well) 09:41:28 bogdando: I think the wiki is the right place to link to them 09:41:32 aspiers: agree. I noticed I can't find our eatherpad on google. 09:41:35 since they are WIP and change quickly 09:41:45 bogdando: you mean steps how to configure pacemaker_remote? 09:41:53 we can also use wiki for evolving arch/design docs 09:41:53 and fencing of compute nodes? 09:42:13 yes, and fencing agents to use with computes probably (w/o devices specific things) 09:42:31 bogdando: there's not much to say about stonith of remote nodes, it's all documented already 09:42:41 bogdando: I just failed to find the docs the first time 09:42:55 well I'd like to put only the very specific things, no cross posting 09:43:18 or a link if that just works the way we want 09:43:44 the idea is to document something verified, even if as PoC 09:44:17 I belive that the way RH explained how to setup stonith is OK 09:44:22 https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/ch-fencing-HAAR.html 09:44:52 folks, it should be not just a common things, but specific to OpenStack computes and pacemaker_*remote* 09:45:15 with existing limitations like stonith resources may not be running on the compute nodes 09:45:25 or which type of remote fencing agents to use 09:45:32 that we *do* recommend 09:45:47 ok, I see 09:46:04 I don't think there is a need to favour any type of fencing agent 09:46:11 like maximum of nodes supported in the cluster (16 afaik) 09:46:15 any is good 09:46:21 so, more practical things 09:46:26 useful for ops 09:47:14 so that would not be just a reference to the existing guides AFAICT 09:47:16 bogdando: in the period before we have a working solution to document, what do you think is the goal of this section in the ha-guide? 09:47:36 I'm not sure it makes sense to document how to do 20% of a full solution 09:47:41 the goal is to make community prepared for Instances HA solutions 09:48:01 and shed some light to required tooling, setups 09:48:15 there are not much things about pacemaker remote IIUC 09:48:28 only few tech talks, and many things WIP, am I right? 09:48:31 bogdando: I have a setup with fencing configured, but it's based on this https://access.redhat.com/articles/1544823 The only things I've added are custom fencing agent (and Mistral ofcourse) 09:48:35 IMHO it's fine to say "you will need pacemaker_remote" 09:48:43 and maybe link to docs on pacemaker_remote 09:48:57 but I'm not sure it makes sense to document details of how to set it up 09:48:59 okay, let's just try to draft something... 09:49:14 bogdando: yeah, please just submit a review and add us to cc 09:49:30 ok I'll try :) 09:49:30 bogdando: I will definitely review anything you cc me on :) 09:49:39 oops, that was a dangerous promise ;) 09:49:56 aspiers: prepare to review some fuel patches ;) 09:50:03 lol 09:50:19 haha 09:50:23 bogdando: I think the most important thing to add is info on our WIP 09:50:49 bogdando: i.e. http://eavesdrop.openstack.org/#High_Availability_Meeting 09:50:56 https://etherpad.openstack.org/p/automatic-evacuation 09:51:12 and maybe we need a wiki page which is a more friendly landing page for this topic 09:51:23 bogdando: also you could link to the user story 09:51:38 although that is the first link in the etherpad :) 09:52:46 #action bogdando to submit an ha-guide review adding info about community WIP on auto-evac 09:53:24 shall we also have a play with google drawing? 09:53:56 https://docs.google.com/drawings/d/1q50txuu3vVx2WadhWGAeSO25PaEy_DbmwhVryT4FXCY/edit?usp=sharing 09:54:14 #action anyone who wants to, to experiment with google drawing 09:54:29 I think an architecture map would really help us 09:55:09 I can paste there drawing I have prepared for my demo to have something to start with 09:55:16 ok 09:55:19 ddeja, great! 09:55:29 #topic AOB 09:55:33 It's Mistral-oriented, but it's still something to start 09:55:33 any other business before we finish? 09:55:59 wait one minute guys and tell if my drawing makes any sense ;) 09:56:33 Done, pasted 09:57:45 I'll paste Masakari architecture. 09:57:57 in the page. 09:58:07 but I don't know if it is anywhere close to what you guys have in mind :) 09:58:16 well, hi \o/ being a _bit_ late, sorry :p 09:58:26 haha hi haukebruno, we are just finishing :) 09:58:30 I think it's easy to compare both and others. 09:58:42 so feel free to delete it, I have it in another docs 09:58:46 masahito: +1 09:58:53 good point 09:58:54 aspiers, yeah. I apologize... pretty bad timings today 09:59:10 I just wonder if we can add second page in this drawing? 09:59:10 ddeja: I was thinking we need a process diagram 09:59:39 ddeja: I'll try to sketch something so it makes more sense 10:00:03 something a bit like https://github.com/ntt-sic/masakari/blob/master/contents/architecture.png 10:00:17 but more generic 10:00:24 not specific to any implementation 10:00:28 anyway, we are out of time 10:00:29 I see 10:00:36 let's continue on #openstack-ha 10:00:46 thanks all! 10:00:55 #endmeeting