09:01:23 #startmeeting ha 09:01:24 Meeting started Mon Dec 7 09:01:23 2015 UTC and is due to finish in 60 minutes. The chair is aspiers. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:01:25 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:01:27 The meeting name has been set to 'ha' 09:01:40 OK hopefully this is the right channel now ;-) 09:01:53 yes, hello :-) 09:01:59 o/ 09:02:02 Welcome everyone - again :-) 09:02:12 o/ 09:02:22 Maybe I should have mentioned that this meeting time is Monday morning for me, so sometimes my brain will not work ;-) 09:02:40 howdy 09:02:48 <_gryf> hey 09:03:19 #topic Current status (progress, issues, roadblocks, further plans) 09:03:55 _gryf/ddeja: you want to give any updates? 09:04:18 I have working fix for bug in mistral, but struggle with tests 09:04:26 ok 09:04:46 but other guy from Mistral have proposed another (probably better) fix 09:04:57 <_gryf> from my side - nothing in the ha area 09:05:04 so I hope to have it resolved this week 09:05:09 ok cool 09:05:25 I'll be making review and start hardening POC solution 09:05:44 ddeja: sounds good! 09:05:49 masahito: anything from your side? 09:06:09 I checked 2 things last week. i) whether Masakari works with pacemaker-remote with no change or not? ii) easy to replace sqlalchemy? 09:06:25 i) We need some works for using pacemaker-remote. Changes sources of host's status for hostmonitor process. Masakari parses output of crm_mon command to check which host is online and offline. aspiers suggetted up-to-date crmsh is better. 09:06:40 but I hit a problem in crmsh when I checked it. A remote node appears in 'crm node list', but when pacemaker-remote goes down the command doesn't said the remote node is offline. I used pacemaker 1.1.10 on ubuntu14.04. If the version is wrong for testing, please let me know. 09:06:52 ii) replacing MySQLdb with sqlalchemy isn't difficult but needs some works. 09:07:01 1.1.100 is old 09:07:05 1.1.10 I mean 09:07:24 above is my update :) 09:07:26 crmsh wont work for RH 09:07:54 beekhof: which tools should I use? 09:08:07 well RH would love you to use pcs 09:08:07 but 09:08:14 maybe crm_mon --xml ? 09:08:27 ubuntu doesn't have it. 09:08:36 beekhof: why not pcs? I'm using it and it's OK? 09:08:53 ddeja: pcs only exists on RH IIUC 09:08:58 ddeja: because that will make suse just as unhappy 09:09:05 Oh, OK 09:09:13 (and Ubuntu) 09:09:33 masahito: i'd worry about using pacemaker remote on a version old enough not to have --xml 09:09:46 sorry, --as-xml 09:09:57 masahito: you should probably use 1.1.13 at least 09:10:09 at least 09:10:17 beekhof: ok, I'll try it with latest one. 09:10:21 lots of work went into it this year 09:10:50 lots of scary, "how did this ever work" kind of fixes 09:11:15 crm_node should work cross-distro, but IIRC it doesn't work on remotes unless they have a node attribute set 09:11:16 <_gryf> masahito, what operating system do you use? 09:11:44 _gryf: ubuntu 09:11:57 <_gryf> masahito, oh, i see, ubuntu lts 09:12:34 _gryf: currently we use ubuntu14.04 because of lts 09:12:34 I should try to summarise some of this for the minutes 09:13:01 #action ddeja to review alternate solution to mistral bug 09:13:13 #action ddeja to continue hardening mistral PoC 09:13:43 #info masahito is now working on integrating pacemaker_remote into masakari 09:13:59 #info masahito is now working on replacing MySQLdb with sqlalchemy 09:14:00 aspiers: it should work 09:14:20 #action masahito will see if a newer pacemaker solves some of his issues 09:14:47 beekhof: it works, but remotes are missing from one part of the CIB until a node attribute is set on them 09:15:01 beekhof: therefore IIRC, crm_node -l only includes remotes with node attributes 09:15:10 but I could be remembering the details totally wrong 09:15:17 since it is still early on Monday 09:15:39 i think i encouraged ken to change that 09:15:49 that would be nice - it did seem a bit inconsistent 09:15:52 should be reliable in 1.1.14 09:16:16 beekhof: any updates from your side? 09:16:29 ha. yeah 09:16:56 so remember all my problems last week? yeah, someone pointed fencing at the undercloud instead. 09:17:07 LOL 09:17:31 there are still some fixes to be had, and i'd like to co-ordinate with you on the host/nova name mappings 09:17:39 yes please 09:17:45 that was the bit I had the most problems with 09:17:46 see if we can get something that works for both of us 09:18:07 yep, i still have your diff open 09:18:23 I have a whole page of notes from when I was reverse-engineering that 09:18:36 I should share it 09:18:50 i should have documented it :) 09:18:58 that would be nice :) 09:19:01 but it wasn't just you 09:19:21 a lot of the mapping stuff is quite sparse on docs 09:19:46 e.g. pcmk_host_map, how port gets set for fencing agents etc. 09:19:55 yeah 09:19:59 #action beekhof and aspiers to discuss host/nova name mappings 09:20:01 too little time 09:20:04 right 09:20:07 i updated RH on the status here, and someone raised a concern that Congress might be flatlining 09:20:11 anyone want to comment on that? 09:20:31 it doesn't look like that based on the meeting minutes I read last week 09:20:50 but I wanted to discuss Congress as a separate topic anyway 09:21:09 <_gryf> beekhof, you mean, Congress would be discontinued? 09:21:24 so let me first just say I don't have any significant updates from my side since last week was our team meeting 09:21:35 #topic Congress and potential use in HA 09:21:36 someone basically sent this link ( http://stackalytics.com/?module=congress-group&metric=commits&release=liberty ) and said see, vmware isn't interested anymore 09:21:48 presumably vmware was the original project champion? 09:22:05 hi, I'm also contributing Congress. 09:22:10 beekhof: are you mentioning Congress due to what I posted on #openstack-ha last week? 09:22:17 _gryf: that was the suggestion 09:22:25 aspiers: i dont recall who suggested it 09:22:29 it was me 09:22:29 but yeah 09:22:34 ok 09:22:46 they already have a use case for triggering nova evacuation workflows from Congress 09:23:03 #info Congress project has documented a use case for triggering nova evacuation workflows from Congress 09:23:07 use case == reason or use case == code? 09:23:14 reason IIUC 09:23:30 link? 09:23:54 I don't know what was happen in ha, but I can explain from Congress side maybe. 09:24:09 #link https://docs.google.com/document/d/1ExDmT06vDZjzOPePYBqojMRfXodvsk0R8nRkX-zrkSw/edit# 09:24:21 to be up front, i don't much care who does the triggering. but the division of labor seemed sane 09:24:28 the use case is missing from the ToC 09:24:43 but it's just after the "evacuation of tenants for planned outage" section 09:25:17 I also talked to one of our guys who is following Congress 09:25:26 and found out that they are talking about integrating it with Mistral 09:26:05 so it could watch out for the attribute set by fence_compute, and then initiate evacuation via Mistral based on cloud-specific policies 09:26:27 which sounds really nice because each cloud will have different ideas of how to set SLAs for pets 09:26:40 would we bother with fence_compute? other than to tell nova that the node is down? 09:27:10 once we've told nova, then congress should automagically know what to do right? 09:27:38 beekhof: I guess it depends on the details 09:27:50 doesn't everything :) 09:27:56 From the document that aspiers linked, it seemes that fence_compute should also tell congress 09:28:01 if congress is too slow to notice for some reason, the compute host could bounce back up without any action taken 09:28:07 also possible 09:28:10 but maybe masahito know the details? 09:28:36 yap 09:28:46 but I really like the idea of allowing flexible policies via Congress, e.g. some might want to do it with availability zones, others per-tenant, etc... 09:28:51 aspiers: so was the question just "what do you think of including it?" ? 09:29:11 beekhof: yeah pretty much 09:29:25 I think there's still quite a bit of work to do, e.g. the mistral driver for congress 09:29:28 masahito: how speedy congress would be for this? seconds? minutes? 09:29:40 beekhof: depends on config. 09:30:08 beekhof: currently, congress polls Nova API every 10s in default. but 09:30:17 Congress team were originally planning to add Mistral integration for liberty 09:30:23 but AFAIK it hasn't started yet 09:30:44 #link https://wiki.openstack.org/wiki/PolicyGuidedFulfillmentLibertyPlanning_Remediation is an example quite similar to the use case we are discussing 09:30:48 aspiers: i think phase 1 == confirm if mistral is the right path + implement, phase 2 == decide if triggering via congress is the right path 09:31:01 In Mitaka release, I'll implement a new feature that Congress receive notification from other service. 09:31:03 beekhof: violently agree :) 09:31:16 masahito: ah, excellent 09:31:25 masahito: very nice :-D 09:31:52 #info masahito is planning to enable Congress to receive external notifications in mitaka 09:32:13 there is already a BP. 09:32:33 #link https://blueprints.launchpad.net/congress/+spec/push-type-datasource-driver 09:32:35 ah cool 09:34:35 On the other hand, the usecase aspiers mentioned above is suggested by others, so it's not under discussion now. 09:35:13 I will try to attend Congress / Mistral IRC meetings when possible 09:35:30 to stay up to date and represent our interests 09:35:48 aspiers: did you want to move on to our fun email thread? 09:35:52 but I'm pretty busy so hopefully others can too 09:35:59 I'm trying to cover Mistral meetings 09:36:01 beekhof: yes I was about to suggest that :) 09:36:06 ddeja: great! 09:36:14 but i got in first so ,,!,, :) 09:36:17 #topic future direction of OCF RAs 09:36:21 haha :) 09:36:30 so, I started a discussion with beekhof 09:36:35 on IRC 09:36:47 and then it turned into a private mail thread, which we should probably avoid in future :) 09:36:49 but then i needed to cook dinner 09:37:00 and you logged off 09:37:04 right :) 09:37:05 beekhof: LOL 09:37:20 I suggested the idea of converting the RAs to wrap around service(8) 09:37:25 and beekhof doesn't like that at all :) 09:37:35 we're currently working towards a consensus 09:37:35 and everyone agreed it was a terrible idea. the end. good night 09:37:39 haha ;-) 09:37:47 <_gryf> aspiers, serviced, or just any other init system? 09:37:57 _gryf: that's one of the key points 09:38:10 my main point was that each distro potentially has a different way of managing services 09:38:16 most are on systemd by now 09:38:17 <_gryf> s/serviced/systed/ 09:38:19 but I guess not all 09:38:24 e.g. Ubuntu LTSS 09:38:27 i'm no systemd fan, but it won, we should move on 09:38:29 <_gryf> darn, systemd :) 09:38:50 well, at least until people start using pacemaker_remoted as pid one 09:39:05 <_gryf> ubuntu 14.04 have only logind, but still uses upstart 09:39:16 <_gryf> ubuntu 16.04 will have systed 09:39:25 sorry, s/LTSS/LTS/ 09:39:38 _gryf: so still a few months out 09:39:55 14.04 LTS is supported until late 2019 09:40:11 <_gryf> beekhof, right, and we have to count existing setups in 09:40:17 LALALALALAL there is only RHEL 09:40:18 beekhof: so you are suggesting we should drop support for that? 09:40:51 IMHO even after U16.04 we still should assume, that some other OS may come without systemd 09:40:54 beekhof: in theory you and I should care the same amount about Ubuntu LTS ;-) 09:40:55 i probably wouldnt base a forward looking solution around not having systemd if thats what you mean 09:41:10 ddeja: that's right 09:41:45 in any case, we should just use OCF scripts. problem solved 09:41:47 <_gryf> ddeja, besides slackware and gentoo are there any other significant distribution that does not use systemd? 09:42:12 now might be a good time to mention the https://wiki.openstack.org/wiki/Rpm-packaging project 09:42:30 no comes with useful status monitoring! free with the first 100 orders 09:42:35 (soory, its late, i get silly) 09:42:55 <_gryf> beekhof, lol 09:43:03 the likelihood of converging on the same systemd service descriptions for OpenStack services for rpm-based distros depends on the success of that project 09:43:17 but it's actually expanding beyond rpm to deb 09:43:36 one of our guys (Dirk Mueller) is the PTL and he was talking to Mirantis etc. in Tokyo 09:43:51 although it's early stages 09:44:14 anyway, my point is, even if everybody uses systemd, can we rely on the services having the same names etc.? 09:44:31 or doing the same things 09:44:36 nope. do we need to? 09:44:41 that's the next question 09:44:53 e.g. if Pacemaker is controlling keystone, should systemctl status openstack-keystone still work reliably? 09:45:00 I would say yes 09:45:15 of course systemctl start/stop are a very bad idea in those scenarios 09:45:17 not necessarily 09:45:28 its arguably better if it doesnt 09:45:45 I'll agree with beekhof 09:45:46 because then people will try start/stop via systemctl too 09:45:59 with pacemaker we got one point to control all services 09:46:00 we know this because it already happens :-( 09:46:01 beekhof: you mean, so deliberately treat HA services as distinct from non-HA ones? 09:46:07 yes 09:46:09 absolutely 09:46:13 or 09:46:33 figure out a way to redirect systemctl X to the cluster 09:46:37 we're close 09:46:47 but of course lacking the time to implement it 09:46:52 ok, but then the problem is: how do we ensure that the RAs work cross-distro? 09:46:56 there are a couple of ways to go 09:47:03 which again comes back to packaging 09:47:38 are the binaries deployed to such different locations? 09:47:45 running as completely different users? 09:48:11 beekhof: I would be surprised if everything was identical across all distros 09:48:33 IIRC, one difference was whether to prefix binaries/users/groups etc. with "openstack-" 09:48:37 we handle apache as an RA and the binary names aren't even consistent 09:48:57 across distros i mean 09:48:58 that's true 09:49:07 httpd vs. apache 09:49:39 HTTPDLIST="/sbin/httpd2 /usr/sbin/httpd2 /usr/sbin/apache2 /sbin/httpd /usr/sbin/httpd /usr/sbin/apache $IBMHTTPD" 09:49:44 wow, that's pretty ugly :-/ 09:50:02 DEFAULT_IBMCONFIG=/opt/IBMHTTPServer/conf/httpd.conf 09:50:04 DEFAULT_SUSECONFIG="/etc/apache2/httpd.conf" 09:50:06 DEFAULT_RHELCONFIG="/etc/httpd/conf/httpd.conf" 09:50:15 this is exactly what I would like to avoid 09:50:16 i'm not saying the community shouldnt standardize on something 09:50:38 the differences should be handled by the vendor packages, not by hardcoding them in the upstream RAs 09:50:48 but a) i dont think we need to drive it and b) i dont think its a requirement 09:50:54 e.g. the above breaks if a distro changes locations between major versions 09:51:00 or a reason not to do an OCF RA 09:51:20 do we really want to go down the road of having 'case "$SUSE_VERSION" in' stuff inside each RA? 09:51:38 no 09:52:05 I don't see another way, unless the RA delegates distro-specific decisions to something external 09:52:07 but i dont think the DEFAULT_ stuff is necessary 09:52:07 or 09:52:11 could be done better imho 09:52:27 OK, would you like to write up a proposal? 09:52:34 no huge rush, obviously 09:52:42 but probably doesn't make sense to delve into details here 09:53:07 i think the sticking point, is that everyone wants to have their own defaults 09:53:19 yes, and that's exactly why I proposed delegation 09:53:24 for all i care, they can be the suse values and our installer can override them 09:53:43 that's assuming that everything is parametrized 09:53:43 or vice versa 09:53:58 they usually need to be anyway 09:54:01 so we could make binary/config/pid file locations parametrized 09:54:10 better idea 09:54:17 lets decide what they should be 09:54:25 OCF_RESKEY_binary 09:54:32 It sounds good. 09:54:33 and anyone that gets it wrong has to rely on their installer 09:54:46 until they fix their packaging :) 09:55:21 It's still duplicating a bunch of stuff which exists in systemd service descriptions 09:56:00 that doesn't help convergance though 09:56:09 write something that pulls those values out? 09:56:15 3 lines of shell? 09:56:30 not sure I follow 09:57:16 grep $field $unit | awk -F= '{print $2}' 09:57:42 that seems a lot more cumbersome than simply wrapping systemctl 09:57:46 that would be the default "default default" 09:57:59 except it has one key advantage 09:58:07 i might actually work 09:58:09 it might actually work 09:58:17 systemd == nightmare 09:58:26 OK, I'm beginning to learn you don't like systemd :) 09:58:31 anyway we're out of time for now 09:58:44 let's continue this on IRC / openstack-dev@ 09:58:48 k 09:58:58 any other topics in the remaining seconds? 09:59:11 <_gryf> i guess no 09:59:20 I think we sent everyone else to sleep with our RA talk ;-)) 09:59:24 next week, I want us to start thinking milestone of the HA. 09:59:38 masahito: good idea, I'll put on the agenda! 09:59:47 aspiers: thanks! 09:59:49 ok thanks all, and see you next week, or sooner on IRC! 10:00:05 #action put roadmap on agenda for next meeting 10:00:21 thanks, bye 10:00:24 #endmeeting