04:02:54 #startmeeting Masakari 04:02:55 Meeting started Tue May 19 04:02:54 2020 UTC and is due to finish in 60 minutes. The chair is tpatil. Information about MeetBot at http://wiki.debian.org/MeetBot. 04:02:56 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 04:02:58 Hi All 04:02:59 The meeting name has been set to 'masakari' 04:03:16 Sampath san should be joining any time 04:03:24 Roll call? 04:03:52 hi 04:03:58 Hi all, sorry Im late 04:04:04 suzhengwei: Hi 04:04:09 samP: Hi 04:04:16 o/ 04:05:41 meeting started, right? 04:05:44 samP: Yes 04:05:55 tpatil: Thanks 04:06:14 samP: Please go ahead and start the topic discussions 04:06:19 sure 04:06:52 #topic Victoria work items 04:07:08 First let's take a look at what left for victoria 04:07:19 #link : https://etherpad.opendev.org/p/masakari-victoria-workitems 04:07:32 I have moved the work items from Ussuri to above etherpad 04:07:39 tpatil: thanks 04:08:31 nice 04:09:55 I have add "promotion for large scale host failure". 04:10:10 spec: Evacuate non-recovery (’HA_enabled = False’) instances in shutoff status at host failure except specified tenant 04:10:32 This looks https://review.opendev.org/#/c/714615/1 pretty usefull one 04:11:03 spec lgtm. I think we can review and merge the code soon. 04:11:33 noonedeadpunk: thanks, that is the code for above spec. right? 04:11:40 yep 04:12:17 samP: I will review the spec and code in this week 04:12:48 suzhengwei: Thanks for adding promotion for large scale host failure. Is this a feature or problem statement? Could please add more details on this. 04:14:24 I will give a spec about it later. 04:14:43 suzhengwei_: Thanks.. 04:14:57 (2) Modify masakari-hostmonitor in order to run it inside container 04:15:39 There is one finding about systemctl command that's used in corosync command as well 04:16:42 sorry for small offtopic, I think I just missed a feature, but how failed host goes to reserved state after recovery? by setting post in host_rh_failure_recovery_tasks? 04:18:45 noonedeadpunk: Are you asking how does recovery_method='reserved_host' works? 04:18:50 kst 04:19:20 not really. just in spec there is "When a failed host goes back to system as a reserved host" 04:19:22 noonedeadpunk: your question is, how a failed host become a reserved host? 04:19:52 yep. as by default it do not become reserved one after recovery 04:21:36 It should have finished recovery workflow and rejion the compute resource pool, then it can be update to reserved by api. 04:22:10 Ok, but it's not smth masakari does at the moment? 04:22:38 masakari dont have that feature now. 04:22:56 reserved hosts is set by operator. 04:23:15 ah, ok I see. Just out of spec it seemed that it has one 04:23:24 yes 04:23:36 Btw it might be pretty useful and not really hard to implement I think.. 04:24:10 You are allowed to set on_maintenance and reserved of the failed host. Once the VMs are evacuated from the failed host. You can update host to change on_maintenance and reserved parameters. 04:24:46 This doesn't happen automatically, operator will need to do it manually. 04:25:00 Are you saying this procedure should be automated? 04:25:26 oh, I see your point now. Like as you need to update on_maintenance manually anyway it's useless to set it to reserved 04:25:33 It is a nice feature to have. However, before you bring back the failed host to cluster again, you have make sure it worn't break again. 04:26:04 yeah, ok, it's fair 04:26:22 In current operations. we leave that part to operator. 04:27:06 Moving back to run hostmonitor inside container 04:27:34 noonedeadpunk: if you have comments or questions, please feel free to add them to spec 04:27:38 corosync itself uses systemctl command. I'm not able to find the link of the source code. 04:27:38 tpatil: sure 04:27:49 I will add it to the etherpad after this meeting 04:28:01 tpatil: sure, thanks 04:28:34 we do not have any bug or gerrit code review related to this, right? 04:29:31 No 04:31:19 tpatil: OK thanks. Let's see how we can proceed this in next meetings. 04:31:30 About, Command parameter support to segment list command to filter out segments based on host input parameter 04:31:50 samP: Sure.I will post all details in etherpad in this week. 04:31:51 we discussed about this in past meetings and agree on how to proceed. 04:33:21 Enable/Disable evacuation segment wise 04:34:05 suzhengwei_: Thanks, and sorry for the review delay. 04:34:07 so we should add a new REST API Get /hosts?host=xyz which shouldn't include segment_id, is it correct? 04:34:16 I would like to move this spec to victoria. 04:34:17 tpatil: correct. 04:34:28 samP: Ok 04:35:26 Add victoria cycle spec firstly. https://review.opendev.org/#/c/723297/ 04:35:36 suzhengwei_: sure, let's do the review and merge this on early Victoria. Then hopefully we can finish this feature in V 04:36:28 suzhengwei_: Thanks, I will merge this first.. 04:36:43 OK 04:36:58 I have one question, what value should be set to "state"? 04:37:15 I want to split the implent commit for easy review. 04:37:36 suzhengwei_That would be really helpful. 04:37:39 IMO, it should be either ENABLED/DISABLED or ACTIVATED/DEACTIVATED, instead of False/True? 04:37:51 we use 'enable' already. If use 'state', we would have a update problem. 04:38:19 imo boolean is better as a value 04:38:38 since it would have only 2 states 04:38:58 yes, a boolean value. 04:39:46 sorry, it's enable not state. 04:40:30 should it be called as state/status and value can be as stated above 04:42:00 Like I really see no reason here to invent some naming for really boolean value. Also bool takes less space in db storage comparing to varchar :p 04:42:19 this is not important conflict. But to me, if changed to 'state', my cloud have to change its api and db table. 04:43:16 I would have to do some extra meaningless work. 04:43:50 enable should be changed to enabled atleast 04:44:05 +1 04:44:31 Anyway, I will post this comment on the spec and later you guys can comment on it. 04:45:51 ok 04:47:24 both options are good for me. Let's discuss further on spec. 04:48:26 suzhengwei_: please add more details for "promotion for large scale host failure". So we can discuss this on up coming meetings. 04:48:43 ok. 04:49:26 #topic List of patches waiting for Victoria 04:50:10 Please add any patches you need to get merge in V. 04:50:18 I have reviewed couple of patches and posted my comments 04:50:34 So I originaly joined meeting to talk about https://review.opendev.org/#/c/728629/ 04:51:07 Today I have posted my comments. did you check these comments? 04:51:09 I read your comment tpatil just during the meeting 04:51:13 ok 04:51:17 noonedeadpunk: thanks for the patch 04:51:35 I will review it later. 04:51:59 So the thing is, that hypervisors are used in the code only once and for checking hosts while adding them 04:52:20 And all futher operations are provided with compute api 04:53:41 So like even if ppl configure pacemaker to use hypervisor names and add them to masakari - masakari would just faile while completing action 04:54:11 As won't be able to find corresponding service to disable and evacuate 04:54:16 noonedeadpunk: I'm ok with your changes. just that, we need a migration command in masakari-manage to update the host from hypervisor_name to host 04:54:37 this is required for operators who was already using masakari in production env. 04:54:56 Yeah, I see, was just answering first comment:) 04:55:07 s/comment/question 04:55:36 time is running out, let's be quick. 04:55:52 noonedeadpunk: So operators basically use nova.services.host when they add a node in pacemaker cluster, is it correct? 04:56:01 Just might need some help with writing migration thing 04:56:14 tpatil: they are supposed so at least 04:56:34 otherwise things do not work out of my experience 04:56:46 noonedeadpunk: sounds reasonable to me. Let's discuss further on the gerrit. 04:56:53 ok 04:56:56 noonedeadpunk: Ok, I will help you to write a migration command in masakari-manage tool. 04:57:24 tpatil: like let me try doing it:) IF I got stuck I'll ping you?:) 04:57:38 noonedeadpunk: Sure 04:58:16 sorry I was disconnected.. 04:58:24 me, too. 04:59:08 So I originaly tried to not touch much code to make patch backportable 04:59:12 but yeah 04:59:22 Samp: not wait when instance evacuate error, I have read your comment. 05:00:45 noonedeadpunk: i will review and add my comments on the patch. You will get the help you need for migtaion scripts 05:01:01 suzhengwei_ I think that it tpatil's comment, right? 05:01:07 Anyway no more time left. 05:01:15 suzhengwei_: yes 05:01:23 sorry :) 05:01:35 Let's discuss further on ML and gerrit. Thank you all for joining for the meeting today 05:01:47 tpatil: could you please end the meeting. 05:02:03 suzhengwei_: NP, if you have any questions, I'm available on openstack-masakari IRC 05:02:09 samP: Sure 05:02:18 Thank you all for joining this meeting 05:02:26 Take care, Bye 05:02:31 #endmeeting