04:00:25 #startmeeting masakari 04:00:26 Meeting started Tue Jul 11 04:00:25 2017 UTC and is due to finish in 60 minutes. The chair is samP. Information about MeetBot at http://wiki.debian.org/MeetBot. 04:00:27 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 04:00:29 The meeting name has been set to 'masakari' 04:00:36 hi all o/ 04:00:42 hi all 04:00:44 o/ 04:00:48 hi 04:00:57 let's start.. 04:01:13 #topic Critical Bugs 04:01:20 any bugs to discuss? 04:01:51 BTW, review request is done.. 04:02:23 Thanks! 04:02:50 If no bugs to discuss, then let's move to discussion points. 04:03:03 samP: ok 04:03:16 ok. 04:03:18 #topic Discussion points 04:03:41 1. Make nova on_shared_storage configurable 04:04:34 We discussed about this topic last week. 04:05:02 ah.. 04:05:07 In last meeting it is decided to make "on_shared_storage" option configurable. Will submit patch soon. 04:05:22 Agenda is not uptodate then..:) 04:05:34 Dinesh_Bhor: rkmrHonjo thanks.. 04:06:09 sorry, I should hv update it.. 04:06:36 Dinesh_Bhor: Do you want to discuss about this now? 04:06:55 Just have one question related to this: will there be a situation like half of the instances are on shared-storage and half not in real deployment? 04:08:00 example: host aggregate, some host aggregate on shared storage and others on non shared storage 04:09:14 First, in order to use Masakari, instance must be on shared storage. 04:09:56 if instance in not in shared storage evacuate = rebuild and can not rescue it. 04:11:42 Then there is no point of making this option configurable 04:11:46 Can you pass on_shared_storage option through nova API? 04:11:57 Yes 04:12:06 samP: How do you think about user who uses boot-from-volume(and non shared storage)? 04:12:30 You can evacuate instance if it's not on shared storage 04:13:09 rkmrHonjo: in that case, on_shared_storage does not make any diff. 04:13:46 rkmrHonjo: because, it will be volume detach and re-attach after evacuate. right? 04:14:36 Dinesh : please update our findings about on shared storage parameter true and false 04:14:53 samP: I got it. 04:15:02 tpatil: yes 04:15:35 if on_shared_storage option isTrue and the instance files are not on shared storage actually then evacuate calls fails 04:15:43 I remember I fixed some thing related to this in past.. 04:15:48 #link https://review.openstack.org/#/c/320231/ 04:16:32 whereas if the on_shared_storage option is False then there is no issue 04:18:59 Dinesh_Bhor: true 04:19:27 #link https://developer.openstack.org/api-ref/compute/?expanded=evacuate-server-evacuate-action-detail 04:19:34 Please see ^^ 04:19:43 Starting since version 2.14, Nova automatically detects whether the server is on shared storage or not. 04:19:52 Therefore this parameter was removed. 04:20:31 Am I going in wrong direction? 04:20:55 samP: Should we bump nova API version over 2.14 in masakari? Current version is 2.9. 04:21:11 #link https://github.com/openstack/masakari/blob/master/masakari/compute/nova.py#L42 04:21:12 in which micro version this parameter was removed 04:22:44 #link https://github.com/openstack/nova/blob/master/releasenotes/notes/remove-on-shared-storage-flag-from-evacuate-api-76a3d58616479fe9.yaml 04:23:03 it is removed in microversion 2.14 04:23:39 What if the instances are booted from imaages? 04:24:48 tpatil: glance images? 04:25:29 yes 04:26:06 sorry, Still cant see the problem.. 04:27:26 for image booted instance, if ephemeral disk is in shared_storage then, it works in normal way, right? 04:28:01 samP: you said the on_shared_storage parameter is removed, what if the instances are booted from images. THe data storage on the instances will be lost after evacuation, right? 04:28:56 samP: if instances path is not using shared_storage 04:29:00 nova automatically detects whether it is in shared storage or not.. 04:29:09 tpatil_: in that case, yes 04:29:31 samP: If it's not on shared storage, does it fail to evacuate if instance is booted from image 04:30:20 tpatil_: I have to check, but I think it will not fail. instance will still evacuated (= rebuild) 04:30:44 samP: ok, do we want to allow masakari to evacuate instance in such cases 04:31:14 samP: I have checked this. It evacuates = rebuilds 04:31:21 samP: i.e. if instance_path is not on shared storage 04:33:26 tpatil_: if operator define non shared storage cluster in masakari, then masakari will evacuate those instances.. 04:33:41 tpatil_: which operator should not do.. 04:34:44 if we can prevent that by set on_shared_storage option, then that would be good. 04:35:15 samP: in future, we might need to use nova version 2.14 above so making this option configurable is not a good solution 04:35:15 However, my point is we can not control it after API v2.14 04:35:57 samP: so let's not make on_shared_storage option configurable 04:36:12 tpatil_: agree. 04:36:31 tpatil_: +1 04:36:57 do we have bug report for this issue? 04:37:07 samP: but the questions remains if instances are booted from image, then there will be data loss. I don't think it's acceptable to the users 04:37:46 samP: I think we can document saying use masakari only if the instance path are on shared_storage 04:38:18 tpatil_: understand.. but instances booted from image on non shared storage... how do we rescue it when compute node no longer there.. 04:39:41 samP: I understand that we don't have any control, but I just wanted to point from users perspective that there will be data loss 04:39:56 tpatil_: agree. we should put this in README.rst 04:40:07 samP: sure 04:41:45 tpatil_: that is a good point.. all the presentations we did in the past, this was an unspoken agreement.. 04:42:24 tpatil: Will you modify masakari codes? I think that we should bump nova API version to 2.14 and remove on_shared_stroage parameter from nova.py. 04:43:08 rkmrHonjo: Sure, we will submit a patch soon 04:43:19 2. Instance gets auto-confirmed(uses new flavor) if masakari evacuates an instance which was partially resized(resize-confirm is not performed) 04:43:22 tpatil: thanks. 04:43:23 tpatil_: thanks.. 04:43:35 samP: regarding second point, I have sent mail to operators mailing list but haven't got any constructive feedback yet, #link http://lists.openstack.org/pipermail/openstack-operators/2017-July/013905.html 04:43:43 abhishekk: thanks.. 04:44:22 I asked some ops to take a look at this.. 04:44:28 samP: I am also discussing same on operators IRC channel, but tow pepoples said they havent got this situation where they need to evacuate resized instance 04:44:34 currently only Saverio has replied .. 04:44:44 s/tow/two 04:45:20 It is a very rare situation 04:45:57 But still critical to us.. 04:46:52 samP: #link http://eavesdrop.openstack.org/irclogs/%23openstack-operators/%23openstack-operators.2017-07-10.log.html#t2017-07-10T08:37:13 04:47:01 On the other hand, most people do not use evacuate .. 04:47:23 In the meantime until nova makes changes to evacuate api to address this issue in masakari 04:47:24 From masakari side I have updated the patch which evacuates and stops the resized instance after evacuation on the basis of power_state of instance: https://review.openstack.org/#/c/469029/ 04:48:18 samP: IMO, we should fix this issue in masakari first and keep following up with nova community to address this issue in nova 04:48:26 Dinesh_Bhor: Thanks 04:48:44 tpatil_: agree.. fix nova will take time 04:49:24 samP: Ok, I will review Dinesh's patch taking this point into consideration 04:49:39 Let's review and merge this.. 04:49:43 tpatil_: thanks.. 04:49:48 samP: Sure 04:50:16 3. Remove ERROR instances from recovery targets when host failure happen 04:50:28 I want to add a configurable option. Error instances will be remove from recovery targets if the option is set. 04:50:33 Because some users don't want to launch error instances after recoverying. 04:50:40 Ofcourse there is a possibility that following patch resolve this issue, but that will take time. 04:50:47 #link https://review.openstack.org/#/c/469029/ 04:52:03 rkmrHonjo: do you want to remove only "ERROR" instances? 04:52:55 samP: Yes. But I think that there is another solution. Writing the rescuable statuses in masakari.conf. 04:53:17 With the above patch error instance will be stopped after evacuation 04:54:05 In the master code error instances will be evacuated and the final state will be active 04:54:27 Dinesh_Bhor: rkmrHonjo's proposal is not to evacaute error instance, which is slightly different from stop after evacuate, right? 04:54:44 yes 04:55:03 samP: yes. 04:55:35 rkmrHonjo: "rescuable statues" in config is much like Recovery method customization 04:55:40 so even isntance which is in error state and marked as HA_Enabled True will be ignored in this case right? 04:55:51 abhishekk: right 04:56:24 samP: thanls 04:56:28 abhishekk: yes, that is my wish. 04:56:33 s/thanls/thanks 04:57:14 3 minutes left 04:57:16 I do not think it is a good idea to list down all rescuable statues in config, where we have spec for "Recovery method customization" 04:57:23 abhishekk: yep 04:57:36 Let's continue this discussion in ML 04:57:38 I will fix comment given by rkmrHonjo on API specification patch 04:57:56 samP: OK. I'll send a mail. 04:58:03 I will send a mail with my thoughts.. 04:58:07 abhishekk: thanks a lot! I'll check it. 04:58:14 ah...ok 04:58:20 rkmrHonjo: please.. 04:58:24 I have submitted updated specs for recovery method customization #link https://review.openstack.org/458023 04:58:31 abhishekk: thanks.. 04:58:43 in this specs I have mentioned which actions we need to add in mistral 04:58:49 please have a look at it 04:58:58 abhishekk: great.. I will review this.. 04:59:04 1m left 04:59:06 samP: thank you 04:59:34 please offload to #openstack-masakari or ML with [masakari] for further discussions.. 04:59:46 thank you all 04:59:51 thank you. 04:59:55 #endmeeting