06:06:14 #startmeeting masakari 06:06:15 Meeting started Tue Jun 1 06:06:14 2021 UTC and is due to finish in 60 minutes. The chair is yoctozepto. Information about MeetBot at http://wiki.debian.org/MeetBot. 06:06:16 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 06:06:18 The meeting name has been set to 'masakari' 06:07:04 #topic Roll-call 06:07:29 bot broke 06:07:41 \o/ 06:07:41 \o/ \O/ 06:07:46 hi jopdorp 06:07:50 hi suzhengwei 06:07:51 0/ 06:08:01 glad to see you in the new network 06:08:44 oftc! 06:08:49 #topic Agenda 06:08:55 * Roll-call 06:08:55 * Agenda 06:08:55 * Announcements 06:08:55 * Review action items from the last meeting 06:08:55 * CI status 06:08:57 * Backports pending reviews 06:08:57 * Xena planning -> https://etherpad.opendev.org/p/masakari-xena-ptg 06:08:59 * Open discussion 06:09:24 #topic Announcements 06:09:57 it's kind of obvious but we moved IRC channels to the new network - OFTC - that we are currently on 06:10:19 this information probably makes more sense in the saved logs read externally because you clearly know it being here ;-) 06:10:35 #topic Review action items from the last meeting 06:10:38 there were none 06:10:44 #topic CI status 06:10:58 I saw it green recently 06:11:03 #topic Backports pending reviews 06:11:08 none 06:11:15 #topic Xena planning -> https://etherpad.opendev.org/p/masakari-xena-ptg 06:11:22 ok, this is the interesting part 06:11:29 have we got more progress? 06:13:16 not from my side 06:13:47 ok 06:15:51 update https://review.opendev.org/c/openstack/masakari/+/788382 06:15:59 I saw suzhengwei commented on my comment 06:16:02 oh, right, that one 06:16:08 I have not read it yet 06:17:10 no hurry 06:17:50 read it 06:18:05 but I'm not sure if you agree with me or disagree 06:18:32 was there any issue with the flow I presented in my comment? 06:19:27 My opinion, no need wat for the result if it disable or force-down nova-compute service. 06:20:21 There is no asynchronous call in nova inside. 06:21:06 well, if we check if it's up and it is, then we should try checking for it going down; if nova still sees the host, then we could say either abort or force it down and continue (this could be up to the user's choice) 06:21:57 The return of calls to nova-api can show whether the status/state has changed sucessefully. 06:23:15 we check if it's up just because we are not sure it has been fenced. 06:25:50 the nova-compute diable or not will not stop evacuation. but down or not matters. 06:26:14 suzhengwei: yes, we don't know if it's down precisely ;-) 06:26:17 also 06:26:32 if nova thinks it's up and it's actually down, then it will fail disabling the service as well 06:27:00 I think disabling the service is an extra such that the host does not come into play if it gets back on and needs operator's intervention 06:27:23 the order of actions is unfortunate though, I pointed you at the relevant bug reports 06:31:06 yep, we can disable the compute firstly, but I don't think need to wait. the enable_disable_service call returns show something. 06:31:45 It the call failed, it will raise exception. 06:32:25 suzhengwei: but it waits 06:32:35 last time I checked, there was some serious delay 06:32:53 because the disable wanted to speak via mq to the nova-compute service 06:37:57 but it is synchronous calls in nova inside. think that if nova-compute already down, which one to disable or force-down the compute service. 06:38:16 not nova-compute, all in the nova-api. 06:38:50 if it dellay, the call would respone dellay too. 06:39:37 delay 06:40:14 suzhengwei: yeah, it's synchronously waiting for an answer on mq 06:40:34 if nova-api thinks nova-compute is up 06:40:38 but it's actually not 06:40:46 then disabling it is going to timeout 06:41:08 try it out locally 06:41:39 just without masakari, ensure nova-compute has just been confirmed to be up, then firewall that host away and try disabling the service in nova 06:41:50 I am very sure. 06:42:52 Even the compute node down. I can still disable or force-down it freely. 06:43:11 if nova knows it's down, then yes 06:43:26 if nova thinks it's up but it's not, timeout 06:43:36 unless they changed that in recent series 06:43:56 because that's what it was in ussuri for sure 06:46:49 I suggest you check and I will check as well 06:49:55 the code in nova project. service_update_by_host_and_binary function in nova/compute/api.py 06:50:49 It just changes db in the nova-api process. 06:53:35 suzhengwei: but it finally calls https://github.com/openstack/nova/blob/5cf06bf33d8f187d444f812177946e134e4c9932/nova/compute/api.py#L5863 06:53:49 which has the unfortunate self.rpcapi.set_host_enabled 06:54:09 and it usually just timeouts 06:57:36 oh, i see. It need synchronous with pacement. But it save status first, then rpcapi.set_host_enabled. 06:57:56 placement 06:59:04 suzhengwei: yeah, I would say it's an edge situation on nova's side 06:59:26 ok, the point is we are disabling the service as a service for the operator and not because we require it 06:59:41 and we have to end the meeting 07:00:00 thanks for the discussion, that was a fruitful meeting 07:00:07 see you next time 07:00:10 #endmeeting