07:01:00 #startmeeting masakari 07:01:00 Meeting started Tue Oct 13 07:01:00 2020 UTC and is due to finish in 60 minutes. The chair is yoctozepto. Information about MeetBot at http://wiki.debian.org/MeetBot. 07:01:01 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 07:01:03 The meeting name has been set to 'masakari' 07:01:08 #topic Roll-call 07:01:11 o/ 07:01:43 0/ 07:01:49 o/ 07:02:53 (do note whatever you write in roll-call is fine; it's just to encourage ppl to announce their presence) 07:04:50 \m/ this then 07:05:30 xD 07:05:36 suzhengwei: around? 07:07:34 #topic Agenda 07:07:47 * Roll-call 07:07:51 * Agenda 07:07:55 * Announcements 07:07:59 ** Victoria release this week! https://releases.openstack.org/victoria/schedule.html 07:08:03 ** Virtual OpenStack Infrastructure Summit is right the next week: https://www.openstack.org/summit/2020/ 07:08:07 ** Masakari Wallaby vPTG planned (in just two weeks! next masakari meeting cancelled): https://etherpad.opendev.org/p/masakari-wallaby-vptg 07:08:10 * Review action items from the last meeting 07:08:13 * CI status 07:08:17 * Backports pending reviews 07:08:19 I am 07:08:21 * Critical Bugs and Patches 07:08:25 * Release planning 07:08:29 * Open discussion 07:08:38 hi suzhengwei 07:08:43 hi 07:08:44 #topic Announcements 07:08:55 #info Victoria release this week! https://releases.openstack.org/victoria/schedule.html 07:09:35 so we made it with RC1 so far; the project is pretty dormant so that was not exactly unexpected 07:09:50 any thoughts on the Victoria release? 07:10:40 what needs to be done for it still? 07:13:53 jopdorp: release approved 07:14:14 but if you are asking about masakari, then we have loads of ideas :-) 07:14:22 just not enough hands/time to deliver 07:14:36 #info Virtual OpenStack Infrastructure Summit is right the next week: https://www.openstack.org/summit/2020/ 07:14:49 you can still register as far as I know 07:15:08 it's a week-long online event 07:15:15 you might find something interesting to you 07:15:21 yeah 07:16:57 #info Masakari Wallaby vPTG planned (in just two weeks! next masakari meeting cancelled): https://etherpad.opendev.org/p/masakari-wallaby-vptg 07:17:11 this is *the* most important announcement today 07:17:28 I'll be there 07:17:52 me too 07:18:06 \o/ 07:18:19 \m/ 07:19:48 please add/review items on that page even before the sessions start 07:19:54 it is more productive this way 07:20:03 I did mine, I saw suzhengwei did as well 07:21:31 ok, I'll think about it 07:22:18 thanks 07:22:27 #topic Review action items from the last meeting 07:22:33 let me find it 07:23:11 oh, there were none 07:23:14 so 100% success 07:23:21 (beat it) 07:23:21 \m/ 07:23:29 #CI status 07:23:33 #topic CI status 07:24:23 not much happening; the last runs were green happy 07:24:28 so GREEN 07:24:54 #topic Backports pending reviews 07:25:32 not only backports, but all stable pending changes: 07:25:38 #link https://review.opendev.org/#/q/(project:openstack/masakari+OR+project:openstack/masakari-monitors+OR+project:openstack/python-masakariclient+OR+project:openstack/masakari-dashboard)+status:open+-branch:master 07:25:59 ^ please review the above, they are one vote away from merging (I did my part!) 07:26:19 will do 07:28:58 #topic Critical Bugs and Patches 07:29:55 and there are none - I will remove this point from the agenda next time if I can't hear any objections - I believe these can still be discussed during the open discussion (if they ever happen) 07:30:44 ok 07:31:26 ok 07:35:28 #topic Release planning 07:35:52 well, it has been planned; for wallaby please remember about participating in the ptg :-) 07:40:13 yoctozepto: Do we need reno for bugfix in V? 07:40:44 There is no releasenote for bugfix patch. 07:43:19 suzhengwei: if it went like this (e.g. this is backport) already, then we don't need it 07:43:39 for all new ones we should strive to get renos because they increase the visibility of all (and others) efforts 07:43:50 and show that the project is actively maintained 07:44:05 #topic Open discussion 07:44:29 (I missed to switch the topic) 07:44:41 please go on with any things you want 07:45:30 i was wondering what a ptg looks like 07:45:50 is like working together in a virtual office on the project for a couple of days 07:45:51 ? 07:45:57 like a hackathon or something? 07:47:36 More like a project video meeting. 07:47:40 another thing is, we are looking into how to decrease failover times, to minimize downtime for our edge locations when they lose power 07:47:52 suzhengwei: okay 07:48:29 I'd be up for tha on a more regular basis too 07:49:01 That is a good point which I study recently too. 07:50:16 it's a more direct way of communicating which is good for efficiency and motivation, and getting to know the team better 07:51:03 suzhengwei, jopdorp: we can do video meetings regularly, yes; we do it with kolla as well; need to schedule right time though 07:51:24 both of you use kolla too? 07:52:12 maybe we could change the default config in kolla to make failover times faster 07:52:41 jopdorp: I develop kolla but kolla should be discussed in #openstack-kolla ;-) 07:52:54 I was looking at this one https://review.opendev.org/#/c/732477/3/specs/victoria/approved/promotion-for-large-scale-hosts-failure.rst 07:53:20 I think it does make sense to prioritize instances 07:53:43 yeah, suzhengwei has some nice ideas in specs already 07:53:45 or example if a client has a kubernetes cluster, prioritize failover for master over workers 07:54:53 what about adding hosts automtically back to pacemaker when they come back up? 07:55:51 I've been working with a test environment which is very unstable, to test out failover approach 07:56:20 and had to remove and re add the host from pacemaker every time we test a failover 07:57:05 also had to reenable it in nova and set on maintenance back to false 07:58:05 also I'm thinking about automatically moving instances back to the failed hosts they came from after a set period of time 07:58:14 jopdorp: well, it's a protection measure against flops 07:58:18 flops are worse 07:58:38 what is a flop? 07:59:13 jopdorp: flaps* (/me broken) 07:59:24 haha 07:59:26 what is a flap? 07:59:38 it is a situation where the instability causes the node to be repeated up/down 07:59:42 yeah 07:59:47 causing the environment to recalculate each time 08:00:02 maybe the set period of time can fesolve that 08:00:02 and wasting more time on constant moves rather than keeping it simple 08:00:04 resolve 08:00:08 wait 48 hours for example 08:00:30 operator should really monitor the infrastructure and act accordingly 08:00:35 the main goal for me would be to have instances move back to the failed nodes after some time 08:00:38 well, at least that's the common consensus 08:00:46 (and it's more about pacemaker itself than masakari) 08:00:55 (we can also consider consul for masakari) 08:00:57 because in our infrastructure failure is expected on leaf(edge) nodes 08:01:04 but we are out of time today 08:01:15 which run at locations outside of regular datacenters 08:01:19 jopdorp: add these points to the ptg discussion please 08:01:29 thank you for coming, jopdorp, suzhengwei 08:01:31 will do! 08:01:33 thanks 08:01:34 #endmeeting