15:01:21 #startmeeting neutron_ci 15:01:22 Meeting started Tue Apr 27 15:01:21 2021 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:23 hi 15:01:24 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:26 The meeting name has been set to 'neutron_ci' 15:01:28 hi 15:01:32 Hi 15:01:38 o/ 15:01:51 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:02:39 and now we can start 15:02:56 #topic Actions from previous meetings 15:03:06 first one 15:03:08 slaweq to update wallaby's scenario jobs in neutron-tempest-plugin 15:03:21 I did, all patches are merged but I don't have links now 15:03:32 next one 15:03:33 bcafarel to report stable/rocky ci failures on LP 15:05:13 https://bugs.launchpad.net/neutron/+bug/1924315 and our fearless PTL close to fixing it (when CI is happy) 15:05:13 Launchpad bug 1924315 in neutron "[stable/rocky] neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid-rocky job fails" [Critical,In progress] - Assigned to Slawek Kaplonski (slaweq) 15:05:40 "fearless PTL" :D 15:05:46 although I was looking in https://bugs.launchpad.net/neutron/+bug/1925451 - grenade seems to fail about 50% of the time on that DistutilsError 15:05:46 Launchpad bug 1925451 in neutron "[stable/rocky] grenade job is broken" [Critical,New] 15:05:47 you made my day now 15:06:07 :) 15:06:23 patch https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/786657 should fix that original issue 15:07:05 regarding grenade one, did You check if that is e.g. failing only one some of the cloud providers? 15:08:13 no, good point I will check that 15:08:19 thx 15:08:33 so lets continue this discussion later/tomorrow 15:09:03 we need to fix it finally and unblock rocky's gate 15:09:12 +1 15:09:25 ralonsoh: please check https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/786657 :) 15:09:31 this is also needed for rocky gate 15:09:32 done 15:09:40 thx 15:10:02 ok, next one 15:10:04 ralonsoh to mark test_keepalived_spawns_conflicting_pid_vrrp_subprocess functional test as unstable 15:10:27 no progress last week, but related to the kill signal 15:10:34 no progress, sorry 15:10:48 I will set it for You for this week, ok? 15:10:51 sure 15:10:55 #action ralonsoh to mark test_keepalived_spawns_conflicting_pid_vrrp_subprocess functional test as unstable 15:10:56 thx 15:11:04 next one 15:11:05 slaweq to report LP with metadata issue in scenario jobs 15:11:11 Bug reported: https://bugs.launchpad.net/neutron/+bug/1923633 15:11:11 Launchpad bug 1923633 in neutron "Neutron-tempest-plugin scenario jobs failing due to metadata issues" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 15:11:25 this is currently IMO most hurting us bug in ci 15:11:42 we investigated that with ralonsoh last week 15:11:46 https://review.opendev.org/c/openstack/neutron/+/787777 15:11:51 and we think we know what is going on there 15:11:55 doesn't help too much 15:12:08 still same issues? 15:12:19 yes but less frequent 15:12:28 :/ 15:12:37 so maybe we don't know exactly what is the problem there 15:12:44 the socket receiving the messages is not responsive 15:13:03 for very long time, or forever? 15:13:09 long time 15:13:18 another option could be not to mock socket module 15:13:23 in the L3 agent 15:13:24 I wonder what have been changed recently there 15:13:30 I'll try it in this patch 15:13:41 as it wasn't that bad few weeks back 15:13:45 s/mock/monkey_patch 15:14:26 ok, lets try that 15:14:31 ok 15:14:42 I also made patch https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/787324 which should mittigate the issue a bit at least 15:15:04 it's merged now, so hopefully jobs will be a bit more stable now 15:15:34 and also, if You will see in job error that "router wasn't become active on any L3 agent" than it means for sure that You hit the same bug 15:15:49 so it will be easier to identify that specific issue now 15:15:53 right 15:16:44 ok, next topic 15:16:46 #topic Stadium projects 15:16:51 any updates? 15:17:04 no specific thing 15:17:26 CI is working (at elast where I see new patches :-) 15:17:39 that section will probably heat up with OVN switch 15:17:40 ok, that's good news :) 15:17:42 thx 15:17:50 bcafarel: true 15:18:02 maybe we should start changing jobs definitions where it's needed? 15:18:23 any volunteer to do that? 15:18:27 yeah perhaps, to make it happen paralelly 15:18:44 I can check 15:18:49 thx lajoskatona 15:19:14 #action lajoskatona to check stadium job's and what needs to be switched to ovs explicity 15:19:26 ok, next topic 15:19:28 #topic Stable branches 15:19:34 anything to discuss? 15:19:38 except rocky 15:20:34 there is still https://bugs.launchpad.net/neutron/+bug/1923412 for stein, I hope to finally take a look this week 15:20:34 Launchpad bug 1923412 in neutron "[stable/stein] Tempest fails with unrecognized arguments: --exclude-regex" [Critical,Triaged] 15:21:00 bcafarel: ouch, I missed that one 15:21:08 it's the same issue like for rocky 15:21:14 or very similar 15:21:17 bcafarel: oh, there is a devstack change which may solve that (but you can still fix it by refactoring the jobs) 15:21:34 namely https://review.opendev.org/c/openstack/tempest/+/787455 15:21:40 tempest, not devstack 15:22:01 tosky: oh nice! I will test it as depends-on on one of our stein pending backports 15:22:20 nice, thx tosky 15:23:01 or you can do what I did for cinder-tempest-plugin 15:23:14 https://review.opendev.org/c/openstack/cinder-tempest-plugin/+/786755 15:23:41 but that requires branch-specific job variants and maybe a bit of refactoring (or it may be easy, depending on your job structure) 15:23:56 Have you read the TC pad (https://etherpad.opendev.org/p/tc-xena-ptg ~l360) about EOLing old branches (ocata....) ? 15:24:02 yes, I did something similar for our rocky jobs already 15:24:27 ocata hopefully will be finally EOLed 15:24:40 and pike, I guess it depends on more projects abandoning it (we did it in cinder) 15:24:56 (so if you think about abandoning pike, please do it :) 15:25:02 ok, so perhaps the lavina will start :-) 15:25:07 :) I don't recall recent backport requests on pike 15:25:38 me neighter 15:25:42 only queens and newer 15:25:54 but still, even queens and rocky are starting to be pain 15:26:12 no open pike backport, last merge in July 2020 15:26:25 yeah, in a non far future (in cinder, again) we are thinking about abandoning those too 15:26:34 ++ 15:26:45 we can think about it also 15:26:50 or just do it 15:26:55 I will take a look 15:26:59 it seems one of those things where, if no one starts, it's never going to happen 15:27:59 yeah but silently we anyway skip those branches 15:28:37 so better give that message to the community in an official way: this is gone 15:28:42 true, it's just not officially marked as EOL 15:28:54 I will check how to do it in next week 15:29:11 thx for bringing that topic up 15:30:16 +1 15:30:25 ok, lets move on 15:30:27 #topic Grafana 15:31:58 looking at dashboard, the only big problem which I see is that one with neutron-tempest-plugin jobs 15:32:25 and that is mostly cause by the bug with L3 HA which we already discussed earlier 15:34:03 do You see anything else You want to discuss? 15:34:08 or can we move on? 15:35:34 let's go to next topic yes 15:35:40 ok, let's go 15:35:45 #topic fullstack/functional 15:35:55 Here there is just one quick thing 15:36:05 please review new test https://review.opendev.org/c/openstack/neutron/+/783748 :) 15:36:10 sure 15:36:20 thx 15:36:29 I don't have any new issues from those jobs for today 15:36:36 #topic Tempest/Scenario 15:36:44 here there is couple of new issues 15:36:55 first one, there is bug reported by Liu: 15:37:00 https://bugs.launchpad.net/neutron/+bug/1926109 15:37:00 Launchpad bug 1926109 in neutron "SSH timeout (wait timeout) due to potential paramiko issue" [Critical,New] 15:37:32 but tbh, I'm not sure if that isn't the same issue with L3 HA like we discussed already 15:37:41 is this one related to the ha router? 15:37:42 the problem is that in that case there is no console log logged 15:37:45 yes, same concern 15:38:06 I think we should first add log of the vm's consolelog first 15:38:15 and then we will see if that's not duplicate 15:38:23 any volunteer to do that? 15:38:26 exactly, to check the metadata update 15:38:34 I can (at the end of the week) 15:38:35 or, even better 15:38:51 I can add, will se if can do before ralonsoh 15:38:56 we should be able to know it without console log now, when https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/787324 is merged 15:39:10 but console log could be useful always 15:39:17 so thx lajoskatona and ralonsoh for taking care of it 15:39:25 yeah, console output will help 15:39:37 #action ralonsoh or lajoskatona will add logging of the console log, related to the https://bugs.launchpad.net/neutron/+bug/1926109 15:39:37 Launchpad bug 1926109 in neutron "SSH timeout (wait timeout) due to potential paramiko issue" [Critical,New] 15:39:38 :) 15:39:45 I assigned it to both of You :P 15:40:37 I also found one issue with multicast test in the ovn job: 15:40:38 https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b66/712474/7/check/neutron-tempest-plugin-scenario-ovn/b661cd4/testr_results.html 15:40:58 but I need to check if that is something what happens more often and report LP with it 15:41:26 #action slaweq to check frequency of the multicast issue in the ovn job and report a LP bug for that 15:42:28 and last one topic for today 15:42:34 #topic Periodic 15:42:47 I just noticed that nftables jobs are failing every day 15:42:52 like e.g. https://619cfb3845a212f70f8d-f88cc2e228aea8b2c74f92ce7ecb609d.ssl.cf2.rackcdn.com/periodic/opendev.org/openstack/neutron/master/neutron-tempest-plugin-scenario-linuxbridge-nftables/1d9785e/job-output.txt 15:43:02 and it's like that for both of them 15:43:16 they are failing on "[nftables : Restore saved IPv4 iptables rules, stored by iptables-persistent]" 15:43:29 yeah... ok, I'll check it 15:43:32 thx 15:43:39 ralonsoh: to check periodic nftables jobs 15:43:44 #action ralonsoh: to check periodic nftables jobs 15:44:10 and that's basically all what I have for today 15:44:24 do You have anything else to discuss now? 15:44:43 or if not, I'm closing the meeting and calling it a day finally :) 15:45:53 in that case, nothing to add for me :) 15:45:53 ok, so thx for attending the meeting 15:45:56 bye! 15:46:02 o/ 15:46:02 o/ 15:46:03 have a nice day, and see You online 15:46:06 #endmeeting