Friday, 2019-05-10

*** mriedem has quit IRC00:06
*** tssurya has quit IRC01:03
openstackgerritlicanwei proposed openstack/watcher master: Fix API version header  https://review.opendev.org/65823701:52
openstackgerritMerged openstack/watcher master: Fix bandit runs with 1.6.0  https://review.opendev.org/65808902:08
openstackgerritlicanwei proposed openstack/watcher master: Add force field to api-ref  https://review.opendev.org/65824402:53
openstackgerritlicanwei proposed openstack/python-watcherclient master: Add force option  https://review.opendev.org/65737503:01
openstackgerritMerged openstack/watcher master: update wsme types  https://review.opendev.org/65721803:49
openstackgerritMerged openstack/watcher master: Fix reraising of exceptions  https://review.opendev.org/65764303:49
*** adisky__ has joined #openstack-watcher05:22
openstackgerritMerged openstack/watcher master: Allow for global datasources preference from config  https://review.opendev.org/64529406:04
openstackgerritDantali0n proposed openstack/watcher master: Improve exceptions and logging in ds manager  https://review.opendev.org/65812706:32
openstackgerritchenker proposed openstack/watcher master: Bandit's version should be equal the min value in test-requriment  https://review.opendev.org/65827306:32
openstackgerritDantali0n proposed openstack/watcher master: Improve exceptions and logging in ds manager  https://review.opendev.org/65812706:51
openstackgerritMerged openstack/watcher master: Use the common logging setup function in devstack runs  https://review.opendev.org/65765107:04
openstackgerritMerged openstack/watcher master: Add tempest voting  https://review.opendev.org/65645707:15
openstackgerritDantali0n proposed openstack/watcher master: [WIP] Grafana proxy datasource to retrieve metrics  https://review.opendev.org/64934107:17
openstackgerritDantali0n proposed openstack/watcher master: [wip] formal datasource interface implementation  https://review.opendev.org/65662207:23
*** adiantum has joined #openstack-watcher08:58
*** adiantum has quit IRC09:22
*** adiantum has joined #openstack-watcher09:27
openstackgerritsumitjami proposed openstack/watcher master: Allow using file to override metric map  https://review.opendev.org/65737409:36
*** adisky__ has quit IRC10:48
*** adisky__ has joined #openstack-watcher10:49
*** zhurong has quit IRC10:50
openstackgerritDantali0n proposed openstack/python-watcherclient master: Fix sphinx builds for python 2.7  https://review.opendev.org/65834512:32
openstackgerritDantali0n proposed openstack/python-watcherclient master: Fix sphinx builds for python 2.7  https://review.opendev.org/65834512:33
openstackgerritsumitjami proposed openstack/watcher master: pass default_config_dirs variable for config initialization.  https://review.opendev.org/65834812:39
*** adisky__ has quit IRC12:47
openstackgerritDantali0n proposed openstack/python-watcherclient master: Fix sphinx builds for python 2.7  https://review.opendev.org/65834513:02
*** mriedem has joined #openstack-watcher13:05
openstackgerritDantali0n proposed openstack/python-watcherclient master: Fix sphinx builds for python 2.7  https://review.opendev.org/65834513:40
*** ianychoi_ is now known as ianychoi14:00
mriedemi'm seeing a failure in the watcher-tempest-workload_balancing job and it looks like a race in the tempest test, live migration fails because the neutron port on the server is deleted while the server is being live migrated14:26
mriedemit actually looks like the instance itself is being destroyed before the live migration is complete, which is what deletes the port14:27
mriedemhttp://logs.openstack.org/74/657374/6/check/watcher-tempest-workload_balancing/3e34c3f/compute1/logs/screen-n-cpu.txt.gz#_May_10_10_18_19_10671714:27
mriedemMay 10 10:18:19.106717 ubuntu-bionic-vexxhost-sjc1-0006079177 nova-compute[19758]: INFO nova.compute.manager [None req-0e809061-4c10-4e82-9c17-20ebf2ca3832 tempest-TestExecuteWorkloadBalancingStrategy-1723816204 tempest-TestExecuteWorkloadBalancingStrategy-1723816204] [instance: 27813a13-39dc-490d-86f3-c1877c07a010] Took 0.28 seconds to destroy the instance on the hypervisor.14:27
mriedemhttp://logs.openstack.org/74/657374/6/check/watcher-tempest-workload_balancing/3e34c3f/compute1/logs/screen-n-cpu.txt.gz#_May_10_10_18_21_17497814:27
mriedemMay 10 10:18:21.174978 ubuntu-bionic-vexxhost-sjc1-0006079177 nova-compute[19758]: WARNING nova.virt.libvirt.driver [None req-4e88ad59-55d7-4cc2-9deb-532900c94ab6 None None] [instance: 27813a13-39dc-490d-86f3-c1877c07a010] Error monitoring migration: Failed to activate binding for port 67f1aaa7-8966-448b-9655-56e19a01fb62 and host ubuntu-bionic-vexxhost-sjc1-0006079176.: PortBindingActivationFailed: Failed to activate bind14:27
mriedemfor port 67f1aaa7-8966-448b-9655-56e19a01fb62 and host ubuntu-bionic-vexxhost-sjc1-0006079176.14:27
mriedemyou can see the nova-api request to delete the server here: http://logs.openstack.org/74/657374/6/check/watcher-tempest-workload_balancing/3e34c3f/controller/logs/screen-n-api.txt.gz#_May_10_10_18_18_86771414:28
mriedemMay 10 10:18:18.867714 ubuntu-bionic-vexxhost-sjc1-0006079176 devstack@n-api.service[6484]: INFO nova.api.openstack.requestlog [None req-0e809061-4c10-4e82-9c17-20ebf2ca3832 tempest-TestExecuteWorkloadBalancingStrategy-1723816204 tempest-TestExecuteWorkloadBalancingStrategy-1723816204] 38.108.68.233 "DELETE /compute/v2.1/servers/27813a13-39dc-490d-86f3-c1877c07a010" status: 204 len: 0 microversion: 2.1 time: 0.26433814:28
mriedemthe live migration of that server starts here http://logs.openstack.org/74/657374/6/check/watcher-tempest-workload_balancing/3e34c3f/controller/logs/screen-n-api.txt.gz#_May_10_10_17_58_00864514:31
mriedemMay 10 10:17:58.008645 ubuntu-bionic-vexxhost-sjc1-0006079176 devstack@n-api.service[6484]: DEBUG nova.compute.api [None req-908b9e12-08da-4f48-bbb2-f3b01c320d92 admin admin] [instance: 27813a13-39dc-490d-86f3-c1877c07a010] Going to try to live migrate instance to ubuntu-bionic-vexxhost-sjc1-0006079176 {{(pid=6485) live_migrate /opt/stack/nova/nova/compute/api.py:4540}}14:31
mriedemso live migration starts at 10:17:58.008645, the server delete request is at 10:18:18.867714, the guest is deleted from the hypervisor by 10:18:19.106717, and live migration fails at 10:18:21.17497814:32
mriedemseems the watcher tempest plugin / test isn't waiting for the live migration that watcher kicked off to actually complete14:32
mriedemso, the test creates an audit template from the workload_stabilization strategy, creates an audit from the template, and then polls until the audit is finished - and it's the audit that would kick off the live migration right? so i guess the race is in the workload_stabilization strategy not waiting for the live migration to complete?14:37
mriedemlooks like at this point the audit status is SUCCEEDED: http://logs.openstack.org/74/657374/6/check/watcher-tempest-workload_balancing/3e34c3f/job-output.txt.gz#_2019-05-10_10_18_22_11205914:43
mriedem2019-05-10 10:18:09,494 12011 DEBUG    [tempest.lib.common.utils.test_utils] Call partial returns true in 4.405191 seconds14:43
mriedemi'm confused because i don't see this in the watcher logs https://opendev.org/openstack/watcher/src/branch/master/watcher/common/nova_helper.py#L28215:05
mriedemor when it's waiting https://opendev.org/openstack/watcher/src/branch/master/watcher/common/nova_helper.py#L30815:05
mriedemah i see why i can't see those logs,15:21
mriedemMay 10 10:06:34.673237 ubuntu-bionic-vexxhost-sjc1-0006079176 watcher-applier[3306]: DEBUG watcher.common.service [-] default_log_levels             = ['amqp=WARN', 'amqplib=WARN', 'qpid.messaging=INFO', 'oslo.messaging=INFO', 'sqlalchemy=WARN', 'keystoneclient=INFO', 'stevedore=INFO', 'eventlet.wsgi.server=WARN', 'iso8601=WARN', 'requests=WARN', 'neutronclient=WARN', 'glanceclient=WARN', 'watcher.openstack.common=WARN', '15:21
mriedemheduler=WARN'] {{(pid=3306) log_opt_values /usr/local/lib/python2.7/dist-packages/oslo_config/cfg.py:2577}}15:21
mriedem'watcher.openstack.common=WARN',15:21
mriedemthat makes debugging this pretty hard...15:21
mriedemanyway, reported a bug https://bugs.launchpad.net/watcher/+bug/1828598 since we'll probably continue to see that in the gate15:22
openstackLaunchpad bug 1828598 in watcher "test_execute_workload_stabilization intermittently fails because server is deleted before live migration is complete" [Undecided,New]15:22
*** josecastroleon has quit IRC15:32
openstackgerritMatt Riedemann proposed openstack/watcher master: Remove watcher.openstack.common=WARN from _DEFAULT_LOG_LEVELS  https://review.opendev.org/65839915:34
mriedem^ should help with debugging gate failures ^15:34
openstackgerritMatt Riedemann proposed openstack/watcher master: docs: fix link to install guide from user guide  https://review.opendev.org/65840115:38
Dantalionmriedam: In general there is quite some flaky behavior in some of the tempest jobs haven't had time to properly look at it yet. it is the main reason I said we should improve documentation on tempest when making jobs voting. But now the jobs are voting we will have to fix them anyway as they will start blocking patches which is fine as they should be reliable and working anyway.15:41
mriedemyeah, being able to tell what the nova helper is doing wrt monitoring the live migration is the first step there15:41
mriedemi can definitely see that the server is deleted before the live migration is complete, which means the audit must think it's done too early for whatever reason15:42
DantalionI am off on holiday for the weekend so probably won't have time to look at it before monday15:42
mriedemsure, https://review.opendev.org/#/c/658399/ fixes the logging issue if you want to hit that quick15:43
DantalionI'll look at it before leaving the office15:43
DantalionHave a great weekend~15:43
mriedemyou too15:46
mriedemfyi there is a nova change proposed to change the default notification format to unversioned https://review.opendev.org/#/c/603079/ which might break watcher (which i think relies on versioned notifications)16:13
openstackgerritDantali0n proposed openstack/python-watcherclient master: Fix sphinx builds for python 2.7  https://review.opendev.org/65834516:19
openstackgerritDantali0n proposed openstack/python-watcherclient master: Remove python 2.7 build job  https://review.opendev.org/65834517:47
*** tssurya_ has joined #openstack-watcher19:23
openstackgerritDantali0n proposed openstack/python-watcherclient master: Limit sphinx version for python 2.7  https://review.opendev.org/65834519:40
*** mriedem has quit IRC20:44
*** tssurya_ is now known as tssurya22:03

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!