Thursday, 2021-04-29

openstackgerritMerged openstack/project-config master: Adds docs_branch_path value needed for promoting release branches.  https://review.opendev.org/c/openstack/project-config/+/78859300:16
kevinzianw: Morning! I wonder is there something wrong with zk02? I can not ping it anyway00:25
ianwkevinz: ahh, yes clarkb just removed it :)  they've moved to zk06/07/08 now00:25
fungi4-600:25
fungibut yes00:25
ianwsorry, 04/05/0600:26
ianwyeah, keyboard typing error :)00:26
fungikeyboard are where i make most of my typing errors as well00:26
ianwfungi: only until elon gets some electrodes in your skull :)00:27
fungii'm saving up for wristjacks, not sure a skulljack is entirely hygenic00:27
ianwkevinz: we're still seeing dropouts from nb03 to the new server(s), however.00:28
ianwprobably the more annoying thing is the aborted uploads to OSU (see thread)00:28
kevinzianw: OK,  Let me check00:30
kevinzianw:  ping zk06.openstack.org00:32
kevinzping: zk06.openstack.org: Name or service not known00:32
fungiopendev.org00:32
fungiwe're in progress renaming our servers into the new domain00:32
fungibasically new servers get names in opendev.org as we phase out use of openstack.org for anything which isn't openstack-specific00:33
*** brinzhang0 is now known as brinzhang00:33
kevinzfungi: OK,  please let me now when the re-name is finished,  so that I can continue test00:48
kevinzfungi: Oh,  I check that the opendev works.  Thanks00:48
fungikevinz: sorry, i probably phrased that confusingly. zk01.openstack.org, zk02.openstack.org and zk03.openstack.org were replaced by zk04.opendev.org, zk05.opendev.org and zk06.opendev.org00:49
fungihopefully that makes sense00:49
gmannclarkb: fungi nova grenade job is failing frequently (~90%) for this are you aware of this error - https://zuul.opendev.org/t/openstack/build/599cfa422a0648168c8b00a27fbd3114/log/logs/grenade.sh.txt#46891-4691200:56
gmannFailed to start rtslib-fb-targetctl.service: Unit rtslib-fb-targetctl.service is not loaded properly: Exec format error.00:57
fungigmann: heh, i guess you're not alone, if it helps... https://askubuntu.com/questions/1334619/failed-to-start-rtslib-fb-targetctl-service01:01
gmannyeah, this legacy grenade job is also running on Ubuntu 18.04 which should be on 20.04 since wallaby gate01:04
*** brinzhang_ has joined #opendev01:05
gmanni remember, legacy jobs are not upgraded to ubuntu 20.0401:05
*** brinzhang has quit IRC01:08
ianwclarkb: i've put holds on the nodepool jobs and rechecked 788553, see zuul ~ianw/nodepool-holds.sh01:09
*** d34dh0r53 has quit IRC01:24
*** hamalq has quit IRC01:34
kevinzfungi: OK, Thanks for clarifying.  so zk05 is the right one :-)01:52
*** brinzhang_ is now known as brinzhang02:07
*** xinliang has joined #opendev02:25
*** hemanth_n has joined #opendev02:45
*** xinliang has quit IRC04:03
*** vishalmanchanda has joined #opendev04:24
kevinzianw: fungi: I can not observe packet loss in another tenant and infra host,  but I can observe a lot of packet loss within os-control project(Just create one instance under os-control for network test)04:33
ianwkevinz: i guess we're just lucky :)04:34
kevinzAlso sometimes the ssh is broken pipe in os-control instance04:34
ianwinteresting, i don't think i've had ssh drop out, but i'm sort of glad it's not just me noticing the issue! :)04:34
kevinzianw: well, I will check os-jobs tenant first,  to exclude IPv6/IPv4 configure. (other tenant just have IPv4 enabled)04:35
ianwahh. yeah, i think the test-nodes would be less susceptible due to much less running of things that want to hang around forever.  we'd barely notice a few retries etc.04:37
kevinzianw: OK,  it sounds like os-control is the lucky one :-(04:38
ianwas i mentioned in the email thread, we do have a certain uncanny ability to break things :)04:40
ianwi think we lost openstack gerrit05:00
*** openstackgerrit has quit IRC05:01
*** sboyron has joined #opendev05:31
*** ysandeep|away is now known as ysandeep05:36
*** snapdeal has joined #opendev05:43
*** slaweq has joined #opendev05:55
*** raukadah is now known as chandankumar05:56
*** marios has joined #opendev06:00
ianwclarkb: i think we got one06:10
ianwhttps://53cc3facebff961adc76-37cbc92cf6f6e06a61846b0d3fa08d8d.ssl.cf2.rackcdn.com/788553/1/check/dib-nodepool-functional-openstack-opensuse-15-src/6ef9cf0/06:10
ianw 158.69.69.15606:11
*** avass has quit IRC06:13
*** eolivare has joined #opendev06:13
ianw#status log updated the hosts entry for freenode on eavesdrop, restart gerritbot06:15
openstackstatusianw: finished logging06:15
ianwi think it's back now, our chosen host was mia06:15
*** ralonsoh has joined #opendev06:25
ianwclarkb: ok so ...06:44
ianwhttp://paste.openstack.org/show/OOOuDOIRBf1jTQEXvgaM/06:46
ianwbasically, if we drop the "--network public" from the end, server creation works06:46
ianwRESP BODY: {"NeutronError": {"type": "NetworkNotFound", "message": "Network public could not be found.", "detail": ""}}06:47
ianwnow, why that leads the overall command to return a 500 error is an open question, but i think that's something like the root cause06:47
ianwof course, "openstack --os-cloud=devstack network show public" works :/06:49
ianwoh boo, this might be a red herring06:51
ianw"GET call to network for https://158.69.69.156:9696/v2.0/networks?name=public used request id req-b4b8d066-b0e4-434f-afd0-91237890fea5" is just below that06:51
ianwand works06:51
*** avass has joined #opendev06:58
ianwhttp://paste.openstack.org/show/804851/07:00
ianw^ the good request, and the bad request (i.e. with --network public, and without)07:00
fricklergmann: fungi: the issue with grenade/devstack on bionic was fixed by https://review.opendev.org/c/openstack/devstack/+/788429 . note however that we want to drop support for bionic on devstack in master, so folks should really migrate their jobs to focal07:02
*** hashar has joined #opendev07:04
fricklerianw: hmm, I never used the "--network xx" option before, is that new? the usual way for me is to do "--nic net-id=yy" but that needs the uuid of the network.07:06
ianwfrickler: definitely not new, but ... this error is :)07:06
*** amoralej|off is now known as amoralej07:07
*** fressi has joined #opendev07:18
*** andrewbonney has joined #opendev07:19
*** jpena|off is now known as jpena07:21
*** openstackgerrit has joined #opendev07:23
openstackgerritMerged opendev/glean master: Move to Zuul standard hacking rules  https://review.opendev.org/c/opendev/glean/+/78812707:23
*** rpittau|afk is now known as rpittau07:27
ianwwell i'm out of time07:33
ianwpretty easy to replicate07:33
*** dtantsur|afk is now known as dtantsur07:40
*** tosky has joined #opendev07:44
*** dirk has quit IRC07:45
fricklerhmm, the public network isn't shared, so it shouldn't be able to be used for instances iiuc anyway, under which condition would this work?07:49
* frickler can look closer after some upcoming meetings07:49
*** dirk has joined #opendev07:51
*** jaicaa has quit IRC08:02
*** jaicaa has joined #opendev08:04
*** ysandeep is now known as ysandeep|lunch08:09
*** jaicaa has quit IRC08:11
*** jaicaa has joined #opendev08:14
kevinzianw: It looks that the packet loss disappear09:01
kevinzianw: fungi: What I found is the virtual router sync mechanism is wrong,  inducing that 3 virtual router backend are alive for os-control-router.  So I restart the l3_agent service and the virtual router sync mechanism works. Now pinging to zk05 works without packet loss...09:03
openstackgerritMerged opendev/irc-meetings master: Update TC office hours time for Xena cycle  https://review.opendev.org/c/opendev/irc-meetings/+/78855209:09
*** amoralej has quit IRC09:36
*** fbo has quit IRC09:36
*** ysandeep|lunch is now known as ysandeep09:40
*** jpena is now known as jpena|off09:45
*** hashar has quit IRC09:48
*** jpena|off has quit IRC09:54
ianwkevinz: thanks for investigating!  i will check it thoroughly in the morning :)10:01
*** fbo has joined #opendev10:10
*** jpena has joined #opendev10:46
*** akahat is now known as akahat|ruck10:50
*** whoami-rajat has joined #opendev10:52
*** hemanth_n has quit IRC10:53
*** iurygregory has quit IRC10:54
*** chrome0 has quit IRC10:55
*** chrome0 has joined #opendev10:55
*** iurygregory has joined #opendev10:58
*** jpena is now known as jpena|lunch11:31
*** hashar has joined #opendev12:09
*** snapdeal has quit IRC12:20
*** hrw has joined #opendev12:25
hrwmorning12:25
fungihrw: not sure if you saw but ianw found the centos arm64 bug, apparently it's since been fixed in rhel but there's no clear picture of how long the new packages will take to make it into centos12:26
hrwfantastic!12:27
*** jpena|lunch is now known as jpena12:27
fungi(summary: binutils was updated to set more aggressive compiler optimizations, which have since been walked back but lots of stuff built with one of the "bad" binutils versions needs recompiling)12:27
hrwlet me dig into logs12:29
fungihttps://bugzilla.redhat.com/show_bug.cgi?id=194651812:31
openstackbugzilla.redhat.com bug 1946518 in binutils "binutils-2.30-98 are causing go binaries to crash due to segmentation fault on aarch64" [Unspecified,Modified] - Assigned to nickc12:31
hrwyeah12:37
hrwjust finished reading http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2021-04-27.log.html12:37
hrwand I lack access to bug 1875912 just like ianw12:38
openstackbug 1875912 in pulseaudio (Ubuntu) "Selected audio output always USB device as default, but no control" [Undecided,Expired] https://launchpad.net/bugs/187591212:38
hrwno openstack, https://bugzilla.redhat.com/show_bug.cgi?id=1875912 one12:39
openstackhrw: Error: Error getting bugzilla.redhat.com bug #1875912: NotPermitted12:39
hrw;d12:39
fungineat12:41
*** fbo has quit IRC13:27
*** fbo has joined #opendev13:29
*** vishalmanchanda has quit IRC13:30
gmannfrickler: thanks, once this merge I think we can remove bionic support, i think that is last deps on legacy jobs https://review.opendev.org/c/openstack/nova/+/778885/1013:33
*** fressi has left #opendev13:43
*** fbo has quit IRC14:03
*** fbo has joined #opendev14:08
*** d34dh0r53 has joined #opendev14:09
*** marios is now known as marios|call14:14
*** hashar has quit IRC14:26
openstackgerritAde Lee proposed zuul/zuul-jobs master: Add role to enable FIPS on a node  https://review.opendev.org/c/zuul/zuul-jobs/+/78877814:49
fricklergmann: sadly it is not the last one, there seem to be some in cinder, neutron and octavia, too http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22Failed%20to%20start%20rtslib-fb-targetctl.service%5C%2214:53
toskyfrickler : legacy jobs? Bionic jobs?14:54
gmannfrickler: did not know they also have job on bionic. anywaysi pushed the patch with -W and will notify this on ML to give time of 1-2 week or so https://review.opendev.org/c/openstack/devstack/+/78875414:55
gmanntosky: yeah on bionic, example cinder-plugin-ceph-tempest-mn-aa - https://opendev.org/openstack/cinder/src/branch/master/.zuul.yaml#L16614:56
toskygmann: uh, that may be an oversight, I will ask about it, thanks14:58
gmannocatvia has nested-virt-ubuntu-bionic14:58
openstackgerritAde Lee proposed zuul/zuul-jobs master: Add role to enable FIPS on a node  https://review.opendev.org/c/zuul/zuul-jobs/+/78877814:59
*** marios|call is now known as marios15:00
toskygmann: for jobs which inherits from devstack, whenever you need to have a multinode job, you need to explicitly set nodeset: openstack-two-node-<version>; shouldn't we have a generic node which matches the default base node? Like openstack-two-node-devstackdefaultplatform?15:01
gmanntosky: we have base multinode job in devstack running on latest distro15:02
toskygmann: but if you have a custom multinode job which inherits from another job you want to just set the nodeset15:02
toskythat's the case for that bionic job15:02
toskywhat we lack now is a "two node nodeset which use the default platform that devstack jobs use"15:03
clarkbfrickler: were you able to look closer at the server create thing yet? I'll try looking at it in a bit if not15:06
gmanntosky: will check, in tc meeting currently. we can disucss this on qa channel may be15:08
toskysure15:12
toskyI may not be around in a bit, but... async IRC, I will answer at some point!15:12
*** fressi has joined #opendev15:46
*** fressi has quit IRC15:47
*** ysandeep is now known as ysandeep|away15:49
clarkbianw: frickler: first thing I'vechecked is that `source /opt/devstack/openrc admin admin` produces env vars for the user and project domain (it does both set to default)16:00
*** mlavalle has joined #opendev16:02
clarkbianw: frickler: next thing I notice is that when you drop the --network specification you still get the same error. It just happens later. The instance is successfully created but then enters an error state later16:06
clarkb| fault                       | {'code': 500, 'created': '2021-04-29T16:04:25Z', 'message': 'Build of instance 0dc6e600-908c-4e06-aff4-955dc8a22ee9 aborted: Expecting to find domain in project. The server could not comply with the request since it is either malformed or otherwise incorrect. The client is assumed to be in error. (HTTP 400) (Reques'} |16:06
clarkbI think specifying --network Public just trips over the problem quicker16:06
clarkbif I create the server as admin admin rather than demo demo it too fails. However when you are admin you get the full traceback back in the fault entry16:09
clarkber --network public not --network Public16:10
clarkbexplicitly setting --os-user-domain-name default --os-project-domain-name default --os-domain-name default doesn't seem to help16:13
clarkblooking at the nova config I see [neutron] user_domain_name = Default but no project_domain_name as under [keystone_authtoken]16:15
clarkbI also wonder if the domain name is case sensitive as we haev Default in nova.conf and default in openrc16:15
clarkbgmann: ^ do you know if that could be related?16:16
*** jpena|off has joined #opendev16:18
*** hamalq has joined #opendev16:22
*** hamalq has quit IRC16:23
*** hamalq has joined #opendev16:24
*** jpena has quit IRC16:25
clarkbok the domain id is 'default' and the domain name is 'Default' (I'm sure this is incredibly difficult to change now but making those different and not making a uuid for the id is incredibly confusing)16:26
clarkbI added project_domain_name = Default to the [neutron] section and did systemctl stop devstack@n-api.service && systemctl start devstack@n-api.service and no change16:26
clarkbas a further sanity check project show demo and project show admin both report domain_id   | default16:28
*** marios is now known as marios|out16:28
mordredclarkb: devstack sets a domain name of Default and a domain id of default16:30
clarkbmordred: ya, its incredibly confusing16:31
mordredah - I see you found thtq16:31
clarkbI've also confirmed that this domain info is passed in the token request "domain": {"id": "default", "name": "Default"}16:31
mordredyes. it is INSANELY confusing16:31
mordredactually, it's not just a devstack thing16:31
mordredit's a keystone thing16:31
mordredand it would be super hard to change at this point iirc16:32
clarkbI suspect this is either a setup issue with keystone (that is what i did the project shows above) or a bug in nova/neutron/keystone16:32
*** jpena|off has quit IRC16:32
clarkbeverything I can see on the client side seems to be accurate for v3 domain usage16:32
clarkband maybe tempest doesn't ever hit this beacuse they create a tempest specific project domain user etc and that setup is correct? I dunno just throwing out ideas right now16:32
clarkbuser show admin and user show demo also show domain_id           | default16:33
clarkbdomain show default shows enabled     | True16:33
clarkbthe next thing to look at is probably the request from nova to keystone for a token to do the neutron work16:35
clarkbthe one that fails16:35
gmannclarkb: in tempest, we also use 'default' as default domain id16:36
clarkbgmann: do you createa a new user and project?16:38
gmannunless creds are asked for a particular domain where tempest create new domain16:38
gmannclarkb: for dynamic creds (default one), yes16:38
clarkbgmann: I wonder if that is why we aren't seeing this with tempest then16:38
clarkbsince we're just trying to use the default demo project and demo user16:38
clarkbdoes anyone know if you can convince keystone to do unsafe logging?16:39
clarkbI can see the request but none of the request details which amkes debugging not easy16:39
clarkb(I can probably tcpdump between the tlsproxy and the keystone process too16:39
clarkbside note: we're running this job on bionic and devstack master wants focal I think16:39
clarkb(though I really doubt that would cause this problem)16:40
gmannyeah, devstack master is moving to focal completely if we can do any soon https://review.opendev.org/c/openstack/devstack/+/78875416:41
clarkbhrm we seem to use a unix socket to proxy to keystone. Can I dump the traffic on that somehow? maybe just with cat?16:45
clarkb(I don't think so)16:45
clarkbor at least not with cat16:45
clarkbheh the internet says use socat in the middle16:45
clarkboh neat adding the project_domain_name to the [neutron] section of the nova.conf allows --network public to work, but then it fails later with the similar error16:49
clarkbso that did make a difference16:49
*** dtantsur is now known as dtantsur|afk16:49
clarkbthe fault is now fault                       | {'code': 500, 'created': '2021-04-29T16:43:00Z', 'message': 'Build of instance 8714c2b0-fefc-4c8d-abe5-ed19c099e397 aborted: It is not allowed to create an interface on external network 5aafef62-c241-4da4-b4c7-5d5dffa916e8'}16:49
clarkbhrm but if I use --network private or leave off --network it is back to borted: Expecting to find domain in project. The server could not comply with the request since it is either malformed or otherwise incorrect. The client is assumed to be in error. (HTTP 400) (Reques'}16:54
*** iurygregory has quit IRC16:58
*** rpittau is now known as rpittau|afk16:59
clarkbBOOM | 1b0c01c7-81ab-4bcc-9d40-7214db8e58ea | clarkb-test  | ACTIVE | private=10.1.0.27, fd1f:8d2f:a260:0:f816:3eff:fe8f:4609 | cirros-0.5.2-x86_64-disk | cirros256 |16:59
clarkbturns out that n-cpu has a separate config file from the rest of nova and you have to add project_domain_name there too17:00
clarkbgmann: the problem is that nova.conf and nova-cpu.conf do not specify project_domain_name under [neutron]17:00
clarkband getting a 500 error back from the create call is when you hit the path in nova api using nova.conf that fails and when you get an error instance it is because n-cpu failed using nova-cpu.conf17:01
*** marios|out has quit IRC17:02
gmannbut i think devstack set in both17:03
clarkbgmann: not on this default devstack install. I had to manually add them17:03
clarkbgmann: I think that it must be setting them most of the time though because most of the time this stuff works17:04
clarkbmaybe there is a race in copying configs around?17:04
clarkblet me find the devsatck log for this job17:04
gmannyeah that is what i am suspecting17:04
clarkbgmann: https://53cc3facebff961adc76-37cbc92cf6f6e06a61846b0d3fa08d8d.ssl.cf2.rackcdn.com/788553/1/check/dib-nodepool-functional-openstack-opensuse-15-src/6ef9cf0/job-output.txt the devsatck log is in the job output17:04
clarkb(note the job runs on ubuntu bionic and builds an opensuse 15 image for testing in the nested openstack but none of that even starts to run because devstack isn't working)17:05
gmannclarkb: here but its neutron/neutron-legacy script set it.  https://opendev.org/openstack/devstack/src/branch/master/lib/neutron-legacy#L38717:12
gmannand nova.conf is what copied to nova-cpu.conf initially before compute specific configs17:13
gmanncan you check the order of copying 'cp /etc/nova/nova.conf /etc/nova/nova-cpu.conf' with neutron/neutron-legacy set it on nova.conf.17:15
gmanncopying should be later once neutron-legacy set it in nova.conf17:15
clarkbgmann: ya looking at the log above the iniset on nova.conf for neutron project_domain_name happens at 2021-04-29 06:03:51.802688 then the copy to nova-cpu.conf is at 2021-04-29 06:06:17.97977517:15
clarkbgmann: I wonder if this is related to paralellized setups?17:16
clarkbgmann: is it possible that the netron config for nova happens before nova does its setup and things get overwritten?17:16
gmannhumm, good point17:16
gmannclarkb: if i remember you said its happening always?17:18
clarkbgmann: no this issue is maybe 10% of the time17:18
clarkbbut when it breaks that break is 100% fatal17:18
gmannohk, may be we can check with disabling paralellization17:18
gmannto confirm17:18
clarkbgmann: well we have the log above we should be ablet ot work from that to undersatnd how it happens?17:19
*** andrewbonney has quit IRC17:19
clarkbgmann: also another interesting thing is that [neutron] user_domain_name is set at near the same port as project_domain_name, but user_domain_name remains in the config17:20
gmannhumm, is it just project_domain_name missing or any other config too https://opendev.org/openstack/devstack/src/branch/master/lib/neutron#L35517:21
clarkbgmann: I'll check. Also I do see a bug with order of operations. 2021-04-29 06:03:51.205615 | ubuntu-bionic | + ./stack.sh:main:1243                     :   iniset /etc/nova/nova-cpu.conf key_manager fixed_key happens before 2021-04-29 06:06:17.979775 | ubuntu-bionic | + lib/nova:start_nova_compute:903          :   cp /etc/nova/nova.conf /etc/nova/nova-cpu.conf17:25
clarkbgmann: just project_domain_name17:26
*** ralonsoh has quit IRC17:26
clarkbalso all of the keystone_authtoken config seems to be good and that is configured before the [neutron] section17:27
*** iurygregory has joined #opendev17:31
*** iurygregory has quit IRC17:31
clarkbgmann: looks like nova-cpu.conf has its config merged from localrc (though our localrc doesn't seem to contain the post-config stuff it operates on) and it does inidelete on some sections like the database17:33
clarkbI need to step out for a bit, unfortunately haven't found any claer indications for why that config is missing yet17:33
gmannyeah, same i was searching in case during merge it gets deleeted 2021-04-29 06:06:18.000195 | ubuntu-bionic | + lib/nova:start_nova_compute:905          :   merge_config_file /opt/devstack/local.conf post-config '$NOVA_CPU_CONF'17:34
*** iurygregory has joined #opendev17:38
*** eolivare has quit IRC17:51
gmannnova-cpu.conf is passed in touched in rpc_backend too but could not see anything removing the project_domain_name.  2021-04-29 06:06:18.262601 | ubuntu-bionic | + lib/rpc_backend:iniset_rpc_backend:158   :   local file=/etc/nova/nova-cpu.conf18:08
clarkbgmann: it is also nova.conf that had the problem. Almost like the original iniset against nova.conf failed18:16
clarkband then we just copied that from nova.conf to nova-cpu.conf18:16
frickleroh, wow, looks like that's exactly the kind of weird races I feared would happen with the devstack async code18:29
clarkbfrickler: the way the logging records things doesn't seem to be out of order though18:30
clarkbbut maybe the recording isn't quite as it seems?18:30
clarkbhttps://opendev.org/zuul/zuul-jobs/src/branch/master/roles/ensure-devstack/templates/local.conf.j2 that is the local.conf we run with. One thing I notice in there is we disable a bunch of services and that could change async ordering and explain why others haven't seen similar18:32
clarkb/opt/stack/async/configure_neutron_nova.log does exist which implies that was done asynchronously with $otherthings18:37
clarkbok I have a theory18:38
clarkbhttps://opendev.org/openstack/devstack/src/branch/master/stack.sh#L1202-L125018:39
clarkbwe start the async neutron config of nova then run merge_config_group18:40
clarkbwe also iniset in nova.conf and nova-cpu.conf18:40
clarkbI think it is this https://opendev.org/openstack/devstack/src/branch/master/stack.sh#L1237-L1244 that runs around the same time as the async configure_neutron_nova18:42
clarkband they will both be reading and writing the same files so there is a race18:42
clarkbI'll push a patch18:43
*** sboyron has quit IRC18:44
*** sboyron has joined #opendev18:44
gmannin recording it seem .1 sec difference between both.18:46
gmannanyways this is good to be moved to lib/nova side https://opendev.org/openstack/devstack/src/branch/master/stack.sh#L1237-L124418:48
gmannin nova.conf only and then start_nova_compute  will copy it to nova-cpu.conf18:48
clarkbremote:   https://review.opendev.org/c/openstack/devstack/+/788820 Fix async race updating nova configs18:49
clarkbI think that may fix it18:49
fricklerclarkb: small nit, but I was looking at that range, too. seems plausible that this is uncovered only with swift disabled, otherwise the start_swift would probably take long enough to take the race away18:53
clarkbfrickler: ++18:53
clarkbfixing up the change now18:53
clarkbI'll also note that it may be related to disabling swift18:53
fricklero.k., I'll check results tomorrow, going offline now18:54
clarkbfrickler: thanks18:55
*** hashar has joined #opendev19:13
*** sboyron has quit IRC19:18
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Reset connection before testing build ssh-keys  https://review.opendev.org/c/zuul/zuul-jobs/+/78882619:56
*** cenne|out is now known as cenne20:32
clarkbfungi: for elod's proejct-config changes to reparent openstack stuff the first 10 or so lgtm. If you are around enough maybe you want to take a look too and approve some of them (though we probably don't want to approve all at once?)20:55
ianwclarkb: great find!!!  i suspected async must have been in there somewhere21:07
clarkbianw: ya there was a lot of confusion over why it would happen in different ways, but once I realized there were two configs invovled and both lacked the expected project_domain_name everything started to come to gether21:09
ianwthe fact that server creation worked without "--network" would have also helped hide this further21:11
clarkbianw: ya but it didn't actually succeed21:11
clarkbnova would just hit the same error later21:11
clarkbI think specfiying network upfront caused n-api to validate it against neutron and fail earyl. If you didn't specify it then n-cpu would tell neutron to do the right thing and fail at that point21:12
ianwit did create the server, but maybe network wasn't functional21:17
ianwanyway, very glad to have found that one ... $RANDOM errors are the worst21:17
ianwkevinz: YAY!! i think you fixed it :)  overnight we've uploaded several images to OSU and not one dropped out21:18
clarkbianw: ya it created the server then it entered and ERROR state. I suspect that if nodepool had tried to run against it we would've failed there21:18
clarkbsince the nodepool tests ssh in and confirm some glean stuff as well as the growroot iirc21:18
clarkbgmann: can you review https://review.opendev.org/c/openstack/devstack/+/788820 if we can get that landed I think it will help nodepool and dib testing (and then we can land some dib chagnes for bullseye)21:19
*** slaweq has quit IRC21:21
*** SWAT has quit IRC21:27
ianwi got a question about intel_zuul not being able to comment, did anyone already look into that?21:33
openstackgerritMerged openstack/project-config master: Move projects under meta-config acl (2)  https://review.opendev.org/c/openstack/project-config/+/78673921:33
openstackgerritMerged openstack/project-config master: Move projects under meta-config acl (3)  https://review.opendev.org/c/openstack/project-config/+/78674021:33
clarkbianw: no, but chances are they are trying to vote and the project doesn't allow it21:33
openstackgerritMerged openstack/project-config master: Move projects under meta-config acl (4)  https://review.opendev.org/c/openstack/project-config/+/78855521:34
openstackgerritMerged openstack/project-config master: Move projects under meta-config acl (5)  https://review.opendev.org/c/openstack/project-config/+/78855621:34
clarkba number of third party ci systems have run into that recently (I'm guessing that means $projects went and updated group membership on which CI systems can vote)21:34
openstackgerritMerged openstack/project-config master: Move projects under meta-config acl (6)  https://review.opendev.org/c/openstack/project-config/+/78855721:34
fungii think a bunch of third-party ci operators have been rebuilding their ci systems, and their new configurations aren't an exact match for their old behaviors21:34
ianwyeah the exact query is "we are still not able to get Intel_Zuul to comment on nova / neutron etc -  comments on the sandbox just fine"21:35
clarkbianw: I think only these third party ci systems can vote on nova for example https://review.opendev.org/admin/groups/841ff9d50c89ab50925f127c8b388792639af64f,members21:36
clarkband Intel_Zuul is not one of them21:36
ianwthat makes sense.  i thought this was a longer-standing CI but as you say it was either dropped, or something else changed21:37
clarkbthey should be able to leave commenst without votes though (any account can do that21:38
ianwyeah, not clear exactly what's going on, but good place to start :)21:39
fungito restate, wallaby dropping support for legacy jobs and devstack-gate is (thankfully) pushing a lot of zuul v2 + jenkins ci systems to be rebuilt with newer zuul so they can continue to use upstream job definitions21:39
fungii'd be willing to bet many of them are following an example some where which includes configuration to +1/-1 changes21:40
clarkbwouldn't surprise me. I think zuul's default examples include that for a check queue too21:42
*** hrw has quit IRC21:48
*** hrw has joined #opendev21:48
gmannclarkb: +A, was waiting for ate result.21:50
gmanngate21:50
clarkbgmann: thanks!21:50
clarkb(depends on won't work for us I don't think so getting that landed is great)21:50
fungiyeah, the pending bullseye changes need another dib release and nodepool image bump after they merge anyway21:53
ianwi'm still seeing glibc 155 packages in https://mirror.iad.rax.opendev.org/centos/8-stream/BaseOS/aarch64/os/Packages/21:54
ianwso i guess "soon" hasn't occured yet21:55
ianwi'll keep an eye on things to try and get the gate flowing21:55
ianw"The spice must flow!"21:57
clarkbianw: do you know why these fixes didn't end up in stream first? I thought that was supposed to be the direction?21:57
clarkbseems like stream will be difficult to use if it gets the new stuff first but the fixes last21:58
clarkbbasically you get all the risk and none of the mitigation21:58
ianwi don't fully understand the flow, and yeah the latency on pulling it, and the unclear way that happens, i don't find ideal22:00
*** hashar has quit IRC22:06
fungii wouldn't be surprised if they're still trying to figure out the sequence themselves22:07
ianwyeah, i'm remaining calm :)22:22
clarkbianw: the othr thing we should do is update these nodepool jobs to run on focal because devsatck is dropping bionic support22:26
ianwclarkb: sure.  you would have seen that i proposed we remove the non-containers test with https://review.opendev.org/c/zuul/nodepool/+/78840622:27
clarkbianw: yup, that was what started my debugging of the registry server iirc22:28
clarkbmaybe not, there have been a lot of changes all trying to get through an failing on variety of unrelated problems lately22:28
ianwanyway, i have that wip to use clouds.yaml for the containers test, i can also tweak one ontop to use focal and bring the whole thing up to 2021 :)22:29
clarkboh I see I saw the dib side change, this is the nodepool side22:35
clarkbI +2'd the dib change but didn't approve it as I think we want to approve things once this devstack fix lands22:37
clarkb(otherwise we're playing the odds at the casino)22:37
clarkbfungi: looks like manage-projects reports success from that set you approved22:39
clarkbfungi: https://zuul.opendev.org/t/openstack/builds?job_name=infra-prod-manage-projects+ fwiw22:39
fungiclarkb: yeah, spot checks show they updated. i guess i'll approve another batch22:43
clarkbfungi: I left some notes on a few of them where we may want to be careful and I -1'd the one that updates openstack/project-config22:44
clarkbbut otherwise ya I think we can do a few batches22:44
openstackgerritMerged openstack/project-config master: Move projects under meta-config acl (7)  https://review.opendev.org/c/openstack/project-config/+/78855822:52
openstackgerritMerged openstack/project-config master: Move projects under meta-config acl (8)  https://review.opendev.org/c/openstack/project-config/+/78856122:57
openstackgerritMerged openstack/project-config master: Move projects under meta-config acl (9)  https://review.opendev.org/c/openstack/project-config/+/78856722:57
openstackgerritMerged openstack/project-config master: Move projects under meta-config acl (10)  https://review.opendev.org/c/openstack/project-config/+/78856922:57
openstackgerritMerged openstack/project-config master: Move projects under meta-config acl (11)  https://review.opendev.org/c/openstack/project-config/+/78857122:57
clarkbarg nova-ceph-multistore failed on the devsatck fix23:02
clarkbianw: ^ fyi23:02
clarkbit failed 1 tempest test23:02
ianwtest_image_glance_direct_import[id-32ca0c20-e16f-44ac-8590-07869c9b4cc2]23:04
ianwfail23:04
ianwtesttools.matchers._impl.MismatchError: 'success' != 'processing'23:04
ianwi wonder if this is a case of not giving it enough time23:04
clarkbI wonder if ceph is the galnce image store too (so could be specific to the ceph job)23:04
*** whoami-rajat has quit IRC23:13
ianwi have no idea how to correlate thing across that test23:27
clarkbI rechecked the chagne figuring it is unlikely related to my change23:29
*** tosky has quit IRC23:40
ianwhttps://review.opendev.org/c/openstack/openstacksdk/+/786814 is released now, so we should be able to fix our stats reporting from nodepool23:52

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!