Monday, 2021-04-26

ianwagain with that 500 error.  we don't collect the devstack logs so it's very hard to tell what's going on00:42
openstackgerritMerged openstack/diskimage-builder master: Ensure redhat efi packages are reinstalled during finalise  https://review.opendev.org/c/openstack/diskimage-builder/+/78680401:08
*** hemanth_n has joined #opendev02:52
*** ysandeep|away is now known as ysandeep03:55
*** vishalmanchanda has joined #opendev04:52
*** ykarel has joined #opendev05:29
*** slaweq has joined #opendev05:35
*** ykarel_ has joined #opendev05:52
*** ykarel has quit IRC05:54
*** sboyron has joined #opendev05:59
*** ralonsoh has joined #opendev06:01
*** marios has joined #opendev06:16
*** eolivare has joined #opendev06:29
*** lbragstad_ has joined #opendev06:36
*** amoralej|off is now known as amoralej06:36
*** lbragstad has quit IRC06:40
*** dtantsur|afk is now known as dtantsur06:46
*** ykarel_ is now known as ykarel06:50
*** iurygregory has joined #opendev06:58
openstackgerritMerged openstack/diskimage-builder master: dib-run-parts: stop leaving PROFILE_DIR behind  https://review.opendev.org/c/openstack/diskimage-builder/+/78730307:00
*** pongboom has joined #opendev07:06
*** andrewbonney has joined #opendev07:07
*** rpittau|afk is now known as rpittau07:12
openstackgerritMerged opendev/system-config master: Remove IRC bots from #ara  https://review.opendev.org/c/opendev/system-config/+/78789107:14
*** ysandeep is now known as ysandeep|mtg07:19
*** fressi has joined #opendev07:19
*** tosky has joined #opendev07:42
*** openstack has joined #opendev07:51
*** ChanServ sets mode: +o openstack07:51
*** brinzhang0 has joined #opendev07:52
openstackgerritMerged openstack/project-config master: Remove IRC bots from #ara  https://review.opendev.org/c/openstack/project-config/+/78789207:53
openstackgerritMerged openstack/project-config master: Make gerritbot linter know all the supported events  https://review.opendev.org/c/openstack/project-config/+/78789407:53
*** jpena|off is now known as jpena07:54
*** brinzhang_ has quit IRC07:55
*** DSpider has joined #opendev08:01
*** parallax has joined #opendev08:08
*** openstackgerrit has quit IRC08:11
*** ysandeep|mtg is now known as ysandeep08:11
*** ysandeep is now known as ysandeep|lunch08:15
*** calcmandan has quit IRC08:16
*** calcmandan has joined #opendev08:17
*** ildikov has quit IRC08:24
*** ildikov has joined #opendev08:29
*** hamalq has joined #opendev08:33
*** hrw has joined #opendev08:36
hrwmorning08:37
hrwfungi: have you had time to find out why python3 failed on centos-8-stream-aarch64 nodes?08:37
*** hamalq has quit IRC08:38
*** ykarel is now known as ykarel|lunch09:16
*** ysandeep|lunch is now known as ysandeep09:19
*** dtantsur is now known as dtantsur|bbl10:08
*** ykarel|lunch is now known as ykarel10:29
*** hamalq has joined #opendev10:34
*** hamalq has quit IRC10:39
*** jpena is now known as jpena|lunch11:26
*** amoralej is now known as amoralej|lunch12:09
*** dtantsur|bbl is now known as dtantsur12:10
kopecmartinhi, have been any changes done lately which could affect email notifications from review.opendev.org? .. f.e. notifications from reviews (owned, or cced, as reviewer ...) on new commits, zuul results, comments etc?12:10
*** jpena|lunch is now known as jpena12:28
*** openstackgerrit has joined #opendev12:28
openstackgerritRadosÅ‚aw Piliszek proposed openstack/project-config master: Make gerritbot notify about V-2 on kolla  https://review.opendev.org/c/openstack/project-config/+/78788712:28
yoctozeptokopecmartin: they work for me today; any specific details you want to share?12:30
kopecmartinyoctozepto: i've just noticed that i haven't received any email from gerrit since 24 Apr .. and I created 1 review, rechecked several ones and got cced in at least one ... and i haven't done any changes in my profile12:32
kopecmartinvery weird12:32
kopecmartini don't know what to check / verify12:32
yoctozeptoyeah, you spammed me as well12:32
kopecmartin:D12:32
yoctozeptoI know it's the dreadful question but... you sure the e-mail address is correct? :D12:33
*** hamalq has joined #opendev12:35
kopecmartinyoctozepto: yes :) .. i'm going all mail now to verify my filters didn't betray me12:38
*** lpetrut has joined #opendev12:38
*** hamalq has quit IRC12:39
yoctozeptooh those treacherous filters!12:41
fungihrw: no, i haven't found time to work out how ansible is invoking python to be able to more directly recreate the fault and load the coredump into gdb12:47
fungikopecmartin: i suppose one possibility is your mailserver is rejecting messages from gerrit. if you can't find then being deleted or misfiled after delivery, i could check out mta logs for outbound delivery errors related to your address12:49
*** lbragstad_ is now known as lbragstad12:51
*** amoralej|lunch is now known as amoralej13:02
fungi#status log Requested Spamhaus PBL delisting for the IPv4 address of review.opendev.org, which should take effect within the hour13:04
openstackstatusfungi: finished logging13:04
fungikopecmartin: ^ see if you start receiving notifications by ~14:00 utc13:04
kopecmartinfungi: thank you13:04
fungithanks for mentioning you were having a problem!13:04
funginow if we can just figure out how it keeps getting readded to the pbl (or rather how its exclusion keeps getting removed)13:05
*** sshnaidm|afk is now known as sshnaidm13:10
fungiinfra-root: just a heads up, i intend to try to stay away from the computer for a lot of this week so i can catch up on a backlog of tasks around the house, but i'll probably quietly knock out a few lingering tasks here as well and keep an ear to the ground in case of emergencies13:17
*** vishalmanchanda has quit IRC13:41
*** d34dh0r53 has quit IRC13:48
fricklerfungi: spamhaus says "NOTE:  Exclusions are only valid for 1 year", not sure how to best handle that but I'll make a note in my calendar for next year. we also should check whether we can get some non-PBLed address if we ever move the host again, like to vexxhost13:48
fricklerindeed, last this happened was 2020/04/17 according to my logs13:49
fricklerso one year plus 7 days grace period sounds very plausible13:50
*** d34dh0r53 has joined #opendev13:57
*** ysandeep is now known as ysandeep|afk14:15
*** slaweq has quit IRC14:16
*** slaweq_ has joined #opendev14:16
*** slaweq_ is now known as slaweq14:16
openstackgerritMerged openstack/project-config master: Make gerritbot notify about V-2 on kolla  https://review.opendev.org/c/openstack/project-config/+/78788714:18
*** vishalmanchanda has joined #opendev14:21
*** openstackgerrit has quit IRC14:23
fungifrickler: oh, thanks i hadn't noticed the exclusions expire. yeah we probably need to go back through and check a bunch of addresses periodically in that case. and yes this is really just rackspace i think, they blanket pbl listings for their entire address space so they can not have to deal with spam complaints about compromised customers or folks abusing their services14:23
*** fressi has quit IRC14:24
*** whoami-rajat has joined #opendev14:25
*** lpetrut has quit IRC14:30
*** hamalq has joined #opendev14:36
hrwfungi: thanks for info14:36
*** hamalq has quit IRC14:40
*** lassimus has quit IRC14:50
clarkbianw: we probably need to (temporarily) collect some openstack logs in the nodepool/dib jobs to try and track that down14:50
clarkbI'm catching up on email and scroll back this morning, but I'm not seeing anything that would indicate today is a bad day to start the zk swaps. I do have one errand I think I need to run this morning but after that I can dive into zk14:51
*** ysandeep|afk is now known as ysandeep15:14
*** ykarel has quit IRC15:21
frickler#status log Requested Spamhaus PBL delisting for the IPv4 address of mirror01.dfw.rax.opendev.org15:23
openstackstatusfrickler: finished logging15:23
frickler#status log Requested Spamhaus PBL delisting for the IPv4 address of nb01.opendev.org15:35
openstackstatusfrickler: finished logging15:35
fricklerfungi: ^^ fyi these are the ones I found in my log from february last year that spamhaus reported in PBL again15:36
fungifrickler: thanks! those are probably less critical since the only e-mail they send is to our sysadmins, but still good to have clean. my biggest worries are mailing list servers, gerrit, mediawiki, storyboard... anything which sends notifications or other messages to application users15:37
*** amoralej is now known as amoralej|off15:45
*** mlavalle has joined #opendev15:49
*** mnaser has joined #opendev15:53
mnaserinfra-root: https://opendev.org/openstack/cinder/commit/f4359c523f4cf47eabad7fdcfa3f35c22ebc619e cinder has decided to bump ussuri oslo.serialization==3.1.2, but our wheels have not updated -- http://mirror.ca-ymq-1.vexxhost.opendev.org/wheel/ubuntu-18.04-x86_64/oslo-serialization/15:54
mnaserhow can i trigger a build?15:54
*** rpittau is now known as rpittau|afk15:55
clarkbmnaser: they should run daily, so if that release is more than a day old we may need to check why the existing builds haven't succeeded first15:55
clarkboh unless we only build what is constrained and we just skipped over it myabe15:56
mnaserclarkb: this seems to have merged 2 months ago :-P15:56
mnaserhttps://review.opendev.org/c/openstack/cinder/+/77468015:56
clarkbmnaser: right I'm saying that it seems likely the reason it isn't there is not that it needs to be manually triggered but some other fundamental problem15:57
clarkband we should address that15:57
*** ykarel has joined #opendev15:58
*** hamalq has joined #opendev15:58
mnaserwhere are the jobs listed?15:58
*** hemanth_n has quit IRC15:58
clarkbthey are in openstack/openstack-zuul-jobs as build-wheel-cache-*15:58
clarkbI think anyway, still digging that up15:59
clarkbmnaser: https://pypi.org/project/oslo.serialization/3.1.2/#files that has a wheel already I think we dont' cache that separately15:59
mnaserah, yes, i think my issues are stemming from the fact i have a cached upper-constraints16:01
clarkbmnaser: roughly the way things should work is if there is already a wheel then don't bother building and caching one locally. I don't understand why 4.1.0 has a wheel there, but it shouldn't if I remember how this works properly16:02
*** hamalq has quit IRC16:03
clarkbpublish-wheel-cache-ubuntu-bionic is the actual job. the build jobs are there to test the tooling to do the publication16:03
*** ykarel has quit IRC16:03
clarkbhttps://zuul.opendev.org/t/openstack/build/c6f4c525447b4513994e6dddc184c89d/logs is the most recent build if you want to look it over, but ya I don't think we expect to cache that because it has a wheel on pypi16:03
*** hamalq has joined #opendev16:04
*** ysandeep is now known as ysandeep|away16:04
clarkbbasically the wheel cache is there primarily to avoid building wheels at runtime which can be expensive for things like lxml, cryptography and friends, etc. If there is already a wheel on pypi we don't bother to rebuild it16:09
clarkbin the case of cryptography we've shifted to helping them build the wheels that go on pypi so everyone benefits too16:09
*** roman_g has joined #opendev16:23
dtantsurianw or anyone: could you please release glean with the latest fixes?16:25
roman_gGood morning, team. I'm experiencing problems with CityCloud KNA1. Could you have a look if there are problems on OpenDev Zuul side, please? Thank you! https://grafana.opendev.org/d/QQzTp6EGz/nodepool-airship-citycloud?orgId=1&from=now-7d&to=now16:26
roman_gSeems that some instances are stuck, and no new instances are being able to launch there.16:27
clarkbroman_g: without looking I'm fairly confident in saying it is most likely the lack of hypervisors capable of scheduling the requested nodes. But I'll check16:27
clarkbdtantsur: I think there were a couple more changes ianw wanted to get in?16:27
roman_gclarkb Error Node Launch Attempts is zero, which confuses me. And graphs show that instances are stuck there.16:28
roman_gSince Thursday evening. There was PTG, so probably no one noticed issue earlier.16:28
roman_gdtantsur o/16:29
fungiclarkb: i'm not here at the moment, but one possibility for the airship-kna1 lack of nodes could be "stuck" node requests, i've noticed over the past month the launchers (especially in the wake of high launch error counts) have a tendency to not complete turning over node requests they lock after they've satisfied or decided to decline them16:29
clarkbI see the issue16:30
clarkbits a config problem16:30
dtantsurclarkb: sure, but the work seems stuck there16:30
clarkbdtantsur: we've all been incredibly busy :/16:30
dtantsurI imagine16:30
dtantsurthat's why I wonder if we could have a release now. It fixes a serious issue for one of the features in ironic.16:30
fungidtantsur: care to skim the commits since the last tag and suggest what the next appropriate semver version should be for it?16:31
roman_gclarkb would it be solved by you, or do I need to contact CityCloud and ask them to check and do something?16:31
dtantsursure16:31
*** openstackgerrit has joined #opendev16:32
openstackgerritClark Boylan proposed openstack/project-config master: Fix nl02's provider config list  https://review.opendev.org/c/openstack/project-config/+/78804416:32
clarkbroman_g: ^ thats the fix16:32
roman_gThank you, clarkb !16:32
dtantsurfungi: things there are essentially bug fixes, but they're quite substantial, so it may be worth bumping the minor version to highlight it.16:32
clarkbya we can check with ianw if he is comfortable with a release as is before landing the additional refactors. I suspect so since everything should be fairly forward/backward compatible16:33
clarkbI wonder if that yaml would've errored if we weren't doing the reserialization stuff16:34
fungiduplicate keys aren't verboten by pyyaml's parser16:35
fungiyou have to jump through some hoops to detect them at all16:35
clarkbya you have to subclass some portion of the parser to check for key existence before adding and raise an exception if you hit that ?16:39
openstackgerritClark Boylan proposed openstack/diskimage-builder master: debian-minimal: Set bullseye version  https://review.opendev.org/c/openstack/diskimage-builder/+/78766516:43
clarkbthis change depends on a modification to the nodepool jobs now to try and debug those nova 500 errors16:44
clarkbdtantsur: did you want to review ianw's glean stack to ensure it doesn't conflict with ironic needs? https://review.opendev.org/c/opendev/glean/+/782010/4 is the bottom of that stack16:45
dtantsurclarkb: I've seen them, and I *think* they're fine, but to say for sure I need to conduct pretty cumbersome testing16:45
clarkbit is a refactor to try and make things more clear as to what runs and when they run16:46
dtantsurwhich I don't have much time for :(16:46
clarkbI think it could make your CI testing take longer beacuse we'll be back to running multiple invocations of things for configuring a single interface, but it should force ordering and make it clear that is what is happening16:46
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Roles to create, cleanup and promote snapshots in ec2  https://review.opendev.org/c/zuul/zuul-jobs/+/78767716:48
*** marios is now known as marios|out16:48
openstackgerritMerged openstack/project-config master: Fix nl02's provider config list  https://review.opendev.org/c/openstack/project-config/+/78804416:50
*** whoami-rajat has quit IRC16:50
clarkbroman_g: ^ that shoudl apply in the next 10 minutes or so16:51
roman_gclarkb thank you!16:51
*** marios|out has quit IRC16:52
roman_gclarkb deploy failed https://zuul.opendev.org/t/openstack/build/7f51faaa68be4680a8417c3d93af2e57/console17:06
clarkbI bet it is nb03 still causing problems. Let me check the specific fix17:07
clarkbroman_g: I think the fix applied successfully to nl02 which is all we should need17:07
roman_gclarkb I see builds starting. Thank you.17:08
*** jpena is now known as jpena|off17:10
*** sshnaidm is now known as sshnaidm|afk17:12
*** ralonsoh has quit IRC17:20
*** roman_g has quit IRC17:30
*** andrewbonney has quit IRC17:37
*** dtantsur is now known as dtantsur|afk17:41
*** sboyron has quit IRC17:50
*** vishalmanchanda has quit IRC18:21
*** eolivare has quit IRC18:33
*** stand has joined #opendev18:42
*** gouthamr has joined #opendev19:01
*** DSpider has joined #opendev19:38
*** amoralej|off has quit IRC20:31
*** jpena|off has quit IRC20:32
*** fbo has quit IRC20:33
*** fbo has joined #opendev20:37
*** slaweq has quit IRC21:06
*** stevebaker has joined #opendev21:21
clarkbok that errand took much much longer than I anticipated and I haven't even had lunch yet :/21:27
clarkbI'm not feeling in a great spot to do the zk thing now, first thing tomorrow though21:27
fungii expect to be around more tomorrow. doing more stuff inside the house and fewer errands outside21:38
fungican arrange to be around for zk swaps21:38
clarkbcool, I'll try to dive right into that when my day starts21:40
ianwclarkb: i guess we should probably do a release of glean with dtantsur's fix and then put the other bits in and do another release.  they're more code motion, and that will give people a point to pin to in an emergency22:02
ianw(not that i expect one, but, you never know)22:02
clarkbianw: cool, that worked for me if it also worked for you22:06
ianwcan do soon22:07
clarkbianw: I got a change up to collect logs in the nodepool changes that I depends-on'd to the dib change.22:10
clarkbHaven't caught a nova failure yet though22:10
ianwoh great, thanks.  that's one less thing on the todo list :)22:10
clarkbfeel free to remove the depends on if you want to try your luck instead (I don't expect we'll be merging the nodepool side change)22:10
fungiclarkb: ianw: i'm struggling a bit to work out what ansible might be calling during fact getting which would crash the python3 interpreter on centos 8 for arm64, wanting to come up with a clean reproducer to generate a core dump on the node i've held... you don't happen to know how to emulate fact gathering from the local command line?22:10
clarkbfungi: ansible -i localhost, -m setup22:11
clarkber ansible localhost -i localhost, -m setup22:11
clarkbyou need to specify the target in addition to the inventory22:11
fungithe comma there is intentional?22:11
clarkbyes it tells ansible to treat the arg as the actual inventory and not a file to find the inventory in22:11
ianwfungi: ansible -i <host>, <host> -m setup22:12
clarkband then you can see what that setup module is doing, but I'm not sure of the exact behavior22:12
ianwheh or what clarkb said22:12
fungii guess i need to actually install ansible on the held node in that case22:14
clarkbor setup the inventory on some other host and target that node22:14
clarkbbut it may be easier to track what is happenign if everything is on one machine22:15
fungiwas more wondering if i could run something involving one of these python zipballs ansible stuck in /tmp22:15
clarkbin theory I think you can, but you'd have to figure out which oen corresponded to the setup module and possibly hack it up to only run that22:15
clarkbalso it may expect some sort of command and control input22:15
ianwi'd probably be tempted in this case to replace python with a shell script that either runs it under strace to start, or gdb if that doesn't help22:16
fungiyeah, they don't make this simple to debug in isolation22:16
fungialternatively, any idea where it would have stuck the core it claims to have dumped?22:17
ianwi feel like with default ulimits that would not be done?22:17
ianwmy method of debugging the debian distro checking was to place a syntaxerror with debugging info in, which i could partially see in the output of "-vvv"22:18
ianwso yeah, i'm also not sure how you're supposed to get info out of the remote side :)22:18
* fungi shakes fist at ulimit rules preventing saving critical debugging data22:19
fungii remember the days when /var/crash was sacrosanct22:20
fungiopenbsd still lets things write to /var/crash by default22:23
* fungi is not bitter22:23
fungiopenbsd 6.9 should officially happen this weekend though22:24
fungi50th release!22:24
ianwclarkb: logs LGTM, what do you think about putting it under /var/log/nodepool/devstack/ though to keep them separate?22:28
clarkbianw: thats fine, I just did the most likely to succeed thing since I suspect that zuulains may not want to merge that change at all22:32
*** zimmerry has quit IRC22:49
*** zimmerry has joined #opendev22:53
*** tosky has quit IRC23:01
ianwsigh, nb03 has filled up it's disk23:08
clarkbianw: I was worried that may happen with all of the new image buidls there23:09
clarkband adding raw23:09
clarkbif we can fit just raw onto that host then switching linaro-us over to raw may be a good option?23:09
ianwkevinz: ^ can/should we use raw images with linaro-us?23:10
ianwhrm we have at least an old 100gb volume to remove23:12
ianwit seems we have quota for another 400, which doubles the current size.  i'll attach that23:15
clarkbsounds good23:16
ianw#status log nb03 : doubled /opt volume to 800g to allow for more images after enabling raw with https://review.opendev.org/c/opendev/system-config/+/78729323:20
openstackstatusianw: finished logging23:20
ianwok rebooted for good measure and it's trying again23:27
clarkbfungi: one other thought on the segfaulting. You can limit which facts are gathered. Yuo might be able to narrow down what is causing it using that?23:32
clarkbfungi: https://docs.ansible.com/ansible/latest/collections/ansible/builtin/setup_module.html#parameter-gather_subset23:32
clarkbbasically say gather_subset: !all,!min,foo then !all,!min,bar and so on23:33
clarkband if you trip with one and not the others that would be the thing to loko into23:33
fungiaha, yeah thanks, that could help23:40
ianwother than nb03, we don't have any other ZK hosts outside RAX DFW, do we?23:43
clarkbianw: I don't think so23:43
clarkbthe launchers, builders, and scheduler are currently it I think. But the executors will be added soon23:43

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!