Wednesday, 2020-05-13

*** Dmitrii-Sh has quit IRC00:04
*** DSpider has quit IRC00:07
*** moppy has quit IRC00:10
*** Dmitrii-Sh has joined #opendev00:10
*** moppy has joined #opendev00:10
*** olaph has joined #opendev00:42
*** Dmitrii-Sh has quit IRC00:46
*** Dmitrii-Sh has joined #opendev00:55
openstackgerritMerged opendev/system-config master: Replace OVH CI mirrors  https://review.opendev.org/72738801:12
openstackgerritMerged openstack/diskimage-builder master: block device: update variable name  https://review.opendev.org/72743101:28
*** redrobot has quit IRC01:34
openstackgerritIan Wienand proposed zuul/zuul-jobs master: bindep: use virtualenv_command from ensure-pip  https://review.opendev.org/72756101:37
fungithe deploy pipeline has almost reached infra-prod-service-mirror02:21
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Unset bindep_command to exercise install paths  https://review.opendev.org/72759302:31
*** kevinz has joined #opendev02:35
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Unset bindep_command to exercise install paths  https://review.opendev.org/72759302:39
fungimust be almost finished, the cachedirs exist on the new mirrors as of a few seconds ago02:39
fungithen i can move them into the logical volumes and we're ready to test02:39
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Unset bindep_command to exercise install paths  https://review.opendev.org/72759302:44
ianwfungi: ++ thanks!02:46
fungii'm nearly done transplanting, then will reboot them both02:52
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Unset bindep_command to exercise install paths  https://review.opendev.org/72759302:54
fungiand rebooting02:55
fungiassuming it checks out, i'll un-wip 72739302:57
ianwfungi: did you do the swizzle in the mirror setup?03:04
fungiyeah03:05
fungiassuming you mean 72739203:05
ianwhttps://review.opendev.org/#/c/727392/1 yep03:05
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Unset bindep_command to exercise install paths  https://review.opendev.org/72759303:08
fungii couldn't access afs even though the openafs lkm was loaded per lsmod and ps showed the openafs-client daemon running03:10
fungibut there was also a running rmmod process since around boot time03:10
fungiso i'm rebooting them a second time to see if that persists03:10
fungithese take a long while to boot03:10
fungistill the same after another reboot03:11
fungiroot       795  0.0  0.0   4648   872 ?        Ds   03:09   0:00 /sbin/rmmod openafs03:13
fungithat is really strange03:13
fungii wonder if it's dkms's doing somehow03:13
fungior something to do with the openafs-client systemd unit?03:13
fungiit's parented to init03:14
fungii'm also well past my bedtime. need to be up extra early to catch the release excitement03:15
fungifound this in syslog:03:18
fungiMay 13 03:12:12 mirror01 kernel: [   12.988169] openafs: module verification failed: signature and/or required key missing - tainting kernel03:19
fungithen later:03:19
fungiMay 13 03:14:28 mirror01 afsd[778]: afsd: Error enabling dynroot support.03:19
fungiMay 13 03:14:28 mirror01 afsd[778]: afsd: Error enabling fakestat support.03:19
fungithen a whole bunch of "Adding cell 'rl.ac.uk': error -1" but for various cells03:20
fungieventually "openafs-client.service: State 'stop-post' timed out. Terminating."03:21
fungiso i guess that explains the rmmod03:21
fungihowever the attempted rmmod then raises a kernel error on its own03:21
fungipossibly because it's still busy03:21
fungiusing the same kernel and openafs package versions as the most recently-built mirror.us-east.openedge.opendev.org which is working fine03:23
ianwhrm03:23
fungiianw: if you get bored, that could use looking into. i'm going to have to call it a night but can resume after tomorrow's release activities cease03:24
ianwthat signature thing is always there but yeah ... it should work03:24
ianwall the hosts?03:24
fungithe two new mirrors i just built03:24
fungimirror01.bhs1.ovh.opendev.org and mirror02.gra1.ovh.opendev.org03:25
fungidoesn't seem to be a problem on our other mirror servers03:25
clarkbmaybe try rebuild it with dkms?03:26
clarkb(paclage reinstall I guess)03:26
ianwwe have had issues with ordering before, i think there's breadcrumbs in the comments of the roles03:26
ianwthere is an openafs mod installed03:27
fungiyeah, and a defunct rmmod process for it too03:27
fungii think because systemd is trying to clean up after failing to start the openafs-client service03:27
fungianyway, if it turns out to be something simple and you can verify the afs mirror urls are working (the apache proxy cache sites and https certs already seem to check out fine), feel free to delete my wip vote on 727393 and approve it03:29
fungiokay, really heading to sleep now03:29
ianw# dkms status03:30
ianwopenafs, 1.8.3, 4.15.0-99-generic, x86_64: installed03:30
ianwfungi: thanks ... i'll see if i can find anything03:30
ianwi rebuilt it with dkms and am seeing what happens on reboot03:44
ianw[0m] A start job is running for OpenAFS client (31s / 1min 33s)03:48
ianwit's clearly still unhappy03:48
openstackgerritIan Wienand proposed opendev/system-config master: [dnm] trigger openafs client test  https://review.opendev.org/72762703:56
ianw^ that appears to pass on the same kernel04:13
*** ykarel|away is now known as ykarel04:26
*** diablo_rojo has quit IRC04:44
*** mtreinish has quit IRC05:01
*** dpawlik has quit IRC05:04
ianwi have no idea ...05:06
ianwgoing to try tracing afsd as it starts05:06
ianwalso, i've uploaded 1.8.5, same as focal, for bionic to the ppa ... see if it builds05:07
ianwok, for the record, i disabled openafs-client and rebooted.  i ran /usr/share/openafs/openafs-client-precheck which seems to load the module05:17
ianwthen i ran strace -f -o /tmp/foo.txt /sbin/afsd -afsdb -dynroot -fakestat -verbose -debug which is in /home/ianw/strace-afsdb.txt05:18
ianwthat command appears to die in an unkillable state05:20
ianwthe last thing it floods out is "SScall(183, 28, -1301744064)=0"05:20
ianwit also does a lot of stuff in the cache directory, which is populated05:30
ianw1.8.5 has built for bionic, so i'm trying that now.  it seems if we have to debug this further, we might as well be using that05:31
ianw(from 1.8.3 ... 1.8.4 has various fixes, 1.8.5 itself is security)05:32
*** mtreinish has joined #opendev05:34
ianwwell well well05:42
ianwroot@mirror01:/var/cache/openafs# ls05:42
ianw05:42
ianwthat appears pretty dead05:42
ianw... ok ... weirdness.  i recreated the file-system on /dev/mapper/main-openafs which went just fine and now ... it works05:47
*** ysandeep|away is now known as ysandeep05:50
*** DSpider has joined #opendev05:55
ianwand the same on mirror02 ... formatting the cache partition has made it work05:58
AJaeger\o/05:59
ianwhttps:// sites on both work for me06:01
ianwi'm not going to take the risk of switching this in now as i'm about done for today06:01
AJaegerianw: thanks, have a good night!06:09
*** dpawlik has joined #opendev06:25
*** lpetrut has joined #opendev06:42
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Policy rule for ownership between remote and executor  https://review.opendev.org/72485507:20
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add linting rule to enforce no-same-owner policy  https://review.opendev.org/72764207:20
*** rpittau|afk is now known as rpittau07:29
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add linting rule to enforce no-same-owner policy  https://review.opendev.org/72764207:29
*** tosky has joined #opendev07:35
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Policy rule for ownership between remote and executor  https://review.opendev.org/72485507:41
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add linting rule to enforce no-same-owner policy  https://review.opendev.org/72764207:41
*** moppy has quit IRC08:01
*** moppy has joined #opendev08:01
openstackgerritSorin Sbarnea (zbr) proposed zuul/zuul-jobs master: bindep: Add missing virtualenv and fixed repo install  https://review.opendev.org/69363708:08
*** ykarel is now known as ykarel|lunch08:30
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Fix nodejs-npm-run-test  https://review.opendev.org/72767009:03
*** dtantsur|afk is now known as dtantsur09:13
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Don't require tox_envlist  https://review.opendev.org/72682909:14
openstackgerritMerged zuul/zuul-jobs master: Fix nodejs-npm-run-test  https://review.opendev.org/72767009:20
*** ykarel|lunch is now known as ykarel09:24
*** ysandeep is now known as ysandeep|lunch10:09
fungiianw: thanks for finding that! all i did was format the volume, set permissions to match the /var/cache/openafs and then move the files into it10:14
fungii wonder if openafs is picky about those files getting moved10:14
fungiianw: when you say "formatting" you mean rerunning mkfs.ext4 on them?10:20
fungiif so, maybe it's one of the tuning options i applied following our document on adding logical volumes10:20
*** dpawlik has quit IRC10:21
*** hrw has quit IRC10:28
*** ysandeep|lunch is now known as ysandeep10:55
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: intercept-job -- self-service SSH access  https://review.opendev.org/67930610:56
*** rpittau is now known as rpittau|bbl11:04
*** dpawlik has joined #opendev11:32
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add linting rule to enforce no-same-owner policy  https://review.opendev.org/72764211:45
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-coverage-output: do not synchronize owner  https://review.opendev.org/72771711:45
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-javascript-content-tarball: do not synchronize owner  https://review.opendev.org/72771811:45
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-javascript-output: do not synchronize owner  https://review.opendev.org/72771911:45
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-javascript-tarball: do not synchronize owner  https://review.opendev.org/72772011:45
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-markdownlint: do not synchronize owner  https://review.opendev.org/72772111:45
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-phoronix-results: do not synchronize owner  https://review.opendev.org/72772211:45
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-puppet-module-output: do not synchronize owner  https://review.opendev.org/72772311:45
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-python-sdist-output: do not synchronize owner  https://review.opendev.org/72772411:45
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-sphinx-output: do not synchronize owner  https://review.opendev.org/72772511:45
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-sphinx-tarball: do not synchronize owner  https://review.opendev.org/72772611:45
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-tox-output: do not synchronize owner  https://review.opendev.org/72772711:45
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-translation-output: do not synchronize owner  https://review.opendev.org/72772811:45
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: fetch-subunit-output: do not synchronize owner  https://review.opendev.org/72772911:45
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: prepare-workspace: do not synchronize owner  https://review.opendev.org/72773011:45
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: publish-artifacts-to-fileserver: do not synchronize owner  https://review.opendev.org/72773111:45
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: upload-logs: do not synchronize owner  https://review.opendev.org/72773211:45
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add linting rule to enforce no-same-owner policy  https://review.opendev.org/72764211:49
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: tarball-post.yaml: do not synchronize owner  https://review.opendev.org/72773511:49
*** sshnaidm|afk is now known as sshnaidm12:16
*** ysandeep is now known as ysandeep|brb12:17
*** rpittau|bbl is now known as rpittau12:19
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Drop support for ansible 2.7  https://review.opendev.org/72741012:20
*** tkajinam has quit IRC12:24
*** lpetrut has quit IRC12:46
AJaegerconfig-core, two small project-config changes, please review: https://review.opendev.org/726461 and https://review.opendev.org/72717813:07
*** ykarel is now known as ykarel|afk13:09
smcginnisAnyone know if the jitsi instance is falling down?13:09
*** spotz has joined #opendev13:13
AJaegerinfra-root, do we want to try swift upload on OVH again? https://review.opendev.org/#/c/726028/13:13
fungismcginnis: no idea, i can ssh into the server and look. i'm not able to really do much with it myself since i don't have a mic and cam yet13:14
smcginnisCan probably still see the problem if you try to join a meeting on there.13:15
smcginnisJust immediately goes to a message stating you have been disconnected and periodically tries to rejoin.13:15
fungiyeah, that happens for any room name i try to give it13:17
fungiit looks like the container is running and was last restarted on thursday13:18
fungino signs of any memory pressure or cpu utilization13:18
spotzChris thought maybe java needed a kick13:19
fungii'll see if i can remember how to look at the container's log13:19
* fungi is not all that good at these container things just yet13:19
fungispotz: smcginnis: was it working earlier?13:21
spotzThis was our first time testing it13:22
smcginnisfungi: I think I had connected to one a week or so ago.13:23
fungilooks like the jvb container is regularly reporting INFO: Performed a successful health check in PT0.005S. Sticky failure: false13:23
fungibut `docker-compose logs` isn't reporting much else that i can see13:24
fungiahh, that container just drowns out the logs from the other containers. if i add --tail=50 i can see output from the others13:26
fungithe jicofo container last logged activity on friday13:27
spotzhehe, troubleshooting how you really learn something:)13:28
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: fetch-sphinx-tarball: introduce zuul_use_fetch_output  https://review.opendev.org/68187013:28
fungisame for the prosody container13:29
funginothing they logged look like errors to my untrained eye, but i also need to read up on what these various components do and how they fit together. i'm hesitant to restart anything without knowing what evidence of the problem i might be destroying13:30
fungispotz: earlier in a privmsg you said it was prompting users to connect microphones, so this current behavior seems different13:30
smcginnisI believe if there is a mic and camera present, the browser will ask for permission to access them. But I think that's a normal and separate thing.13:32
fungithe corresponding etherpad for the room i'm testing seems to be loading fine, so i doubt anything about the etherpad server is triggering this behavior13:32
fungimust be one of the jitsi-meet services13:32
*** redrobot has joined #opendev13:32
spotzYeah but after that it's a rejoin loop13:33
fungigot it, so same condition, and not sure how long ago it got into this state or whether it just started, i guess13:34
spotzUnfortunately never tried logging in before so incomplete data13:36
fungii know we were testing it out as recently as friday13:36
fungiclarkb: said it was working as recently as monday: http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-05-12-19.01.log.html#l-11713:41
spotzI wonder if existing rooms are fine13:45
corvusi see a 405 when posting to /http-bind; that's supposed to connect to prosody, but i don't see any messages in the prosody log13:45
spotzOn the plus side the Community meeting isn't using it13:46
corvusjicofo is normally chatty, but i don't see anything there13:46
corvusmy inclination would be to restart jicofo, then if that doesn't work, restart prosody13:46
corvuswe might at least narrow down the component that's misbehaving13:47
corvusfungi: sound good?13:48
fungisounds reasonable, i'm still reading up on how these bits fit together13:48
fungii just wanted to make sure i didn't lose any logs which could tell us what went wrong13:49
corvusi'll save the docker log output real quick13:49
corvusin ~root/logs13:50
fungithanks13:50
corvusjicofo restart seems insufficient; restarting prosody13:51
corvusrestarting jvb13:52
corvusokay, next possibility: we track new upstream images for all of the components except web, and they were updated 5 days ago13:53
corvusperhaps our old web image is incompatible, or perhaps the other components upgraded in a way that needs intervention13:54
corvushow about we either try rolling the non-web components back, or the web component forward?13:54
fungido we have an updated web image, or do we need to trigger a build?13:55
corvuswe would either need to trigger a build, or, just for testing purposes, use the upstream image (without the etherpad auto-open)13:56
corvusi'll try rolling the other components back first13:57
fungiokay, i suppose that entails a restart of everything besides the jotsi-meet-web container... which of those didn't yet get restarted?13:58
corvusfungi: i restarted all 313:58
fungiprosody, jicofo and jvb. yep okay13:59
fungii see web is the only other one listed in the docker-compose13:59
*** ykarel|afk is now known as ykarel13:59
corvusokay, all restarted on stable-454814:00
spotzNo joy:(14:01
fungiyep, still seeing the same "you have been disconnected" error14:01
corvusi'll try one more old version14:01
corvusalso no joy14:03
corvushow about we try latest everything upstream14:03
fungiwfm14:03
fungiand then if that solves it we can update web with the patch14:03
fungiand if not, the problem must be elsewhere14:03
fungilooks like firewall rules haven't changed for that server since march14:06
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Combine javascript deployment and deployment-tarball jobs  https://review.opendev.org/72737014:08
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Set node_version in js-build base job  https://review.opendev.org/72777414:08
corvushrm, i'm stumped.14:09
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Combine javascript deployment and deployment-tarball jobs  https://review.opendev.org/72737014:09
corvusi'll change prosody to debug log14:10
funginothing relevant in /etc has changed in the past few days14:14
corvusit looks like one of the services has a docker volume; i've deleted it and restarted everything14:15
corvusi'm going to back out the hyphen change14:19
fungidpkg.log says lots of package updates at 06:45z today14:20
corvusthat was the problem14:20
corvushttps://meetpad.opendev.org/http-bind14:20
corvusthat was getting redirected14:20
corvuslet me restart all the containers at the version from this morning14:21
fungidbus initramfs-tools libc-bin libnss-systemd libpam-systemd libsystemd0 libudev1 man-db open-iscsi systemd-sysv systemd udev14:21
corvusfungi: ^ problem identified14:21
fungicorvus: so "http-bind" is a thing we're not supposed to redirect?14:22
corvusfungi: correct, it's referenced elsewhere in that config file14:23
fungiahh, i wonder if we need specific exclusions, or can get away with changing the matching order14:23
fungitaking a look14:23
corvussmcginnis, spotz: ^ found the problem14:23
spotzSweet14:24
smcginnisThanks!14:24
spotzYep I'm in! Thanks corvus!!!14:24
fungistrangely, that patch was working earlier, i wonder if it relied on folks already having cached connections or something14:25
corvusspotz, smcginnis: i'm going to restart once more to reset the log levels14:25
corvusmay be a brief hiccup14:25
spotzNo worries, our testing window for today is over but we'll give it a shot maybe tomorrow14:25
corvusfungi: or we didn't restart the container after applying the change?14:26
corvusfungi: i'll try your reordering idea14:27
corvusfungi: nope14:28
*** lpetrut has joined #opendev14:29
smcginnisBack to getting the disconnected/rejoin loop.14:29
corvusoh sorry i thought you were done testing14:30
corvusi put it back14:30
corvussmcginnis: should be gtg14:30
smcginnisOh, no. No worries. I was just testing. If you need to do more, please go ahead.14:30
openstackgerritJeremy Stanley proposed opendev/system-config master: Move jitsi-meet pattern redirect to end  https://review.opendev.org/72778014:30
corvussmcginnis: nah, i think we need to read the nginx manual now anyway :)14:31
fungicorvus: so something like that ^ won't work?14:31
smcginnisOh no, not RTFM!14:31
corvusfungi: i put the http-bind entry at the top and that didn't work14:31
fungigot it. yeah, now reading how location matches are parsed14:32
corvusi don't understand why the config.js and similar things work though.14:32
fungithey have a "." in them14:32
corvusooh14:33
fungiwhich i don't think we want to rely on as a protection (likely folks could want "." in room/pad names too)14:33
fungior in the case of /etherpad/ they have a / in them14:33
fungiwhich is probably safer to rely on14:33
corvusmaybe we put "^~" next to the jitsi-meet block14:36
corvus^~: If a carat and tilde modifier is present, and if this block is selected as the best non-regular expression match, regular expression matching will not take place.14:37
corvusthat sounds like what we want14:37
*** ysandeep|brb is now known as ysandeep14:38
*** mlavalle has joined #opendev14:42
openstackgerritJames E. Blair proposed opendev/system-config master: Exclude some regex matches in jitsi-meet web  https://review.opendev.org/72778614:42
corvusfungi: should i try that on the server ^ ?14:43
fungicorvus: looks like a worthwhile test14:44
fungisorry, distracted by lingering openstack release activities and now a dockerhub proxy outage reported in #openstack-infra14:45
fungiand there's an openstack "community meeting" conference call happening at the same time. when it rains it pours14:45
spotzyeah I'm doing 4 things at once14:46
clarkbI'm starting to be here now and catching up14:47
clarkbfungi: I can look at the dockerhub proxy thing14:47
fungiclarkb: awesome, thanks, i was just trying to see if i can get it to reproduce on other mirror servers14:48
fungialso i need to prepare to turn the ovh regions back on14:48
corvussmcginnis: i'm going to restart the web server one more time; may be a blip14:48
corvusall right, https://meetpad.opendev.org/http-bind still works14:49
fungiexcellent! i wonder if we should add "." to the allowed pad character redirect pattern?14:49
corvusand the actual service looks like it still works14:49
smcginnis👍14:49
clarkbfungi: ianw looks like it was the afs cache local fs that made openafs services unhappy?14:50
clarkbsmcginnis: my font doesn't have a glyph for whatever that was. I'll assume it was a thumbs up and not a pile of poo :)14:50
corvusclarkb, mordred: https://review.opendev.org/727786 is tested in prod if you want to +3 i can remove meetpad from emergency14:50
fungiclarkb: yes, not sure if it was our recommended ext4 tuning (not clear in what way he reformatted the volume) or if it was me moving the cache files from the rootfs into it which caused it to break consistently on both servers14:51
clarkbcorvus: looking14:51
smcginnisclarkb: Haha, yes, thumbs up.14:51
fungiclarkb: my terminal showed a thumbs-up glyph, i guess i should be either proud or embarrassed14:51
fungiat my font coverage14:52
corvusmine showed a weird character and deleted the line above it14:52
corvusthat feels like the proper unix behavior14:52
fungiit does match my expectations, yes14:52
fungicool-retro-term is probably just too fancy since it actually knows how to display that14:53
clarkbfungi: it is cool afterall14:59
fungibut not all that "retro" behavior-wise14:59
clarkbfwiw I think what my terminal is suppsoed to do is find a glyph in other fonts for that character if it doesn't have it in my current font. But it doesn't seem to do that properly14:59
fungiyeah, that's what crt is doing (via qterm internals, since it's built on qterm's backend)15:00
*** priteau has joined #opendev15:01
*** lpetrut has quit IRC15:08
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Install correct node version in ensure-yarn  https://review.opendev.org/72778915:09
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Pass node_version explicitly  https://review.opendev.org/72777415:11
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Combine javascript deployment and deployment-tarball jobs  https://review.opendev.org/72737015:11
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Install correct node version in ensure-yarn  https://review.opendev.org/72778915:13
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Install correct node version in ensure-yarn  https://review.opendev.org/72778915:16
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Install correct node version in ensure-yarn  https://review.opendev.org/72778915:18
clarkb#status log Restarted apache2 on mirror.org.rax.opendev.org. It had an apache worker from April 14 that was presumed to be the problem with talking to dockerhub after independently verifying all round robin backends using s_client.15:23
openstackstatusclarkb: finished logging15:23
*** sshnaidm is now known as sshnaidm|afk15:28
*** ykarel is now known as ykarel|away15:35
*** dtantsur is now known as dtantsur|afk15:36
AJaegerconfig-core, two small project-config changes, please review: https://review.opendev.org/726461 and https://review.opendev.org/72717815:53
fungiconfig-core: also https://review.opendev.org/727393 which turns ovh back on again. i'm around to watch for possible fallout from the mirror replacements15:59
clarkbfungi: AJaeger done. I can also help with ovh as it comes back online16:00
AJaegerregarding OVH: do we want to try swift upload on OVH again? https://review.opendev.org/#/c/726028/16:00
clarkbI tried to keep a reasonably clear todo list for today in case the release needed help16:00
AJaegerclarkb: thanks16:00
fungiAJaeger: yeah, we never heard back from rledisez on the keystone situation there16:01
clarkbthey did mention the issue was identified at least16:01
fungiyep, so odds are it was fixed within hours16:01
fungii'm okay giving it a shot16:01
clarkbmaybe lets recheck some chagnes that repartent to base-test?16:01
clarkbthen if those are happy approve it?16:01
* clarkb looks for changes like that16:02
fungioh, right, i forgot we split them to base-test to investigate16:02
clarkbhttps://review.opendev.org/#/c/680178/16:03
clarkbI restored that chagne. unittests are expected to fail but not have post failures16:03
clarkbbasically if all of the logs load and a non zero number of them uploaded to ovh I think we've confirmed it doesn't just fail16:03
clarkb(all of them should upload to ovh though due to that earlier change)16:04
AJaegergood idea16:06
*** Burke9077 has joined #opendev16:10
openstackgerritMerged openstack/project-config master: Retire x/pbrx - part 1  https://review.opendev.org/72646116:11
openstackgerritMerged openstack/project-config master: Remove noop-jobs from oslo.tools  https://review.opendev.org/72717816:11
openstackgerritMerged openstack/project-config master: Revert "Temporarily disable OVH"  https://review.opendev.org/72739316:11
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Set node_version in js-build base job  https://review.opendev.org/72777416:13
openstackgerritMonty Taylor proposed zuul/zuul-jobs master: Combine javascript deployment and deployment-tarball jobs  https://review.opendev.org/72737016:13
AJaegerinfra-root, looking at http://grafana.openstack.org/d/T6vSHcSik/zuul-status?panelId=21&fullscreen&orgId=1 , we seem to miss 100 nodes. Looking at other graphs, we have 10 in vexxhost - http://grafana.openstack.org/d/nuvIH5Imk/nodepool-vexxhost?orgId=1 ; 50 in rax-ord http://grafana.openstack.org/d/8wFIHcSiz/nodepool-rackspace?orgId=1; and 10 in openedge16:15
AJaegerhttp://grafana.openstack.org/d/dYykaX_Wz/nodepool-openedge?orgId=116:15
AJaegerAnything to be concerned about?16:15
clarkb680178 has uploads to bhs and gra that have run successfully16:15
clarkbAJaeger: we have a number of held nodes (but on the order of 10 not 100)16:16
clarkbI think that nodepools understanding of quotas can play a role in that too16:17
AJaegerclarkb: 20 nodes is still more than normal16:17
clarkbit will depress under the artificial max if some quota metric won't allow us to grow furhter16:17
AJaegerclarkb: ok16:17
clarkbfungi: maybe you want to approve https://review.opendev.org/#/c/726028/1 if 680178 looks good to you?16:18
AJaegermordred: want to approve pbrx retirement change https://review.opendev.org/726462 and abandon all open reviews?16:19
*** Dmitrii-Sh has quit IRC16:21
*** Dmitrii-Sh has joined #opendev16:21
*** rpittau is now known as rpittau|afk16:21
openstackgerritMerged opendev/system-config master: Exclude some regex matches in jitsi-meet web  https://review.opendev.org/72778616:27
clarkbinfra-root maybe this afternoon would be a good time to restart zuul to switch back to python3.7 and pull in the don't reconfigure on tags change?16:29
clarkbthere were also some changes to the merger code so we will want to restart those when we get a chance (this afternoon is looking good since openstack reelase went smoothly)16:29
clarkb(by this afternoon I mean after 2000UTC time)16:30
openstackgerritMerged opendev/base-jobs master: Revert "Temporarily disable OVH Swift uploads"  https://review.opendev.org/72602816:33
clarkbthat change should be immediate for any jobs starting after it merged16:33
clarkbor maybe not starting but created16:33
*** Burke9077 has left #opendev16:34
*** Burke9077 has joined #opendev16:34
AJaegermmh, http://grafana.openstack.org/d/BhcSH5Iiz/nodepool-ovh?orgId=1 shows that ovh is deleting 1 node/7 nodes for quite some time.16:39
AJaegerIs nodepool confused? Do we need to do anything?16:39
fungiclarkb: back to 3.7 to be consistent with what ansible can work with on the executors?16:39
fungi(even though the executors weren't upgraded)16:40
clarkbfungi: ya, upstream zuul switched the images back to python3.7 for this reason16:40
openstackgerritMerged zuul/zuul-jobs master: Set node_version in js-build base job  https://review.opendev.org/72777416:40
clarkbfungi: so zuul-scheduler et al will be on 3.7 now even though only the executors technically need it16:40
fungimakes sense to me, just being sure we didn't have a problem with our current deployment16:40
fungiAJaeger: could be that there are some nodes stuck in deleting state. happens from time to time16:40
clarkbya no known issues with our deployment. More just we expect the images to take us there anyway so may as well switch sooner than later16:41
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Combine javascript deployment and deployment-tarball jobs  https://review.opendev.org/72737016:48
fungitoday sounds good for that, sure16:49
*** ysandeep is now known as ysandeep|away17:12
clarkbinfra-root I've also noticed that we've got a few servers in the emergency file that maybe we can start to remove?17:22
corvusi can remove meetpad now17:23
corvusdone17:23
clarkbnb01.openstack.org and nb012.openstack.org, and fungi's ovh mirror lists too17:23
clarkbetherpad01.openstack.org also doesn't exist anymore I think17:23
clarkbmordred: ^17:23
clarkbetherpad-dev01.openstack.org static.openstack.org files02.openstack.org status.openstack.org too (I'll do status once I confirm that all the work I did to migrate it to new server is done)17:24
AJaegerclarkb: want to abandon https://review.opendev.org/680178 again?17:25
clarkbAJaeger: yup17:25
clarkblooks like old status is still in our inventory so I'll push up a change to remove it as well17:25
openstackgerritClark Boylan proposed opendev/system-config master: Remove old status server from inventory  https://review.opendev.org/72784317:28
clarkbinfra-root if we land ^ I can remove it from emergency.yaml too17:28
fungiclarkb: two of the mirrors still need their servers deleted, but yeah they're out of the inventory now i guess17:32
clarkbfungi: ya if they are out of the inventory I think its fine to remove from emergency17:32
clarkbnb01 and nb02 are still in nova listings but not in inventory soI think they can be safely removed too17:33
fungii can clean them up now, or in a bit when i'm on bridge to actually delete the server instances17:33
clarkbI think its fine to do when you remove the server instances. Just don't want to forget to do it17:34
clarkbetherpad01.openstack.org is not in inventory or nova listing so should be safe17:34
clarkbfungi: if you aren't going to edit it now I'll update the file to remove etherpad0117:34
clarkb(don't want conflicting writes)17:34
fungifeel free17:34
clarkbI removed etherpad01.openstack.org and etherpad-dev01.openstack.org as neither are in inventory or nova listing17:36
*** dpawlik has quit IRC17:36
*** priteau has quit IRC17:38
clarkbI'll do status when the change above lands and ask ianw about those servers he was working on later today17:38
clarkbthat should get us back to a really minimal list17:38
*** jkt has left #opendev17:53
fungilooks like we've been more than two weeks without an oom on lists.o.o now17:53
fungiwhere previously they were almost daily17:53
fungigrafana afs dashboard says our ubuntu mirror is almost a week stale, but others (including debian which is also reprepro from the same server) are fine, odd17:54
fungii'll look into it now17:54
fungiVLDB: vldb entry is already locked17:55
fungii don't see a stuck vos release for it17:56
clarkbfungi: I'm guessing early distro release for focal churn caused us to timeout on a vos release?17:58
clarkbfungi: may just need to grab the cron lock, rerun reprepro then vos releas by hand?17:59
fungioh! right, we still authenticate vos release without localauth there17:59
fungiyeah, doing that now17:59
fungireran the mirror script just now, and am holding the cron lock for it in a root screen session on mirror-update.openstack.org while running vos release with localauth in a root screen session on afs01-dfw.openstack.org18:13
clarkb++18:15
*** hashar has joined #opendev18:21
clarkbI'm popping out for a bike ride before rain arrives18:27
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Add zuul_work_dir to run-test-command  https://review.opendev.org/72785518:31
openstackgerritMerged zuul/zuul-jobs master: ansible-lint-rules: Fix bad path and filename  https://review.opendev.org/72644918:33
fungilooks like we're coming up on ~2 hours of jobs running in ovh now, seems like no complaints so far about the new mirrors18:50
AJaeger\o/18:50
fungii'm about to spend an hour-ish in the kitchen, but can delete the old mirror servers once i'm done assuming we're still in the clear there18:51
openstackgerritMerged zuul/zuul-jobs master: fetch-sphinx-output: introduce zuul_use_fetch_output  https://review.opendev.org/68190518:59
openstackgerritMerged zuul/zuul-jobs master: fetch-sphinx-tarball: add missing zuul_success default  https://review.opendev.org/72727219:04
openstackgerritMerged zuul/zuul-jobs master: Add zuul_work_dir to run-test-command  https://review.opendev.org/72785519:04
*** Burke9077 has quit IRC19:18
*** hashar has quit IRC19:48
openstackgerritClark Boylan proposed opendev/system-config master: Add infra-root-keys-2020-05-13 to rotate older ssh keys  https://review.opendev.org/72786520:17
openstackgerritClark Boylan proposed openstack/project-config master: Use infra-root-keys-2020-05-13 in nodepool  https://review.opendev.org/72786720:20
clarkbinfra-root ^ fyi I told shrews and dmsimard I would do that rotation20:20
clarkbfungi: thank you for taking care of the ovh thing20:25
clarkbcorvus: mordred if you have a quick moment for https://review.opendev.org/#/c/727843/ that is an easy cleanup change in system-config20:25
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: Patch CoreDNS corefile The minikube v1.10.x is appending the systemd-resolved conf always. So to workaround this problem, do a patch after deployment.  https://review.opendev.org/72786820:30
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: Patch CoreDNS corefile The minikube v1.10.x is appending the systemd-resolved conf always. So to workaround this problem, do a patch after deployment.  https://review.opendev.org/72786820:39
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: Patch CoreDNS corefile The minikube v1.10.x is appending the systemd-resolved conf always. So to workaround this problem, do a patch after deployment.  https://review.opendev.org/72786820:42
corvusclarkb: done20:46
openstackgerritClark Boylan proposed opendev/system-config master: Set connection limits on mirror apache workers  https://review.opendev.org/72787320:49
clarkband ^ is an attempt at forcing mirror nodes to recycle apache worker processes20:49
fungiokay, so still no ovh complaints, i'll delete the old mirror servers and let amorin know we're all buttoned up20:52
corvusclarkb: lgtm and i just learned an apache.20:55
corvusapparently maxrequestsperchild is now called maxconnectionsperchild20:55
clarkbcorvus: ya I discovered that when looking this up too20:58
fungi#status log deleted obsolete servers mirror01.bhs1.ovh.openstack.org and mirror01.gra1.ovh.openstack.org20:58
openstackstatusfungi: finished logging20:58
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: Patch CoreDNS corefile  https://review.opendev.org/72786820:58
fungii've now replied to amorin and cc'd the infra-root mailbox21:03
fungimirror.ubuntu vos release is still in progress21:04
clarkbfungi: dno't forget to update emergency.yaml if you think those are all done with now21:04
openstackgerritMerged opendev/system-config master: Remove old status server from inventory  https://review.opendev.org/72784321:04
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: Patch CoreDNS corefile  https://review.opendev.org/72786821:05
fungiclarkb: yep! done just now21:05
fungithanks!21:05
clarkbthanks!21:05
clarkbfor the ssh key rotation I've checked logs for the cloud launcher runs and it looks happy currently not sure what it was doignm when I saw it fail in the corner of my eye21:10
clarkbbut all is well now so good to go I think21:10
openstackgerritJames E. Blair proposed opendev/system-config master: Run Zuul, Nodepool, and Zookeeper as the "container" user  https://review.opendev.org/72695821:11
openstackgerritDavid Hill proposed openstack/diskimage-builder master: Disable all repositories after attaching a pool  https://review.opendev.org/72787921:19
ianwfungi: yeah, i'm not sure!  i couldn't "ls" it, but i could cat from the device .... so it wasn't like a backend storage issue.  very weird21:48
ianwand yeah, i just did a straight "mkfs.ext4" ... deliberately avoiding the options from the documentation to take it back to nothing.  but i've always used those flags (like mount count iirc) and never had issues21:50
fungistrange21:51
fungianyway, thanks for working it out!21:51
fungii was fairly braindead by then21:51
clarkbdo we need to update our script that bootstraps lvm and the fs for volumes?21:52
fungiit might be that i should have just started from a blank tree rather than moving the existing /var/cache/afs contents into the new filesystem tree21:53
fungier, /var/cache/openafs i mean21:54
*** DSpider has quit IRC21:56
ianwyeah ... i guess if that gets inconsistent it can kill afsd, and then it seems to maybe get worse as the module doesn't unload properly, then maybe you try to start it and now even more is wrong ...21:59
fungiin theory basically nothing should have exercised the cache before then, but... basically nothing is not absolutely nothing22:00
clarkbinfra-root thinking out loud I think I'll go ahead and restart zuul-web now as that is really low impact and will act as a canary for memory use under python3.722:04
clarkbthen if that looks good proceed with the scheduler and mergers tomorrow morning?22:04
fungiwfm, sure22:04
clarkbjust double checking the 3 hour old image is the python3.7 image and will restart22:07
clarkbyup looks like it. Also if you want to cross check against docker hubs digests you need to do `docker images --digests`22:10
clarkbotherwise you get image ids only which don't seem useful to cross check22:10
clarkbok restarting zuul web now22:10
clarkb#status log Restarted zuul-web on zuul.opendev.org in order to switch back to the python3.7 based images. This will act as a canary for memory use.22:11
openstackstatusclarkb: finished logging22:11
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: Patch CoreDNS corefile  https://review.opendev.org/72786822:13
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Unset bindep_command to exercise install paths  https://review.opendev.org/72759322:17
*** tkajinam has joined #opendev22:46
*** yuri has quit IRC22:52
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Unset bindep_command to exercise install paths  https://review.opendev.org/72759322:57
*** tosky has quit IRC23:06
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Unset bindep_command to exercise install paths  https://review.opendev.org/72759323:06
ianwthe ansible 2.9.8 update is green and the focal testing ontop of that @ https://review.opendev.org/#/c/726981/ if we'd like to do that23:10
clarkbwe are on 2.9.1 so that should be a really safe update23:12
clarkbI have approed that change23:12
ianwi think we should knock over these final openstack.org mirrors and get everything to ansible23:20
ianwmuch as the kafs stuff is interesting i don't think we practically can use it just yet ... but we have a framework for testing23:21
clarkb++ htat would be great to make consistent. From memory I think we are about half converted at this point23:22
ianwi'll start tracking it at https://etherpad.opendev.org/p/openstack.org-mirror-be-gone23:22
ianwok, it looks like nb01 is the only thing hitting mirror02.dfw.rax.openstack.org ... and i think i know why23:29
clarkbianw: semi related did you see in scrollback my questions about emergency.yaml? would be good to clean up your entries in there if we can (nb*, static, and files)23:30
clarkbjust trying to trim that down to reality so it is easier to think about what is getting ansibled23:30
clarkbI'm going to remove status now23:30
clarkbthe change for that merged23:30
clarkbso let me do that then you can edit the file too23:30
clarkball done, all yours23:31
ianwok will look in a sec23:31
openstackgerritIan Wienand proposed openstack/project-config master: Switch nodepool builders to opendev.org mirrors  https://review.opendev.org/69075723:34
ianwclarkb: ^ that responds to your ... mumble ... month old comment and drops the https from the switch, which you're right might cause deb issues23:35
ianwi think we're past the point of no return on nb01/02.openstack.org servers, so i'll delete them23:37
clarkb+223:37
clarkber to the change but also to removing the old builders23:37
ianw#status log removed nb01/02.openstack.org servers and volumes23:41
openstackstatusianw: finished logging23:41
ianwok, two down23:42
ianwstatic.openstack.org is gone per https://storyboard.openstack.org/#!/story/200659823:49
ianwfiles02 as well23:50
ianwthat leaves the two kafs testing mirrors, per prior comments i think we should just get opendev.org transition done for now, will work on that23:50
openstackgerritMerged opendev/system-config master: Update to Ansible 2.9.8  https://review.opendev.org/72698123:53
clarkb++23:54
openstackgerritMerged opendev/system-config master: Add focal testing for mirror nodes  https://review.opendev.org/72697023:55
openstackgerritIan Wienand proposed zuul/zuul-jobs master: [wip] Unset bindep_command to exercise install paths  https://review.opendev.org/72759323:56

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!