Tuesday, 2023-04-18

ianwclarkb: i think it's fine to use the shell container, but we just want to "run --rm" it ... because we don't want old shell containers around?00:04
clarkbianw: the problem is that we aren't doing that today because docker-compose up -d doesn't rm the shell container00:05
clarkbmy concern is thta I don't want us relying on a random cronjob to correct the docker-compose commands run elsewhere.00:06
clarkbbasically the container exists today whether we want it to or not. I think removing it should be handled separately00:06
ianwi'm not 100% sure we're talking about the same thing ... in "docker images ls" we don't want to see old shell containers that were started to run this job, right?00:08
clarkbianw: its `docker ps -a` and I agree that is probably a good thing. But if you run that command right now you'll see this container exists and isn't cleaned up00:10
clarkbdocker image ls is for the images which this change doesn't affect (its going to use whichever image is currently present which is why I split it into a separate change)00:10
opendevreviewClark Boylan proposed opendev/system-config master: Prune invalid replication tasks on Gerrit startup  https://review.opendev.org/c/opendev/system-config/+/88067200:11
opendevreviewClark Boylan proposed opendev/system-config master: Run the replication task cleanup daily  https://review.opendev.org/c/opendev/system-config/+/88068800:11
clarkbit is a side effect of having shell defined in docker-compose.yaml and using `docker-compose up -d` this runs the shell container which by default executes `true` in the image then exits but does not remove the container00:12
ianwok, but the daily cron job is going to create another container every day?00:13
clarkbno, it will be the same container I think00:13
clarkb(due to docker-compose magic)00:13
ianwok, maybe that's where we're crossing wires.  in other commands like that we exec, to use the current container00:14
ianwi feel like the run is going to create a new container, and without --rm leave it there00:14
ianwthere is an easy way to test :)00:14
clarkbIt will create a new container but only one that is logically managed by docker-compose called shell00:14
clarkbif you used docker run instead of docker-compose run then you would create a new contaner every day00:15
ianw# /usr/local/bin/docker-compose -f /etc/gerrit-compose/docker-compose.yaml run -T shell ls00:15
ianw1475ef7225d8   opendevorg/gerrit:3.7   "/usr/bin/dumb-init …"   6 seconds ago   Exited (0) 5 seconds ago             gerrit-compose_shell_run_f9f293a1981600:15
ianw# /usr/local/bin/docker-compose -f /etc/gerrit-compose/docker-compose.yaml run --rm -T shell ls00:16
clarkband if you docker ps -a there should only be three containers not four00:16
ianwthere is 4, that was from ps -a00:16
fungihoe does docker-compose run differ from docker-compose exec?00:16
fungis/hoe/how/00:16
ianwhttps://paste.opendev.org/show/beCqux2wuQgO7RxVQsea/00:16
clarkbfungi: run means start a new container instance and exec means run something in an already running container00:16
fungiah, okay00:17
ianwwhere as with "--rm" "docker ps -a" doesn't show another container added 00:17
clarkbianw: what ends up cleaning our gerrit init/reindex containers then?00:17
ianwi think we exec all them?00:17
clarkbmaybe that is where I'm getting confused. We only have the one shell container and we use it for multiple commands and don't end up with extras00:17
clarkbno we don't because there isn't a running gerrit to exec an offline reindex in00:18
clarkbit has to be run00:18
clarkbhttps://etherpad.opendev.org/p/gerrit-upgrade-3.7 line 136 for example00:18
clarkbmaybe docker-compose down clears those extras too00:19
clarkband so over time we trned towards having them cleaned up00:19
ianwperhaps down clears?00:19
ianwoh, yeah, what you said :)00:19
ianwup or down, or something like that00:19
clarkbI find that extremely confusing behavior if so00:19
ianwit was named   "gerrit-compose_shell_run_f9f293a19816" so maybe docker-compose does some matching 00:19
clarkbya its something like gerrit-compose == the dirname of the docker-compose.yaml file. shell is the container name. Then $instance after that00:21
clarkbwe have _1s by default from up -d because we only want a single copy of each container00:22
clarkbI guess the run invocation gets something else that lets it track and clean it up00:22
opendevreviewClark Boylan proposed opendev/system-config master: Prune invalid replication tasks on Gerrit startup  https://review.opendev.org/c/opendev/system-config/+/88067200:22
opendevreviewClark Boylan proposed opendev/system-config master: Run the replication task cleanup daily  https://review.opendev.org/c/opendev/system-config/+/88068800:22
clarkbI expected everything to happen within the one container based on the fact that we only ever had the one container when I ran docker ps -a and we definitely run other commands on the host. But ya we must down often enough to clean them up00:22
ianwdo we use a "dummy" node in any tests?00:33
ianwthe LE test jobs roll out things to the adns primary server, but we dont' have any secondary ns nodes in the test00:34
ianwwe don't actually need them, but i'd like them in the inventory00:34
opendevreviewIan Wienand proposed opendev/system-config master: [wip] dns: abstract names  https://review.opendev.org/c/opendev/system-config/+/88058000:53
Clark[m]I don't know if we have any nodes that are ignored by the test job if that is what you mean?00:55
ianwyeah, i guess we could use add_host: to add something fake, but if there's any "host: all" that doesn't work.  00:57
opendevreviewIan Wienand proposed opendev/system-config master: [wip] dns: abstract names  https://review.opendev.org/c/opendev/system-config/+/88058001:41
opendevreviewIan Wienand proposed opendev/system-config master: letsencrypt test : update to jammy  https://review.opendev.org/c/opendev/system-config/+/88069801:41
opendevreviewIan Wienand proposed opendev/system-config master: [wip] dns: abstract names  https://review.opendev.org/c/opendev/system-config/+/88058002:15
opendevreviewIan Wienand proposed opendev/system-config master: [wip] dns: abstract names  https://review.opendev.org/c/opendev/system-config/+/88058004:06
auniyalehterpad opendev is down04:20
auniyalit's up again, thanks04:21
opendevreviewIan Wienand proposed opendev/system-config master: letsencrypt test : update to jammy  https://review.opendev.org/c/opendev/system-config/+/88069804:28
opendevreviewIan Wienand proposed opendev/system-config master: dns: abstract names  https://review.opendev.org/c/opendev/system-config/+/88058004:28
opendevreviewIan Wienand proposed opendev/system-config master: inventory : add Ubuntu Jammy DNS refresh servers  https://review.opendev.org/c/opendev/system-config/+/88057904:28
opendevreviewIan Wienand proposed opendev/system-config master: dns : add Jammy refresh servers  https://review.opendev.org/c/opendev/system-config/+/88070606:28
*** amoralej|off is now known as amoralej06:56
opendevreviewIan Wienand proposed opendev/zone-opendev.org master: Add Jammy refresh NS records  https://review.opendev.org/c/opendev/zone-opendev.org/+/88057706:58
opendevreviewIan Wienand proposed opendev/zone-opendev.org master: Remove old NS nodes  https://review.opendev.org/c/opendev/zone-opendev.org/+/88070906:58
opendevreviewIan Wienand proposed opendev/system-config master: Remove old DNS servers  https://review.opendev.org/c/opendev/system-config/+/88071007:04
ianwclarkb / fungi: https://etherpad.opendev.org/p/2023-opendev-dns is what i think the final plan for dns server swizzle comes down to.  the mess of changes ^ should work step-by-step in there...07:12
fricklerianw: does https://review.opendev.org/c/opendev/system-config/+/880580 need new records to be created, too? or why is the job failing?07:27
*** jpena|off is now known as jpena07:32
*** cloudnull6 is now known as cloudnull13:37
fungilooks like the zuul restarts eventually finished, though a new commit landed after ze05 pulled, so ze01-ze05 are on a slightly older version (by one commit) than the rest of the servers14:20
opendevreviewClark Boylan proposed opendev/system-config master: Prune invalid replication tasks on Gerrit startup  https://review.opendev.org/c/opendev/system-config/+/88067215:22
opendevreviewClark Boylan proposed opendev/system-config master: Run the replication task cleanup daily  https://review.opendev.org/c/opendev/system-config/+/88068815:22
fricklerinfra-root: seems arm64 builds are failing, timing suggests it is related to the updated nodepool https://grafana.opendev.org/d/f3089338b3/nodepool-dib-status?orgId=115:27
clarkbexec_sudo: losetup: /opt/dib_tmp/dib_image.x1eCaKaY/image0.raw: failed to set up loop device: No such file or directory15:29
clarkbthe data volume isn't full but there are 8 losetup devices in use (which is high)15:30
clarkbmight be worth stopping the services and rebooting to clear those out and take it from there? I don't recall if there are limits to losetup device numbrs smaller than say 256. losetup -f says loop9 is next at least15:31
frickleroh, so likely some other build failure first that wasn't cleaned up properly and now this. that matches that some failures seem to have started earlier in grafana than others15:32
clarkbyes or at least that is one possibility15:32
fricklerso restarting sounds reasonable, maybe also check for uncleaned build dirs if possible15:33
clarkbfrickler: should I do that or did you want to do it?15:34
fricklerclarkb: please go ahead, I'm in evening mode already15:35
clarkbok15:37
clarkbI've stopped services and have begun cleanup of /opt/dib_tmp. Will reboot when that is done15:39
frickleropeneuler has a different failure, seems there is still an issue in the mirror setup for it15:47
fricklererror: Status code: 404 for http://mirror.dfw.rax.opendev.org/openeuler/openEuler-20.03-LTS-SP2//openEuler-20.03-LTS-SP2/OS/x86_64/repodata/repomd.xml15:47
fricklernote the duplication in the middle of the path15:47
frickleroh, wait, that's 20.03. we only mirror 22.03 currently15:48
fricklerand that from from an old build log, too. maybe we should clean those up after like a couple of weeks to avoid such confusion?15:50
fricklerthe error for 22.03 is: /tmp/in_target.d/pre-install.d/00-setup-mirror: line 12: TARGET_ROOT: unbound variable15:50
fricklerwhich I think is related to the latest fix attempt in dib15:51
clarkb++ to removing old images15:52
fricklerclarkb: old images should get removed by nodepool when we remove them from the config in the right order, this is just about the build logs that linger15:54
fricklerhttps://review.opendev.org/c/openstack/diskimage-builder/+/878807/1/diskimage_builder/elements/openeuler-minimal/pre-install.d/00-setup-mirror is exactly the file above15:54
clarkbfrickler: oh I see15:54
clarkbre the TARGET_ROOT error previously that file wasn't running at all because it wasn't named in a way the runparts implementation would find and execute. I suspect this has exposed a latent bug now that it is actually running15:55
frickleryes, maybe the author will notice and debug further, seems they were have some interest in openeuler15:56
fricklerhaving15:56
*** amoralej is now known as amoralej|off16:02
fungihuawei has a vested interest in openstack upstream testing on that platform, but i think they're not always paying close enough attention to help ensure it happens. i've prodded account managers at the foundation to bring the recent openeuler testing concerns to the attention of contacts there16:05
fungihopefully they'll figure out that they need to keep an eye on this stuff16:06
*** jpena is now known as jpena|off16:17
clarkbianw: I've reviewed most of the dns update changes and I think overall the process looks good. Testing did catch a bug and I've lefta couple of questions though16:32
opendevreviewMartin Kopec proposed opendev/system-config master: refstack: fix doc paths  https://review.opendev.org/c/opendev/system-config/+/88076716:33
clarkbianw: I also left a note in the etherpad about needing ot update the other zones (zuul-ci.org, zuulci.org, and gating.dev) as well16:34
opendevreviewMerged opendev/system-config master: letsencrypt test : update to jammy  https://review.opendev.org/c/opendev/system-config/+/88069816:37
fungiwhen did zuul start reporting the offending change number in a parent merge failure? that's SO useful16:43
clarkbI want to say that happened end of last year or early this year16:44
opendevreviewJeremy Stanley proposed opendev/zone-opendev.org master: Dummy mailman hostname to house the list of lists  https://review.opendev.org/c/opendev/zone-opendev.org/+/86798116:46
clarkbnb04 finally finished cleaning up things in dib_tmp and I am rebooting it now16:59
clarkblosetup -f looks better17:02
clarkblets see how the next build(s) do17:02
opendevreviewMartin Kopec proposed opendev/system-config master: refstack: fix doc paths  https://review.opendev.org/c/opendev/system-config/+/88076717:11
clarkbfungi: do ns servers need valid reverse DNS?17:39
clarkb(something we'll need to ask vexxhost to update if so)17:39
funginot strictly required afaik, just a really really good idea17:42
frickler+1, was about to write something similar17:42
clarkbok I'll add a note in the etherpad17:42
fungibut then, i tend to think accurate reverse dns records for everything is a good idea17:43
ianwone other thought is that i can't remember how the grafana page is generated, but there is a chance it's not 100% reflecting all the active volumes.  it might be worth checking to ensure somethings not growing we're not seeing there20:01
clarkbianw: the 92.2% is total though right? you're saying a volume may have growth we aren't graphing20:02
* clarkb finds lunch.20:03
ianwthe overall is right, but yeah, might miss a volume20:03
clarkbgot it20:03
clarkboh also arm64 image builds were all broken. I decided to start by cleaning up the building since it was complaining around losetup commands20:03
clarkbwill need to check on build logs to see if that persists after clearing dib_tmp and reboots20:04
clarkb*cleaning up the builder20:04
fricklerthere was one good build at least after the reboot https://nb04.opendev.org/ubuntu-bionic-arm64-0000147820.log20:48
fricklermaybe we also want to pause openeuler builds now that they are hard broken20:49
ianwi thought we merged a mirror fix for that?  i'll have to pull it up.  but thanks for getting it back on track22:44
ianwclarkb: good point ont he . in https://review.opendev.org/c/opendev/system-config/+/880580 and will fix22:45
ianwAFAIK, the name listed in the SOA record doesn't really do anything22:45
clarkbianw: re the '.' it was the test results that made me realize those were missing since it was complaining about ns99.opendev.org.acme.opendev.org :)22:46
ianwyeah that for sure :)  22:46
ianwi'm not sure what it means to have the hidden master listed in the SOA record.  it doesn't respond to anyone but the two nameservers22:47
clarkboh good point22:48
clarkbya so that should be fine to toggle back and forth22:48
clarkbunless it needs to match what the registrar says?22:48
ianwfrom what i read, and i'm willing to be corrected, it's only used by some fairly obscure dynamic-dns implementation22:49
clarkbianw: because the dynamic updates want to go to the primary authoritatvie server? Thinking out loud here we give the registrar our NS server values so that `dig NS opendev.org` can resolve due ot the way dns db tree works. So ya the actual adns server which doesn't respond to requests shouldn't matter here22:51
clarkb(and really the registry is recording those values with the .org domain not doing anything with it itself)22:51
ianwi don't know the flow for the dynamic-dns use of the SOA record.  but yeah, for us it's ask .org. for NS records for opendev.org. (which has glue records for ns1/ns2) and then it checks the opendev.org. NS records to make sure the server is authoritative22:53
fungithe name in the soa record is used by some secondary servers to know which server to axfr the zone from, in situations where a resolver may be a secondary for multiple primaries22:55
clarkbfungi: so as long as we chnage that record after adns2 is up and running we're fine22:56
fungiit boils down to the behavior of the nsd and how explicit the configs are22:56
funginormally, you want the primary to notify secondaries when there are serial changes anyway22:56
fungiso it rarely comes into play22:57
ianwright -- that is our situation22:57
fungithe hostname in the soa can become useful if notifications don't happen (or are missed) and the zone ttl expires22:58
fungibut otherwise tends to be irrelevant22:58
ianwsince the secondaries have just one variable telling them their primary, we can deploy all the zones to two primaries (adns1 and adns02) with ansible, and none of the 4 secondaries (ns1,ns2,ns03,ns04) will get confused22:58
fungiagreed22:59
fungiit also used to be a little more relevant since you might have the same zones served from multiple primaries, you could use that field to track which primary the original record came from, but nobody really does that these days23:00
fungito a great extent it's residual baggage from an era of belt-and-suspenders design23:01
opendevreviewClark Boylan proposed opendev/system-config master: Run the replication task cleanup daily  https://review.opendev.org/c/opendev/system-config/+/88068823:06
clarkbianw: ^ good catch23:06
opendevreviewIan Wienand proposed opendev/system-config master: dns: abstract names  https://review.opendev.org/c/opendev/system-config/+/88058023:27
opendevreviewIan Wienand proposed opendev/system-config master: inventory : add Ubuntu Jammy DNS refresh servers  https://review.opendev.org/c/opendev/system-config/+/88057923:27
opendevreviewIan Wienand proposed opendev/system-config master: dns : add Jammy refresh servers  https://review.opendev.org/c/opendev/system-config/+/88070623:27
opendevreviewIan Wienand proposed opendev/system-config master: Remove old DNS servers  https://review.opendev.org/c/opendev/system-config/+/88071023:27
ianwi guess the gitea api gets a lot less exposure because the UI isn't one of these SPA's that drives everything via calls23:40
ianwwhen I first looked, i pulled up the network monitor and reloaded /explore/organizations and was like -- wait, where's the calls?23:41
ianwimagine my shock to "view source" and see ... actual source23:41
ianwnot just "<script jsblob.js>"23:41
fungioh those were the days23:43
clarkbya I want to say it was org renaming or something that we hacked around before by manipulating forms because there was no api for it23:46
clarkbbut then that got fixed23:46
clarkbianw: why does key[1] change to key[2] ?23:47
clarkbwe aren't changing any of the datastructures as far as I can tell?23:47
clarkbin https://review.opendev.org/c/opendev/system-config/+/880580 specifically23:47
ianwarrrgggghhhhhh!23:52
ianwit's freaking emacs dns mode.  it thinks it's the serial number i think23:52
opendevreviewIan Wienand proposed opendev/system-config master: dns: abstract names  https://review.opendev.org/c/opendev/system-config/+/88058023:54
opendevreviewIan Wienand proposed opendev/system-config master: inventory : add Ubuntu Jammy DNS refresh servers  https://review.opendev.org/c/opendev/system-config/+/88057923:54
opendevreviewIan Wienand proposed opendev/system-config master: dns : add Jammy refresh servers  https://review.opendev.org/c/opendev/system-config/+/88070623:54
opendevreviewIan Wienand proposed opendev/system-config master: Remove old DNS servers  https://review.opendev.org/c/opendev/system-config/+/88071023:54
ianwnow saved with vi :)23:55

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!