Thursday, 2022-02-24

DerecksonHello. Do we hav a specific channel for git-review?00:18
fungiDereckson: yep, this one ;)00:19
Derecksonah thanks :)00:19
DerecksonI maintain the FreeBSD port for git-review, and I've a request on our bug tracker to include the commit fixing the rebase for Git 2.3400:19
DerecksonI've also noticed Fedora released a "2.2.0" rpm in advance of an actual release.00:20
fungiDereckson: this one? https://pypi.org/project/git-review/2.2.0/00:21
fungihttps://opendev.org/opendev/git-review/src/tag/2.2.000:22
fungiDereckson: https://lists.opendev.org/pipermail/service-announce/2021-November/000028.html00:23
fungithat was the release announcement last november00:23
fungibut if you specifically want to backport the fix you mentioned, it's commit 7182166ec00ad3645821435d72c5424b4629165f00:24
fungireviewed in https://review.opendev.org/c/opendev/git-review/+/81821900:24
DerecksonI'd like to avoid the backport and release a new version, but I was still tracking https://docs.opendev.org/opendev/git-review/latest/00:24
Derecksonas we've already 2.2.0, problem solved, thanks00:25
fungiahh, yeah, we need to figure out a way to force a docs refresh on tag push, it would say 2.2.0 there the next time we merge a commit, but nothing's merged since i pushed that tag00:25
fungisorry for the confusion, i'll spend some more time thinking about how we can make that smoother00:27
fungiin a high-activity project it tends to go unnoticed, but git-review doesn't update all that often00:28
fungii can't remember why we had decided previously that re-running our docs jobs on tag events was problematic00:29
fungiclarkb: ianw: do either of you recall?00:29
opendevreviewMerged opendev/glean master: distro: sync to 3.6  https://review.opendev.org/c/opendev/glean/+/83053600:30
ianwumm, i do not.  i feel like it would be ok, as it would just push new contents to AFS?00:30
fungiright, like simply adding the docs job we already run in post to the tag or release pipelines00:31
DerecksonStill supported under Python 3.6 by the way?00:31
Dereckson(I only noted 2.0 is Python 3+)00:32
fungiDereckson: python-requires = >= 3.500:32
fungiso yep00:32
Derecksonok00:32
ianwfungi: yeah, perhaps we were worried about overwriting or something?  if you pushed a tag not at HEAD?00:33
fungiDereckson: the patch mentioned above was automatically tested with 3.5, 3.6, 3.7, 3.8 and 3.900:33
fungi(or some point releases thereof for each of those, anyway)00:34
fungiianw: i suppose if you tagged an earlier commit than the branch tip that could roll back the content, right00:34
funginow that zuul has the ability to guess a most likely branch for any tag, we could maybe reset to the branch tip if the ref is a tag, but that will require some thought00:35
fungiin a single-branch project like git-review that's obviously safe, but for projects maintaining multiple branches it may need some additional care to make sure the correct branch is chosen00:36
clarkbI don't recall why that changed. It used to be we published the main docs and the version specific docs when we pushed tags. But now our tag documentation jobs generaly only push the tag specific version iirc00:45
clarkbzuul suffers from this too00:45
clarkboh I think I remember. It is because the tag version docs can overwrite the latest master builds as they can race each other00:46
clarkbI agree thati s something git-review doesn't really need to worry about00:46
fungioh, right, i forgot about the version-specific doc builds00:47
fungiso, yes, we'd need to run two docs jobs on any tag event: one to publish the version-specific docs, and one which resets to the guessed branch for that tag and refreshes the branch-specific docs00:47
DerecksonThanks for the support fungi, patch submitted to the FreeBSD ports systems so we can have 2.2.0 there. Tested locally, works like charm.01:08
fungiDereckson: great! thanks for confirming it's working there01:57
*** rlandy|ruck|bbl is now known as rlandy|out02:06
*** lajoskatona_ is now known as lajoskatona02:25
opendevreviewwangxiyuan proposed zuul/zuul-jobs master: Add openEuler to iptalbe firewall persist  https://review.opendev.org/c/zuul/zuul-jobs/+/83070602:32
*** pojadhav|out is now known as pojadhav|ruck02:52
*** frenzy_friday is now known as frenzyfriday|rover04:26
*** ysandeep|out is now known as ysandeep04:53
*** ysandeep is now known as ysandeep|away05:55
*** ysandeep|away is now known as ysandeep07:04
*** amoralej|off is now known as amoralej07:09
*** jpena|off is now known as jpena08:34
opendevreviewIan Wienand proposed opendev/system-config master: encrypt-logs: turn on for all prod playbooks  https://review.opendev.org/c/opendev/system-config/+/83078408:40
opendevreviewIan Wienand proposed opendev/system-config master: docs: reorganise around a open infrastructure overview  https://review.opendev.org/c/opendev/system-config/+/83078508:40
fricklerthis may affect our mailman setup going forward https://bugs.launchpad.net/ubuntu/+source/mailman3/+bug/196054709:06
*** ysandeep is now known as ysandeep|lunch09:07
opendevreviewMerged openstack/diskimage-builder master: rhel: work around RHEL-9 BLS issues  https://review.opendev.org/c/openstack/diskimage-builder/+/82962010:04
opendevreviewMerged openstack/diskimage-builder master: Detect boot and EFI partitions in extract-image  https://review.opendev.org/c/openstack/diskimage-builder/+/82861710:04
*** ysandeep|lunch is now known as ysandeep10:19
opendevreviewWill Szumski proposed openstack/diskimage-builder master: Always Use linuxefi with DIB_BLOCK_DEVICE=efi  https://review.opendev.org/c/openstack/diskimage-builder/+/83080110:20
opendevreviewMerged opendev/glean master: Add Rocky Linux support  https://review.opendev.org/c/opendev/glean/+/83053910:55
*** rlandy|out is now known as rlandy|ruck11:18
*** dviroel_ is now known as dviroel11:28
*** bhagyashris_ is now known as bhagyashris11:57
*** amoralej is now known as amoralej|lunch12:19
*** sshnaidm is now known as sshnaidm|off12:23
*** ysandeep is now known as ysandeep|afk12:33
*** ysandeep|afk is now known as ysandeep12:54
*** amoralej|lunch is now known as amoralej13:19
opendevreviewMerged opendev/glean master: Remove rebuild-test-output.sh  https://review.opendev.org/c/opendev/glean/+/83054013:39
opendevreviewRiccardo Pittau proposed openstack/diskimage-builder master: Revert "Detect boot and EFI partitions in extract-image"  https://review.opendev.org/c/openstack/diskimage-builder/+/83071713:42
*** dviroel is now known as dviroel|brb14:01
fungifrickler: thanks for the heads up! while i did the poc with the debian/ubuntu packages, the current plan is to use https://docs.mailman3.org/en/latest/install/docker.html14:08
fungiand yeah, i actually bookmarked the rfh bug back when it was opened in november, in case i find time to help maintain the debian packages (which i might if i decide to run them on some of my servers for my own projects)14:18
fungilooks like a couple of wmf folks volunteered to pitch in though14:20
fungiand they're acutally dds, which i'm not, so probably have more chance of being useful as their uploads won't need sponsoring14:21
*** dviroel|brb is now known as dviroel14:23
*** pojadhav|ruck is now known as pojadhav|brb14:28
*** pojadhav|brb is now known as pojadhav|afk14:32
mgagne_fungi: my (new) employer would like to reach to you. Would your first name at openinfra.dev be the email to use?15:46
fungimgagne_: sure, that works. thanks!15:51
mgagne_good15:51
*** dviroel is now known as dviroel|lunvh16:03
*** dviroel|lunvh is now known as dviroel|lunch16:03
*** ysandeep is now known as ysandeep|out16:17
*** pojadhav|afk is now known as pojadhav|out16:25
clarkbhttps://github.com/go-gitea/gitea/pull/18799 is the fix for our gitea 1.16 diff issue. Looks like they noticed before I filed the bug, but I couldn't find an issue for it16:29
*** ykarel is now known as ykarel|away16:31
fungihah, convenient!16:34
clarkbLooks like there may be a good number of other fixes in 1.16.2. I'm thinking we can wait for that version whether or not it includes the diff fix and maybe update to that instaed16:35
fungisure16:37
fungiwfm16:37
opendevreviewMerged openstack/diskimage-builder master: Revert "Detect boot and EFI partitions in extract-image"  https://review.opendev.org/c/openstack/diskimage-builder/+/83071716:45
*** marios is now known as marios|out16:51
*** dviroel|lunch is now known as dviroel16:58
opendevreviewClark Boylan proposed opendev/system-config master: Update Etherpad to 1.8.17  https://review.opendev.org/c/opendev/system-config/+/83087417:11
opendevreviewClark Boylan proposed opendev/system-config master: DNM forcing failure to hold a node and check new etherpad  https://review.opendev.org/c/opendev/system-config/+/83087517:11
clarkbI'll put a hold on the etherpad job for that second change and we can check the etherpad there and land the parent if it looks good17:12
fungiooh, thanks!17:13
clarkbfungi: 0000000024 is an autohold for the gitea links checking. I had mentioned to frickler we can use that one to double check the mergability checking update on a running gerrit. I'll see if I can get to that today, then we are good to delete the autohold?17:14
fungiclarkb: yeah, i have no more need of it17:29
*** amoralej is now known as amoralej|off17:42
*** jpena is now known as jpena|off17:53
clarkbinfra-root https://104.130.132.239/ but you need to hit it with that ip set as etherpad.opendev.org in /etc/hosts due to redirects17:57
clarkbI'm in the isitbroken pad17:57
*** rlandy|ruck is now known as rlandy|ruck|mtg17:59
clarkbit seems to be working happily. If you agree we should be able to proceed with https://review.opendev.org/c/opendev/system-config/+/83087417:59
corvuschecking18:00
corvusit does not seem broken18:01
corvusi have closed etherpad18:07
clarkbme too. Let me know if I should rejoin to perform additional testing but I think corvus and I are happy with it18:08
clarkbwe do note that unset names appear to be able to be set by anyone. Then once set only the controlling session can change the name18:09
corvusi think that's an exist behavior, just not one we exercise often18:11
corvus           +ing18:11
clarkboh one other thing. I checked to see if 1.8.17 has upstream docker images published yet and it does not. Seems like it was a good move to stop relying on them for these18:20
clarkbI was hopeful that was a one off issue with 1.8.16 but seems nopt18:21
fungiinteresting18:21
opendevreviewClark Boylan proposed opendev/system-config master: DNM test if we can build gitea release/v1.16  https://review.opendev.org/c/opendev/system-config/+/83088518:38
*** rlandy|ruck|mtg is now known as rlandy|ruck19:19
mnasiadkahello19:20
mnasiadkaregarding Rocky Linux - does it work now after merging the glean patches?19:20
clarkbmnasiadka: no we need a glean release and that needs to be coordinated with a dib release. One thing complicating this is it changes how glean is executed by udev a bit and ideally we'll be able to monitor that as it goes in19:26
clarkbthat said rocky is bootable in at least one of the clouds now via a fallback to dhcp so it kind of works19:26
mnasiadkaok, so I can run a job now using rocky nodeset?19:26
mnasiadkaor do I have to wait for the coordinated release?19:27
clarkbI think you can try running a job now and see what happens19:29
clarkbI'm not going to commit to it working reliably at this point, but it should hopefully be sufficient to start seeing if rocky works for your stuff and preparing for when it is reliable19:29
clarkbif ianw would like to do those releases early in his day to day I can help monitor today and tomorrow.19:30
clarkbI think worst case we end up tagging an old commit of glean as a new version to revert19:30
clarkbbut there is a lot to observe as we need to ensure all the images continue to boot in all the clouds :/19:30
opendevreviewJeremy Stanley proposed opendev/system-config master: Clean up two retired mailing lists  https://review.opendev.org/c/opendev/system-config/+/83089319:34
clarkbfungi: isn't it great when ansible is helpful :)19:35
fungiso great19:35
clarkbLooking at glean I htink we'd do a 1.20.0 for the executable restructuring, gentoo fixes, rocky addition and distro sync to 1.619:54
fungiyeah, it needs a minor rev, not patch19:54
fungiagreed19:54
fungithe rocky addition and vendored distro update both warrant it19:55
clarkbcorvus: is https://zuul.opendev.org/t/openstack/build/cde0c6eedcc54d67acac0e2ee31245ca a zuul-registry bug?20:14
clarkbfungi: also I think if you want you coudl update the gerrit testing jobs to not rebuild the images on your config update20:14
clarkbfungi: that might be more reliable now wheil we sort out that problem which seems fairly consistent20:15
clarkband now I need to eat lunch20:15
corvusclarkb: either that, or a client bug, or a client/registry interaction bug20:21
fungioh, huh20:21
corvusor maybe corrupted data?  i dunno.  it usually takes a while to triage those20:21
corvus`Manifest has invalid size for layer sha256:41de18d1833d2d5e6cb6111780577f3eab0afd9b1bf6c0c1756c5227abfa645c (size:44359680 actual:203253446)` is the meat of it20:22
fungiyeah, that definitely seems wrong20:23
fungilooks like it could be an interrupted upload20:24
fungithough size<actual which would imply the reverse?20:24
fungii'm misunderstanding why we'd see that on upload of a newly rebuilt image each time though20:27
fungiand yeah, i misread the error originally and thought it was dockerhub rejecting the upload, didn't spot that it was the buildset registry20:28
fungi829975 is really not at all urgent though, i'm perfectly happy to sort this problem out first20:29
fungithough before i do anything else, i should probably sort out dinner20:38
clarkbhrm we've had these size mismatches before I think ianw debugged them20:39
clarkbthe gist was we calculated things off by one in the registry iirc and some clients ebcause more strict?20:40
fungisomething something skopeo something? ;)20:40
clarkbmaybe we've got another client being more strict and we're catching another existing bug somewhere20:40
clarkbhttps://zuul.opendev.org/t/openstack/build/ad14ff4e385640beba9d43b9de108973 so it did succeed in check20:45
clarkbbut then seems to consistently fail in gate?20:45
clarkboh gate and check run two slightly different version of the image build jobs. Once uploads only to the insecure ci registry and the other to both docker and the insecure ci registry20:46
clarkbso in this case cherrypy is reporting back that the manifest reported a size of 44359680 but it got 203253446 bytes. The tool doing the push is docker itself not skopeo20:47
clarkbthis upload is to the buildset registry not the insecure ci registry.20:50
fungithe server itself doesn't seem to be in any particular distress, at least20:51
fungiif anything it looks extremely underutilized20:52
clarkbwell this is the buildset registry which is per buildset20:52
clarkbhttps://zuul.opendev.org/t/openstack/build/d80aa9c8e84241cdbbfffc7d7c4f43fc/log/docker/buildset_registry.txt is the log for the one that failed20:52
clarkbhttps://zuul.opendev.org/t/openstack/build/d80aa9c8e84241cdbbfffc7d7c4f43fc/log/docker/buildset_registry.txt#767 here is the server side reporting the 400 error20:53
*** dviroel is now known as dviroel|afk20:54
opendevreviewSteve Baker proposed openstack/diskimage-builder master: Revert "Revert "Detect boot and EFI partitions in extract-image""  https://review.opendev.org/c/openstack/diskimage-builder/+/83090020:54
clarkbhttps://zuul.opendev.org/t/openstack/build/d80aa9c8e84241cdbbfffc7d7c4f43fc/log/docker/buildset_registry.txt#711 that shows the docker client HEADing the blob and getting back the actual size rather than the size in the manifest20:55
clarkbthe docker version is from december20:56
clarkboh now this is creally curious20:56
clarkbhttps://zuul.opendev.org/t/openstack/build/d80aa9c8e84241cdbbfffc7d7c4f43fc/log/docker/buildset_registry.txt#667 vs https://zuul.opendev.org/t/openstack/build/d80aa9c8e84241cdbbfffc7d7c4f43fc/log/docker/buildset_registry.txt#71020:57
clarkbhttps://zuul.opendev.org/t/openstack/build/d80aa9c8e84241cdbbfffc7d7c4f43fc/log/docker/buildset_registry.txt#706 that is when it reports the upload is complete20:57
clarkbI think it must be checking the size before the upload is complete and using that smaller value. Then when it tries to use that smaller value the up to date and correct value is known and there is a conflict20:58
clarkbI don't know enough about the docker image upload protocols to know if we are supposed to kick back a 404 or what if the upload isn't completed20:58
clarkbcorvus: ^ do you know off the top of your head?20:59
fungioh, yeah, i misread you, i was looking at insecure-ci-registry, sorry20:59
ianwo/21:00
ianwistr that there was a podman 4 release very recently.  have we pulled in something different?21:01
clarkbianw: this hsould all be docker not podman21:03
clarkbhttps://zuul.opendev.org/t/openstack/build/d80aa9c8e84241cdbbfffc7d7c4f43fc/log/docker/buildset_registry.txt#552-555 at this point which is before 667 with the shorter size it seems to recognize 203253446 is the size of something21:03
ianwclarkb: i can do the glean release ... but given that it will have global effect i'm thinking that doing it on my monday might be a better idea.  rolling it back will be an exercise in rebuilding things and having a longer runway when things are quiet would be helpful21:04
clarkbianw: wfm21:04
clarkbianw: I think if things get really bad iwth a glean release the easiest thing may be to pull the release from pypi21:04
clarkbthen the next round of image buidls will use the prior good release which we know to work21:05
clarkbbut ++ to doing it when everything can be monitored and action taken to address problems if necessary21:05
clarkbfor the registry thing I think line 667 of the registry log is the thing we need to undersatnd better. It is returning the short value that things complain about and it is doing so after the registry seems to know about the larger value logged on 552-55521:06
ianwhrm maybe not podman, but IIRC the use of buildkit was invovled21:07
clarkbzuul registry's filesystem implementation does return os.stat(path).st_size to get the size that head returns21:08
clarkbI think the issue here maybe that we've told the registry it is this size, but before all that data is written out to disk we'ev queried the disk for how much it has and fed that back to docker21:08
clarkbthinking out loud here. Maybe we need to write to a tmp name then do an atomic mv?21:09
clarkbthen the os.path.exists will fail for the aerly HEAD and we report the proper result when we know it? Though that might also break docker because it may need that early HEAD to succeed21:10
ianwhttps://opendev.org/zuul/zuul-registry/commit/7100c360b31dfd1b9f4413eb2df3db3f229a5e26 seems similar21:12
clarkbI've just confirmed the actual_size value is also from disk. So by the time we go to validate the manifest the size is correct on disk but the PUT for the manifest must've supplied the short value from the earlier head21:16
fungiso are we racing a fd flush?21:18
clarkb"When this response is received, the client can assume that the layer is already available in the registry under the given name and should take no further action to upload the layer." this is what it says about the HEAD on a layer (which is expected to return the size)21:19
clarkbI think there is a race here. We shouldn't be responding positively to the HEAD request when the value is short.21:19
clarkbfungi: no I think the upload hasn't completed yet based on the logs21:19
clarkbfungi: even if we flushed everything we'd still be short because the client is sending us more data21:20
clarkbhttps://zuul.opendev.org/t/openstack/build/d80aa9c8e84241cdbbfffc7d7c4f43fc/log/docker/buildset_registry.txt#706 is where the upload for the layer completes but https://zuul.opendev.org/t/openstack/build/d80aa9c8e84241cdbbfffc7d7c4f43fc/log/docker/buildset_registry.txt#667 is responding that the layer exists prior to that with a smaller size21:20
fungioh, the short read is happening server-side?21:21
clarkbyes, it seems like midway through the upload the client is asking the server in a separate thread to tell it how big the not yet completely uploaded layer is21:23
clarkbthe server is checking what it has on disk which is short because it hasn't finsihed recieving the data and responds with that value. Then the client constructs a manifest that it sends to the server with that short value21:23
clarkb(its a bit insance to me that we need to track sizes like this when the client is the originator of the info and everything is hashed, just check if hashes are valid)21:24
clarkband zuul-registry doesn't seem to be stateful enough to know whether or not a file on disk represents a complete upload?21:25
fungi#zuul21:30
fungihah, you're not my buffer command21:30
clarkbI suspect that this may be a bug in the client HEADing a layer before it has finished uploading the layer which exposes a bug in zuul-registry which returns the layer info before it is complete21:32
clarkbhrm there may be some state here though as we seem to record chunks and those go to a subdir?21:34
clarkband then later they end up in the path that the HEAD looks at21:34
clarkboh actually https://zuul.opendev.org/t/openstack/build/d80aa9c8e84241cdbbfffc7d7c4f43fc/log/docker/buildset_registry.txt#554 tgat may be where we finish the upload which is before teh short read so maybe the HEAD isn't too soon and we just aren't synced?21:36
clarkbI think I see it21:39
clarkbor at least something that can cause it. Fix on the way21:39
opendevreviewMerged opendev/system-config master: Clean up two retired mailing lists  https://review.opendev.org/c/opendev/system-config/+/83089321:44
clarkbremote:   https://review.opendev.org/c/zuul/zuul-registry/+/830905 Atomically concatenate blob objects21:52
clarkbIt is possible that the fsync flushing is also to blame I suppose21:53
clarkband we may not actually solve this with the move, but I think the move is more correct either way21:53
clarkbalso I think this may be somewhat cloud specific22:01
clarkbdue to iops availability?22:01
clarkbrechecking until it goes through is probably viable. but also we should reconsider building gerrit images for apahce config changes22:01
clarkbthey two shouldn't be related22:01
opendevreviewSteve Baker proposed openstack/diskimage-builder master: Revert "Revert "Detect boot and EFI partitions in extract-image""  https://review.opendev.org/c/openstack/diskimage-builder/+/83090022:17
ianwi wonder how hard it would be to setup a job to build and push into the registry via mitmproxy and keep a dump of the traffic.  that was how i debugged the original issues22:17
clarkbhttps://review.opendev.org/c/opendev/system-config/+/830874 could use another reviewer. I'm happy to approve that tomorrow morning and watch it after wards22:28
clarkbthe child change has a held etherpad you can test with too. corvus and I did testing earlier with the isitbroken pad on the test node and it seeemd fine (note you have to update your /etc/hosts to make the redirects work)22:28
opendevreviewClark Boylan proposed opendev/system-config master: Remove Gerrit's JVM GC logs  https://review.opendev.org/c/opendev/system-config/+/83091222:44
clarkbOk I tested the mergeability update and it seemed fine. I think we can proceed with that when we have a good time to restart gerrit22:44
clarkbsince I was on the test node gerrit I wanted to see if I could fix the jvm_gc log rotation errors we always get on gerrit startup and 830912 is the best I could come up with. Basically stop telling it to write those files entirely. Note that we have to stop gerrit and move the old files aside before starting it again or we get the same errors on startup22:45
clarkbThe extent of my mergability testing was to check the error_log for any indication a full reindex occured and I didn't see any22:45
clarkbI'll leave that server up until tomorrow in case frickler wants to take a look but then I think we can delete the old22:46
clarkb*delete the zuul autohold22:47
fungiokay, back from dinner/errands22:50
fungii guess a depends-on to 830905 from a failing system-config change isn't going to exercise the updated registry code, right?22:53
fungilooks like base deploys may be failing again, digging into it now22:56
fungithe failed build i'm looking at claimed codesearch was unreachable at 2022-02-24T21:49:42Z22:58
fungii can ssh into it and it's been "up 29 days" so maybe just a network blip22:59
fungioh! i should have read further22:59
fungiit was "unreachable" because "mkdir: cannot create directory ...: No space left on device"22:59
fungirootfs is indeed full22:59
fungii'll see what's gone nuts23:00
ianwfungi: no, it will really have to be rolled out to the actual registry i think 23:01
clarkbianw: fungi: the buildset registry uses the :latest tag s o ithink once my change lands and :latest is updated it will be used afterwards23:01
fungigot it23:01
clarkbfungi: ianw: re codesearch seems it is largely /var/lib/hound/data23:02
clarkbwhich is where the indexed data lives23:02
ianwi wonder if we're not removing things we should be; seems there's like daily files, but stuff from back in november23:03
clarkbI do half wonder if some of the data in there is stale though. There are 2831 entries and I don't think we're indexing that many repos23:03
ianwnot sure if it's additive23:03
clarkbianw: ya exactly23:03
fungiyeah, that's 87% of the fs23:04
fungithere's a ton of old files in there23:05
fungieven over a year old23:05
fungii thought it regenerated all its data  at restart23:05
ianwwe reload it if project-config changes23:06
clarkbya I wonder if every time we update it or restart it it leaks some data23:06
fungiyeah, there's content dated back to 2020-11-2023:06
clarkbwe might be able to stop it. Delete things, then start it again? This might also be slow?23:06
fungiit might be slow, yes23:07
clarkband maybe we should consider not bindmounting that data so that it auto flushes every time we restart23:07
clarkbassuming that is the issue23:07
ianw /var/log/resync-hound.log, which is supposed to log resyncs, is empty, i'm guessing a pipe error23:07
ianw /usr/local/bin/resync-hound >> /var/log/resync-hound.log 2>&123:07
fungithe data tree contains 817421 regular files and 257558 directories23:07
ianwseems right ... wonder why it's blank23:08
fungiianw: maybe the disk filled up and then the logfile was rotated?23:08
fungino space, no way to write to the new log23:08
clarkbalso is the redirect happenign in the container context? I don't think so based on the quoting23:08
clarkbbut /var/log in the container is separate from /var/log on the host so maybe?23:09
ianwthe theory there at least is to take whatever comes out on stdout of the run and save it locally, i don't think that passes through23:09
fungishould i hold off stopping the container? (or trying to stop it at least? that may also fail if it needs to write to the rootfs23:10
fungiwe'll also want to reboot, seeing as how it's the rootfs which filled up completely, there's no telling what else may have broken in the process23:11
clarkbI expect stopping it will fail due to the full disk. What I've done in the past is had journald prune some data then you get enough haed room to stop things and make changes23:12
fungibut yes, my vote is to stop the container if we can, blow away the data tree, reboot the server, then wait for everything to reindex and see what the utilization looks like after that23:12
clarkb`journalctl --vacuum-size=500M` or similar then docker-compose down, then debug further?23:12
clarkbfungi: ya that sounds good23:12
fungithanks for the journalctl tip, i'll do that first just for safety23:13
ianwi think yes, start again and we can monitor what's going on when we issue resyncs23:13
fungidowning the container now23:13
fungiit claims to have downed the container successfully23:13
fungiokay, so are we agreed on just recursively deleting everything under /var/lib/hound/data?23:14
clarkbmy understanding has been that all of that data is ephemeral and can be rebuilt from the source repos23:16
clarkbso ya I'm good with that23:16
ianw++23:16
fungithanks, removing now23:17
ianwso when we update, we run update-hound-config, which diffs project-config and, if it is different, runs supervisorctl restart houndd23:21
ianwi note that in the docs, they don't seem to say anything with the container about mapping in a volume for index data23:22
ianwi'm not sure if supervisorctl stop && rm -rf /index/data/* && supervisorctl start would work ... or if docker would think the container was dead?23:23
clarkbianw: I think we could just stop start the container instead?23:25
clarkber and use ephemeral /var/lib/hound/data23:25
ianwthe problem is that the update checking happens inside the container23:26
clarkbianw: can we have an external prcess trigger the check within the container and then take action externally if necessary?23:26
fungiokay, deletion complete, rootfs utilization is at a meagre 13%23:29
opendevreviewIan Wienand proposed opendev/system-config master: codesearch: remove index when resyncing  https://review.opendev.org/c/opendev/system-config/+/83091623:30
fungiam i clear to reboot the server now so we can make sure fundamental processes are sane, in case anything else broke when it could no longer write?23:30
ianwclarkb: ^ maybe with more fiddling; that may be another approach23:31
clarkbianw: ya that seems like it could work23:31
clarkbfungi: ya I think so23:31
ianwalthough, i guess we should not map in the index directory either23:32
fungiokay, rebooting the server now23:32
clarkbianw: oh ya beacuse we also stop and start the service externally23:33
fungiserver has rebooted, should we start the container again or are we wanting to get some changes deployed to it first?23:37
ianwi think restart, i'm just thinking about it23:37
clarkbya I think we can start it and see what we learn23:37
*** dviroel|afk is now known as dviroel23:37
fungiokay, startint it up now23:37
*** rlandy|ruck is now known as rlandy|out23:38
fungilooks like it's running again23:38
fungireindexing will presumably take ~hours23:38
clarkbif you tail the log it gives you some indication of what it is doing iirc23:38
clarkb/var/log/containers/docker-hound.log23:39
fungiyep23:39
ianwthat's why i wonder if a complete reindex on project-config updates is the best way23:44
ianwi wonder if you can just delete idx-*23:47
clarkbsays its done23:50
clarkbso that was about 13 minutes?23:50
clarkbI get results searching in it too23:50
clarkband its only using ~6GB now?23:50
clarkbso ya we definitely leak23:50
ianwit's probably worth putting a todo for next week and coming back to look at it23:51
ianwand checking the old directories on disk, and see if they've been re-cloned23:52
clarkb++23:52
ianwi've soured on the idea of just removing the whole index on restart; but maybe that's the only solution23:53
clarkbor maybe we can identify the old data somehow and delete it after codesearch is restarted23:54
ianwyeah, maybe an mtime type find & delete23:54
ianwif we see that the project has been re-cloned and the old one hasn't been accessed23:55
ianwi don't see any references to this in the hound github, open or closed issues23:59

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!