Tuesday, 2022-10-18

ianwstephenfin: trying to get to the bottom of a stestr/stevedore/importlib-metadata/python3.7 horror combo issue -> https://github.com/mtreinish/stestr/issues/33603:00
ianwit looks like https://opendev.org/openstack/stevedore/commit/143a3e9f0716690be7343d4d083f65d7624b3d2e in stevedore 3.5.1 should be the fix; but stestr still isn't finding it's commands03:02
ianwreal_groups with old importlib-metadata looks like -> https://paste.opendev.org/show/bZ7yPaO1pVF8aKT2uGof/03:14
ianwreal_groups with stevedore 3.5.1 looks like -> https://paste.opendev.org/show/bpqS0OPic22XF284x5Dg/03:14
ianw... i think they should look the same.  it suggests to me the expansion maybe isn't quite right?03:15
ianwhttps://review.opendev.org/c/openstack/stevedore/+/86169503:57
ianwthat seems to fix stestr on buster/python 3.7 ... but does it introduce some other issue?  i don't know03:58
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Pin sphinx to 5.2.3  https://review.opendev.org/c/zuul/zuul-jobs/+/86158704:30
opendevreviewIan Wienand proposed zuul/zuul-jobs master: linter: Use capitals for names  https://review.opendev.org/c/zuul/zuul-jobs/+/85493304:30
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Fix ansible-lint name[template]  https://review.opendev.org/c/zuul/zuul-jobs/+/86155904:30
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Add names to include tasks  https://review.opendev.org/c/zuul/zuul-jobs/+/86156004:30
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Standarise block/when ordering  https://review.opendev.org/c/zuul/zuul-jobs/+/86156204:30
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Update to ansible-lint 6.8.2  https://review.opendev.org/c/zuul/zuul-jobs/+/86156304:30
opendevreviewIan Wienand proposed zuul/zuul-jobs master: [wip] sphinx circular dependencies error  https://review.opendev.org/c/zuul/zuul-jobs/+/86158804:30
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Workaround stevedore/python3.7 issues  https://review.opendev.org/c/zuul/zuul-jobs/+/86169804:30
*** ysandeep|out is now known as ysandeep05:47
ianwchown: invalid group: ‘root:letsencyrpt’06:20
ianw... ? 06:20
ianwhttps://7e827a77180c1e6e432f-3c4e8d8f712aba3e652b0cfd0c30a298.ssl.cf5.rackcdn.com/861138/12/check/system-config-run-letsencrypt/35bbcf9/letsencrypt01.opendev.org/acme.sh/acme.sh.log06:20
*** ramishra_ is now known as ramishra06:38
*** jpena|off is now known as jpena07:17
*** ysandeep is now known as ysandeep|lunch08:16
*** marios is now known as marios|call09:00
*** jpodivin__ is now known as jpodivin09:33
*** ysandeep|lunch is now known as ysandeep10:37
*** marios|call is now known as marios11:33
*** dviroel|out is now known as dviroel11:41
stephenfinianw: One comment on https://review.opendev.org/c/openstack/stevedore/+/86169512:01
*** ysandeep is now known as ysandeep|away12:20
frickleranyone else seeing lags when loading etherpads currently?13:03
fungiit was maybe a little slower than usual for me. i'll take a look at the system resource utilization13:35
*** dasm|off is now known as dasm13:49
clarkbmemory and system load look fine. I do note that the root fs is a bit full14:09
clarkbit looks like log rotate has rotated out (deleted) one of the old db backups in /var/backups/etherpad-mariadb14:10
clarkbthe bulk of the disk is consumed by the etherpad container's json log file14:14
clarkbI think we should either directly truncate that under the container (ugh) or down then up the container later to restart the log collection with a new container14:16
clarkband then look at redirecting the logs to /var/log/containers in order to log rotate them there14:17
clarkbalso we should clear out the old backup at /var/backups/etherpad-mariadb/etherpad-mariadb.sql.gz.314:17
fungiyeah, we can probably do it safely around 18:00 utc if we don't want to wait until the weekend14:17
clarkbya I think start with clearing the extra backup file14:19
clarkbthen down then up and that should get the disk usage into a happy enough spot where we can add the syslog redirects without being rush14:19
clarkb*without being in a rush14:19
fungipeople will get disconnected from pads they've left up in their browsers, but since there shouldn't be any sessions running at that time it hopefully won't be too disruptive14:22
fungiwe can #status log it for a bit of added visibility14:22
clarkb++14:22
fungii have no idea what would happen if we restarted etherpad while a meetpad call is running14:23
clarkbit should just fail the document14:25
clarkbthe call will work (as was the casewhen we had the cross site domain stuff improperly set up)14:25
*** ysandeep|away is now known as ysandeep14:31
fungiyeah, just didn't know if it would be able to reconnect clients to the pad after14:31
*** dviroel is now known as dviroel|dr_appt14:52
clarkbone thing I noticed about the jammy cloud image I used for gitea-lb02 is that it has a /boot/efi as a separate partition15:31
fungidebian's official images are like that too15:31
clarkbDoes that imply it is also gpt instead of mbr?15:32
clarkbvexxhost booted it just fine so I'm not really worried about it, but curious to see cloud images moving forward like that15:32
fungiyou should be able to inspect it easily, but i expect so yes15:32
fungi`sudo fdisk -l /dev/vda`15:33
fungi"Disklabel type: gpt"15:33
clarkbis that from gitea-lb02?15:33
fungiyes15:33
clarkbneat15:33
fungipartition types are Linux filesystem on /dev/vda1, BIOS boot on /dev/vda14, and EFI System on /dev/vda1515:34
clarkboh interesting I guess that implies it can boot legacy or efi15:35
fungivda14 isn't mounted but vda15 is, so that's what i take from it yes15:35
clarkbthat makes sense for an official cloud image that might be used in many places15:35
fungialso fstab says it's configured to use swap on /swapfile15:36
fungiso no separate swap partition15:36
clarkbyup the swapfile is created by our launch scripts15:39
*** ysandeep is now known as ysandeep|out15:44
fungiah okay15:47
clarkbinfra-root the ptg sessions I got up earlyfor today are winding down and I should be around if we want to alnd https://review.opendev.org/c/opendev/zone-opendev.org/+/86122915:59
clarkbcacti is collecting info from the new server now too so we'll be able to watch and compare that data against the historical data for theold server16:00
*** marios is now known as marios|out16:02
clarkband now breakfast since I managed to skip that16:03
fungiyeah, i'm ready for 861229 whenever16:04
fungiabout to go pick up some lunch takeout, but i'll only be gone for like 10-15 minutes16:04
*** jpena is now known as jpena|off16:31
clarkbfungi: are you back? If so I'll go ahead and approve that change16:46
*** svinavel_ is now known as svinavel16:47
*** dviroel|dr_appt is now known as dviroel16:49
fungiclarkb: yeah, sorry, had stepped away to eat17:07
fungibut i'm around and ready to watch/fix stuff17:07
clarkbok approved17:08
clarkbnext up is looking at ethepad. Earlier I think you mentioend waiting until 1800 for that to be sure people had set the ptg down?17:08
fungiyeah, sessions officially ended at 17:00 but i expect some may run long17:09
clarkbwfm17:09
fungibut an hour after sessions have officially ended seems like plenty of buffer17:09
fungiand the outage should be extremely brief17:10
clarkbyup17:10
opendevreviewMerged opendev/zone-opendev.org master: Swap gitea-lb01 to gitea-lb02 for opendev.org  https://review.opendev.org/c/opendev/zone-opendev.org/+/86122917:10
clarkbgitea-lb02 is in dns for opendev.org now17:48
clarkbIt seems towork for me but please say something if you notice anything odd or unexpected. We can monitor resource utilization via cacti as well and probably clean up gitea-lb01 towards the end of the week if nothing comes up17:49
fungii'm cloning nova now just as a cursory check17:51
clarkb++17:53
fungithat worked17:53
fungii also let #openinfra-events know about the impending etherpad restart17:54
clarkbthanks. Before doing that etherpad down and up we should clean up the unneeded backup file if others agree that file is extra17:54
fungilooking17:55
fungietherpad-mariadb.sql.gz.3 from 2022-10-11 or something else?17:55
clarkbyes that one17:56
clarkbit seems that the rotation failed. I suspect due to running out disk space to move things around17:56
fungiyes, it looks like there is currently insufficient space to create a new db backup even17:57
clarkbI think we can manually delete it and then down then up -d the containers to clear out the json log file associated with the container. And that should free a bunch of space17:57
fungiagreed17:57
fungii don't think .3 is necessarily "extra" but that it was probably created just before space got too tight to rotate any new backups17:58
clarkbwell I think we only keep like one extra day normally17:58
clarkbwhich is why .1 and .2 don't exist now that .3 won't go away?17:58
fungiand since then it's just been overwriting the primary backup file17:59
fungicould be17:59
clarkbfungi: were you planning the do the things or should I?18:00
clarkb(I'm around tohelp either way)18:00
fungii can do the things, no prob18:00
fungiworking on that now18:00
clarkbthanks!18:00
fungifirst deleting /var/backups/etherpad-mariadb/etherpad-mariadb.sql.gz.318:00
fungifree space on the rootfs went from 2.9G to 6.4G18:01
fungiwhen we stop and start the container, is there a log deletion step i need to do in between?18:02
clarkbfungi: you have to use docker-compose down then docker-compose up -d which will cause docker-compose and docker to delete the containers and create new ones. The container deletion step will wipe the logs18:02
clarkbif you stop then start the container process will stop and you'd have to manually delete the files behind docker's back which seems hackier18:03
fungiperfect, so no extra step required18:03
fungidone. available space on the rootfs is back up to 20G now18:04
clarkbheh we just got an email about backups failing. I checked the log and the reason is that we stopped the databsae server (the dockercompose down) while it was backing up remotely18:05
clarkbI think that is fine, but if we are concerned we could manually retrigger the backup crontab entry in a root screen18:05
clarkbI'm able to open a coupel of etherpads so all looks well from that perspective18:06
clarkbaccording to cacti the gitea-lb02 network traffic is picking up. System load and cpu usage both look to be happy so far18:10
fungiyeah, etherpads seem to be working for me18:13
fungidatabase backups kick off at 04:42z daily, so i expect if we interrupted that db backup then it was approximately hung18:14
clarkbfungi: there are two different db backups18:15
clarkbthere are the borg driven backups which happen randomly for each server and stream to the borg backup servers (this is what we interrupted). And there is the local write a file on the host for ease of use and local db backups which is what we rm'd thestale file for18:15
fungioic18:15
fungiit's the local mysqldump which happens at 04:4218:16
clarkbyup18:16
clarkbfungi: https://review.opendev.org/q/topic:docker-cleanups+OR+topic:use-new-python is a set of docker image fixes and python modernization changes18:17
clarkbif you've got time to take a look at those a number of them are probablfairly safe to land18:17
fungiand indeed, the borg backup to backup01.ord.rax.opendev.org started at 17:49z, about 15 minutes before i took the container down18:18
fungi#status log Restarted the services on etherpad.opendev.org in order to free up some disk space18:25
opendevstatusfungi: finished logging18:25
opendevreviewMerged opendev/system-config master: Fixup jinja-init image  https://review.opendev.org/c/opendev/system-config/+/86147318:41
clarkbsounds like gitea 1.18 rc0 will be out in a few days18:50
clarkbthat will include the vendor file indentification fix I wrote. I'll try ot get a patch up testing it once the tag exists18:50
fungioh, nice!18:51
opendevreviewMerged openstack/project-config master: Move grafyaml check and gate jobs in repo  https://review.opendev.org/c/openstack/project-config/+/86148220:10
opendevreviewMerged opendev/grafyaml master: Run pep8 and unittest jobs out of in repo config  https://review.opendev.org/c/opendev/grafyaml/+/86148320:22
opendevreviewMICHAEL KELLY proposed zuul/zuul-jobs master: helm: Add job for linting helm charts  https://review.opendev.org/c/zuul/zuul-jobs/+/86179921:26
opendevreviewClark Boylan proposed opendev/system-config master: Stop updating pip in our docker assemble script  https://review.opendev.org/c/opendev/system-config/+/86180021:35
*** dasm is now known as dasm|off22:15
clarkbhrm I think there is a chicken and egg in that change for the uwsgi image22:17
clarkbthe good news with that is it means we do actually test it. The bad news is I have to figure out how to unravel things22:17
opendevreviewMICHAEL KELLY proposed zuul/zuul-jobs master: helm: Add job for linting helm charts  https://review.opendev.org/c/zuul/zuul-jobs/+/86179922:18
clarkbhrm so it does seem that the uwsgi builds properly did not update pip but we've still got the same problem22:24
clarkbin this case with the uwsgi package instead of netifaces22:24
clarkbmaybe my reproduction case with netifaces locally was too trivial and there is something bigger happening22:27
clarkbhrm and this doens't fix nodepool either22:28
opendevreviewMICHAEL KELLY proposed zuul/zuul-jobs master: helm: Add job for linting helm charts  https://review.opendev.org/c/zuul/zuul-jobs/+/86179922:32
opendevreviewMICHAEL KELLY proposed zuul/zuul-jobs master: helm: Add job for linting helm charts  https://review.opendev.org/c/zuul/zuul-jobs/+/86179922:42
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Pin sphinx to 5.2.3  https://review.opendev.org/c/zuul/zuul-jobs/+/86158723:21
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Workaround stevedore/python3.7 issues  https://review.opendev.org/c/zuul/zuul-jobs/+/86169823:25
opendevreviewMerged opendev/system-config master: infra-prod-bootstrap-bridge: run directly on bridge  https://review.opendev.org/c/opendev/system-config/+/86113823:34
opendevreviewIan Wienand proposed opendev/system-config master: docs: Update force-merge docs for removing votes  https://review.opendev.org/c/opendev/system-config/+/86180223:41
opendevreviewMerged zuul/zuul-jobs master: Pin sphinx to 5.2.3  https://review.opendev.org/c/zuul/zuul-jobs/+/86158723:48
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Workaround stevedore/python3.7 issues  https://review.opendev.org/c/zuul/zuul-jobs/+/86169823:51
opendevreviewIan Wienand proposed zuul/zuul-jobs master: linter: Use capitals for names  https://review.opendev.org/c/zuul/zuul-jobs/+/85493323:51
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Fix ansible-lint name[template]  https://review.opendev.org/c/zuul/zuul-jobs/+/86155923:51
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Add names to include tasks  https://review.opendev.org/c/zuul/zuul-jobs/+/86156023:51
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Standarise block/when ordering  https://review.opendev.org/c/zuul/zuul-jobs/+/86156223:51
opendevreviewIan Wienand proposed zuul/zuul-jobs master: Update to ansible-lint 6.8.2  https://review.opendev.org/c/zuul/zuul-jobs/+/86156323:51
opendevreviewIan Wienand proposed zuul/zuul-jobs master: [wip] sphinx circular dependencies error  https://review.opendev.org/c/zuul/zuul-jobs/+/86158823:51

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!