Thursday, 2021-01-21

*** artom has quit IRC00:11
openstackgerritMerged opendev/system-config master: gitea backup: prune some large directories  https://review.opendev.org/c/opendev/system-config/+/77153400:22
openstackgerritMerged opendev/system-config master: borg-backup: fix logrotate name  https://review.opendev.org/c/opendev/system-config/+/77155700:22
*** akrpan-pure has quit IRC00:25
*** klonn has quit IRC00:26
clarkbhttps://nb02.opendev.org/centos-8-0000140776.log that looks promising (says the build succeeded)00:30
clarkbI've cleaned up those three dirs in dib_tmp as they contiued to not go away and the builder had moved on to its third image build00:47
clarkbnb02 is looking good so far, it now needs time to take load off of nb0100:48
clarkbianw: fungi I noticed that the afs servers are still in the emergency file when I was modifying it earlier for the nb0X servers.00:48
clarkbNot sure if they still need to be there?00:48
ianwclarkb: was just making sure that the new ansible was ok ... which it wasn't :)00:48
ianwbut i think it's good now.  ord has it's correct rules and i'm pretty confident it's all acting idempotently00:49
*** mlavalle has quit IRC00:49
auristorthere is something odd because "vos examine docs" doesn't show the entry locked for a release but there is clearly a volume transfer from afs01.dfw to afs01.ord in flight.00:52
clarkbnb02 has built two images now00:53
*** DSpider has quit IRC00:55
openstackgerritIan Wienand proposed opendev/system-config master: borg-bcakup: implement saving a stream, use for database backups  https://review.opendev.org/c/opendev/system-config/+/77173801:03
ianwauristor: yeah, i think i unlocked it, not realising the cron job to release it had just kicked off01:03
openstackgerritIan Wienand proposed opendev/system-config master: borg-backup: implement saving a stream, use for database backups  https://review.opendev.org/c/opendev/system-config/+/77173801:04
openstackgerritIan Wienand proposed opendev/system-config master: borg-backup: implement saving a stream, use for database backups  https://review.opendev.org/c/opendev/system-config/+/77173801:20
openstackgerritIan Wienand proposed opendev/system-config master: borg-backup: implement saving a stream, use for database backups  https://review.opendev.org/c/opendev/system-config/+/77173801:34
*** hamalq has quit IRC01:40
openstackgerritIan Wienand proposed opendev/system-config master: borg-backup: implement saving a stream, use for database backups  https://review.opendev.org/c/opendev/system-config/+/77173802:05
*** lbragstad_ is now known as lbragstad02:15
*** hemanth_n has joined #opendev02:18
ianwclarkb: https://4af1cb710e8a42609c94-0507f0d56ad72621551127593d6d8a94.ssl.cf5.rackcdn.com/771738/5/check/system-config-run-borg-backup/8d5f5e5/borg-backup-test01.opendev.org/borg-backup-borg-backup01.region.provider.opendev.org.log02:35
ianwclarkb: hrmmm, i think need to think about pruning with multiple archives02:35
ianwKeeping archive: borg-backup-test01-random-2021-01-21T02:27:13 Thu, 2021-01-21 02:27:14 [5f875153437165003c135a6c0b45e96f48fe7a9a876f669bcec94bb4d653b90c]02:35
ianwPruning archive: borg-backup-test01-2021-01-21T02:27:04 Thu, 2021-01-21 02:27:05 [3b2d8dcaa0f43db571febfe846f60c698de4a17d9397a41c401e5b3b1f2daca4] (1/1)02:35
ianwwith --daily, it deletes on of the mysqldump stream or file backup02:35
ianwor i guess we need separate prunes with prefixes maybe ...02:37
openstackgerritIan Wienand proposed opendev/system-config master: borg-backup: separate archives  https://review.opendev.org/c/opendev/system-config/+/77174803:28
openstackgerritIan Wienand proposed opendev/system-config master: borg-backup: separate archives  https://review.opendev.org/c/opendev/system-config/+/77174804:04
openstackgerritIan Wienand proposed opendev/system-config master: borg-backup: separate archives  https://review.opendev.org/c/opendev/system-config/+/77174804:33
*** ykarel has joined #opendev04:52
openstackgerritIan Wienand proposed opendev/system-config master: borg-backup: separate archives  https://review.opendev.org/c/opendev/system-config/+/77174805:39
*** whoami-rajat__ has joined #opendev05:56
*** ysandeep|away is now known as ysandeep06:06
*** marios has joined #opendev06:15
*** ykarel_ has joined #opendev06:16
*** ykarel has quit IRC06:19
openstackgerritRico Lin proposed openstack/project-config master: Add ubuntu bionic and focal with xxxlarge flavor  https://review.opendev.org/c/openstack/project-config/+/77156507:02
*** lpetrut has joined #opendev07:10
*** tkajinam_ has joined #opendev07:19
*** tkajinam has quit IRC07:20
*** ykarel_ is now known as ykarel07:27
*** eolivare has joined #opendev07:39
*** ralonsoh has joined #opendev07:41
*** jpena|off is now known as jpena07:52
*** sboyron has joined #opendev07:57
*** slaweq has joined #opendev07:58
*** DSpider has joined #opendev08:03
*** fressi has joined #opendev08:06
*** rpittau|afk is now known as rpittau08:11
*** sboyron has quit IRC08:14
*** tosky has joined #opendev08:16
*** hashar has joined #opendev08:23
*** sboyron has joined #opendev08:24
*** andrewbonney has joined #opendev08:24
openstackgerritAlfredo Moralejo proposed zuul/zuul-jobs master: Rename config repos file config for CentOS Stream  https://review.opendev.org/c/zuul/zuul-jobs/+/77081508:33
*** sboyron has quit IRC08:37
*** brinzhang has quit IRC08:55
*** brinzhang has joined #opendev08:55
*** brinzhang has quit IRC08:57
*** brinzhang has joined #opendev08:57
openstackgerritAlfredo Moralejo proposed zuul/zuul-jobs master: Rename config repos file config for CentOS Stream  https://review.opendev.org/c/zuul/zuul-jobs/+/77081509:00
*** jaicaa has quit IRC09:16
*** jaicaa has joined #opendev09:18
*** lpetrut_ has joined #opendev09:34
*** ykarel_ has joined #opendev09:34
*** tosky has quit IRC09:36
*** tosky_ has joined #opendev09:36
*** lpetrut has quit IRC09:37
*** ykarel has quit IRC09:37
*** ykarel_ is now known as ykarel09:39
*** tosky_ is now known as tosky09:43
*** zoharm has joined #opendev09:58
openstackgerritDaniel Lublin proposed opendev/git-review master: Allow choosing field for author in named branch  https://review.opendev.org/c/opendev/git-review/+/44457410:11
openstackgerritGuillaume Chauvel proposed opendev/system-config master: Increase autogenerated comment width to avoid line wrap  https://review.opendev.org/c/opendev/system-config/+/77144510:14
*** ysandeep is now known as ysandeep|afk10:14
*** sboyron has joined #opendev10:14
*** hashar is now known as hasharAway10:53
*** dtantsur|afk is now known as dtantsur10:58
openstackgerritGuillaume Chauvel proposed opendev/system-config master: Increase autogenerated comment width to avoid line wrap  https://review.opendev.org/c/opendev/system-config/+/77144511:33
openstackgerritGuillaume Chauvel proposed opendev/system-config master: [DNM] test comment width: review without autogenerated tag  https://review.opendev.org/c/opendev/system-config/+/77179811:33
*** ysandeep|afk is now known as ysandeep11:36
*** jpena is now known as jpena|lunch12:30
openstackgerritRico Lin proposed openstack/project-config master: Add ubuntu bionic and focal with xxxlarge flavor  https://review.opendev.org/c/openstack/project-config/+/77156512:56
*** hemanth_n has quit IRC13:02
*** tosky has quit IRC13:06
*** tosky has joined #opendev13:06
openstackgerritDmitry Tantsur proposed openstack/diskimage-builder master: Remove the deprecated ironic-agent element  https://review.opendev.org/c/openstack/diskimage-builder/+/77180813:07
*** jpena|lunch is now known as jpena13:29
*** owalsh has quit IRC13:48
*** owalsh has joined #opendev14:08
*** ykarel is now known as ykarel|mtg14:10
*** hemanth_n has joined #opendev14:21
kopecmartinclarkb: hi, when you have a moment, could you have a look at this please? https://review.opendev.org/c/opendev/system-config/+/705258 .. i got patchset 22 passing, i have troubles with https support, any advice?14:27
*** artom has joined #opendev14:28
*** klonn has joined #opendev14:29
mordredinfra-root: I've been getting some email bounces from mttest and buildx-test - which are both test vms I had spawned up a while ago. I'm going to delete them14:37
fungimordred: thanks! i kept meaning to ask you about whether we could delete mttest14:38
fungii hadn't noticed buildx-test but ++cleanup14:38
mordredfungi: in fact, I'm going to delete both mttest's and mttest-docker too14:40
mordredat least I name things consistently :)14:41
fungiawesome14:41
openstackgerritMerged openstack/project-config master: Add ubuntu bionic and focal with xxxlarge flavor  https://review.opendev.org/c/openstack/project-config/+/77156514:42
*** zul has joined #opendev14:42
mordredfungi: done14:42
*** zimmerry has quit IRC14:45
*** sshnaidm|ruck is now known as sshnaidm|afk14:49
fungithanks again!14:50
*** brinzhang has quit IRC15:01
*** brinzhang has joined #opendev15:01
*** brinzhang has quit IRC15:04
*** brinzhang has joined #opendev15:04
*** zimmerry has joined #opendev15:04
*** fressi has left #opendev15:07
*** ykarel|mtg is now known as ykarel15:08
*** hasharAway has quit IRC15:18
*** Eighth_Doctor has quit IRC15:19
*** mordred has quit IRC15:20
*** hemanth_n has quit IRC15:25
*** mordred has joined #opendev15:29
*** klonn has quit IRC15:33
*** lpetrut_ has quit IRC15:42
*** Eighth_Doctor has joined #opendev15:49
clarkbI've reenabled nb01's builder. Will remove it from the emergency file now15:50
*** mlavalle has joined #opendev15:51
clarkbthe disk use is much more balanced between the two servers now15:52
clarkbkopecmartin: I need to find breakfast, but will take a look after wards15:53
*** sshnaidm|afk is now known as sshnaidm|ruck15:56
*** hashar has joined #opendev15:59
clarkbkopecmartin: I think you need to add an apache2 reverse proxy to terminate 80 and 443. I'm not seeing that in the current role16:09
clarkbkopecmartin: our letsencrypt stuff has a test mode that will use a self signed cert which you can use16:09
clarkblooking for some examples from other services now16:09
clarkbkopecmartin: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/codesearch/templates/codesearch.vhost.j2 that is an apache config for the codesearch reverse proxy and https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/codesearch/tasks/main.yaml#L11-L37 is the ansible to manage the service16:11
openstackgerritMerged zuul/zuul-jobs master: Temporarily stop running Gentoo base role tests  https://review.opendev.org/c/zuul/zuul-jobs/+/77110516:11
kopecmartinclarkb: thanks, i was trying to analogically compare the code with other servers' code  but it takes a lot of time as this system-config stuff is new to me ..16:16
kopecmartinthank you for the links, i'm gonna check them16:16
clarkbkopecmartin: I would start just by getting apache running with a config and then worry about getting the letsencrypt testing set up next16:17
clarkbsince I think you should be able to mostly confirm that apache is running first on port 80 even if 443 doesn't work yet16:17
kopecmartinclarkb: hm, in patchset 22 the test_refstack_listening passed , didn't that confirm that apache is running?16:19
clarkbkopecmartin: no, that checked port 8000 which is the refstack python daemon service16:21
clarkbkopecmartin: we want apache2 to proxy port 443 to port 8000 and do the ssl termination in apache (it should also redirect port 80 to port 443)16:21
kopecmartinah, ok, makes sense16:23
clarkbkopecmartin: the codesearch apache config should do almost exactly that but for a different service that uses port 6080 instead of 800016:23
clarkbkopecmartin: feel free to point out when new patchsets show up and I can rereview too16:29
*** ysandeep is now known as ysandeep|away16:31
openstackgerritMerged opendev/git-review master: Fix bug in git_credentials()  https://review.opendev.org/c/opendev/git-review/+/75394616:34
*** lpetrut has joined #opendev16:42
*** klonn has joined #opendev16:42
*** marios is now known as marios|out16:44
*** lpetrut has quit IRC16:50
*** jpena is now known as jpena|off16:59
*** marios|out has quit IRC17:03
clarkbfungi: the tc meeting got me thinking about general capacity issues and while I looked last week to see if any clouds were just hard failing I didn't realize that inap was also still off17:04
clarkbI suppose we could try turning it on to see if the ip arp issues persist?17:04
fungiyeah, we've tried several times but worth trying again, and giving mgagne (or someone) a new list of affected addresses17:06
clarkblet me push that change up (along with a revrt)17:06
clarkbor maybe we already have a revrt to the disable?17:06
funginot sure17:07
fungii can probably go digging after the next couple of meetings wrap up17:07
openstackgerritClark Boylan proposed openstack/project-config master: Revert "Revert "Revert "Temporarily stop booting nodes in inap-mtl01"""  https://review.opendev.org/c/openstack/project-config/+/77185717:09
clarkbeasy enough to just make a change ^17:09
*** ykarel has quit IRC17:09
*** ykarel has joined #opendev17:11
artomIs /etc/nodepool/id_rsa(.pub) something that Zuul or another "base" component generates on the VMs running CI jobs17:13
artom?17:13
artomI'm tyring to track down where it's from...17:13
clarkbartom: I think our base jobs do that as a way to keep backward compatibility from forever ago when nodepool did it17:13
clarkblet me see if I can find it17:13
artomIt looks like TripleO-CI uses it (and creates if it's not already present)17:14
artomBut changing the way it's created in there didn't affect anything, so I'm assuming it comes from somewhere else17:14
artomMy "actual" issue is - it's generated in OpenSSH format, and read by some code that's using paramiko, which doesn't support the exact format17:15
clarkbartom: openstack/openstack-zuul-jobs/playbooks/legacy/pre.yaml17:15
artomclarkb++17:16
clarkbanother option would be to get off of the legacy base job stuff and just do it directly in the way you need it17:16
clarkbor modify the legacy stuff to do a ssh-keygen -m PEM conversion17:16
artomNot a choice I have, I depend on the tripleo job for what I'm tyring to do17:16
clarkbalso you can do that conversion at any time before you use it I suppose17:17
clarkbnew openssh reads the PEM format just fine17:17
clarkbeventually it might stop, but for now that seems fine17:17
*** ykarel has quit IRC17:17
fungii expect it will be *many years* before openssh starts to refuse to read pem formatted private keys17:18
fungithey waited 5+ years from when they added support for the new key format before they changed the default in ssh-keygen, after all17:19
artomclarkb, I don't support I'd be allowed to propose a change to the legacy playbooks that allow you to specify the format with a variable?17:19
artomI tried a conversion on my laptop, while paramiko read the key fine, it wasn't able to authenticate with the public key...17:19
clarkbI wouldn't make it optional I would just always write a pem version there17:20
clarkbsince it is forward compatbile and backward compatible unlike the new ssh format17:20
clarkb`ssh-keygen -p -m PEM -f ./$FILE -N '' -P ''` is the conversion process17:20
clarkband it shouldn't affect the public key at all17:21
clarkbthat command says change the keyfile's passphrase from empty '' to empty '' and in the process we can side effect a format change with -m PEM17:22
* artom tries again17:22
artom(Locally)17:22
*** hamalq has joined #opendev17:33
artomOK, I think the auth failures with the converted key are un-related... something about paramiko sending the wrong pubkey type...17:35
artomclarkb, so is https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/771858 what you had in mind, or am I meant to add an override for that in my own job only?17:35
clarkbare you on fedora 33?17:35
artomYeah17:35
clarkbfedora 33 has effectively broken all ssh-rsa due to sillyness17:36
clarkb(thats my personal opinion)17:36
clarkbgive me a mintue and I can try and summarize17:36
clarkbyes that change is roughly what I had in mind17:36
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: remove eselect python from gentoo element  https://review.opendev.org/c/openstack/diskimage-builder/+/77186117:36
artomclarkb, cool, thanks - I was worried about changing stuff this "low down" in the CI stack17:37
clarkbartom: SSH uses rsa for a number of things like authentication and verification of host keys (to avoid mitm attacks)17:37
clarkbI think to make things more manageable sizes pubkey information is exchanged in a hashed form. The old school hash is sha117:38
clarkbso when ssh does things called 'ssh-rsa' that means rsa + sha1 when hashing stuff17:38
clarkbbeacuse sha1 is no longer considered strong enough for security work openssh has deprecated ssh-rsa (note deprecated, not disabled) for host key verification and supports rsa-sha2-256 and rsa-sha2-512 as alternatives to continue verifying rsa host keys but with much stronger sha2 hashes17:39
clarkbfedora 33 has disabled all use of ssh-rsa including for the use of host key verification and authentication (note upstream hasn't even deprecated the authentication side yet, only hostkey verification as far as I can tell)17:39
*** lpetrut has joined #opendev17:40
clarkbthe authentication side is where things break beacuse in order to use rsa-sha2-256 and rsa-sha2-512 for authentication instead of the sha1 ssh-rsa both the client and the server must support kex extensions to specify server-sig-algs17:40
clarkbfedora 33 users have struggled with using rsa keys to talk to our gerrit server beacuse the java sshd there does not support server-sig-algs extension and openssh falls back to ssh-rsa in that case which is disabled17:41
*** diablo_rojo has quit IRC17:41
clarkbthe rfc says that clients can one day default to a sha2 variant in this case, but we haven't reached that point yet I guess17:42
artomAh, so that's why you grok this so well - Gerrit SSH auth :)17:42
*** rpittau is now known as rpittau|afk17:42
clarkbya :)17:42
artomThanks for the crash course, very concise and understandable17:42
openstackgerritMerged openstack/project-config master: Revert "Revert "Revert "Temporarily stop booting nodes in inap-mtl01"""  https://review.opendev.org/c/openstack/project-config/+/77185717:42
clarkbartom: if you do ssh -v you can see the server-sig-algs get negotiated by the server if supported17:42
clarkbwhcih can help narrow down where the problem is.17:43
clarkbOr if you just want to make things work you can reenable ssh-rsa via ssh config changes on fedora33 or you can use ecdsa or ed2511917:43
artomI can auth to Gerrit just fine, btw :)17:43
clarkbI think we're suggesting that people should use ecdsa or ed25119 totalk to our gerrit from fedora 33 ratehr than reduce the distro's security stance17:44
artomThis is a CI job thing... Tempest plugin Python code using paramiko SSHing into the nodeset VMs17:44
clarkbthat said, the reason I call it silly is I think fedora 33 should've updated openssh to fallback to rsa-sha2-512 instead of ssh-rsa since they have disabled ssh-rsa17:44
clarkbtheir users won't be any worse off if the server doesn't negotiate rsa-sha2-512, it will fail if the server can't support it. But if the server can support it it should work17:45
clarkbartom: ya in that case my bet is paramiko either doesn't know how to do rsa-sha2-* or paramiko isn't handling server-sig-algs properly17:45
clarkbunless this is a cirros vm. maybe dropbear doesn't do kex extensions like gerrit17:46
clarkbI could see that being the case since dropbear sshd is super tiny17:46
artomI think the latter? Because with the converted key, SSHing with paramiko to localhost fails17:46
clarkbartom: if paramiko has the equivalent of ssh -vvv turning that on and reviewing the log would be helpful. I can probably skim it too since I spent a bunch of time doing that recently17:47
artomWith the server complaining about "userauth_pubkey: key type ssh-rsa not in PubkeyAcceptedKeyTypes [preauth]"17:47
artomAnd setting PubkeyAcceptedKeyTypes in sshd.conf doesn't seem to affect anything17:47
clarkbartom: oh ya localhost will fail if it is fedora 33 since it doesn't want to do ssh-rsa17:47
clarkbmodifying PubkeyAcceptedKeyTypes should work but its apparently more complicated than people thought17:48
artomSo how come ssh -i <that same key> works?17:48
clarkbbecause openssh `ssh` can tell the server it will do rsa-sha2-* with that same key17:48
clarkbthe on disk format doesn't affect the hash used, just the client and servers supported protocols17:48
clarkbthe hash is a hash of the stuff on disk and is calculated at runtime based on what they negotiate between themselves17:49
artomAh, so stacking issues here...17:49
clarkb(the pem format things makes this confusing because there is a separate file format issue but that is independent)17:49
artom1. paramiko doesn't understand the new openssh key format17:49
artomAnd 2. It can't "match protocols" (sorry for the vulgar oversimplification)17:50
clarkbthat is a good simplification17:50
clarkb(un)fortunately I have yet to debug thsi in the context of paramiko as the client so don't have any great pointers to a fix for paramiko17:51
clarkbbut hopefully the general problem description simplifies your debugging17:51
artomI helps me understand the situation better17:51
artomFor debugging I'm just re-running my job with a Depends-on: https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/771858 :)17:52
*** ralonsoh has quit IRC17:52
*** lpetrut has quit IRC17:55
clarkbfungi: inap is in use in nodepool now, but no successful boots yet. Going to check logs momentarily18:06
fungifun18:08
clarkbif I had to guess server launches timed out due to hypervisor image caches being stale, but not finding any evidence of that yet (or of any failures, need to dig more)18:09
clarkbya "openstack.exceptions.ResourceTimeout: Timeout waiting for the server to come up." appears to be the error which has been in the past due to stale images18:11
clarkbI guess give it another hour and check back in18:11
clarkbya the in use number is non zero now18:16
clarkbsmall, but trending the right way18:16
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: remove eselect python from gentoo element  https://review.opendev.org/c/openstack/diskimage-builder/+/77186118:16
clarkbI have rechecked 775121 (the xenial openafs ppa cleanup in openafs-client role) it failed in the gate beacuse a couple of jobs failed to add the ppa due to a timeout talking to gpg keyservers18:25
clarkb*77152118:25
*** dtantsur is now known as dtantsur|afk18:32
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: DNM: simplify updating python versions in gentoo  https://review.opendev.org/c/openstack/diskimage-builder/+/77186118:36
clarkbanyone understand why a bunch of zuul changes are in the openstack zuul check queue?18:54
clarkbthey aren't running any jobs and seem to just be hanging out18:54
clarkball the queue lenght values are 018:55
*** andrewbonney has quit IRC18:55
*** _mlavalle_1 has joined #opendev19:00
fungii don't see them19:01
fungiwas it maybe momentary?19:01
*** mlavalle has quit IRC19:03
openstackgerritMartin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker  https://review.opendev.org/c/opendev/system-config/+/70525819:08
*** ShadowJonathan has quit IRC19:27
*** bbezak has quit IRC19:28
*** bbezak has joined #opendev19:28
*** ShadowJonathan has joined #opendev19:28
*** klonn has quit IRC19:33
clarkbfungi: ya I did a manual refresh and they have gone away19:51
*** slaweq has quit IRC19:55
*** zoharm has quit IRC20:00
*** hashar has quit IRC20:00
clarkbinap seems to have stabilized from a node launching perspective20:02
clarkbnow we just need to keep an eye out for the ip issues20:02
*** sboyron has quit IRC20:09
fungilast few times we tried it started to show up fairly quickly20:11
fungii think maybe i used the e-r signature for changed host keys to spot it20:11
clarkbhttps://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_874/705258/25/check/system-config-run-refstack/8742657/job-output.txt is a post run failure in inap20:12
clarkblooking at the log I don't see a clear reason for why it failed though20:13
clarkbit says ok ok, skipping skipping, then not much info and it reports failed: 1 for each host20:13
clarkbthe job failed properly too though20:13
clarkbso maybe a side effect from the runtime failure?20:13
*** tosky has quit IRC20:14
*** tosky has joined #opendev20:15
openstackgerritMartin Kopec proposed opendev/system-config master: WIP Deploy refstack with ansible docker  https://review.opendev.org/c/opendev/system-config/+/70525820:16
*** eolivare has quit IRC20:17
openstackgerritGhanshyam proposed opendev/subunit2sql master: Fix compatibility with latest oslo.config  https://review.opendev.org/c/opendev/subunit2sql/+/76483220:30
*** klonn has joined #opendev20:34
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: DNM: simplify updating python versions in gentoo  https://review.opendev.org/c/openstack/diskimage-builder/+/77186120:45
*** klonn has quit IRC20:50
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: simplify updating python versions in gentoo  https://review.opendev.org/c/openstack/diskimage-builder/+/77186120:54
*** whoami-rajat__ has quit IRC21:04
ianwclarkb: if you have a little time, love to get your review on https://review.opendev.org/c/opendev/system-config/+/771738 and 77174821:07
ianwthe gist is individual hosts can add a file in a directory that streams output that the backup script then looks for and runs, and puts in a separate archive21:08
ianwfungi: i think we're back to 100% on afs ... even docs is running fine right?  no other known issues ATM?21:08
ianwclarkb: i can probably squash 771748 into it as well if we like; that just makes sure we prune the archives separately (otherwise you only keep one archive per day, which fails when you have a filesystem archive AND db archive)21:09
fungiianw: yep, i think we're all clear on afs today so can think about distro upgrades for the servers21:12
clarkbianw: at first glance squashing those makes sense. I'll have to do proper review of the setup in the parent though. One concern is that complicates getting complete backups setup for new hosts/services, but maybe we just do our best to document that21:20
ianwclarkb: yeah, i should also add a documentation.  i think it's likely new services liberally copy-paste things like the db backup bits so that might help21:21
clarkbianw: that might be a good one to get fungi and corvus to weigh in on too? just to make sure we're comfortable with split backups like that21:22
corvushioh21:22
ianwcorvus: https://review.opendev.org/c/opendev/system-config/+/771738 and 771748 are the question21:22
ianwnote they're not *totally* split; like it's still all in the one directory on the backup server, just different archive entries21:23
clarkbthey are split on the front end though s you have to configure them separately21:23
clarkbwhich is I think my biggest concern21:24
clarkb(it adds an extra step to getting proper backups)21:24
corvuswhy is the prune separate?  is it because you can't have the db stream and the filesystem in the same archive?21:24
clarkbcorvus: that is my understanding of it. They are separate archives so separate pruns21:25
clarkbif you try to prune together with the shared prefix it prunes things improperly21:25
mordredianw: you may want to add --skip-extended-insert21:25
ianwmordred: yeah, i was thinking you might have some ideas on the most effective way to dump :)21:26
mordredotherwise you get giant insert lines that might be bad or differential? or maybe it's ok if borg is smarter than diff21:26
*** owalsh has quit IRC21:26
ianwthere's also a --compact21:26
mordredyah - I think you want the opposite of that21:26
clarkbthinking out loud this will take us from 1 db dump a day to 321:27
mordredassumign a diff-like behavior, you'd want the largest number of lines21:27
clarkbany concern with that?21:27
ianwand i saw something that mentioned you should order by primary key to try and keep it stable21:27
fungipresumably if we wanted database dumps in the same archive the solution would be to back them up from the filesystem, the attempt here is to be more space efficient on the archive end?21:27
corvussake of argument: what if we didn't compress the mysqldumps (and had fewer of them; like... maybe one); would that make borg happy and we could have one archive?21:27
clarkbcorvus: yes, but some services db backups are too big for that to fit on their disk iirc21:27
ianwcorvus: on etherpad, even having one uncompressed dump is getting tight21:27
clarkbwe'd need to figure that out to make it work that way21:28
mordredianw: innodb tables are naturally sorted by primary key21:28
mordredso it should dump in that order regardless21:28
corvusclarkb: why 3/day?21:28
clarkbcorvus: 1 for the loca dump, then once a day to each of our different borg targets21:28
ianwcorvus: we backup to 2 servers, and i've left the local compressed jobs too21:28
fungimysqldump to the fs, and separately to both backup sites21:28
fungi= 321:29
clarkbwe could make the stream script do a zcat of the local dump instead of dumping during the backup?21:29
clarkbthen we'd be back down to 1 dump a day21:29
ianwclarkb: yep, although i guess that would need some locking to be 100% sure we never sent something corrupt21:30
clarkbhrm ya21:31
mordredianw: yeah - I think --skip-extended-insert should be all you need for good differential backup storage. oh - also - add --skip-dump-date21:31
corvusit's worth considering the (lack of) atomicity between the db and fs.  that's true today anyway (we dump the db to the fs, then back that up, so the host's filesystem is always ahead of the db in a backup).  i don't think this substantially changes anything related to that (other than we have a few more timestamps involved)21:31
mordred(no need to put a "this was dumped on $date" line into the dump)21:31
corvusmordred: actually, re my last line, that might be handy? :)21:32
mordredoh - yeah21:32
corvusunless we think it's going to explode the backup size -- i'm not sure what the de-dup window is like with borg?21:32
mordredand it's not that much data that would be diffed each time or anything21:32
corvusright, assuming dedup is sane, i think it would be an asset to keep it.21:33
ianwcorvus: it's configurable, about 2mb by default i think21:33
corvusoh21:33
corvusmordred: does the datestamp come at the start?21:33
ianwso on a test yesterday of the etherpad db, basically back-to-back updates were a diff of ~250mb21:33
corvusoh, it looks like it's at the end of the file!  which is the most likely to be different anyway, so i think we're good to keep dump-date21:33
ianw(this is better than the gzip'd on-disk, which are 5gb with nothing to de-dup)21:33
ianwi can try with a few of these options and see if we get better21:34
fungifor the etherpad example, the skew between db and fs is likely unimportant (except maybe around upgrades with a schema transition or something) as all the data is in the db. for a service like mediawiki which has a db and associated user-submitted files (images, et cetera) outside the db it could be somewhat more relevant21:34
corvusfungi: and gerrit21:34
ianwbut even so, keeping 7*250mb + 12*250 isn't too much of an overhead21:34
clarkbcorvus: fungi ya but we've always had that gap I think21:35
clarkbI guess removing as much gap as possible is a nice improvement though21:35
fungiyeah, gerrit is a better example than mediawiki, but we have very little of importance in mysql there now21:35
clarkbin fact we could stop backing up the mysql db and we in theory only lose what files people have previously reviewed21:35
corvusprobably don't even need to back it up?21:35
clarkbmy last statement was for gerrit specifically21:35
corvusyep21:35
mordredrandom thought - for the services where we're running container + db - should we add a table to the db to record the container sha we're runnign with? that way a db dump would also carry which version fo the container it was created from?21:36
clarkbor add a docker ps -a and docker image list to the stream?21:37
clarkb(I assume we can sneak that in as a comment)21:38
ianwmordred: so you think "--opt" ON bu tthen "--skip-extended-insert"?21:38
clarkbthats probably the osrt of thing to figure out once this is running and we are happy iwth it tough21:38
mordredclarkb: ++21:38
mordredianw: yah. although --opt defaults to on - so you don't really need it anymore21:38
ianwoh --opt "This option, enabled by default"21:38
ianwyeah, jinx21:39
mordredit doesn't hurt - you can leave it in21:39
ianwi'm running a baseline with --skip-extended-insert now, and then will run basically a zero delta and see how it goes.  208.56 MB is the number to beat :)21:40
ianwoh and for reference the old dump was 15.92 GB21:41
fungii feel like i should be deciding whether to place my chips on red or black21:41
corvuscome on!  big data!  big data!21:43
*** owalsh has joined #opendev21:44
ianwThis archive:               17.33 GB              4.24 GB              4.24 GB22:00
ianwinteresting, slightly bigger but compresses smaller22:00
*** DSpider has quit IRC22:14
*** owalsh has quit IRC22:16
ianwhis archive:               17.33 GB              4.24 GB             22.33 MB22:17
ianwok, mordred wins, 22mb v 250mb for a zero-ish delta22:17
ianwand it compresses smaller too22:17
mordred\o/22:17
fungian order of magnitude? wow!22:18
*** owalsh has joined #opendev22:25
JayFI always learn the most interesting tricks watching you all work in here :)22:26
*** d34dh0r53 has quit IRC22:37
*** d34dh0r53 has joined #opendev22:48
*** brinzhang has quit IRC23:00
openstackgerritClark Boylan proposed opendev/system-config master: Set stop_grace_period on nodepool-builder containers  https://review.opendev.org/c/opendev/system-config/+/77189923:14
clarkbcorvus: ianw ^ maybe we just try that23:14
corvusclarkb: think that's long enough?23:16
clarkbcorvus: it probably depends on how full those dirs are23:17
clarkbcorvus: cleaning up multiples of them yesterday definitely took longer than 90s, but when I did a single one it wsa reasonably quick23:17
clarkbianw: ^ may have a better sense for timing on that though23:17
JayFeiddccidrhjiviiikiukfijbhebltdeunkckugcfveru23:19
JayFwhoops, sorry23:19
ianwclarkb: yeah, if it's not done by 90s it probably isn't going to get done :) ...23:20
*** brinzhang has joined #opendev23:34
mordredJayF: I couldn't agree more23:42

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!