Monday, 2020-07-13

*** ryohayakawa has joined #opendev00:04
openstackgerritIan Wienand proposed opendev/system-config master: Add host keys to inventory; give host key in launch-node script  https://review.opendev.org/73941201:04
openstackgerritIan Wienand proposed opendev/system-config master: Add host keys on bridge  https://review.opendev.org/73941401:04
openstackgerritIan Wienand proposed opendev/system-config master: Setup gate inventory in /etc/ansible on bridge  https://review.opendev.org/74060501:04
openstackgerritIan Wienand proposed opendev/system-config master: Setup gate inventory in /etc/ansible on bridge  https://review.opendev.org/74060501:30
openstackgerritIan Wienand proposed opendev/system-config master: Add host keys to inventory; give host key in launch-node script  https://review.opendev.org/73941201:30
openstackgerritIan Wienand proposed opendev/system-config master: Add host keys on bridge  https://review.opendev.org/73941401:30
*** sgw1 has quit IRC02:00
openstackgerritMerged openstack/diskimage-builder master: Switch from unittest2 compat methods to Python 3.x methods  https://review.opendev.org/73964502:21
openstackgerritAndrii Ostapenko proposed zuul/zuul-jobs master: Add ability to use upload-docker-image in periodic jobs  https://review.opendev.org/74056002:25
*** sgw1 has joined #opendev02:29
*** sgw1 has quit IRC02:48
*** weshay_ruck is now known as weshay_pto03:04
*** sgw1 has joined #opendev03:19
openstackgerritwu.chunyang proposed openstack/diskimage-builder master: remove py35 in "V" cycle  https://review.opendev.org/74060703:31
openstackgerritIan Wienand proposed opendev/system-config master: Copy generated inventory to bridge logs  https://review.opendev.org/74060503:41
openstackgerritIan Wienand proposed opendev/system-config master: Add host keys to inventory; give host key in launch-node script  https://review.opendev.org/73941203:41
openstackgerritIan Wienand proposed opendev/system-config master: Add host keys on bridge  https://review.opendev.org/73941403:41
*** DSpider has joined #opendev03:48
*** sgw1 has quit IRC03:59
openstackgerritIan Wienand proposed opendev/system-config master: Add host keys on bridge  https://review.opendev.org/73941404:07
openstackgerritIan Wienand proposed opendev/system-config master: testinfra: silence yaml.load() warnings  https://review.opendev.org/74060804:10
*** raukadah is now known as chandankumar04:20
*** sgw1 has joined #opendev04:23
openstackgerritIan Wienand proposed opendev/system-config master: Fix junit error, add HTML report  https://review.opendev.org/74060904:29
*** sgw1 has quit IRC04:32
*** sgw1 has joined #opendev04:33
*** bhagyashris|away is now known as bhagyashris04:41
openstackgerritIan Wienand proposed opendev/system-config master: Fix junit error, add HTML report  https://review.opendev.org/74060905:05
*** cloudnull has quit IRC05:23
openstackgerritIan Wienand proposed opendev/system-config master: Fix junit error, add HTML report  https://review.opendev.org/74060905:25
*** marios has joined #opendev05:35
*** fressi has joined #opendev05:36
*** cloudnull has joined #opendev05:48
*** ysandeep|away is now known as ysandeep05:50
*** ysandeep is now known as ysandeep|afk05:51
ianwinfra-root: https://review.opendev.org/#/q/topic:host-keys+(status:open+OR+status:merged) is a little stack to add host keys to our inventory, and automatically deploy them on bridge, and do a little cleanup06:14
*** halali_ has quit IRC06:15
ianwfungi: i haven't done a full debug, but 740609 failed in system-config-run-lists which looks unrelated -- https://zuul.opendev.org/t/openstack/build/a80165194c7f4f42a44477642a304c3106:29
ianwfungi: Error: Execution of '/usr/sbin/newlist mailman nobody@openstack.org notarealpassword' returned 1: Create a new, unpopulated mailing list. ... i wonder if the job is not happy?06:31
*** halali_ has joined #opendev06:33
*** tosky has joined #opendev06:39
*** ysandeep|afk is now known as ysandeep|rover07:53
*** moppy has quit IRC08:01
*** moppy has joined #opendev08:01
*** ysandeep|rover is now known as ysandeep|lunch08:16
*** dtantsur|afk is now known as dtantsur08:34
*** ysandeep|lunch is now known as ysandeep|rover08:55
*** sshnaidm|afk is now known as sshnaidm09:09
openstackgerritIury Gregory Melo Ferreira proposed openstack/diskimage-builder master: Update ipa jobs  https://review.opendev.org/74064209:27
fricklerinfra-root: mirror01.london.linaro-london.openstack.org seems to still be running and trying to send mails, failing for lack of DNS records, any way to get that shut down? that region seems no longer to be in our clouds.yaml either09:29
*** halali_ has quit IRC09:43
*** zbr has joined #opendev10:34
*** finucannot is now known as stephenfin10:57
*** halali_ has joined #opendev11:00
*** bhagyashris is now known as bhagyashris|afk11:33
*** tkajinam has quit IRC11:37
*** rh-jelabarre has joined #opendev12:12
*** rh-jelabarre has quit IRC12:12
*** rh-jelabarre has joined #opendev12:12
*** ryohayakawa has quit IRC12:29
*** osmanlicilegi has quit IRC12:33
*** bhagyashris|afk is now known as bhagyashris12:35
*** osmanlicilegi has joined #opendev12:44
*** zbr|ruck has quit IRC13:27
*** zbr|ruck has joined #opendev13:28
*** noonedeadpunk has quit IRC13:31
*** noonedeadpunk has joined #opendev13:33
*** bhagyashris is now known as bhagyashris|afk13:56
*** dviroel has joined #opendev14:05
*** ysandeep|rover is now known as ysandeep|away14:32
fungiianw: looks like it complained about "illegal list name: <foo>@ lists" for every <foo> it tried14:33
fungii wonder if this is a behavior change with newer mailman14:33
fungiianw: though it's ubuntu xenial so that seems unlikely14:35
fungimaybe we changed something related to name resolution on the nodes?14:36
fungifrickler: git history for system-config says that server's last known ip address is 213.146.141.3714:39
fungiand i can still ssh in, i could just locally initiate a poweroff for it14:40
fungibut yeah, deleting will most likely require excavating the old api credentials from the private hostvars git history, assuming the api is still reachable14:41
clarkbhrw may be able to rm that mirror node too if the api is not accessible14:41
*** mlavalle has joined #opendev14:55
openstackgerritThierry Carrez proposed openstack/project-config master: maintain-github-mirror: add requests dependency  https://review.opendev.org/74071115:10
fungiinfra-root: rackspace says it had to reboot the hypervsior host for ze05 a few hours ago... i'll check it over15:13
fungioh, nope zm0515:13
fungi#status log zm05 rebooted by provider at 12:02 utc due to hypervisor host problem, provider trouble ticket 200713-ord-000036715:14
openstackstatusfungi: finished logging15:14
openstackgerritClark Boylan proposed opendev/system-config master: Update to gitea v1.12.2  https://review.opendev.org/74071615:34
clarkbinfra-root catching up on some things that happened last week and I was most curious about the bup git indexes. Did we end up rm'ing them and everything was fine afterwards? if so should we do the same to review.o.o?15:40
fungithe zuul-merger process on zm05 seems to be running and getting used, and no obvious errors in its logs, so i'll close out the rax trouble ticket15:40
fungigrafana says we're down one merger though, so seeing if i can work out which one is out to lunch15:41
fungiaha, nevermind. it's ze01, so known15:42
clarkbthough we landed the change to vendor geard so we can probably turn it back on again?15:42
clarkbit being ze01's executor15:42
fungiyeah, i think we just didn't want to do it while nobody was around15:42
clarkbI'm around if we want to do that nowish15:43
clarkbmostly just digging through scrollback and emails to ensure I'm not missing anything important15:43
corvusclarkb: yes i rm'd it; let me see if it looks like everything is fine15:43
corvushrm, last entry on the remote side is wed jul 8, so i think everything is not fine15:45
clarkbI wonder if we need to create a new remote backup target if we reset the local indexes15:46
corvusthere are 2 bup processes currently running on zuul0115:47
corvusjul 9 and 1015:47
corvusi wonder if one of them is stuck due to the disk being full before15:47
corvushow about i kill them (in reverse order) and see if the next one runs okay?15:48
clarkbsounds good15:48
fungithough status log says you removed /root/.bup from zuul01 2020-07-08 16:14:11 utc15:50
fungiso if the oldest running backup started on 2020-07-09 that would be after the cleanup15:50
corvuswell, hrm.15:51
corvusi dunno then.15:52
corvusthe remote side is getting pretty full.15:52
corvusmaybe we should just go ahead and do a rotation there anyway.15:52
*** marios is now known as marios|out15:54
*** mlavalle has quit IRC15:57
*** mlavalle has joined #opendev16:05
fungiyeah, looks like we didn't zero the root reserved allocation for the current volume16:06
fungioh, nevermind i'm looking at the older volume16:06
fungiso yes, we're below half a percent free there16:07
fungiand we last rotated it a little over a year ago, judging from the volume name16:08
fungiwould we also keep the volume currently mounted at /opt/backups-201711 or just blow it away and swap them?16:09
clarkbrotating like that would simplify things, rather than making a new volume16:12
*** marios|out has quit IRC16:13
corvusi'd advocate blowing away 2017 and using its pvs to make 202016:18
*** fressi has left #opendev16:19
*** diablo_rojo__ has joined #opendev16:21
fungii'm in favor of that plan. i have to assume nothing writes to /opt/backups-201711 currently anyway16:25
clarkbprocess for that is something like remount current  backups to some new path, clear oldest backups, remount oldest backups fs to current backups?16:25
clarkbfungi: ya we've basically done a rotation to keep oldest set around16:25
fungii'm happy to work on that unless someone else already is16:26
clarkbwe may want to double check with ianw as ianw has done some backup stuff in the past but I think in this case repurposing space for oldest backups is safe16:26
clarkband no I'm not working on it16:26
*** diablo_rojo__ is now known as diablo_rojo16:26
fungibut yeah, the only thing i need to figure out is what's currently telling it to write into /opt/backups-201903 (a symlink?) and whether we need to bup init all the trees and chown stuff on the new fs16:27
fungiaha, yep. symlink16:28
fungi/opt/backups -> backups-20190316:28
corvushttps://docs.opendev.org/opendev/system-config/latest/sysadmin.html#rotating-backup-storage16:30
fungii guess we should make fresh homedirs for each of the users (copying from /etc/skel?), carry over their .ssh/authorized_keys files and then bup init as each of them to create an empty ~/.bup16:30
fungiaha, we have docs, right! ;)16:30
fungi(why do i always assume stuff like this isn't documented?)16:31
corvusthat looks fairly complete :)16:31
fungiindeed it does, thanks16:31
corvusit has a step of run a test backup manuall16:31
corvusy16:31
corvuswe may want to do 2 of those16:31
fungisure16:31
corvusa "normal" server and zuul01, in case zuul01 is still somehow broken16:31
fungiso the remaining question, i know ianw wanted to build a new bup server, anyone happen to know the state of that?16:32
corvusi don't, but i feel like we can/should consider that a second server16:32
fungii recall he was talking about doing the next rotation to a replacement server, but yeah, having redundant backups again would be good16:32
corvusit's time to rotate the volumes on the primary server anyway, so i vote we just do that for now16:32
clarkbsystem-config-run-backup exists and you can probably go from there to figure out general state but I agree with corvus that can be a second server rather than a replacement16:33
fungiokay, cool, i'll get started on that here momentarily16:40
fungiinfra-root: any objections to ditching the main/backups volume (currently mounted at /opt/backups-201711) on backup01.ord.rax.ci.openstack.org and repurposing it for future backups so the currently near-full backup volume can be rotated out?16:42
corvusfungi: no objections (you knew that already, but ftr)16:44
fungithanks16:46
clarkbya I think that is fine16:47
clarkbcorvus: can you think of any reason to not start ze01's executor again? the vendored gear code in the ansible role should address the last known problem with it right?16:57
clarkbI'll go ahead and do that if that is the undersatnding16:57
corvusclarkb: ++; i need to take care of some errands; should be back in a bit.  feel free to start if you have time, or i can later17:03
clarkbya I can do it17:03
clarkbI've got the errands this afternoon (school district is doing q&a on restarting schools in the fall at 2pm) so trying to be useful now17:04
clarkband done. I'll keep an eye on it17:04
clarkb#status log Restarted zuul-executor container on ze01 now that we vendor gear in the logstash job submission role.17:05
openstackstatusclarkb: finished logging17:05
clarkbinfra-root I'll plan to land https://review.opendev.org/737885 once I'm satisfied ze01 can be left alone. And https://review.opendev.org/740716 would be good to land too. Both are gitea improvements/upgrades17:06
*** dtantsur is now known as dtantsur|afk17:15
openstackgerritAndrii Ostapenko proposed zuul/zuul-jobs master: Add ability to use upload-docker-image in periodic jobs  https://review.opendev.org/74056017:22
clarkbze01 seems happy. I've now identified a tox job I'm follwogin specifically17:34
clarkbwill use that a canary17:34
clarkb2020-07-13 17:37:40,225 DEBUG zuul.AnsibleJob.output: [e: abb23df574fd4ababf35797c0dcbcae3] [build: ff644ee4f74b4e8596416af21bd31757] Ansible output: b"An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ModuleNotFoundError: No module named 'gear'"17:38
clarkbseems the vendoring hasn't fully worked (and we wouldn't haven oticed until ze01 was turned on?)17:38
corvusi just saw that :/17:38
corvusi'm guessing it has to do with the cwd of the python process when it loads the module17:39
clarkbdo we need a __init__.py in the library/ dir to make it a valid python module?17:39
clarkboh that could be too17:39
clarkbcorvus: we vendor toml things somewhere or am I imagining that? I wonder if we can replicate how that is done17:40
clarkbI seem torecall something related to serialization like that anyway17:40
corvusya, i'll look into it17:40
fungiso stopping ze01 now?17:40
corvusclarkb: meanwhile, we could graceful ze01, or leave it running if we don't mind missing a few logstashes17:41
clarkbI don't mind missing that data personally17:41
clarkbe-r says we're way behind right now anyway17:41
corvusokay, let's give me a few mins to see if there's a quick fix17:41
clarkband leaving it up will make it easier to test the fix17:41
fungiyeah, i expect the worst side effect is users getting confused by the failed tasks in their successful jobs17:41
corvusfungi: they'll never see it17:41
corvusthis is strictly post-log-upload17:42
fungioh, right, it's after log collection17:42
corvus(you'd have to watch the streamer in real-time)17:42
fungi(NECESSARILY after log collection, since we're processing collected logs)17:42
corvusclarkb: the special ansible.module_utils thing is what you're thinking of that we use with toml17:42
corvusi'm going to take a few mins to set up a repro env locally so we don't burn all day on this :)17:44
clarkbI'm guessing if I look at logstash worker logs we'll find some new giant log files that are causing problems with indexing (and that is why we are behind)17:44
clarkbcorvus: ++17:44
fungithis is the tool which grew out of pip's vendoring approach: https://pypi.org/project/vendoring/17:44
*** qchris has quit IRC18:08
fungi#status log old volume and volume group for main/backups unmounted, deactivated and deleted on backup01.ord.rax.ci.openstack.org18:16
openstackstatusfungi: finished logging18:16
fungii suppose we want to continue to keep the pvs tied to separate vgs instead of extending a vg across all of them?18:17
fungiwe can tie specific pvs to specific lvs either way, but i guess we can revisit how these are organized when we build the new server18:18
clarkbseparating them seems good if we want to have more than one failure domain?18:18
fungiit's irrelevant as far as that's concerned. vgs don't have any notion of consistency anyway unless you use mirroring18:19
clarkbbut if we mixed vgs across pvs losing a pv would lose both vgs?18:20
fungii was suggesting we think about putting a single volume group across the physical volumes, you can still tell it which physical volumes should contain the blocks for which logical volumes18:21
fungiit's really mostly namespacing18:21
clarkbgotcha18:21
fungianyway, irrelevant for the moment, i've already created the new vg across the old repurposed pvs18:22
fungimain-20200718:22
*** qchris has joined #opendev18:22
fungiokay, we've got 3.0T free on /opt/backups-20200718:27
fungii need to take a break to do some dinner prep, and will then tackle the rest of the cutover18:28
clarkbthanks!18:28
openstackgerritJames E. Blair proposed opendev/base-jobs master: Really vendor gear for log processing  https://review.opendev.org/74074418:40
corvusclarkb: ^ i think that should do it (it at least gets past import errors in my local testing)18:41
clarkbgotcha so there is an ansible method for doing that lgtm18:42
fungiall, module_utils is a special namespace i guess?19:43
fungiansible magic19:43
corvusyep19:51
fungimordred: not sure if you're around today, but i'm catching up on old stuff in my "i'll look at this later" pile, and a few weeks ago rax alerted us that "MySQL Replica on testmt-01-replica-2017-06-19-07-19-15-replica-2020-06-27-15-27-00 is CRITICAL"19:51
fungii assume this is some old trove replication test we no longer care about but have forgotten was set up, and we should delete it?19:52
clarkbI've approved https://review.opendev.org/#/c/737885/7 to paginate more gitea requests for project management19:59
clarkbit should be pretty well tested at this point but definitely say something if you notice it operating oddly19:59
fungicorvus: you killed the bup processes on zuul01, right? just making sure i'm not overlooking them20:00
openstackgerritMerged opendev/base-jobs master: Really vendor gear for log processing  https://review.opendev.org/74074420:01
clarkbfungi: corvus we should consider rm'ing review.o.o's bup indexes and restarting them on the new volume if zuul01 shows it is happy that way20:02
clarkbit is quite large there as well20:02
fungii expect so, yes20:03
fungiwhen we were looking into it, sounded like it would just maybe reduce performance of the next bup run since there would be no cache20:04
openstackgerritMerged zuul/zuul-jobs master: Strip path from default ensure_pip_from_upstream_interpreters  https://review.opendev.org/74050520:04
fungibut if the next run is also a full backup, then it's probably irrelevant anyway20:04
clarkbwe didn't rename any projects last week right?20:10
* clarkb is putting together our agenda for tomorrow20:11
fungiwe did not, no20:16
fungiat least we didn't take any downtime for it20:16
clarkbthanks for confirming20:16
corvusfungi: yes i killed the bups20:21
fungithanks for confirming20:22
fungii'll proceed with stopping sshd, switching mounts around, setting the old volume read-only and priming the new homedir copies20:23
*** shtepanie has joined #opendev20:43
openstackgerritMerged opendev/system-config master: Paginate all the gitea get requests  https://review.opendev.org/73788520:53
*** DSpider has quit IRC21:16
fungiwhile prepping homedirs for the new backups volume, i took the opportunity to omit a few which had no content on the prior volume (likely were already not being backed up by the time of the last rotation) as well as a couple where the servers had been replaced since the last rotation and are so no longer getting new data21:17
corvusfungi: ++21:17
*** boyvinall has joined #opendev21:17
*** diablo_rojo has quit IRC21:18
fungioh, also one service which has been decommissioned since the last rotation (groups.o.o)21:18
fungithat leaves us with the following nine bup-* accounts: ask01 ethercalc02 etherpad lists review storyboard translate wiki zuulv321:19
fungihopefully there's nothing anyone thinks we're backing up which doesn't appear in that list21:20
fungiand all the ~/.bup dirs in them have been initialized21:21
*** JayF has quit IRC21:23
fungiand the symlink pointed at the new file tree and sshd reenabled21:23
*** JayF has joined #opendev21:24
fungibased on previous backup sizes, i should probably start by testing ethercalc if i have any desire for it to wrap up in a reasonable amount of time21:25
fungii have a root screen session going on the ethercalc server where i'm testing the bup command from its crontab21:26
fungiit spewed a warning about missing indices21:27
fungithat's presumably to be expected21:27
fungiwarning: index pack-ac94d2c7004625e772e9c1cc623163ab30d9b37a.idx missing used by midx-36b8c644cf750bbfe70298e7b8453dd3da9f3b28.midx21:28
fungiet cetera21:28
fungithe size of ~bup-ethercalc02 (now on the new volume) is growing21:30
fungiand it seems to have finished, exited 021:30
*** boyvinall has quit IRC21:30
fungitotal size on the backup server 764M21:30
funginow to test zuul21:31
fungii have a root screen session going on zuul01 where i'm testing the bup command from its crontab now21:32
fungisame missing index warnings21:32
fungi~/bup-zuulv3 growing on the backup server21:33
*** boyvinall has joined #opendev21:33
corvusi deem that to be promising :)21:40
fungi7.5gb accumulated for it on the backup server already21:40
*** boyvinall has quit IRC21:42
*** boyvinall has joined #opendev21:43
fungicompleted, exited 0, 22gb as stored on the backup server now22:06
ianwfungi: system-config-run-lists re-ran and went ok, so i don't know, i guess it was just a transient error22:06
fungiianw: yeah, that was really, really strange. it looked like it could have been a name resolution problem22:06
fungicorvus: should i do a second backup on zuul.o.o now to confirm it goes more quickly once primed?22:07
ianwfungi: i think that these days we wouldn't need to pre-seed the storage volume on backups; the ansible roles should create things as required22:07
clarkbfungi: how big is the bup stuff on zuul01 now?22:07
fungicorvus: also the ~root/.bup dir on zuul.o.o is now 3.5gb22:07
clarkbis it in the same magnitidue of the previous contents?22:07
fungiheh, you read my mind22:07
clarkbah cool so we saved like 20GB or something22:07
clarkbin that case I think we should do similar with review22:08
fungiyeah, seems safe22:08
fungii'll fire a second backup on zuul now to see how much faster to completes than the initial transfer22:09
fungino missing index warnings this time22:09
corvusfungi: huzzah, thanks!22:09
corvusso if we wanted to clear out the .bup dir on gerrit, now would probably be the time22:10
fungiyes, i think so22:10
corvusi'm in favor22:10
fungilooks like it's 15gb22:10
clarkbI'm still deep into school district q&a so can't help right now but I am also in favor22:10
fungii'll remove it and then start a backup there under screen22:10
clarkband maybe we update docs to say that we can clear that server dir if we rotate the remote backups too22:11
clarkbI can write that change since its less time sensitive22:11
fungihave at it22:11
fungiianw: i think the benefit of the rsync step is that you can do it while the backup server is offline. disabling sshd means ansible can't prepopulate homedirs, so you risk having a backup attempted when the homedir doesn't exist... though maybe that's fine after all, the end result is probably the same as if a backup is attempted with sshd stopped?22:13
fungioh, also the second zuul backup completed in a few minutes, and both local and remote .bups are still basically the same size22:20
fungias one expects22:21
fungiokay, i've removed ~root/.bup on review01 and run `bup init` as root22:22
funginow running the backup command from its crontab in a root screen session there22:22
fungi#status log rotated backup volume to main-202007/backups-202007 logical volume on backup01.ord.rax.ci.openstack.org22:24
openstackstatusfungi: finished logging22:24
fungibefore i forget22:24
ianwfungi: also, yeah this is the old puppet hosts; the ansible hosts are backing up to vexxhost22:24
fungioh, do we have a second backup server already?22:24
fungiindeed, i did not notice that review.o.o is backing up to backup01.ca-ymq-1.vexxhost.opendev.org22:25
fungiianw: zuul01 isn't really an "old puppet host" though22:26
fungiis it just awaiting switching to the new server?22:26
ianwyeah, i think where things got stalled was converting everything to ansible-based backups, and then starting a new rax backup, and doing dual backups22:26
ianwthe ansible roles are all written so that we just drop another server in the backup-server group and it should "just work" ... install a separate cron job22:27
ianwfungi: zuul may be a hybrid; i don't think we've started completely fresh servers, let me see...22:29
fungialso, i guess shortly we'll have the answer to what happens if you blow away the local .bup but not the remote one22:30
ianwit's not in the "ansible" backup list ... https://opendev.org/opendev/system-config/src/branch/master/inventory/service/groups.yaml#L2322:31
fungigot it, so its backups are still being configured by puppet, even though everything else on the server is ansible22:32
ianwin fact, there's probably a good chance backups are not being configured22:32
ianwthey're just left over22:32
ianwsince the switch to containers22:32
fungiwhich i guess is fine if we don't anticipate rebuilding those servers as-is22:32
clarkbfor zuul01 we want to backup the keys iirc22:33
clarkbwhich should be backed up properly since they are bind mounted22:33
fungiso on the new backup server so far the only things being backed up are review, review-dev and etherpad22:35
openstackgerritIan Wienand proposed opendev/system-config master: Add Zuul to backups group  https://review.opendev.org/74082422:38
ianwfungi: ^ so we should probably go with that22:38
ianwthe idea was that as we dropped bup::site that would replace it22:39
ianwuntil there was no more puppet hosts; then as i say, we drop in another backup server to have dual offsites22:39
clarkbI think it would be good to go back to two remotes if possible22:39
ianwthat's certainly possible; all that needs to happen is to bring up another backup host and put it in the backup-server group22:41
ianws/host/server/ just to keep the terms consistent22:41
*** tosky has quit IRC22:42
*** tkajinam has joined #opendev22:54
fungiokay, so this is strange22:57
fungion review01, trying to perform a backup is (eventually) failing with "IOError: [Errno 28] No space left on device"22:59
clarkbis it filling /22:59
fungidoesn't seem like it22:59
clarkbor maybe /var/backups or similar type of spool?22:59
fungii don't see a full fs on either end22:59
fungirerunning again because i wasn't doing it under screen the first time23:00
fungibut i wonder if this has been failing for a while23:00
clarkbhttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=32&rra_id=all shows a spike but not being full (could be it hit the limit then imediately went back under though?)23:01
fungioh, yep, i bet so23:02
*** boyvinall has quit IRC23:02
fungithe initial drop is from where i cleared .bup23:02
clarkbmaybe we need to clean up /home/gerrit2 before bup will be happy23:03
clarkbI keep avoiding that because escared23:03
fungii guess we don't have enough free space for the spool23:03
clarkbwe do it as a stream on the command line but bup itself must spool in order to chunk and checksum?23:04
fungiyeah, especially if we're not actually successfully backing it up23:04
fungiseems that way23:07
*** mlavalle has quit IRC23:08
fungiwe could likely clear out a ton of ~gerrit2/index.backup.* files which may reduce the volume of data we're backing up (won't free up space at rest on the rootfs though as that's on a separate fs)23:08
clarkbya but the spooling is likely related to the input?23:08
fungii doubt those are of much use except to roll back if a reindex fails23:08
clarkbfungi: any idea where the growth is?23:09
fungialso some bundles like gerrit_backup_2016-04-11_maint.sql.gz and gerrit-to-restore-2017-09-21.sql.gz23:10
fungican you clarify what growth you mean?23:10
clarkb"IOError: [Errno 28] No space left on device" <- basically what causes that23:10
fungioh, as in what file is it spooling to on the rootfs. i'll see if i can find out23:11
fungilsof will likely say what's open23:11
clarkbcacti seems happy now at least23:11
clarkbhave you hit the issue more than once?23:11
fungiwe have a /tmp/repos dir we could clean up to free 1.2gb of the rootfs23:12
clarkbI had a set of notes around this23:14
clarkbbut then every time I sit down to deal with it I get paranoid about deleting things I shouldn't23:14
* clarkb trying to find it now23:14
fungiand yeah, i'm not seeing any unconstrained growth on the rootfs during this backup attempt23:16
clarkbhttp://paste.openstack.org/show/BoP6WhVAe5XbXtf8gDUC/23:17
clarkband then I made an etherpad from that23:17
clarkbthe rest of my day today has been completely shot by school stuff23:17
clarkbI'll try to dig up the rest of my notes on ^ tomorrow as we should do that clenaup anyway23:17
fungisounds good, thanks23:19
openstackgerritIan Wienand proposed opendev/system-config master: Backup all hosts with Ansible  https://review.opendev.org/74082723:21
ianwfungi/clarkb: ^ so i think that lays out a plan ... the fatal flaw was probably that the puppet side was supposed to disappear more quickly than it has23:22
ianwData could not be sent to remote host "23.253.56.128". Make sure this host can be reached over ssh: Load key "/root/.ssh/id_rsa": invalid format23:27
ianwthat's a new one23:27
ianwhttps://zuul.opendev.org/t/openstack/build/c3676edcffdd4c2583aaa823516ce01c23:27
funginew key format with old openssh?23:46
fungialso the rootfs disk utilization on review01 is starting to grow again, but not as fast as during the previous backup attempt23:48
fungii can't find where the additional files are. entirely possible they're unlinked but open fds somewhere23:56
fungiwhich would explain how they were immediately cleaned up when bup crashed rather than left behind23:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!