Monday, 2021-05-17

openstackgerritIan Wienand proposed openstack/diskimage-builder master: Fix DISTRO_NAME in Fedora elements  https://review.opendev.org/c/openstack/diskimage-builder/+/79162700:02
openstackgerritIan Wienand proposed opendev/system-config master: zuul job : collect some more logs  https://review.opendev.org/c/opendev/system-config/+/79105500:08
*** janders has quit IRC00:17
*** Dmitrii-Sh has quit IRC00:17
*** yoctozepto has quit IRC00:17
*** zbr has quit IRC00:17
*** Dmitrii-Sh has joined #opendev00:17
*** zbr has joined #opendev00:17
*** janders has joined #opendev00:17
*** yoctozepto has joined #opendev00:17
openstackgerritMerged opendev/system-config master: Double the default number of ansible forks  https://review.opendev.org/c/opendev/system-config/+/79152800:26
openstackgerritMerged openstack/diskimage-builder master: Add fedora-containerfile element  https://review.opendev.org/c/openstack/diskimage-builder/+/79036500:52
*** ricolin has joined #opendev03:12
*** ricolin has quit IRC03:12
*** ricolin has joined #opendev03:14
*** ykarel_ has joined #opendev03:48
*** ykarel_ has quit IRC03:51
*** ykarel_ has joined #opendev03:51
openstackgerritIan Wienand proposed opendev/system-config master: [DNM] Use new docker ipv6tables option to map haproxy ports  https://review.opendev.org/c/opendev/system-config/+/79163303:57
*** ykarel_ is now known as ykarel03:59
*** akahat is now known as akahat|ruck04:26
openstackgerritIan Wienand proposed opendev/system-config master: Run haproxy as root user  https://review.opendev.org/c/opendev/system-config/+/79163404:30
openstackgerritIan Wienand proposed opendev/system-config master: [DNM] Use new docker ipv6tables option to map haproxy ports  https://review.opendev.org/c/opendev/system-config/+/79163304:32
*** ralonsoh has joined #opendev04:38
*** vishalmanchanda has joined #opendev04:43
*** marios has joined #opendev04:53
openstackgerritIan Wienand proposed opendev/system-config master: [DNM] Use new docker ipv6tables option to map haproxy ports  https://review.opendev.org/c/opendev/system-config/+/79163304:59
*** ysandeep|away is now known as ysandeep05:10
*** sboyron has joined #opendev05:51
*** darshna has joined #opendev05:51
*** sboyron has quit IRC05:52
*** sboyron has joined #opendev05:55
openstackgerritIan Wienand proposed opendev/system-config master: [DNM] Use new docker ipv6tables option to map haproxy ports  https://review.opendev.org/c/opendev/system-config/+/79163306:04
*** logan- has quit IRC06:12
*** logan- has joined #opendev06:15
*** slaweq has joined #opendev06:35
*** mkowalski has quit IRC06:43
*** mkowalski has joined #opendev06:43
*** brinzhang has joined #opendev06:43
*** amoralej|off is now known as amoralej06:44
*** gibi has quit IRC06:54
*** fressi has joined #opendev06:59
*** iurygregory has quit IRC07:12
*** hashar has joined #opendev07:17
*** iurygregory has joined #opendev07:21
*** andrewbonney has joined #opendev07:21
openstackgerritIan Wienand proposed opendev/system-config master: [DNM] Use new docker ipv6tables option to map haproxy ports  https://review.opendev.org/c/opendev/system-config/+/79163307:30
*** tosky has joined #opendev07:32
*** ysandeep is now known as ysandeep|lunch07:49
*** lucasagomes has joined #opendev07:58
*** jpena|off is now known as jpena07:58
*** whoami-rajat has joined #opendev08:09
*** dtantsur|afk is now known as dtantsur08:09
*** ykarel is now known as ykarel|lunch08:13
*** gibi has joined #opendev08:21
fricklermnaser: any update about the IPv6 situation yet? this is still affecting my daily work by forcing me to explicitly require accessing opendev.org via v4 only08:22
*** brinzhang_ has joined #opendev08:55
*** brinzhang has quit IRC08:58
openstackgerritIan Wienand proposed opendev/system-config master: [DNM] Use new docker ipv6tables option to map haproxy ports  https://review.opendev.org/c/opendev/system-config/+/79163309:03
openstackgerritIan Wienand proposed opendev/system-config master: [DNM] Use new docker ipv6tables option to map haproxy ports  https://review.opendev.org/c/opendev/system-config/+/79163309:05
*** ysandeep|lunch is now known as ysandeep09:11
*** ykarel|lunch is now known as ykarel09:27
*** hrw has joined #opendev09:32
hrwmorning09:32
hrwcan https://storage.gra.cloud.ovh.net be configured to show logs? or zuul configured to not store logs there?09:32
hrw"Network Error (Unable to fetch URL, check your network connectivity, browser plugins, ad-blockers, or try to refresh this page) https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_16b/777062/2/check-arm64/openstack-tox-py39-arm64/16b46d9/"09:33
*** ralonsoh has quit IRC09:54
*** ralonsoh has joined #opendev09:57
*** hashar is now known as hasharAway09:58
fricklerhrw: the logs expire after some time, I think 4 weeks, that job seems to be a bit older, so you'd need to rerun it in order to get new logs10:24
hrwah. thanks10:25
hrwtoo many tabs in monday morning patch check and looked in wrong place for job age10:26
*** gibi has quit IRC10:42
*** gibi has joined #opendev10:43
*** jpena is now known as jpena|off10:58
*** jpena|off is now known as jpena11:00
*** fressi has quit IRC11:14
*** fressi has joined #opendev11:15
openstackgerritIan Wienand proposed opendev/system-config master: [DNM] Use new docker ipv6tables option to map haproxy ports  https://review.opendev.org/c/opendev/system-config/+/79163311:21
ianwclarkb / fungi: ^ in short, haproxy switched to running as a user.  the simple thing to do is to just run as root11:22
ianwhowever, i've been exploring the new options that uses ip6tables to make ipv6 much more workable for us.  they're still experimental in docker, but i think it's worth fleshing it out just so we understand the option11:23
ianwit is incremental steps.  it would mean we could expose 80/443 to containers on ipv4 and ipv6 without having to give them capabilities to bind to low ports, or fiddle other settings11:25
*** jpena is now known as jpena|lunch11:30
*** fressi has quit IRC11:36
*** fressi has joined #opendev11:47
fungiianw: some of the permissions errors were about opening files for write though too, i guess ones we bindmount into the container?11:55
*** yoctozepto has quit IRC11:55
*** ykarel has quit IRC11:55
*** ykarel has joined #opendev12:03
*** hasharAway has quit IRC12:09
openstackgerritHitesh Kumar proposed openstack/diskimage-builder master: Migrate from testr to stestr  https://review.opendev.org/c/openstack/diskimage-builder/+/78924612:10
*** hashar has joined #opendev12:10
openstackgerritMerged openstack/diskimage-builder master: Fix DISTRO_NAME in Fedora elements  https://review.opendev.org/c/openstack/diskimage-builder/+/79162712:15
*** jpena|lunch is now known as jpena12:25
*** amoralej is now known as amoralej|lunch12:25
*** yoctozepto has joined #opendev12:31
openstackgerritchandan kumar proposed openstack/project-config master: Added publish-openstack-python-tarball job  https://review.opendev.org/c/openstack/project-config/+/79174512:38
*** marios is now known as marios|call13:02
openstackgerritchandan kumar proposed openstack/project-config master: Added publish-openstack-python-tarball job  https://review.opendev.org/c/openstack/project-config/+/79174513:11
*** amoralej|lunch is now known as amoralej13:13
*** marios|call is now known as marios13:46
kopecmartinfungi: hi, can you have a look when you have a moment please https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/78948113:54
*** artom has joined #opendev14:00
*** ysandeep is now known as ysandeep|afk14:00
*** chandankumar is now known as raukadah14:16
fungiinfra-root: per discussion from friday, i've forced a logrotate on zuul02 just now, and am in the process of copying /var/log/zuul to /var/log/zuul.tmp (which is on the rootfs, there's plenty of space there for now)14:23
clarkbfungi: thanks! I'm just starting to sit down to start my day, though I've got to visit the optometrist in a bit14:25
fungii'll write up the rest of the plan for it in https://etherpad.opendev.org/p/zuul-swapfs-2021-05-1714:25
fungiclarkb: no sweat. i'm happy to wait on the scheduler restart until you're back and have increased your terminal font size a bunch ;)14:25
*** hashar is now known as hasharAway14:25
corvusfungi: etherpad link is empty for me14:26
fungiyeah, for me to; i haven't actually pulled it up and started writing just yet ;)14:27
corvusoh ho14:27
fungier, me too14:27
corvusfungi: why do we need a scheduler restart?14:27
clarkbcorvus: to fix the swap partition on the zuul02 (we need to repartition xvde to do that and /var/log/zuul shares the same device)14:28
corvusgotcha, that's what swapfs means :)14:29
clarkbcorvus: https://review.opendev.org/c/opendev/system-config/+/791554 is hte underlying issue that was fixed. It affects zuul02, zk04-06, two mirrors and review0214:29
clarkbI was going to do a more extensive double check today to make sure I didn't miss any on friday14:29
clarkblooks like ianw has fixed review02 already14:29
fungiyeah, something (i forget what?) changed such that our launch-node script began passing megabytes assuming they were gigabytes14:30
fungiso we have servers with 8mb swap devices14:30
clarkbfungi: ianw wrote a change when spinning up review02 to limit swap space to 8GB bceause review02 has ~120GB of memory and that is far too large of a swapfile/partition. Unfortunately this change got the scale wrong and caused things to be limited to 8MB not 8GB14:31
*** fressi has quit IRC14:31
fungiaha, thanks, makes sense14:32
fungioh, right, fixed in 791554 as you said14:33
clarkbthe two mirror hosts use swapfiles and should be easy fixes. I'll try to pick those up after the optometrist14:34
clarkbzk04-06 use partitions but nothing seems to be using /opt there (what the swap partition shares a device with) so those are also likely easy. Means zuul02 is the only complicated fixup14:35
clarkbside note: looks like logrotate was working properly since the fix for zuul02 logrotation landed14:36
fungiyep, i can confirm that seemed to work fine14:42
clarkbfwiw my list of servers that need swap fixes is based on changes to our inventory file since the bug landed. I think that is a reasonably complete list but will try and do an ansible check against all the things today14:43
*** ysandeep|afk is now known as ysandeep14:44
*** vishalmanchanda has quit IRC14:54
clarkbI have fixed up mirror.iad3.inmotion.opendev.org's swap situation. Doing the osuosl mirror next15:03
clarkbosuosl mirror is done now too15:10
clarkbI'll hold off on zookeeper servers until after my errand today to avoid neeind to leave it in a half state15:13
clarkbfungi: did you see https://review.opendev.org/c/opendev/system-config/+/791634/1/playbooks/roles/haproxy/files/docker/docker-compose.yaml should address the haproxy problems you worked around over the weekend?15:14
clarkbpasses testing which did hit the problem previously right?15:14
clarkbfrickler: hrw: correct we set the swift metadata on our log uploads to clean them up after ~1 month15:17
clarkbthe volume and size of the logs is quite large which pushes us to do that15:18
*** mlavalle has joined #opendev15:34
fungiclarkb: hadn't reviewed it yet, thanks for the reminder15:35
fungiinfra-root: i think the plan at https://etherpad.opendev.org/p/zuul-swapfs-2021-05-17 is reasonably complete, but update/correct it if you spot obvious problems in there15:35
*** ykarel has quit IRC15:39
fungithe longest parts of the outage will likely be moving the saved logs back into the recreated filesystem, and waiting for the scheduler to finish initializing... any feeling for whether it'll be long enough that we should do a status notice in there?15:44
clarkbfungi: we probably should as even the fastest restarts often get noticed15:46
fungii'll add it as a step15:46
fungiit's in there before stopping the container now15:51
*** amoralej is now known as amoralej|off15:52
fungiclarkb: i need to switch to kneading pizza dough for a bit, but let me know when you're back from your appointment and i'll get started working through the steps outlined there15:53
clarkbfungi: will do!15:55
*** ysandeep is now known as ysandeep|away15:59
*** marios is now known as marios|out16:06
*** artom has quit IRC16:07
*** artom has joined #opendev16:07
*** lucasagomes has quit IRC16:12
*** DSpider has joined #opendev16:20
*** DSpider has quit IRC16:26
*** marios|out has quit IRC16:29
*** dtantsur is now known as dtantsur|afk16:36
openstackgerritMerged opendev/system-config master: Run haproxy as root user  https://review.opendev.org/c/opendev/system-config/+/79163416:56
*** jpena is now known as jpena|off17:01
*** artom has quit IRC17:04
*** ralonsoh has quit IRC17:22
*** hamalq has joined #opendev17:25
clarkbfungi: I'm back at this point17:30
fungican you still see?17:32
clarkbI can, they only numbed my eyeballs and did not dilate them17:32
clarkbI'm pulling up the etherpad now to review the steps17:32
clarkbfungi: the etherpad lgtm. Do we also want to stop ansible from running on zuul02 while we do that work? (it won't restart services but it might modify /var/log/zuul contents?17:36
fungioh, because ansible may set ownership on that path or something?17:37
clarkbalso we need to do similar with the zk servers. Except instead of /var/log/zuul it is /opt.17:37
clarkbfungi: yup17:37
fungii can stick zuul02 in the emergency disable list now, just a sec17:37
fungiokay, its in there17:38
fungii guess we should give it a few minutes in case it's running a playbook which hasn't noticed that17:38
clarkb++17:39
clarkbinfra-root once swap fixups are done the next thing I want to do is delete zuul01, please check if you want to preserve anything on that host17:40
clarkbfungi: question about this fs stuff: do we actually want to preserve lost+found across fses? I don't think so?17:44
clarkbmaybe we do? your copy on zuul02 did copy it fwiw17:44
clarkbI guess it is empty17:44
fungiyeah, i tend to just ignore it17:44
fungii can delete it after the rsync17:45
clarkbfungi: oh I would keep it around I just thought its content was fs specific17:45
clarkband so copying it from one fs to another may not make sense17:45
fungii mean i can delete the copy in the temporary location17:45
clarkbgot it17:45
*** andrewbonney has quit IRC17:46
fungiso that we don't overwrite the one for the new fs17:46
clarkb++17:46
clarkbfwiw the process you have on the etherpad looks very similar to what the zk's need so I may just stick a copy and edit it at the bottom of that etherpad too?17:46
clarkbwill help me not miss anything when doing the zk's17:46
fungiokay, added lost+found cleanup and emergency disable list stuff to the plan17:49
fungiso i don't forget17:49
fungiclarkb: yeah, feel free to plagiarize it for the zk servers where it makes sense, continuing to use that pad seems fine too17:49
clarkbfungi: I think I'm ready to do a zk server if you think we should wait a bit on zuul otherwise I'll wait for zuul to finish first17:55
fungihere's an idea... since the zk servers are redundant, we can use one to test particularly the parted command syntax17:56
fungiin case i got something subtly wrong17:56
clarkbfungi: sounds like a good idea. Do you want to drive that or should I? zk04 is a follower so a good candidate17:58
clarkb(we can do the leader last just in case)17:58
fungiclarkb: your zk plan doesn't include stopping/starting zk. that's needed, right?17:58
clarkbfungi: stopping zk shouldn't be needed because /opt isn't used by zk17:58
fungiohh17:58
clarkbthe only thing in /opt is a mostly empty /opt/containerd17:58
fungiokay cool17:59
clarkb(there are two subdirs of that dir and no files)17:59
fungiyeah plan there lgtm then17:59
funginone of them would be outages anyway i guess?17:59
clarkbya there shouldn't be any outages unless we do something very wrong17:59
corvusclarkb: did the zuul02 swap happen yet?  i'm wondering if we can squeeze the encrypt change into that sequence18:01
fungicorvus: it hasn't happened yet, let's add it18:01
clarkbya we're going to do at least on zk serverfirst to make sure the parted commands are happy18:01
fungicorvus: can you plug the commands you want into the outline in https://etherpad.opendev.org/p/zuul-swapfs-2021-05-17 ?18:01
clarkbfungi: I've started a root screen on zk0418:02
clarkband I'm editing the emergency.yaml now to put all the zks in it18:02
fungijoined18:02
corvusfungi: sorry i just mean if we merge 791765 first, then we'll restart into the new decrypt-on-executor code18:02
clarkbfungi: note ^ that you need to do an image pull and a full restart for that18:03
corvusthough if we do that, we'll need to run the zuul stop/start playbooks18:03
clarkbcorvus: and ya I can review those changes after zks are cleaned up18:03
fungicorvus: got it, yeah the current plan was just to down and up the container. if we need executors restarted too than can rework the plan to include that18:04
fungis/than/then/18:04
clarkbfungi: I'm proceeding with zk04 now18:05
fungilgtm so far18:06
clarkbfungi: I edited the parted command for zk to use 4096 MB instead of 8192 to match memory18:06
fungiyup18:06
fungithat looks right18:07
clarkbfungi: you ready for me to run the parted command now?18:07
fungiyes18:07
fungiperfecto18:08
clarkbfungi: I think that went well18:11
clarkbI'll do zk05 and zk06 now, do you want me to do those in a root screen too?18:11
*** hasharAway is now known as hashar18:12
funginot necessary, that was straightforward and as you said nothing's actually using it18:12
clarkbfungi: zk05 is done now if you want to double check it18:16
clarkbdoing zk06 next18:16
fungilooking18:17
fungi05 lgtm18:18
fungifor zuul restarts, what's the playbook we normally run from bridge?18:18
fungii guess the process would be to stop all the containers on the executor as stated in the maintenance plan and do the filesystem work, then instead of just upping those run the pull and restart playbook(s)?18:19
clarkbfungi: there is a system-config/playbooks/zuul_pull.yaml and a zuul_start.yaml and a zuul_stop.yaml18:19
clarkbI think you want to od a zuul_pull.yaml then a zuul_stop.yaml then a zuul_start.yaml18:19
fungisince it's not a single restart playbook, i suppose i can replace the `docker-compose down` on zuul02 with the full stop playbook on the bridge instead18:20
clarkbyes18:21
clarkbzk06 is done now too if you can take a quick look then I'll remove the zks from emergency18:22
clarkbfungi: oh also the queue dumping and restoring is a bit different on zuul02 now18:24
fungiwe need the change merged and image updated before we pull, right?18:24
clarkbfungi: yes changes need to be merged and images promoted first18:24
fungiclarkb: i copied the queue dump/restore from root's command history on zuul02, are there missing steps?18:25
clarkbfungi: re queue dumping you need to run it out of a checkout on zuul02 since there isn't one in /opt anymore. Also you need to edit the commands to reenqueue to use docker exec18:25
clarkbfungi: if corvus set that up in roots homedir then it should work let me check18:25
clarkbyup looks like corvus did that for us, thank you corvus18:25
clarkbfungi: ^ you should be good18:26
clarkbI'm going to go and review changes for zuul now18:26
fungithanks!18:26
clarkbfungi: oh you want to remove zuul02 from the emergency file so that those playbooks function18:38
clarkbfungi: instead of using the emergency file we should use disable-ansible to prevent automated ansibel from doing things while we do the human controlled ansible18:38
clarkbI'm going to remove zk04-06 from the emergency file now18:38
clarkbfungi: ^ I left zuul02 in the mergency file but I think you can go ahead and remove it18:39
clarkbI'm going to grab lunch while we wait for zuul to CI those changes18:40
*** artom has joined #opendev18:48
fungioh, yep, removing zuul02 from the emergency disable list18:48
fungithat would get in the way of us running those playbooks, for sure18:48
fungiwhere/when should we use disable-ansible in that sequence? now, i guess?18:50
clarkbfungi: maybe closer to when we are ready to run it18:50
clarkbsince that puts a big roadblock on the zuul jobs and they can pile up18:50
fungiyeah18:50
fungiokay, i've got the continuous deployment disable/resume added to the plan18:54
corvusclarkb, fungi: the zuul decrypt patches are approved, if all goes well they should land in ~1 hour.  if you need to restart zuul02 before then, that's fine.  if it happens after, i can update the plan with the extra commands.19:09
clarkbI think we can wait. All of the other hsots have been sorted as far as swap goes. zuul02 is the last one one my list (though I still need to do a wider check)19:11
fungicorvus: nah, no rush, and i think i got the commands right if you can just double-check19:11
clarkbcurrently plenty of memory free on zuul0219:11
corvusoh you already changed, cool i'll check19:11
corvusfungi: looks correct to me19:12
fungiawesome, thanks!19:12
clarkbonce we're done with that I Think zuul01 will be up for deletion too19:13
openstackgerritClark Boylan proposed openstack/project-config master: Stop requiring registered nicks for IRC  https://review.opendev.org/c/openstack/project-config/+/79181819:28
clarkbI said I would push ^ up last meeting (I'm putting tomorrows agenda together)19:28
fungiprobably worth keeping an extra close eye on the results of that once it's in place, given the recent drams19:29
fungidrama19:29
clarkb++19:31
clarkbwe also don't need to land it just yet since we've got a few other things in the fire19:32
*** hashar has quit IRC19:50
clarkblooking ahead at my week I'm thinking wednesday may be a good day to try the mailman ansible stuff19:54
clarkbcorvus: fungi: looks like the zuul changes hit a problem in the gate (the upload image job timed out)19:56
clarkbdo we want to dequeue then enqueue to speed things up? or just wait for unittest to finish and reapprove?19:56
fungii'm in no rush19:58
corvusi'll rejigger it19:59
corvusi ran zuul promote --tenant zuul --pipeline gate --changes 791514,220:01
clarkbthings look queued the way we want them20:01
clarkbcorvus: and you had to docker exec that?20:02
corvusya20:02
fungicool20:03
clarkbIf you have anything to add to the meeting agenda do it soon. I think all my edits are in now. Just need to mail it out20:07
corvusi'm out to run an errand; biab.20:34
*** sboyron has quit IRC21:01
*** gothicserpent has quit IRC21:05
*** whoami-rajat has quit IRC21:11
*** slaweq has quit IRC21:11
fungi791514 has merged and its zuul-promote-image build succeeded. is that all we were waiting for to be able to pull?21:18
clarkbfungi: I think we want the end of that stack to merge21:19
clarkbthere are 3 chagnes total21:19
fungior i guess we wanted the other two in as well, yeah21:19
fungilooks like they're merging now21:19
corvusand merged; let's check the promote job21:20
fungiwe want to see 791775 succeed its zuul-promote-image build i think21:20
clarkbya21:20
clarkbthe promote job just succeeded21:22
clarkbcorvus: do you also want to double check the info on docker hub? You did that last time but I think that was because the job failed?21:23
corvusthe job succeeded, so i think we're good21:24
clarkbfungi: you ready?21:25
fungiokay, yep, moving forward21:25
clarkbI guess let me know what I can do to help. I am around21:25
fungii have root screen sessions on both bridge and zuul02 if anyone wants to follow along21:26
fungistarting with disabling ansible21:26
clarkbI've attached to both of them21:26
fungiand pulling images21:27
fungilooks like it worked21:28
clarkbThe about an hour ago image update looks right to me21:29
fungithough `docker image ls` on zuul02 shows the most recent image is from "About an hour ago"21:29
fungibut yeah, i guess that's when the gate job to build the image completed21:29
clarkbyup because the image timestamp is when it was built which happened in the gate job21:29
fungischeduler image id is b6c06442196d21:29
fungii'll dump queues and send the status notice next21:30
clarkb++21:30
corvusyeah i believe that timestamp interpretation is correct21:30
fungi#status notice The Zuul service at zuul.opendev.org will be offline for a few minutes (starting now) in order for us to make some needed filesystem changes; if the outage lasts longer than anticipated we'll issue further notices21:31
openstackstatusfungi: sending notice21:31
-openstackstatus- NOTICE: The Zuul service at zuul.opendev.org will be offline for a few minutes (starting now) in order for us to make some needed filesystem changes; if the outage lasts longer than anticipated we'll issue further notices21:31
fungistopping services now21:31
fungisays it completed21:32
fungiworking on the fs changes to zuul02 next21:32
clarkbI double checked 02 and it indeed has no containers running21:32
openstackstatusfungi: finished sending notice21:34
fungiinterestingly the debug logs never updated after i called logrotate, but the non-debug logs have21:34
fungianyone want to double-check me on that before i umount the original fs?21:35
clarkbI'm not sure I understand what you mean by that21:35
clarkb-rw-r--r-- 1 zuuld zuuld 3225855785 May 17 14:18 debug.log.1 exists21:36
clarkbwhich is from when you rotated earlier today21:36
fungilast modified timestamp on /var/log/zuul/debug.log and /var/log/zuul.tmp/debug.log are 14:2321:36
clarkb-rw-r--r-- 1 zuuld zuuld 1625733762 May 17 21:31 debug.log is what I see21:36
fungisimilar for web-debug.log21:36
fungiokay that's super weird21:37
clarkbI think you are looking at the log.1 files?21:37
clarkbthose are from earlier today when you rotated by hand21:37
fungi-rw-r--r-- 1 zuuld zuuld   28966363 May 17 14:23 debug.log21:38
clarkbbut the current log files all seem to have current timestamps for me21:38
fungiif i ls -l the directory that's what it shows21:38
clarkbthat isn't getting truncated?21:38
funginope line after it is this21:38
fungi-rw-r--r-- 1 zuuld zuuld 3225855785 May 17 14:18 debug.log.121:38
fungiif i ls -l the file directly it shows a different timestamp21:38
clarkbthe file size isn't want I see either21:38
fungiit's like the output is cached/stale or something21:39
clarkbI am not able to reproduce that21:39
funginevermind21:40
corvusi see clarkb's21:40
fungii was scrolling back my tmux window which had an old ls -l in it :/21:40
funginot scrolling back the screen buffer21:40
fungiokay, moving ahead!21:40
clarkbits me21:40
clarkbI'm out of the dir now21:41
fungiaha, thanks21:41
fungiit complains the partition is not aligned, do we care?21:42
clarkbI think make_swap.sh does log_2 math to aviod that (however 8192 should be log_2 aligned)21:42
clarkbI didn't get similar when doing 4096 on zks21:43
clarkboh its sector alignments?21:43
fungilooks that way21:43
fungithe "s" suffix21:43
clarkbyou could tell it to do 2048 sectors for the first partition21:44
fungii don't know where/how it's inferring those sector numbers21:44
fungilike that?21:45
clarkbpresumably?21:46
funginope, still not aligned, plus new errors21:46
fungii don't know where it's getting the 1953 sector start21:47
clarkbheh its still the same error for swap. Is the issue the 1 in 1 8192 ?21:47
fungii can try 021:48
clarkbfungi: I think we want to start at sector 204821:48
clarkbnot at 021:48
fungiahh21:48
fungiso shift the values by +2048 like that?21:49
fungior start at 2049 instead of 1?21:49
clarkbno because that is still bytes21:49
clarkbhttps://askubuntu.com/questions/201164/proper-alignment-of-partitions-on-an-advanced-format-hdd-using-parted says any multiple of 8 is probably fine so maybe we're ok with the original if we shift it by 8MB ?21:50
fungiso like that? i'm honestly not quite sure what you're suggesting, nor why we didn't see the same on other servers21:50
clarkbfungi: well I'm not sure what the command needs to be. But we want to express start at sector 2048 and end 8GB later. Then start the next partition from that point forward21:51
clarkbit seems that it zero indexes so you don't need to do the +121:51
fungithere we go21:51
fungiparted /dev/xvde --script -- mklabel msdos mkpart primary linux-swap 8 8200 mkpart primary ext2 8200 -121:51
fungithat did not error on me21:51
clarkbyup that lgtm (fungi did the shift by 8 bytes thing)21:51
fungi(8192+8=8200)21:51
clarkbI think that is good and we can proceed21:52
fungilsblk says xvde1  202:65   0   7.6G  0 part21:53
fungiclose enough i guess21:53
clarkbya21:54
fungilogfiles are moving back to the new partition now21:55
fungilooks like it finished quickly21:56
clarkbwe are ready to run the start playbook now?21:57
fungicontents of the new /var/log/zuul look correct to me21:57
clarkbcorvus: ^ fyi21:57
fungiyeah, switching over to the bridge screen to run that if everyone's ready21:58
clarkbI'm ready21:58
fungiin theory the excitement won't begin until it tries to run some jobs anyway21:58
fungistarting it now21:58
fungiand that's completed21:59
corvuslogs lgtm21:59
corvusscheduler starting22:00
clarkbcorvus: does the saving of keys double check that the file isn't already there or is it unconditional?22:00
clarkb(since they should already be there?)22:00
corvusunconditional22:00
fungionce it's clear of the cat jobs, i'll start reenqueuing22:00
corvusand it's not the slow part; reading them from zk is22:01
corvusso i don't think adding a condition would speed that up22:01
corvus(though, there might be a bit of extra computation happening to write them out)22:01
corvusanyway, we're going to drop the filesystem stuff soon anyway, so i don't think that part is worth digging into22:01
clarkbok22:02
fungiyeah, i figured that was transitional22:02
corvusi think as soon as we write an export utility, we can drop it22:02
fungilooks like we're through the cat jobs now?22:05
clarkbfungi: yup but it isn't done parsing yet I don't think22:05
fungiahh, no not yet22:05
fungii still see a few cats flashing by22:06
corvusmore tenants22:06
clarkbI think it is up now. The tenant list loads as does openstack status22:06
corvusyep22:06
fungiokay, starting to reenqueue22:07
openstackgerritClark Boylan proposed opendev/system-config master: Better swap alignment  https://review.opendev.org/c/opendev/system-config/+/79183222:07
fungisome builds are already running22:08
clarkbthey have console logs too22:08
fungiyup22:08
clarkbpreviously when we had trouble with the yaml it failed before it got that far22:08
fungiso i think the revert of the revert is good now22:09
corvus\o/  that's definitely more than last time :)22:09
fungithat was ~half an hour downtime, not terrible22:09
*** iurygregory has quit IRC22:10
clarkbcorvus: fungi: any objections to me deleting zuul01 and its dns records once we're happy with zuul02s restart?22:10
corvusclarkb:  no objection22:11
fungiclarkb: no objection22:11
fungii checked my homedir on it earlier22:11
clarkbcool I'll do that as soon as fungi gives the all clear on 0222:11
funginot that i have a habit of keeping things of any value on random servers22:11
fungireenqueue finished, doing cleanup now22:12
clarkbI'm out of both root screens now too fwiw (I think you can close those up when you are happy with them fungi)22:12
fungiall finished22:13
fungi#status log Updated swap and log filesystem sizes on zuul02, and restarted all Zuul services on cdc99a322:14
openstackstatusfungi: finished logging22:14
fungisome builds have already succeeded22:16
fungii think it's good22:17
clarkbinfra-root I will delete zuul01.openstack.org with id ef3deb18-e494-46eb-97a2-90fb8198b5d3 that look correct to you?22:17
clarkbfungi: the failures I see appear to be actual failures which is another good sign22:18
*** iurygregory has joined #opendev22:19
fungiclarkb: that uuid looks like what openstack server show gives me for zuul0122:19
clarkbcool I'm going to issue the delete command now, thank you for double checking22:19
clarkbthe deletion is done. Doing dns cleanup then will status log it22:22
clarkbalso I didn't delete the zk01-03 dns records so will do that next22:23
clarkb#status log Deleted zuul01.openstack.org (ef3deb18-e494-46eb-97a2-90fb8198b5d3) and its DNS records as zuul02.opendev.org has replaced it.22:25
openstackstatusclarkb: finished logging22:25
ianwclarkb: urgh, sorry about the missing *1024, what a mess22:26
clarkbianw: I reivewed the change too :) no worries22:27
clarkbianw: I think we are all done now as far as cleanup goes, but I want to see if I can figure out having ansible check all the hosts before I declare victory22:27
clarkbzk01-03.openstack.org A and AAAA records are now cleaned up too22:27
ianwfungi: yeah, on the permissions issue with haproxy, we do paper over a lot running as root22:29
ianwi believe the way to do it most securely is with the user namespace stuff; so in the haproxy model22:30
ianwhttps://review.opendev.org/c/opendev/system-config/+/791633/8/playbooks/roles/haproxy/tasks/main.yaml22:30
ianwit creates a haproxy user as UID 99, so making /var/haproxy/* owned by 100099 on disk running with the "zuul" subuid makes things work22:30
ianwalthough, the exact location the zuul subuid is created is still a bit of a mystery to me.  i'm assuming we do it in ansible, somehwere?22:31
clarkbinfra-root I'm going to run bridge:~clarkb/playbooks/swap-inspector.yaml and see what that tells me to double check things22:35
clarkbit relies on ansible fact gathering and debug module to emit info when a host has less than 128MB of swap total22:36
clarkbfun that actually still shows zk04 as having too little swap beacuse we cache facts22:37
clarkbI guess I run it once, then double check my list is what we already fixed, then rm the cached facts for those hosts and rerun22:37
clarkbthere are actually a few servers that have no swap22:40
clarkbthat was not caused by the issue we have had with make_swap.sh so I'll ginore those for now (but maybe we want to swapfile them)22:40
clarkbfungi: corvus check out the periodic jobs in openstack tenant. They are all listed as error22:42
clarkbexcept for one22:42
clarkbI'm deleting cache entries from /var/cache/ansible/facts for the hosts we just fixed swap on22:43
clarkbswap lgtm for the hosts that had the tiny swap problem based on that playbook22:45
corvusclarkb: ack22:46
clarkbwe should have the hourly opendev deploy jobs starting in about 11 minutes and we can cross check against that I Guess22:49
clarkbbut there isn't a whole lot of pointers to why there were errors in the dashboard that I see22:50
clarkbthe hourly opendev deploy jobs seem to be running just fine (which gives more weight to corvus' explanation in #zuul)23:01
*** tosky has quit IRC23:43
*** hamalq has quit IRC23:56
*** hamalq has joined #opendev23:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!