Monday, 2020-10-19

*** hamalq has joined #opendev00:10
*** hamalq has quit IRC00:15
openstackgerritIan Wienand proposed opendev/system-config master: [wip] reprepro: convert to Ansible  https://review.opendev.org/75766000:15
*** hamalq has joined #opendev00:28
*** hamalq_ has joined #opendev00:33
*** hamalq has quit IRC00:33
*** hamalq_ has quit IRC00:38
openstackgerritIan Wienand proposed opendev/system-config master: [wip] reprepro: convert to Ansible  https://review.opendev.org/75766000:54
*** hamalq has joined #opendev01:29
*** hamalq has quit IRC01:34
*** DSpider has quit IRC01:45
*** hamalq has joined #opendev01:45
openstackgerritIan Wienand proposed opendev/system-config master: [wip] reprepro: convert to Ansible  https://review.opendev.org/75766001:48
*** hamalq has quit IRC01:49
openstackgerritIan Wienand proposed opendev/system-config master: reprepro: convert to Ansible  https://review.opendev.org/75766002:27
openstackgerritIan Wienand proposed opendev/system-config master: reprepro: convert to Ansible  https://review.opendev.org/75766003:08
openstackgerritIan Wienand proposed opendev/system-config master: mirror-update/reprepro : use common functions  https://review.opendev.org/75869503:08
openstackgerritIan Wienand proposed opendev/system-config master: Remove rsyslogd xconsole workaround  https://review.opendev.org/75662805:06
openstackgerritIan Wienand proposed opendev/system-config master: docker: install rsyslog to capture container output  https://review.opendev.org/75660505:06
*** marios has joined #opendev05:08
openstackgerritIan Wienand proposed opendev/system-config master: ARM64 : run base test on Focal too  https://review.opendev.org/75662905:48
*** ralonsoh has joined #opendev06:43
*** sboyron has joined #opendev06:45
*** eolivare has joined #opendev06:49
*** iurygregory has joined #opendev06:51
*** slaweq has joined #opendev06:57
*** andrewbonney has joined #opendev06:59
*** sshnaidm is now known as sshnaidm|afk07:00
*** hamalq has joined #opendev07:26
*** rpittau|afk is now known as rpittau07:26
*** tosky has joined #opendev07:29
*** sgw has quit IRC07:29
*** hamalq has quit IRC07:30
*** sgw has joined #opendev07:47
*** hashar has joined #opendev07:52
*** sgw has quit IRC08:00
*** hamalq has joined #opendev08:19
*** hamalq has quit IRC08:24
*** sgw has joined #opendev08:25
*** hamalq has joined #opendev08:40
*** hamalq has quit IRC08:45
*** mkalcok has joined #opendev08:57
*** sshnaidm|afk is now known as sshnaidm09:37
*** DSpider has joined #opendev09:57
*** marios has quit IRC10:30
*** ysandeep is now known as ysandeep|coffee11:23
*** marios has joined #opendev11:30
sshnaidmhi, all11:50
sshnaidmI saw in some vms they have private IP v4 like 10.45.1.98 and public IP v6 2607:ff68:100:54:f816:3eff:fe91:803 - is it for specific cloud providers only? Can I connect from other vm that happened to be in other cloud provider to private IP v4 10.45.1.98 ?11:50
sshnaidmis there a routing?11:50
fungisshnaidm: multi-node jobs always satisfy all nodes for their nodeset from a single provider, and it's expected that the public interfaces of all nodes in that build will be able to communicate with each other11:56
sshnaidmfungi, yeah, but we have a case with dependency job11:56
sshnaidmand seems like they can be in different clouds11:56
sshnaidmfungi, can we ensure dependent jobs will run on same cloud as its "parent"?11:57
fungii thought we had solved that already. we have container building workflows which start a registry service in a paused build and then interact with it from others. we wouldn't be able to do that consistently mixing nodes from ipv4-only and ipv6-only providers12:03
sshnaidmfungi, yeah, I think I have exactly same jobs pattern - registry and dependent jobs, they happen to be from ipv4 and ipv6.12:04
sshnaidmfungi, lemme find logs12:05
fungimaybe clarkb or corvus can say for certain how that works when they wake up. i'm a bit swamped getting ready for summit sessions to start and don't have time to go digging in the docs right this moment12:05
fungibut yeah, an example will help12:06
sshnaidmfungi, hmm.. seems like I was wrong, dependency also has IPv6. It just can't pull from ipv6 registry..12:07
sshnaidmfungi, maybe need rule in ip6tables also12:07
sshnaidmfungi, ack, will ping someone if I still have trouble, thanks12:08
*** ysandeep|coffee is now known as ysandeep12:11
*** slaweq has quit IRC12:19
fungisshnaidm: i think there may also be a firewall role we use from zuul-jobs which will open ports for both ipv4 and ipv6, but i may have also imagined it12:21
*** slaweq has joined #opendev12:21
sshnaidmfungi, yep, trying this now, but is deps jobs are in the same cloud, then I can use private ipv4 as well12:22
sshnaidmand not to deal with ipv6 at all12:22
*** slaweq has quit IRC12:25
*** priteau has joined #opendev12:30
openstackgerritsebastian marcet proposed opendev/puppet-openstackid master: Changed default queue driver from database to redis to prevent deadlocks  https://review.opendev.org/75880612:31
*** slaweq has joined #opendev12:31
*** Khodayar has joined #opendev12:54
*** slaweq has quit IRC12:54
KhodayarHi, anyone with OpenStack Monitoring experience?12:55
*** slaweq has joined #opendev12:57
TheJuliaHey, is openstackid-resources.openstack.org run by the infra-team?13:08
TheJuliaor is that purely OSF ?13:08
fungiTheJulia: it is not run by us, no, it's managed by the osf webdev team and contractors13:08
fungithey're working to get it back on track from what i understand13:08
TheJuliafungi: thanks13:09
TheJuliafungi: any specific communication channel I should keep an eye on?13:14
fungiTheJulia: they've been posting updates in the "lobby" main page for the conference13:15
TheJuliafungi: almost nobody can even get that far... :\13:16
TheJuliawe just get "Checking credentials" for at least most people13:16
gouthamr+++13:16
gouthamr:(13:16
TheJuliaoh hey, it just loaded on one of my computers13:16
* TheJulia looks at how many requests this is13:16
gouthamrI'm moderating a session that was supposed to start a couple of minutes ago - zoom room not working sigh13:17
TheJuliagouthamr: the link not working or the supplied credential information ?13:17
TheJuliaugh 138 successful of 185 attemtped requests13:18
gouthamrTheJulia: the zoom room error is "the meeting ID is invalid", dunno who to get hold of, sent an email to speakersupport13:18
gouthamrspeakersupport@openstack.org*13:18
TheJuliagouthamr: have you updated zoom in say the last two months?13:19
TheJuliamaybe 313:20
gouthamrTheJulia: yep - the meeting URL is probably incorrect, i get a "This meeting link is invalid (3,001)" even from the browser13:20
fungias of a few minutes ago we've got a #openinfra-summit channel on freenode13:20
TheJuliaHave you tried the meeting id manually in the zoom client and then paste the password from the url?13:20
TheJuliaoooh13:20
fungialso yes, e-mail to speakersupport is the preferred way for the events staff to help you sort out session issues13:22
yoctozeptoTheJulia: I don't seem to be able to get the zoom link from anywhere for your session, only getting disqus... would you mind sharing the link if you have one?13:23
*** sboyron_ has joined #opendev13:23
ttxLet's centralize the discussion on those issues on #openinfra-summit13:23
yoctozepto++13:24
fricklerttx: fungi: want to send a status notice for that?13:25
fungii can, sure13:25
*** sboyron has quit IRC13:26
openstackgerritMerged opendev/puppet-openstackid master: Changed default queue driver from database to redis to prevent deadlocks  https://review.opendev.org/75880613:26
fungithough we don't have logging set up for that channel since it was just created13:26
fungisomething like: #status notice open infrastructure summit platform status updates will be available in the #openinfra-summit channel (though it is presently not logged)13:27
*** snbuback2 has joined #opendev13:28
*** sboyron_ has quit IRC13:32
fungi#status notice Open Infrastructure Summit platform issues are being worked on by OSF events and webdev teams, status updates will be available in the conference "lobby" page as well as the #openinfra-summit channel on Freenode (though it is presently not logged)13:33
openstackstatusfungi: sending notice13:33
-openstackstatus- NOTICE: Open Infrastructure Summit platform issues are being worked on by OSF events and webdev teams, status updates will be available in the conference "lobby" page as well as the #openinfra-summit channel on Freenode (though it is presently not logged)13:33
openstackstatusfungi: finished sending notice13:36
fungireports in #openinfra-summit that etherpad is slow... looking into it now13:37
fungietherpad server is up over 1.2k concurrent connections according to cacti, but seems to be holding out okay. rootfs utilization is kinda risky though, i'll see if there's something i should clean up in the short term13:44
clarkbfungi: older db backups can likely go. We keep 7 locally iirc13:50
clarkbdocker logs to journalctl another likely candidate13:51
clarkbwe can prune those with a journalctl command iirc13:51
*** diablo_rojo has joined #opendev13:57
clarkbthe journal is 4GB13:58
clarkblooks like we have space there now, I assume you cleaned some stuff up?13:59
fungino, it's just the amount of available space is barely enough for an uncompressed db backup i think, so we nearly fill it daily when mysqldump runs14:00
fungi(looking at the cacti graph)14:00
clarkbah14:00
*** sgw has left #opendev14:00
clarkbfungi: hrm I'm not sure that is it either unless gzip spools to disk14:01
clarkbwe do mysqldump | gzip > file14:01
clarkbwhich should mean the uncompressed content is only ever in memory14:01
fungioh, maybe not then14:01
fungiheh, apparently we're recompressing those?14:02
fungilooking in /var/backups/etherpad-mariadb/ there's etherpad-mariadb.sql.gz.2.gz et cetera14:03
fungii think we have logrotate set to compress them when rotating, which would explain the spikes14:03
fungibut of course it can't effectively compress them, so it just winds up being an extra copy while rotating14:03
clarkbah yup its set to compress14:04
clarkbI bet that is a bug in our ansible conversion14:04
*** sgw has joined #opendev14:07
openstackgerritClark Boylan proposed opendev/system-config master: Don't recompress db backups  https://review.opendev.org/75882414:07
clarkbfungi: ^ that may fix it14:07
fungiaha, thanks!14:07
*** Khodayar has quit IRC14:08
fungiindeed, i guess we were doing it on gitea as well14:09
*** sshnaidm is now known as sshnaidm|afk14:12
*** elod has quit IRC14:17
*** elod has joined #opendev14:19
fungietherpad concurrent tcp connection count is up to 1.4k now14:47
fungiserver still looks reasonably happy14:47
fungiserver-status scoreboard has lots of open slots14:49
clarkbI think we can do 8l14:51
clarkb*8k14:52
*** mlavalle has joined #opendev15:01
*** ysandeep is now known as ysandeep|away15:18
*** hashar has quit IRC15:38
*** slaweq has quit IRC15:55
*** slaweq has joined #opendev15:59
*** hamalq has joined #opendev16:00
*** marios has quit IRC16:01
*** prometheanfire has quit IRC16:05
*** tosky has quit IRC16:12
*** prometheanfire has joined #opendev16:14
*** dtroyer has joined #opendev16:29
*** eolivare has quit IRC16:36
*** snbuback has joined #opendev16:41
*** rpittau is now known as rpittau|afk16:43
*** snbuback92 has joined #opendev16:45
snbuback16:55
*** snbuback92 has quit IRC16:55
openstackgerritClark Boylan proposed opendev/system-config master: Add two more openstackid servers so that we can load balnace them  https://review.opendev.org/75884616:55
clarkbI based ^ on the logrotate fix let me rebase really quickly16:56
openstackgerritClark Boylan proposed opendev/system-config master: Add two more openstackid servers so that we can load balnace them  https://review.opendev.org/75884616:56
clarkbmnaser: fungi ^ fyi please double check the ips there16:56
fungiyup16:57
clarkbI'm finishing up dns chagnes for them. NOTE I cannot do reverse PTR records so that will need to happen vexxhost side if email si a concern16:57
clarkbA and AAAA records for both should be up now16:57
*** snbuback2 has quit IRC16:58
clarkbnote the groups already match the variable digits in the names so I think we should be good on that side of things16:59
*** hamalq has quit IRC17:04
*** mkalcok has quit IRC17:05
fungii've enqueued 758846,2 directly to the gate pipeline17:06
*** hamalq has joined #opendev17:31
*** hamalq has quit IRC17:36
openstackgerritMerged opendev/system-config master: Add two more openstackid servers so that we can load balnace them  https://review.opendev.org/75884617:42
*** andrewbonney has quit IRC17:49
*** ralonsoh has quit IRC18:13
*** priteau has quit IRC18:22
clarkbinfra-root I'e disabled ansible on bridge with `disable-ansible` LE updates failed which is causing us to not run puppet run else18:36
clarkbwe're going to manually run puppet run else on openstackid02 and 03 to pick up those changes18:37
clarkbre LE failure I think it is due to nb01 filling its disk19:19
clarkbianw: ^ fyi sinceI think you were looking at that recently19:19
clarkbalso I had planned to try and do a quiet infra meeting tomorrow, but given today's fires I doubt I'll have the attention span for it19:24
clarkbanyone object to cancelling the meeting?19:24
fungii have no burning desire for a meeting19:25
corvusthat's fine19:32
*** tosky has joined #opendev19:35
*** weshay has quit IRC20:07
*** slaweq has quit IRC20:12
*** slaweq has joined #opendev20:14
*** hamalq has joined #opendev20:32
*** hamalq has quit IRC20:36
*** hamalq has joined #opendev20:48
*** hamalq has quit IRC20:52
clarkbopenstackid 02 and 03 are puppeted now20:55
clarkbfungi: should we rm the DISABLE-ANSIBLE file?20:55
fungiyeah, i think we're all clear now20:57
clarkbdone20:58
ianwhrm, was it 01 disk filling before?21:01
clarkbianw: I think so21:01
ianwfungi: the reprepo work i turned off the cron jobs as suggested, so applying it should be essentially a no-op now : https://review.opendev.org/#/c/757660/21:03
clarkbhttps://review.opendev.org/758824 is another good one to get in21:03
ianwclarkb: ^ if you have time to look it over, i can work on it21:03
clarkbfwiw I'm still trying to coordinate some openstackid scale up21:04
clarkbbut once done I can try and take a quick look21:04
ianwyeah, i saw that.  i have generic roles to setup a load-balancer that didn't go in i think21:04
ianwhttps://review.opendev.org/#/c/677903/21:05
clarkbah cool that could be useful if we stop puppeting this server :)21:05
openstackgerritIan Wienand proposed opendev/system-config master: Make haproxy role more generic  https://review.opendev.org/67790321:07
ianwahh, right.  well yeah, that at least makes it possible to more generically install haproxy21:09
ianwthat was the idea of https://review.opendev.org/#/c/678159/ to be a haproxy for situations like this to be a generic-ish proxy.  that was going to go infront of static but we dropped that bit21:10
*** ianw has quit IRC21:13
*** ianw has joined #opendev21:15
openstackgerritGoutham Pacha Ravi proposed openstack/project-config master: Add manila client,ui,tempest plugin core teams  https://review.opendev.org/75886821:18
openstackgerritGoutham Pacha Ravi proposed openstack/project-config master: Add manila client,ui,tempest plugin core teams  https://review.opendev.org/75886821:30
*** slaweq has quit IRC21:35
*** slaweq has joined #opendev21:35
*** slaweq has quit IRC21:41
ianwthere's a bunch of .raw images left on nb0122:05
ianwi think all the interesting logs have scrolled away but i'm looking22:05
openstackgerritMerged opendev/system-config master: Don't recompress db backups  https://review.opendev.org/75882422:06
ianw[Mon Oct 19 22:05:54 2020] EXT4-fs error (device dm-1): ext4_put_super:935: Couldn't clean up the journal22:06
ianw[Mon Oct 19 22:05:54 2020] EXT4-fs (dm-1): Remounting filesystem read-only22:06
clarkbthat will do it22:08
ianwthe logs are full of bad stuff.  i'm starting to think maybe it's beyond fsck22:09
ianwi think we can just mkfs /opt and an ansible run will restore everything22:10
ianwi wonder if this is related to container updates.22:21
clarkbwe should have a step that does a container image clean but maybe that isn't working or was missed on nodepool builders?22:22
ianwin theory a graceful stop of the container should wait for the current dib to finish and shutdown but i doubt it practically does22:23
clarkbbut that keeps the running container image and the most up to date container images if working22:23
ianwthe /opt fs fsck's clean, maybe all the dm-1 stuff is from dib's loopback mounts22:24
ianwzuul/nodepool-builder   latest              7749c9547286        3 weeks ago         792MB22:25
ianwi guess that's unlikely then22:25
ianwthere's no leaked images, they are legitimately taking up 924G22:27
*** DSpider has quit IRC22:27
clarkbhrm what does nb02 look like? is it not helping out?22:28
ianwyeah, it's seeming not doing much22:30
ianwopenstack.exceptions.HttpException: HttpException: 500: Server Error for url: https://image.api.mtl01.cloud.iweb.com/v2/images/7360eac0-a157-4975-acb3-8b87bfbf53ee, The server has either erred or is incapable of performing the requested operation.: 500 Internal Server Error22:31
ianwi tjust seems to be looping around doing that22:31
ianwgrep 'Deleting image build ' * | awk '{print $8}' | sort | uniq -c | sort22:34
ianw...     20 fedora-32-000000003822:34
ianw     20 ubuntu-bionic-000012099122:34
ianw  27468 debian-stretch-000011603922:34
ianwso not quite true, but it's certainly hung up on that image22:34
clarkbthat should be in a separate thread I think22:38
ianw2020-10-16 17:12:46.590 | DEBUG diskimage_builder.block_device.utils [-] exec_sudo: losetup: /opt/dib_tmp/dib_image.JPvC9WST/image0.raw: failed to set up loop device: No such file or directory exec_sudo /usr/local/lib/python3.7/si22:39
ianwte-packages/diskimage_builder/block_device/utils.py:13522:39
ianwit is trying to build ... but getting weird errors22:40
ianwthere's a bunch of stuff in /dev/mapper, but no mounts22:41
*** qchris has quit IRC22:41
ianwi don't know, i think i'm going to reboot it, there's not much i can do with it now22:41
clarkbok22:41
ianwwe could also consider btrfs and dedup these, as i think the raw and vhd files are basically the same22:43
ianwnb02 is trying centos-8 now and i'm watching it22:44
*** qchris has joined #opendev22:54
clarkbfungi shoudl I exit our root screen on bridge? I think we're steady state on our side of things and now its up to smarcet et al23:06
fungiyeah, i already detached23:07
clarkband now I'm out too23:07
*** tosky has quit IRC23:16
ianwnb02 seems to have mounted it's loop device and is making the image23:18
ianw60G free doesn't leave much headroom on nb01 i guess, as i think vhdutil makes about 3 copies at various points23:23
ianwif i let nb02 go for a while and build a few images, presumably then the images will be old on nb01 and should free up some space when i start it23:23
ianwi will renable ansible though23:25
clarkbya usually we end up about 50:50 between them23:25
clarkbI guess if one breaks the other does too after ita disk fills23:26
ianwyeah, i think that's at the root of this23:26
ianwand i think all the corruption might be because we may make sparse .raw files?  and then when the disk fills ... bang, nothing can handle that23:27
ianwspeaking of backups, "Connection closed by remote host. Is borg working on the server?" on ethercalc ... so something up there23:41
ianwahh, i think i might know, i think the bup config for .ssh/config overwrites the borg one23:42
fungithat seems entirely likely23:47
ianwhrm, it uses blockinfile ... but still the borg config seems to not be there23:47
ianwoh, i think you have to set unique markers23:49
openstackgerritIan Wienand proposed opendev/system-config master: borg-backup: use unique mark in .ssh/config  https://review.opendev.org/75887923:56

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!