Monday, 2020-10-19

*** hamalq has joined #opendev		00:10
*** hamalq has quit IRC		00:15
openstackgerrit	Ian Wienand proposed opendev/system-config master: [wip] reprepro: convert to Ansible https://review.opendev.org/757660	00:15
*** hamalq has joined #opendev		00:28
*** hamalq_ has joined #opendev		00:33
*** hamalq has quit IRC		00:33
*** hamalq_ has quit IRC		00:38
openstackgerrit	Ian Wienand proposed opendev/system-config master: [wip] reprepro: convert to Ansible https://review.opendev.org/757660	00:54
*** hamalq has joined #opendev		01:29
*** hamalq has quit IRC		01:34
*** DSpider has quit IRC		01:45
*** hamalq has joined #opendev		01:45
openstackgerrit	Ian Wienand proposed opendev/system-config master: [wip] reprepro: convert to Ansible https://review.opendev.org/757660	01:48
*** hamalq has quit IRC		01:49
openstackgerrit	Ian Wienand proposed opendev/system-config master: reprepro: convert to Ansible https://review.opendev.org/757660	02:27
openstackgerrit	Ian Wienand proposed opendev/system-config master: reprepro: convert to Ansible https://review.opendev.org/757660	03:08
openstackgerrit	Ian Wienand proposed opendev/system-config master: mirror-update/reprepro : use common functions https://review.opendev.org/758695	03:08
openstackgerrit	Ian Wienand proposed opendev/system-config master: Remove rsyslogd xconsole workaround https://review.opendev.org/756628	05:06
openstackgerrit	Ian Wienand proposed opendev/system-config master: docker: install rsyslog to capture container output https://review.opendev.org/756605	05:06
*** marios has joined #opendev		05:08
openstackgerrit	Ian Wienand proposed opendev/system-config master: ARM64 : run base test on Focal too https://review.opendev.org/756629	05:48
*** ralonsoh has joined #opendev		06:43
*** sboyron has joined #opendev		06:45
*** eolivare has joined #opendev		06:49
*** iurygregory has joined #opendev		06:51
*** slaweq has joined #opendev		06:57
*** andrewbonney has joined #opendev		06:59
*** sshnaidm is now known as sshnaidm\|afk		07:00
*** hamalq has joined #opendev		07:26
*** rpittau\|afk is now known as rpittau		07:26
*** tosky has joined #opendev		07:29
*** sgw has quit IRC		07:29
*** hamalq has quit IRC		07:30
*** sgw has joined #opendev		07:47
*** hashar has joined #opendev		07:52
*** sgw has quit IRC		08:00
*** hamalq has joined #opendev		08:19
*** hamalq has quit IRC		08:24
*** sgw has joined #opendev		08:25
*** hamalq has joined #opendev		08:40
*** hamalq has quit IRC		08:45
*** mkalcok has joined #opendev		08:57
*** sshnaidm\|afk is now known as sshnaidm		09:37
*** DSpider has joined #opendev		09:57
*** marios has quit IRC		10:30
*** ysandeep is now known as ysandeep\|coffee		11:23
*** marios has joined #opendev		11:30
sshnaidm	hi, all	11:50
sshnaidm	I saw in some vms they have private IP v4 like 10.45.1.98 and public IP v6 2607:ff68:100:54:f816:3eff:fe91:803 - is it for specific cloud providers only? Can I connect from other vm that happened to be in other cloud provider to private IP v4 10.45.1.98 ?	11:50
sshnaidm	is there a routing?	11:50
fungi	sshnaidm: multi-node jobs always satisfy all nodes for their nodeset from a single provider, and it's expected that the public interfaces of all nodes in that build will be able to communicate with each other	11:56
sshnaidm	fungi, yeah, but we have a case with dependency job	11:56
sshnaidm	and seems like they can be in different clouds	11:56
sshnaidm	fungi, can we ensure dependent jobs will run on same cloud as its "parent"?	11:57
fungi	i thought we had solved that already. we have container building workflows which start a registry service in a paused build and then interact with it from others. we wouldn't be able to do that consistently mixing nodes from ipv4-only and ipv6-only providers	12:03
sshnaidm	fungi, yeah, I think I have exactly same jobs pattern - registry and dependent jobs, they happen to be from ipv4 and ipv6.	12:04
sshnaidm	fungi, lemme find logs	12:05
fungi	maybe clarkb or corvus can say for certain how that works when they wake up. i'm a bit swamped getting ready for summit sessions to start and don't have time to go digging in the docs right this moment	12:05
fungi	but yeah, an example will help	12:06
sshnaidm	fungi, hmm.. seems like I was wrong, dependency also has IPv6. It just can't pull from ipv6 registry..	12:07
sshnaidm	fungi, maybe need rule in ip6tables also	12:07
sshnaidm	fungi, ack, will ping someone if I still have trouble, thanks	12:08
*** ysandeep\|coffee is now known as ysandeep		12:11
*** slaweq has quit IRC		12:19
fungi	sshnaidm: i think there may also be a firewall role we use from zuul-jobs which will open ports for both ipv4 and ipv6, but i may have also imagined it	12:21
*** slaweq has joined #opendev		12:21
sshnaidm	fungi, yep, trying this now, but is deps jobs are in the same cloud, then I can use private ipv4 as well	12:22
sshnaidm	and not to deal with ipv6 at all	12:22
*** slaweq has quit IRC		12:25
*** priteau has joined #opendev		12:30
openstackgerrit	sebastian marcet proposed opendev/puppet-openstackid master: Changed default queue driver from database to redis to prevent deadlocks https://review.opendev.org/758806	12:31
*** slaweq has joined #opendev		12:31
*** Khodayar has joined #opendev		12:54
*** slaweq has quit IRC		12:54
Khodayar	Hi, anyone with OpenStack Monitoring experience?	12:55
*** slaweq has joined #opendev		12:57
TheJulia	Hey, is openstackid-resources.openstack.org run by the infra-team?	13:08
TheJulia	or is that purely OSF ?	13:08
fungi	TheJulia: it is not run by us, no, it's managed by the osf webdev team and contractors	13:08
fungi	they're working to get it back on track from what i understand	13:08
TheJulia	fungi: thanks	13:09
TheJulia	fungi: any specific communication channel I should keep an eye on?	13:14
fungi	TheJulia: they've been posting updates in the "lobby" main page for the conference	13:15
TheJulia	fungi: almost nobody can even get that far... :\	13:16
TheJulia	we just get "Checking credentials" for at least most people	13:16
gouthamr	+++	13:16
gouthamr	:(	13:16
TheJulia	oh hey, it just loaded on one of my computers	13:16
* TheJulia looks at how many requests this is		13:16
gouthamr	I'm moderating a session that was supposed to start a couple of minutes ago - zoom room not working sigh	13:17
TheJulia	gouthamr: the link not working or the supplied credential information ?	13:17
TheJulia	ugh 138 successful of 185 attemtped requests	13:18
gouthamr	TheJulia: the zoom room error is "the meeting ID is invalid", dunno who to get hold of, sent an email to speakersupport	13:18
gouthamr	speakersupport@openstack.org*	13:18
TheJulia	gouthamr: have you updated zoom in say the last two months?	13:19
TheJulia	maybe 3	13:20
gouthamr	TheJulia: yep - the meeting URL is probably incorrect, i get a "This meeting link is invalid (3,001)" even from the browser	13:20
fungi	as of a few minutes ago we've got a #openinfra-summit channel on freenode	13:20
TheJulia	Have you tried the meeting id manually in the zoom client and then paste the password from the url?	13:20
TheJulia	oooh	13:20
fungi	also yes, e-mail to speakersupport is the preferred way for the events staff to help you sort out session issues	13:22
yoctozepto	TheJulia: I don't seem to be able to get the zoom link from anywhere for your session, only getting disqus... would you mind sharing the link if you have one?	13:23
*** sboyron_ has joined #opendev		13:23
ttx	Let's centralize the discussion on those issues on #openinfra-summit	13:23
yoctozepto	++	13:24
frickler	ttx: fungi: want to send a status notice for that?	13:25
fungi	i can, sure	13:25
*** sboyron has quit IRC		13:26
openstackgerrit	Merged opendev/puppet-openstackid master: Changed default queue driver from database to redis to prevent deadlocks https://review.opendev.org/758806	13:26
fungi	though we don't have logging set up for that channel since it was just created	13:26
fungi	something like: #status notice open infrastructure summit platform status updates will be available in the #openinfra-summit channel (though it is presently not logged)	13:27
*** snbuback2 has joined #opendev		13:28
*** sboyron_ has quit IRC		13:32
fungi	#status notice Open Infrastructure Summit platform issues are being worked on by OSF events and webdev teams, status updates will be available in the conference "lobby" page as well as the #openinfra-summit channel on Freenode (though it is presently not logged)	13:33
openstackstatus	fungi: sending notice	13:33
-openstackstatus- NOTICE: Open Infrastructure Summit platform issues are being worked on by OSF events and webdev teams, status updates will be available in the conference "lobby" page as well as the #openinfra-summit channel on Freenode (though it is presently not logged)		13:33
openstackstatus	fungi: finished sending notice	13:36
fungi	reports in #openinfra-summit that etherpad is slow... looking into it now	13:37
fungi	etherpad server is up over 1.2k concurrent connections according to cacti, but seems to be holding out okay. rootfs utilization is kinda risky though, i'll see if there's something i should clean up in the short term	13:44
clarkb	fungi: older db backups can likely go. We keep 7 locally iirc	13:50
clarkb	docker logs to journalctl another likely candidate	13:51
clarkb	we can prune those with a journalctl command iirc	13:51
*** diablo_rojo has joined #opendev		13:57
clarkb	the journal is 4GB	13:58
clarkb	looks like we have space there now, I assume you cleaned some stuff up?	13:59
fungi	no, it's just the amount of available space is barely enough for an uncompressed db backup i think, so we nearly fill it daily when mysqldump runs	14:00
fungi	(looking at the cacti graph)	14:00
clarkb	ah	14:00
*** sgw has left #opendev		14:00
clarkb	fungi: hrm I'm not sure that is it either unless gzip spools to disk	14:01
clarkb	we do mysqldump \| gzip > file	14:01
clarkb	which should mean the uncompressed content is only ever in memory	14:01
fungi	oh, maybe not then	14:01
fungi	heh, apparently we're recompressing those?	14:02
fungi	looking in /var/backups/etherpad-mariadb/ there's etherpad-mariadb.sql.gz.2.gz et cetera	14:03
fungi	i think we have logrotate set to compress them when rotating, which would explain the spikes	14:03
fungi	but of course it can't effectively compress them, so it just winds up being an extra copy while rotating	14:03
clarkb	ah yup its set to compress	14:04
clarkb	I bet that is a bug in our ansible conversion	14:04
*** sgw has joined #opendev		14:07
openstackgerrit	Clark Boylan proposed opendev/system-config master: Don't recompress db backups https://review.opendev.org/758824	14:07
clarkb	fungi: ^ that may fix it	14:07
fungi	aha, thanks!	14:07
*** Khodayar has quit IRC		14:08
fungi	indeed, i guess we were doing it on gitea as well	14:09
*** sshnaidm is now known as sshnaidm\|afk		14:12
*** elod has quit IRC		14:17
*** elod has joined #opendev		14:19
fungi	etherpad concurrent tcp connection count is up to 1.4k now	14:47
fungi	server still looks reasonably happy	14:47
fungi	server-status scoreboard has lots of open slots	14:49
clarkb	I think we can do 8l	14:51
clarkb	*8k	14:52
*** mlavalle has joined #opendev		15:01
*** ysandeep is now known as ysandeep\|away		15:18
*** hashar has quit IRC		15:38
*** slaweq has quit IRC		15:55
*** slaweq has joined #opendev		15:59
*** hamalq has joined #opendev		16:00
*** marios has quit IRC		16:01
*** prometheanfire has quit IRC		16:05
*** tosky has quit IRC		16:12
*** prometheanfire has joined #opendev		16:14
*** dtroyer has joined #opendev		16:29
*** eolivare has quit IRC		16:36
*** snbuback has joined #opendev		16:41
*** rpittau is now known as rpittau\|afk		16:43
*** snbuback92 has joined #opendev		16:45
snbuback		16:55
*** snbuback92 has quit IRC		16:55
openstackgerrit	Clark Boylan proposed opendev/system-config master: Add two more openstackid servers so that we can load balnace them https://review.opendev.org/758846	16:55
clarkb	I based ^ on the logrotate fix let me rebase really quickly	16:56
openstackgerrit	Clark Boylan proposed opendev/system-config master: Add two more openstackid servers so that we can load balnace them https://review.opendev.org/758846	16:56
clarkb	mnaser: fungi ^ fyi please double check the ips there	16:56
fungi	yup	16:57
clarkb	I'm finishing up dns chagnes for them. NOTE I cannot do reverse PTR records so that will need to happen vexxhost side if email si a concern	16:57
clarkb	A and AAAA records for both should be up now	16:57
*** snbuback2 has quit IRC		16:58
clarkb	note the groups already match the variable digits in the names so I think we should be good on that side of things	16:59
*** hamalq has quit IRC		17:04
*** mkalcok has quit IRC		17:05
fungi	i've enqueued 758846,2 directly to the gate pipeline	17:06
*** hamalq has joined #opendev		17:31
*** hamalq has quit IRC		17:36
openstackgerrit	Merged opendev/system-config master: Add two more openstackid servers so that we can load balnace them https://review.opendev.org/758846	17:42
*** andrewbonney has quit IRC		17:49
*** ralonsoh has quit IRC		18:13
*** priteau has quit IRC		18:22
clarkb	infra-root I'e disabled ansible on bridge with `disable-ansible` LE updates failed which is causing us to not run puppet run else	18:36
clarkb	we're going to manually run puppet run else on openstackid02 and 03 to pick up those changes	18:37
clarkb	re LE failure I think it is due to nb01 filling its disk	19:19
clarkb	ianw: ^ fyi sinceI think you were looking at that recently	19:19
clarkb	also I had planned to try and do a quiet infra meeting tomorrow, but given today's fires I doubt I'll have the attention span for it	19:24
clarkb	anyone object to cancelling the meeting?	19:24
fungi	i have no burning desire for a meeting	19:25
corvus	that's fine	19:32
*** tosky has joined #opendev		19:35
*** weshay has quit IRC		20:07
*** slaweq has quit IRC		20:12
*** slaweq has joined #opendev		20:14
*** hamalq has joined #opendev		20:32
*** hamalq has quit IRC		20:36
*** hamalq has joined #opendev		20:48
*** hamalq has quit IRC		20:52
clarkb	openstackid 02 and 03 are puppeted now	20:55
clarkb	fungi: should we rm the DISABLE-ANSIBLE file?	20:55
fungi	yeah, i think we're all clear now	20:57
clarkb	done	20:58
ianw	hrm, was it 01 disk filling before?	21:01
clarkb	ianw: I think so	21:01
ianw	fungi: the reprepo work i turned off the cron jobs as suggested, so applying it should be essentially a no-op now : https://review.opendev.org/#/c/757660/	21:03
clarkb	https://review.opendev.org/758824 is another good one to get in	21:03
ianw	clarkb: ^ if you have time to look it over, i can work on it	21:03
clarkb	fwiw I'm still trying to coordinate some openstackid scale up	21:04
clarkb	but once done I can try and take a quick look	21:04
ianw	yeah, i saw that. i have generic roles to setup a load-balancer that didn't go in i think	21:04
ianw	https://review.opendev.org/#/c/677903/	21:05
clarkb	ah cool that could be useful if we stop puppeting this server :)	21:05
openstackgerrit	Ian Wienand proposed opendev/system-config master: Make haproxy role more generic https://review.opendev.org/677903	21:07
ianw	ahh, right. well yeah, that at least makes it possible to more generically install haproxy	21:09
ianw	that was the idea of https://review.opendev.org/#/c/678159/ to be a haproxy for situations like this to be a generic-ish proxy. that was going to go infront of static but we dropped that bit	21:10
*** ianw has quit IRC		21:13
*** ianw has joined #opendev		21:15
openstackgerrit	Goutham Pacha Ravi proposed openstack/project-config master: Add manila client,ui,tempest plugin core teams https://review.opendev.org/758868	21:18
openstackgerrit	Goutham Pacha Ravi proposed openstack/project-config master: Add manila client,ui,tempest plugin core teams https://review.opendev.org/758868	21:30
*** slaweq has quit IRC		21:35
*** slaweq has joined #opendev		21:35
*** slaweq has quit IRC		21:41
ianw	there's a bunch of .raw images left on nb01	22:05
ianw	i think all the interesting logs have scrolled away but i'm looking	22:05
openstackgerrit	Merged opendev/system-config master: Don't recompress db backups https://review.opendev.org/758824	22:06
ianw	[Mon Oct 19 22:05:54 2020] EXT4-fs error (device dm-1): ext4_put_super:935: Couldn't clean up the journal	22:06
ianw	[Mon Oct 19 22:05:54 2020] EXT4-fs (dm-1): Remounting filesystem read-only	22:06
clarkb	that will do it	22:08
ianw	the logs are full of bad stuff. i'm starting to think maybe it's beyond fsck	22:09
ianw	i think we can just mkfs /opt and an ansible run will restore everything	22:10
ianw	i wonder if this is related to container updates.	22:21
clarkb	we should have a step that does a container image clean but maybe that isn't working or was missed on nodepool builders?	22:22
ianw	in theory a graceful stop of the container should wait for the current dib to finish and shutdown but i doubt it practically does	22:23
clarkb	but that keeps the running container image and the most up to date container images if working	22:23
ianw	the /opt fs fsck's clean, maybe all the dm-1 stuff is from dib's loopback mounts	22:24
ianw	zuul/nodepool-builder latest 7749c9547286 3 weeks ago 792MB	22:25
ianw	i guess that's unlikely then	22:25
ianw	there's no leaked images, they are legitimately taking up 924G	22:27
*** DSpider has quit IRC		22:27
clarkb	hrm what does nb02 look like? is it not helping out?	22:28
ianw	yeah, it's seeming not doing much	22:30
ianw	openstack.exceptions.HttpException: HttpException: 500: Server Error for url: https://image.api.mtl01.cloud.iweb.com/v2/images/7360eac0-a157-4975-acb3-8b87bfbf53ee, The server has either erred or is incapable of performing the requested operation.: 500 Internal Server Error	22:31
ianw	i tjust seems to be looping around doing that	22:31
ianw	grep 'Deleting image build ' * \| awk '{print $8}' \| sort \| uniq -c \| sort	22:34
ianw	... 20 fedora-32-0000000038	22:34
ianw	20 ubuntu-bionic-0000120991	22:34
ianw	27468 debian-stretch-0000116039	22:34
ianw	so not quite true, but it's certainly hung up on that image	22:34
clarkb	that should be in a separate thread I think	22:38
ianw	2020-10-16 17:12:46.590 \| DEBUG diskimage_builder.block_device.utils [-] exec_sudo: losetup: /opt/dib_tmp/dib_image.JPvC9WST/image0.raw: failed to set up loop device: No such file or directory exec_sudo /usr/local/lib/python3.7/si	22:39
ianw	te-packages/diskimage_builder/block_device/utils.py:135	22:39
ianw	it is trying to build ... but getting weird errors	22:40
ianw	there's a bunch of stuff in /dev/mapper, but no mounts	22:41
*** qchris has quit IRC		22:41
ianw	i don't know, i think i'm going to reboot it, there's not much i can do with it now	22:41
clarkb	ok	22:41
ianw	we could also consider btrfs and dedup these, as i think the raw and vhd files are basically the same	22:43
ianw	nb02 is trying centos-8 now and i'm watching it	22:44
*** qchris has joined #opendev		22:54
clarkb	fungi shoudl I exit our root screen on bridge? I think we're steady state on our side of things and now its up to smarcet et al	23:06
fungi	yeah, i already detached	23:07
clarkb	and now I'm out too	23:07
*** tosky has quit IRC		23:16
ianw	nb02 seems to have mounted it's loop device and is making the image	23:18
ianw	60G free doesn't leave much headroom on nb01 i guess, as i think vhdutil makes about 3 copies at various points	23:23
ianw	if i let nb02 go for a while and build a few images, presumably then the images will be old on nb01 and should free up some space when i start it	23:23
ianw	i will renable ansible though	23:25
clarkb	ya usually we end up about 50:50 between them	23:25
clarkb	I guess if one breaks the other does too after ita disk fills	23:26
ianw	yeah, i think that's at the root of this	23:26
ianw	and i think all the corruption might be because we may make sparse .raw files? and then when the disk fills ... bang, nothing can handle that	23:27
ianw	speaking of backups, "Connection closed by remote host. Is borg working on the server?" on ethercalc ... so something up there	23:41
ianw	ahh, i think i might know, i think the bup config for .ssh/config overwrites the borg one	23:42
fungi	that seems entirely likely	23:47
ianw	hrm, it uses blockinfile ... but still the borg config seems to not be there	23:47
ianw	oh, i think you have to set unique markers	23:49
openstackgerrit	Ian Wienand proposed opendev/system-config master: borg-backup: use unique mark in .ssh/config https://review.opendev.org/758879	23:56

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!