Thursday, 2020-10-15

*** DSpider has quit IRC		00:15
*** ysandeep\|away is now known as ysandeep		00:26
*** hamalq has quit IRC		00:43
clarkb	fungi: -v/opt/project-config/gerrit/projects.yaml:/home/gerrit2/projects.yaml in /usr/local/bin/manage-projects is the problem with the command I ran. We're ignoring the content in the gerrit2 homedir	01:13
clarkb	which is confusing because that is where we look according to defaults/config but we mount /opt/project-config to that location to solve that	01:14
clarkb	anyway we can either change the mount or change project-config contents and try again, but tomorrow	01:14
*** ykarel\|away has joined #opendev		01:29
*** ysandeep is now known as ysandeep\|afk		01:29
*** ykarel\|away is now known as ykarel		01:31
*** fressi has joined #opendev		02:16
openstackgerrit	sebastian marcet proposed opendev/system-config master: OpenstackId v3.0.16 https://review.opendev.org/758322	03:24
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: fetch-sphinx-tarball: explain what is happening https://review.opendev.org/758323	03:49
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: fetch-sphinx-tarball: don't run merge-output-to-logs https://review.opendev.org/758324	03:49
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: Revert "Revert "Refactor fetch-sphinx-tarball to be executor safe"" https://review.opendev.org/758325	03:49
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: Revert "Revert "Refactor fetch-sphinx-tarball to be executor safe"" https://review.opendev.org/758325	03:56
*** ykarel_ has joined #opendev		04:12
*** ykarel has quit IRC		04:14
openstackgerrit	Merged opendev/system-config master: tarballs: remove incorrect redirects https://review.opendev.org/758259	04:32
*** ykarel has joined #opendev		04:49
*** auristor has quit IRC		04:50
*** ykarel_ has quit IRC		04:51
*** auristor has joined #opendev		04:53
*** ykarel has quit IRC		05:13
openstackgerrit	Ian Wienand proposed opendev/base-jobs master: Run merge-output-to-logs on the executor https://review.opendev.org/758341	05:29
*** sboyron has joined #opendev		05:31
openstackgerrit	Merged opendev/system-config master: Add four more gitea ddos UA strings https://review.opendev.org/758219	05:34
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325	05:39
*** fressi has quit IRC		05:53
*** jaicaa has quit IRC		06:00
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325	06:01
*** jaicaa has joined #opendev		06:02
*** mkalcok has joined #opendev		06:27
*** ykarel has joined #opendev		06:29
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325	06:34
*** slaweq has joined #opendev		06:35
*** eolivare has joined #opendev		06:36
*** roman_g has joined #opendev		06:38
*** ysandeep\|afk is now known as ysandeep		06:38
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325	06:44
*** ralonsoh has joined #opendev		06:53
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325	06:53
*** andrewbonney has joined #opendev		07:01
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325	07:05
*** hashar has joined #opendev		07:05
*** ykarel is now known as ykarel\|afk		07:10
openstackgerrit	Logan V proposed openstack/project-config master: Revert "Disable limestone provider" https://review.opendev.org/758354	07:10
openstackgerrit	Logan V proposed openstack/project-config master: Revert "Disable limestone provider" https://review.opendev.org/758354	07:11
*** ykarel\|afk has quit IRC		07:11
*** rpittau\|afk is now known as rpittau		07:22
*** fressi has joined #opendev		07:31
*** cloudnull has quit IRC		07:33
*** cloudnull has joined #opendev		07:34
frickler	mnaser: RIPE atlas confirms that opendev.org is not reachable from AS3320 (German incumbant ISP) via IPv6 https://atlas.ripe.net/measurements/27671479/#!probes	07:42
*** tosky has joined #opendev		07:42
frickler	mnaser: I'll try to dig further, but this may be related to you only announcing the /48 for that net and not the covering /32	07:43
*** dmellado has quit IRC		07:43
*** dmellado has joined #opendev		07:47
openstackgerrit	Merged openstack/project-config master: Revert "Disable limestone provider" https://review.opendev.org/758354	07:48
*** ysandeep is now known as ysandeep\|lunch		08:03
AJaeger	infra-root, infra-prod-service-nodepool failed for this ^ https://zuul.opendev.org/t/openstack/build/a18272805d054fddb30d020bde35031c . Could you check, please?	08:11
frickler	AJaeger: looking	08:14
*** lourot has quit IRC		08:15
*** lourot has joined #opendev		08:16
frickler	infra-root: that log has nb03 unreachable, when I try to ssh to it, I get "ssh_exchange_identification: Connection closed by remote host". might be related to the rackspace ticket mentioned earlier.	08:17
AJaeger	thanks, frickler	08:19
AJaeger	frickler: the rackspace ticket mentioned different nodes - but that happens sometimes with rackspace AFAIR	08:19
frickler	actually nb03 isn't on rackspace, but on packethost according to the IP. but I can't find that instance there. will need to dig deeper later	08:23
frickler	also nb03.opendev.org seems to be missing in cacti, that would've possibly allowed to get a hint as to when the issue started	08:46
frickler	finally found the node, it's on linaro-us. console log shows "INFO: task jbd2/dm-3-8:17697 blocked for more than 120 seconds." so some kind of disk issue, will try to reboot via the api now	09:13
frickler	o.k., that seems to have worked, I can log in again	09:15
frickler	#status log rebooted nb03.opendev.org via openstack API after it seemed to have gotten stuck due to disk IO issues	09:16
openstackstatus	frickler: finished logging	09:16
*** tkajinam is now known as tkajinam\|away		09:17
*** tkajinam\|away is now known as tkajinam		09:17
*** ysandeep\|lunch is now known as ysandeep		09:24
*** hashar has quit IRC		09:46
openstackgerrit	Tobias Henkel proposed zuul/zuul-jobs master: WIP: Create upload-logs-with-failover role https://review.opendev.org/758380	09:49
ianw	frickler: thanks, nb03 is the arm builder	09:49
*** DSpider has joined #opendev		09:56
*** ysandeep is now known as ysandeep\|afk		11:07
*** sboyron has quit IRC		11:15
*** sboyron has joined #opendev		11:16
*** elod is now known as elod_afk		11:26
*** hashar has joined #opendev		11:45
*** ysandeep\|afk is now known as ysandeep		12:35
*** elod_afk is now known as elod		12:39
*** ykarel has joined #opendev		13:01
fungi	clarkb: ianw: the additional ua strings in 758219 either don't seem to have taken effect or their addition has no clear impact on the established connections graph: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=66611&rra_id=all	13:18
fungi	in theory they were deployed by 06:00	13:18
mnaser	frickler: we announce a /48 in each region because we have one big /32 .. dont think that should be a wild problem	13:37
fungi	it would only be a problem if the isp in question has bgp filters excluding longer prefixes	13:39
fungi	(or if whoever they get their bgp feeds from does)	13:39
fungi	granted, it's not unheardof, and so not uncommon for folks to also announce their aggregates alongside the longer prefixes to help in such situations	13:40
fungi	worst case the packets make it most of the way to the wrong facility and get redirected via a less optimal path	13:41
fungi	but really they only have to get as far as the first hop which has a more complete bgp table and then they'll start getting matched against the longer prefix	13:42
fungi	it's worth noting that there are recommendations floating around to filter longer prefixes from ranges which are normally used to assign shorter ones	13:46
fungi	https://www.space.net/~gert/RIPE/ipv6-filters.html	13:46
fungi	on the expectation that you'd request a new allocation from your rir for each multi-homed facility	13:48
openstackgerrit	Aurelien Lourot proposed openstack/project-config master: Mirror charm-neutron-api-plugin-ironic to GitHub https://review.opendev.org/758429	13:49
lourot	^ sorry I forgot that one in yesterday's review	13:49
*** sshnaidm has quit IRC		13:54
*** fressi has quit IRC		13:55
fungi	mnaser: so in that example, i think following the "ipv6 prefix-list ipv6-ebgp-strict permit 2600::/12 ge 19 le 32" recommendation would cause your 2604:e100:3::/48 announcements to be dropped	13:56
mnaser	honestly i don't know why they would be dropping /48 announcements -- its not like ipv6 tables are gigantic :)	13:59
*** sshnaidm has joined #opendev		14:00
fungi	i concur, my opinion is that isps should simply upgrade their routers if they need enough memory to carry a full bgp table, but then again i've been out of the business for nearly a decade	14:01
fungi	it's probably seen as insurance against someone with a /28 announcing all 2^20 of their possible /48 prefixes	14:03
clarkb	fungi: check /var/gitea/logs/access.log to see if they are making itthrough	14:11
clarkb	note the conneections have to establish long enough for us to 403 them and that ~2k connections may be steady state for that	14:12
*** sgw has left #opendev		14:16
*** tkajinam has quit IRC		14:20
*** hashar has quit IRC		14:25
*** ysandeep is now known as ysandeep\|away		14:43
ttx	docs.openstack.org is down?	14:47
ttx	(from openstack-discuss)	14:47
icey	looks like a lot of the static bits are having trouble, I've got spotty acces to tarballs.opendev.org for a while as well	14:48
ttx	probably all the Victoria downloads (kidding)	14:48
icey	(well, that's what _I'm_ doing :-P )	14:51
AJaeger	any infra-root around to check docs.o.o, please? ^	14:52
clarkb	[Thu Oct 15 07:11:15 2020] afs: Waiting for busy volume 536870992 () in cell openstack.org is the most recent afs complaint according to dmesg	14:53
*** sgw has joined #opendev		14:53
bbezak	tarballs.opendev.org is not accessible either	14:53
clarkb	bbezak: they are just different vhosts on the same server and serving out of afs too	14:54
clarkb	trying to sort out if the problem is afs or apache	14:54
clarkb	I think afs. afs01.dfw.openstack.org seems sad. I can't ssh to it. Going to check the console next	14:55
clarkb	(I really wish I understood why afs doesn't fall over like it is supposed ot)	14:55
clarkb	(thats like half the reason we run afs)	14:55
clarkb	oh wait trying ssh again succeeds so maybe not afs	14:56
*** iurygregory has quit IRC		14:57
*** iurygregory has joined #opendev		14:57
clarkb	afs is navigable on static01	14:57
clarkb	time to look at apache I guess	14:58
fungi	yes, apache seems like it might have gotten sad somehow	14:59
clarkb	error logs don't seem to have anything useful though	14:59
fungi	scorecard full?	15:00
fungi	just guessing	15:00
clarkb	http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=68065&rra_id=all implies not (not sure we have the localhost debugging set up on this vhost to check directly)	15:00
corvus	we have no text browsers on static01? :(	15:01
clarkb	should we restart it ?	15:01
fungi	fungi@static01:~$ wget -O- https://security.openstack.org/	15:01
fungi	Connecting to security.openstack.org (security.openstack.org)\|2001:4800:7818:101:be76:4eff:fe04:7c28\|:443...	15:01
fungi	and it just sits there	15:02
corvus	strace on 6155 has it reading from a pipe	15:02
corvus	(maybe connection to master proc)	15:02
fungi	cacti graphs don't show much out of the ordinary other than increase in established tcp connections and a drop in tcp opens (both of those seem like symptoms not causes)	15:03
*** auristor has quit IRC		15:04
fungi	looks like whatever it is may have started around 12:45	15:04
corvus	the master process is in a select/wait loop	15:04
clarkb	apache was reloaded about 11 hours ago to pick up the fix for tarballs redirects?	15:04
clarkb	but the main process has been running for almost 2 weeks	15:05
fungi	looks like we may have mod_status plumbed but unable to connect to loopback port 80 to get at it	15:05
corvus	i'm out of debugging ideas and think we should restart	15:05
fungi	so yeah, at this point probably a service restart is our only next step	15:06
fungi	i concur	15:06
corvus	(i see no dmesg entries around 12:45)	15:06
clarkb	wfm	15:06
corvus	(nothing interesting in syslog around that time either)	15:07
clarkb	fwiw tail *_error.log shows some attempts at maybe buteforcing /etc/passwd GETs about 20 minutes prior to 12:45 but that is all the stands out to me there	15:07
clarkb	apache happilly logged those as invalid URIs	15:08
fungi	yeah, i saw nothing of note in any logs	15:08
fungi	it's like it just magically deadlocked	15:08
clarkb	who is doing the restart? should I do it?	15:08
corvus	not i	15:08
clarkb	I'll do it now	15:08
fungi	clarkb: if you can, i'm in a meeting right now	15:09
fungi	thanks!	15:09
clarkb	and tarballs.opendev.org loads for me now	15:09
clarkb	ttx: icey bbezak maybe you can confirm it is happier for you now too	15:10
clarkb	where it == thing you noticed was sad	15:10
icey	clarkb: curl jus tworked for me	15:10
icey	just worked	15:10
icey	unfortunately (or fortunately) I'm handing off the thing I was doing that caused me to notice to somebody else as it's EoD :)	15:10
*** auristor has joined #opendev		15:11
clarkb	fungi: do you know where we'ev configured mod status? grepping for status in system-config/playbooks/roles/static doesn't show it	15:12
fungi	clarkb: /etc/apache2/mods-enabled/status.conf	15:12
fungi	clarkb: wget -O- http://127.0.0.1/server-status	15:12
fungi	mmm, that seems to get getting overridden	15:13
fungi	ahh, no it's working	15:13
clarkb	ah ok I thought we had to manually configure that, didn't realize that ubuntu was provided it out of the box	15:14
clarkb	TIL and now I know to do a ssh proxy and check that out for next time	15:14
fungi	unsusprisingly the scoreboard shows a lot of available workers/slots for now	15:14
fungi	it was timing out for me like everything else before the restart	15:14
clarkb	oh ha	15:15
bbezak	yeah, looks ok now, thx	15:18
fungi	wish we knew why apache just seized up	15:18
fungi	that's gonna bug me now	15:18
fungi	looks like whatever started around 12:45 may have been a precursor, because graphs show the server basically stopped serving anything between 14:45 and 15:05 when it was restarted	15:22
openstackgerrit	Merged opendev/system-config master: OpenstackId v3.0.16 https://review.opendev.org/758322	15:22
fungi	so we probably want to keep a close eye on the scoreboard and access logs for a little while	15:22
*** mlavalle has joined #opendev		15:22
fungi	the e-mail to openstack-discuss was dated 14:41 utc	15:23
clarkb	I half wonder if we should cross check our gitea UAs against static apache logs too	15:23
clarkb	fungi: I'm not seeing those 4 UAs I added in the gitea access logs anymore. I think that level of connections is what we see when dealing out HTTP403s to the bad bot(s)	15:24
fungi	oh, entirely possible	15:24
*** mkalcok has quit IRC		15:24
clarkb	basically we're doing a bunch of cheap connections to send 403s back to the bot rather than a bunch of expensive connections trying to serve all those git commits and files to the bot	15:25
fungi	#status log restarted apache on static.opendev.org (serving most static content and documentation sites) at 15:09 utc to recover from an unexplained hung process causing site content not to be served	15:26
openstackstatus	fungi: finished logging	15:26
ttx	LGTM! thanks	15:26
fungi	outbound traffic levels for static.o.o have picked way up after the restart	15:27
fungi	around 80mbps at the moment, though nothing out of the ordinary for our weekly peaks	15:28
fungi	scoreboard still looks okay for now	15:29
*** marios has joined #opendev		15:34
*** sgw has left #opendev		15:39
*** sgw has joined #opendev		15:42
*** fressi has joined #opendev		15:43
fungi	the scoreboard is starting to fill up now, though not looking terrible yet	15:44
*** fressi has quit IRC		15:44
*** hashar has joined #opendev		15:44
fungi	and now it's not serving a response to me	15:44
fungi	oh, there it went, just took a moment	15:45
fungi	fwiw, there's a bunch of ip addresses retrieving many different files from tarballs.o.o/osf/openstackid/	15:45
fungi	almost all the slots are in a "sending reply" state now	15:46
*** rpittau is now known as rpittau\|afk		15:52
fungi	i just got a connection timeout trying to get the server status	15:58
fungi	requests i'm seeing tailing the tarballs.opendev.org access log do look like the ones we'd expect from weibo :/	16:00
fungi	oh, i finally managed to get server status to reply to me agani	16:01
fungi	again	16:01
fungi	yeah, it's looking like weibo bots were each fetching copies of all the osf/openstackid tarballs and some are now moving on to starting to get the osf/groups tarballs	16:03
*** eolivare has quit IRC		16:10
*** ysandeep\|away is now known as ysandeep		16:27
*** marios is now known as marios\|out		16:32
openstackgerrit	Noam Angel proposed openstack/diskimage-builder master: kill chroot processes before "dib-block-device umount" https://review.opendev.org/758465	16:35
openstackgerrit	Noam Angel proposed openstack/diskimage-builder master: kill chroot processes before "dib-block-device umount" https://review.opendev.org/758465	16:36
*** marios\|out has quit IRC		16:39
*** lpetrut has joined #opendev		16:44
*** hashar is now known as hasharTurns42		16:51
fungi	activity on the server seems to have calmed back down now, and the scoreboard is mostly open/waiting	16:52
*** lpetrut has quit IRC		16:59
*** hamalq has joined #opendev		17:02
openstackgerrit	Clark Boylan proposed opendev/system-config master: Update mirror apache configs to 2.4 acl primitives https://review.opendev.org/758469	17:07
clarkb	fungi: ^ something like that	17:08
clarkb	I've updated sjc1's mirror by hand to match ^ if people want to check it	17:13
fungi	i can confirm, my personal sites are all only using require directives, no order or satisfy anywhere	17:14
*** ykarel is now known as ykarel\|away		17:19
*** andrewbonney has quit IRC		17:20
*** ysandeep is now known as ysandeep\|away		17:24
*** roman_g has quit IRC		17:37
clarkb	related does anyone know why we have https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/gerrit/templates/gerrit.vhost.j2#L72-L81 ?	17:40
*** Vadmacs has joined #opendev		18:20
*** ralonsoh has quit IRC		18:22
dmsimard	pardon me for losing track but do git.o.o and opendev.org end up at the same place ? seeing clones on git.o.o hang but not opendev.org	18:38
clarkb	dmsimard: git.o.o should be a redirect to opendev.org	18:39
clarkb	opendev.org does not support git:// so if you're doing git://git.openstack.org that will break	18:39
clarkb	need to be http or https	18:39
dmsimard	over https yeah	18:39
clarkb	ah git.o.o is also hosted on static which has been sad today	18:40
clarkb	I think its been happier recently though, is this an ongoing problem? or was it only happening between ~12:45UTC and 15:10UTC?	18:40
dmsimard	I'm reproducing it right now	18:40
dmsimard	i.e, git clone https://git.openstack.org/openstack/ospurge	18:41
dmsimard	I can update our URLs to use opendev.org but I thought I'd point it out	18:41
clarkb	ya this seems to be the same overwhelmed server state	18:41
clarkb	I wonder if its actually the git vhost that is eating all the connections on that server	18:41
clarkb	you really should be using opendev.org	18:41
dmsimard	sure thing	18:42
dmsimard	I don't know who/what else might be using git.o.o out there though	18:42
clarkb	hrm it says we have idle workers in apache though	18:42
*** ykarel\|away has quit IRC		18:43
clarkb	low system load, plenty of memory, and we have free connections whats up	18:43
dmsimard	are you seeing the issue as well ?	18:43
clarkb	ya	18:44
clarkb	I mean it works but its laggy	18:44
clarkb	like apache doesn't have a free connection to give me when I try to git clone from it	18:44
clarkb	(in reality its a free connection to serve the redirect)	18:44
clarkb	but its still on the order of a few seconds	18:45
clarkb	noticeable but not end of the world	18:45
clarkb	I wonder if it is iowait reltaed to afs reads somehow?	18:46
clarkb	would that bog down the rest of apache when all it needs to do is serve a redirect?	18:46
dmsimard	good question	18:46
clarkb	(side note is this the first release where tarballs have all been in afs?)	18:46
dmsimard	sometimes it completes in a few seconds, others not http://paste.openstack.org/show/799096/	18:48
clarkb	fungi: when I can get server-status it shows a happy server then other times it times out	18:48
fungi	yeah, same here	18:48
clarkb	we could try tuning the mpm configs to allow many more connections (the server seems super idle)	18:49
clarkb	and maybe we're just getting syn flooded?	18:49
fungi	cacti graphs suggest the problem has spiked up again as of ~18:15	18:50
clarkb	top reports low wai which I think rules out the iowait theory?	18:50
fungi	yeah	18:51
clarkb	should we try adopting a mpm tuning config more similar to etherpad's?	18:51
fungi	also the server status extended report indicates most of the requests are still tarballs for osf/openstackid and osf/groups	18:51
clarkb	fungi: is that smarcet ddos'ing us?	18:52
fungi	if he's got hundreds of random ip addresses in china who all want to download old versions of openstackid, then maybe	18:52
clarkb	I can copy over the mpm config from etherpad and restart apache on static if we want to give that a shot	18:52
clarkb	(it should dramatically increase the number of allowed connections)	18:53
fungi	well, we don't seem to be at risk of filling them up at the moment	18:53
clarkb	it seems like maybe we are though? otherwise why does apache not respond when I try to connect sometimes?	18:54
dmsimard	conntrack ?	18:54
clarkb	(I guess that is something we could rule in or out by increasing the limits)	18:54
clarkb	I don't think we run any special conntrack rules here	18:54
clarkb	just basic iptables port and source ip based rules	18:54
fungi	the open tcp connections reported by cacti is hovering around the same level as the total connections for our apache workers	18:55
clarkb	ya that is why I wondered if it could be a syn flood.	18:55
clarkb	that may nto show up on our graphs as readily?	18:55
fungi	and the scoreboard is reporting something like 100 idle connections as of a minute ago	18:56
*** ozzzo has joined #opendev		18:56
fungi	but yeah, can't hurt to try if we have the available cpu/ram for it	18:56
clarkb	14.8 requests/sec 158.4 kB/request	18:56
fungi	looking at the uas for all these tarball requests the workers seem to be spending time on, they look like the same set we blocked in gitea's apache	18:57
clarkb	oh hahahahah	18:57
clarkb	I mean I have to laugh otherwise I'll be sad	18:58
clarkb	ok instead of changing our mpm tuning shoudl we add the same set of UA blockign rules?	18:58
fungi	that might be more useful	18:58
clarkb	++ do you want to put that chagne together? I'm missing lunch right now but can write it if you're in a bad spot for it	18:59
fungi	connections has jumped way up too, we have one worker with 25 active connections and another 25 closing	18:59
fungi	yeah, should be able to give it a shot, though i guess we need to include the same deny rules into every vhost? i guess i could start with the tarballs	19:00
clarkb	ya	19:02
clarkb	looks like it users mod headers and mod rewrite	19:02
fungi	we have both enabled already, looks like	19:03
fungi	i've manually inserted those rules into the tarballs.opendev.org vhost to see if they help	19:05
*** roman_g has joined #opendev		19:06
fungi	churn in that vhost's access log has died way down	19:06
fungi	connection counts are dropping in the scoreboard too	19:06
clarkb	ya I can refresh it quickly	19:08
clarkb	definitely looking healthier	19:08
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Add nim roles and job https://review.opendev.org/747865	19:08
fungi	assuming this has a continued impact, i'll see if we can centralize the list of patterns so it can be included into multiple servers without carrying copies around	19:08
clarkb	that would be excellent	19:09
fungi	probably just install it as a dedicated config file defining a macro, and then call that macro in each vhost	19:09
fungi	http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=68065&rra_id=all has just updated and shows a dramatic drop	19:10
fungi	i can browse the tarballs site just fine still too	19:11
fungi	but the scoreboard is now showing almost no tarballs vhost request in progress, whereas before those were the vast majority	19:12
fungi	er, not the scoreboard but the extended status data	19:12
fungi	though the scoreboard is settling to around 1/8 of the earlier connection count	19:13
fungi	15 to 1/8	19:13
fungi	er, 1/5 to 1/8	19:13
fungi	not quite an order of magnitude drop, but substantial nonetheless	19:13
clarkb	cool I'm really going to take a break now for lunch and bike ride and all that, back later today	19:16
fungi	i'll keep an eye on this for a little longer and then look at a possible patch	19:18
*** hasharTurns42 is now known as hashar		19:22
*** hashar has quit IRC		19:26
*** roman_g has quit IRC		19:39
*** roman_g has joined #opendev		19:40
fungi	okay, things are looking a good sight better with this in place, so i'll write up a review	19:47
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Block restricted user agents for the tarballs site https://review.opendev.org/758495	20:01
fungi	so that's step 1	20:02
fungi	i'm going to wip it while i make sure doing it that way will work for the tarballs site	20:02
fungi	then i'll look at splitting it into a dedicated role and deduplicating the gitea use	20:03
fungi	seems to be working so far	20:07
fungi	ianw: when you're around, your input would be most appreciated, as you have the most experience with this so far	20:08
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Block restricted user agents for the tarballs site https://review.opendev.org/758495	20:27
fungi	okay, there it is done as a reusable ansible role added to the service-static playbook	20:28
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Use the apache-ua-filter role on Gitea servers https://review.opendev.org/758496	20:31
fungi	and that ^ will remove the copy in the gitea role	20:31
*** Vadmacs has quit IRC		20:36
ianw	fungi: hey, catching up ...	20:38
dmsimard	wow, https://github.com/mythsman/weiboCrawler/blob/master/opener.py is all kinds of messed up o_O	20:45
ianw	fungi: idea lgtm; but i do think the handler will cause issues mentioned in comment	20:45
ianw	dmsimard: as will be the bill someone eventually gets ...	20:54
fungi	ianw: oh, good catch	20:54
* clarkb keeps getting distracted is actually doing bike ride next		20:54
fungi	dmsimard: maybe you know... what happens if you have multiple roles installing the same handler (or handlers with the same name)?	20:56
dmsimard	I've never tried it, good question -- my guess would be that there's a matter of precedence and it would use the one from the current role before reaching out to other roles	20:57
fungi	i suppose it can't hurt to try in that case	20:58
dmsimard	you should be able to tell which role the handler came from from the task output -- it's prefixed by the role name	20:58
dmsimard	s/from//	20:59
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Block restricted user agents for the tarballs site https://review.opendev.org/758495	21:00
openstackgerrit	Jeremy Stanley proposed opendev/system-config master: Use the apache-ua-filter role on Gitea servers https://review.opendev.org/758496	21:00
fungi	guess we'll find out! ^	21:00
dmsimard	science \o/	21:00
fungi	science isn't science without splosions	21:01
ianw	"Use unique handler names. If you trigger more than one handler with the same name, the first one(s) get overwritten. Only the last one defined will run."	21:07
ianw	that isn't really clear what first and last means	21:07
fungi	but in this case they have the same actions, so probably fine?	21:08
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325	21:08
fungi	otherwise maybe we want to extract that handler into its own apache-config-handler role or something along those lines, and just include it where it's needed?	21:09
ianw	i think kiss and leave it where it is, unless it doesn't work :)	21:12
fungi	in the past little while, apache on static.o.o has scaled back from 6 to 3 process slots	21:12
fungi	connections graph in apache is looking stable too	21:13
fungi	ooh, only 2 now	21:14
fungi	most of the traffic is docs and legacy git redirects now	21:15
fungi	things seem stable so i'm going to disappear for a couple hours but will check back in later	21:18
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325	21:21
*** slaweq has quit IRC		21:31
*** roman_g has quit IRC		21:32
*** sboyron has quit IRC		21:47
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325	22:00
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325	22:16
*** slaweq has joined #opendev		22:22
*** hamalq has quit IRC		22:32
*** hamalq has joined #opendev		22:33
openstackgerrit	Ian Wienand proposed openstack/diskimage-builder master: WIP: boot test of containerfile image https://review.opendev.org/722148	22:38
*** slaweq has quit IRC		22:38
*** qchris has quit IRC		22:41
openstackgerrit	Ian Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code https://review.opendev.org/758325	22:53
clarkb	I'm back from my bike ride, is there anything I should review promptly?	22:54
*** qchris has joined #opendev		22:55
*** tkajinam has joined #opendev		22:59
openstackgerrit	Clark Boylan proposed opendev/system-config master: Update mirror apache configs to 2.4 acl primitives https://review.opendev.org/758469	22:59
ianw	clarkb: we can do the referrals thing on static if you like?	23:09
clarkb	ianw: referrals thing? also is gerrit being slow for anyone else ? I wonder if it is being hit by the same thing gitea and static were	23:10
ianw	sorry the blocking UA macro that fungi has put together	23:10
clarkb	looks like CI -1'd it	23:10
clarkb	but ya I'm for landing that in a general way particularly if gerrit is now being hit by it	23:11
clarkb	(not sure about that yet just noticing gerrit is very slow to respond to my requests, could be a local issue)	23:11
ianw	yes i have to agree on review.opendev.org	23:12
clarkb	there seems to be a dotbot and a stormcrawler hitting gerrit	23:14
ianw	yeah http://www.opensiteexplorer.org/dotbot	23:14
clarkb	not seeing the other UAs we've come to expect	23:14
ianw	i really doesn't seem to be under a lot of apache load	23:16
clarkb	ya static was in the same situation though	23:16
clarkb	it seemed fine except it wasnt	23:16
clarkb	http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=30&rra_id=all we can see the connection spike there	23:18
ianw	sigh	23:18
ianw	should we iptables out the ip-range manually for a bit, so we can at least merge the UA blocking?	23:19
clarkb	ianw: or we can manually apply the UA blocking assuming that will help	23:19
clarkb	I'm still not seeing those UAs on review though	23:20
clarkb	I also see "The Knowledge AI" hitting port 80 a lot	23:20
ianw	yeah, and when i connect it logs me in apache pretty quickly, but i don't get a response	23:20
ianw	ergo my request is getting to apache	23:21
clarkb	ya I think like gitea the issue here is gerrit is busy as a result	23:21
clarkb	system load is 12	23:21
clarkb	is there a specific IP you've identified as a bad actor? I don't think I've seen one that stands out as particularly bad yet	23:22
ianw	66.160.140.183 - - [15/Oct/2020:23:22:54 +0000] "GET /748684 HTTP/1.1" 302 558 "-" "The Knowledge AI"	23:23
ianw	is currently going nuts	23:23
clarkb	I'm good with blocking that one and see if things are happier	23:23
ianw	i dunno if that's the root cause though	23:23
*** mlavalle has quit IRC		23:23
*** tosky has quit IRC		23:23
ianw	66.160.140.183 - - [15/Oct/2020:07:34:43 +0000] "GET /robots.txt HTTP/1.1" 302 567 "-" "The Knowledge AI"	23:24
ianw	was the first occurence of that	23:24
clarkb	hrm thats well before things got sad	23:24
ianw	so we've not been seeing problems that long	23:24
clarkb	I'm going to see if java melody will load and show us what is slow	23:25
clarkb	not getting my hopes up	23:25
clarkb	but maybe we can work backward from slow threads in melody to specific requests	23:25
ianw	iotop shows nothing of interest	23:26
TheJulia	I guess gerrit is down?	23:29
* TheJulia looks back a little		23:29
ianw	TheJulia: yes, it is unhappy unfortunately, debugging ongoing :/	23:36
TheJulia	:(	23:38
* TheJulia leaves folks be and goes to email		23:39
ianw	we are going to stop apache on review.opendev.org for a bit, to see if gerrit can clear the connections and slow down, then watch what happens if we re-enable	23:41
ianw	ok, apache is stopped	23:42
clarkb	ianw: maybe give it until 0000UTC and if it isn't happier by then we should consider restarting the service too?	23:45
ianw	clarkb: are there two containers running?	23:46
clarkb	oh looks like it	23:47
clarkb	wat	23:47
ianw	stupefied_bell	23:47
clarkb	oh wait no thats track upstream	23:47
clarkb	if you do a docker ps -a --no-trunc you can see the whole command	23:47
clarkb	I think we're good	23:47
clarkb	I mean other than that it is broken :)	23:47
ianw	doh, right	23:47
clarkb	theory: its ssh that is doing it not http	23:48
ianw	it had 211 connections open, now 210	23:48
clarkb	fwiw the gerrit server having a hard time happened the other day iirc	23:52
clarkb	fungi had to restart it	23:52
clarkb	then it was ifne after that, I wonder if this is a repeat	23:52
ianw	is 102 mysql connections too many?	23:53
clarkb	aiui no	23:53
clarkb	at least in an openstack context you use a lot of connections	23:53
clarkb	whether that is the case for java and gerrit is possibly another matter	23:53
clarkb	quick google shows gerrit is happy with many connections	23:54
ianw	total open tcp connections still at 209, it's not like things are dropping out quickly	23:55
clarkb	but maybe our db isn't like we could have a limit set on the db side that we're hitting	23:55
clarkb	and that backed up the queues while things spun on slow db lookups?	23:55
clarkb	(it would be so nice if melody was accessible	23:55
clarkb	our pool limit is 225 in the config for mysql	23:56
*** ysandeep\|away is now known as ysandeep		23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!