Thursday, 2020-10-15

*** DSpider has quit IRC00:15
*** ysandeep|away is now known as ysandeep00:26
*** hamalq has quit IRC00:43
clarkbfungi: -v/opt/project-config/gerrit/projects.yaml:/home/gerrit2/projects.yaml in /usr/local/bin/manage-projects is the problem with the command I ran. We're ignoring the content in the gerrit2 homedir01:13
clarkbwhich is confusing because that is where we look according to defaults/config but we mount /opt/project-config to that location to solve that01:14
clarkbanyway we can either change the mount or change project-config contents and try again, but tomorrow01:14
*** ykarel|away has joined #opendev01:29
*** ysandeep is now known as ysandeep|afk01:29
*** ykarel|away is now known as ykarel01:31
*** fressi has joined #opendev02:16
openstackgerritsebastian marcet proposed opendev/system-config master: OpenstackId v3.0.16
openstackgerritIan Wienand proposed zuul/zuul-jobs master: fetch-sphinx-tarball: explain what is happening
openstackgerritIan Wienand proposed zuul/zuul-jobs master: fetch-sphinx-tarball: don't run merge-output-to-logs
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Revert "Revert "Refactor fetch-sphinx-tarball to be executor safe""
openstackgerritIan Wienand proposed zuul/zuul-jobs master: Revert "Revert "Refactor fetch-sphinx-tarball to be executor safe""
*** ykarel_ has joined #opendev04:12
*** ykarel has quit IRC04:14
openstackgerritMerged opendev/system-config master: tarballs: remove incorrect redirects
*** ykarel has joined #opendev04:49
*** auristor has quit IRC04:50
*** ykarel_ has quit IRC04:51
*** auristor has joined #opendev04:53
*** ykarel has quit IRC05:13
openstackgerritIan Wienand proposed opendev/base-jobs master: Run merge-output-to-logs on the executor
*** sboyron has joined #opendev05:31
openstackgerritMerged opendev/system-config master: Add four more gitea ddos UA strings
openstackgerritIan Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code
*** fressi has quit IRC05:53
*** jaicaa has quit IRC06:00
openstackgerritIan Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code
*** jaicaa has joined #opendev06:02
*** mkalcok has joined #opendev06:27
*** ykarel has joined #opendev06:29
openstackgerritIan Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code
*** slaweq has joined #opendev06:35
*** eolivare has joined #opendev06:36
*** roman_g has joined #opendev06:38
*** ysandeep|afk is now known as ysandeep06:38
openstackgerritIan Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code
*** ralonsoh has joined #opendev06:53
openstackgerritIan Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code
*** andrewbonney has joined #opendev07:01
openstackgerritIan Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code
*** hashar has joined #opendev07:05
*** ykarel is now known as ykarel|afk07:10
openstackgerritLogan V proposed openstack/project-config master: Revert "Disable limestone provider"
openstackgerritLogan V proposed openstack/project-config master: Revert "Disable limestone provider"
*** ykarel|afk has quit IRC07:11
*** rpittau|afk is now known as rpittau07:22
*** fressi has joined #opendev07:31
*** cloudnull has quit IRC07:33
*** cloudnull has joined #opendev07:34
fricklermnaser: RIPE atlas confirms that is not reachable from AS3320 (German incumbant ISP) via IPv6!probes07:42
*** tosky has joined #opendev07:42
fricklermnaser: I'll try to dig further, but this may be related to you only announcing the /48 for that net and not the covering /3207:43
*** dmellado has quit IRC07:43
*** dmellado has joined #opendev07:47
openstackgerritMerged openstack/project-config master: Revert "Disable limestone provider"
*** ysandeep is now known as ysandeep|lunch08:03
AJaegerinfra-root,  infra-prod-service-nodepool failed for this ^ . Could you check, please?08:11
fricklerAJaeger: looking08:14
*** lourot has quit IRC08:15
*** lourot has joined #opendev08:16
fricklerinfra-root: that log has nb03 unreachable, when I try to ssh to it, I get "ssh_exchange_identification: Connection closed by remote host". might be related to the rackspace ticket mentioned earlier.08:17
AJaegerthanks, frickler08:19
AJaegerfrickler: the rackspace ticket mentioned different nodes - but that happens sometimes with rackspace AFAIR08:19
frickleractually nb03 isn't on rackspace, but on packethost according to the IP. but I can't find that instance there. will need to dig deeper later08:23
frickleralso seems to be missing in cacti, that would've possibly allowed to get a hint as to when the issue started08:46
fricklerfinally found the node, it's on linaro-us. console log shows "INFO: task jbd2/dm-3-8:17697 blocked for more than 120 seconds." so some kind of disk issue, will try to reboot via the api now09:13
fricklero.k., that seems to have worked, I can log in again09:15
frickler#status log rebooted via openstack API after it seemed to have gotten stuck due to disk IO issues09:16
openstackstatusfrickler: finished logging09:16
*** tkajinam is now known as tkajinam|away09:17
*** tkajinam|away is now known as tkajinam09:17
*** ysandeep|lunch is now known as ysandeep09:24
*** hashar has quit IRC09:46
openstackgerritTobias Henkel proposed zuul/zuul-jobs master: WIP: Create upload-logs-with-failover role
ianwfrickler: thanks, nb03 is the arm builder09:49
*** DSpider has joined #opendev09:56
*** ysandeep is now known as ysandeep|afk11:07
*** sboyron has quit IRC11:15
*** sboyron has joined #opendev11:16
*** elod is now known as elod_afk11:26
*** hashar has joined #opendev11:45
*** ysandeep|afk is now known as ysandeep12:35
*** elod_afk is now known as elod12:39
*** ykarel has joined #opendev13:01
fungiclarkb: ianw: the additional ua strings in 758219 either don't seem to have taken effect or their addition has no clear impact on the established connections graph:
fungiin theory they were deployed by 06:0013:18
mnaserfrickler: we announce a /48 in each region because we have one big /32 .. dont think that should be a wild problem13:37
fungiit would only be a problem if the isp in question has bgp filters excluding longer prefixes13:39
fungi(or if whoever they get their bgp feeds from does)13:39
fungigranted, it's not unheardof, and so not uncommon for folks to also announce their aggregates alongside the longer prefixes to help in such situations13:40
fungiworst case the packets make it most of the way to the wrong facility and get redirected via a less optimal path13:41
fungibut really they only have to get as far as the first hop which has a more complete bgp table and then they'll start getting matched against the longer prefix13:42
fungiit's worth noting that there are recommendations floating around to filter longer prefixes from ranges which are normally used to assign shorter ones13:46
fungion the expectation that you'd request a new allocation from your rir for each multi-homed facility13:48
openstackgerritAurelien Lourot proposed openstack/project-config master: Mirror charm-neutron-api-plugin-ironic to GitHub
lourot^ sorry I forgot that one in yesterday's review13:49
*** sshnaidm has quit IRC13:54
*** fressi has quit IRC13:55
fungimnaser: so in that example, i think following the "ipv6 prefix-list ipv6-ebgp-strict permit 2600::/12 ge 19 le 32" recommendation would cause your 2604:e100:3::/48 announcements to be dropped13:56
mnaserhonestly i don't know why they would be dropping /48 announcements -- its not like ipv6 tables are gigantic :)13:59
*** sshnaidm has joined #opendev14:00
fungii concur, my opinion is that isps should simply upgrade their routers if they need enough memory to carry a full bgp table, but then again i've been out of the business for nearly a decade14:01
fungiit's probably seen as insurance against someone with a /28 announcing all 2^20 of their possible /48 prefixes14:03
clarkbfungi: check /var/gitea/logs/access.log to see if they are making itthrough14:11
clarkbnote the conneections have to establish long enough for us to 403 them and that ~2k connections may be steady state for that14:12
*** sgw has left #opendev14:16
*** tkajinam has quit IRC14:20
*** hashar has quit IRC14:25
*** ysandeep is now known as ysandeep|away14:43 is down?14:47
ttx(from openstack-discuss)14:47
iceylooks like a lot of the static bits are having trouble, I've got spotty acces to for a while as well14:48
ttxprobably all the Victoria downloads (kidding)14:48
icey(well, that's what _I'm_ doing :-P )14:51
AJaegerany infra-root around to check docs.o.o, please? ^14:52
clarkb[Thu Oct 15 07:11:15 2020] afs: Waiting for busy volume 536870992 () in cell is the most recent afs complaint according to dmesg14:53
*** sgw has joined #opendev14:53 is not accessible either14:53
clarkbbbezak: they are just different vhosts on the same server and serving out of afs too14:54
clarkbtrying to sort out if the problem is afs or apache14:54
clarkbI think afs. seems sad. I can't ssh to it. Going to check the console next14:55
clarkb(I really wish I understood why afs doesn't fall over like it is supposed ot)14:55
clarkb(thats like half the reason we run afs)14:55
clarkboh wait trying ssh again succeeds so maybe not afs14:56
*** iurygregory has quit IRC14:57
*** iurygregory has joined #opendev14:57
clarkbafs is navigable on static0114:57
clarkbtime to look at apache I guess14:58
fungiyes, apache seems like it might have gotten sad somehow14:59
clarkberror logs don't seem to have anything useful though14:59
fungiscorecard full?15:00
fungijust guessing15:00
clarkb implies not (not sure we have the localhost debugging set up on this vhost to check directly)15:00
corvuswe have no text browsers on static01? :(15:01
clarkbshould we restart it ?15:01
fungifungi@static01:~$ wget -O-
fungiConnecting to (|2001:4800:7818:101:be76:4eff:fe04:7c28|:443...15:01
fungiand it just sits there15:02
corvusstrace on 6155 has it reading from a pipe15:02
corvus(maybe connection to master proc)15:02
fungicacti graphs don't show much out of the ordinary other than increase in established tcp connections and a drop in tcp opens (both of those seem like symptoms not causes)15:03
*** auristor has quit IRC15:04
fungilooks like whatever it is may have started around 12:4515:04
corvusthe master process is in a select/wait loop15:04
clarkbapache was reloaded about 11 hours ago to pick up the fix for tarballs redirects?15:04
clarkbbut the main process has been running for almost 2 weeks15:05
fungilooks like we may have mod_status plumbed but unable to connect to loopback port 80 to get at it15:05
corvusi'm out of debugging ideas and think we should restart15:05
fungiso yeah, at this point probably a service restart is our only next step15:06
fungii concur15:06
corvus(i see no dmesg entries around 12:45)15:06
corvus(nothing interesting in syslog around that time either)15:07
clarkbfwiw tail *_error.log shows some attempts at maybe buteforcing /etc/passwd GETs about 20 minutes prior to 12:45 but that is all the stands out to me there15:07
clarkbapache happilly logged those as invalid URIs15:08
fungiyeah, i saw nothing of note in any logs15:08
fungiit's like it just magically deadlocked15:08
clarkbwho is doing the restart? should I do it?15:08
corvusnot i15:08
clarkbI'll do it now15:08
fungiclarkb: if you can, i'm in a meeting right now15:09
clarkband loads for me now15:09
clarkbttx: icey bbezak maybe you can confirm it is happier for you now too15:10
clarkbwhere it == thing you noticed was sad15:10
iceyclarkb: curl jus tworked for me15:10
iceyjust worked15:10
iceyunfortunately (or fortunately) I'm handing off the thing I was doing that caused me to notice to somebody else as it's EoD :)15:10
*** auristor has joined #opendev15:11
clarkbfungi: do you know where we'ev configured mod status? grepping for status in system-config/playbooks/roles/static doesn't show it15:12
fungiclarkb: /etc/apache2/mods-enabled/status.conf15:12
fungiclarkb: wget -O-
fungimmm, that seems to get getting overridden15:13
fungiahh, no it's working15:13
clarkbah ok I thought we had to manually configure that, didn't realize that ubuntu was provided it out of the box15:14
clarkbTIL and now I know to do a ssh proxy and check that out for next time15:14
fungiunsusprisingly the scoreboard shows a lot of available workers/slots for now15:14
fungiit was timing out for me like everything else before the restart15:14
clarkboh ha15:15
bbezakyeah, looks ok now, thx15:18
fungiwish we knew why apache just seized up15:18
fungithat's gonna bug me now15:18
fungilooks like whatever started around 12:45 may have been a precursor, because graphs show the server basically stopped serving anything between 14:45 and 15:05 when it was restarted15:22
openstackgerritMerged opendev/system-config master: OpenstackId v3.0.16
fungiso we probably want to keep a close eye on the scoreboard and access logs for a little while15:22
*** mlavalle has joined #opendev15:22
fungithe e-mail to openstack-discuss was dated 14:41 utc15:23
clarkbI half wonder if we should cross check our gitea UAs against static apache logs too15:23
clarkbfungi: I'm not seeing those 4 UAs I added in the gitea access logs anymore. I think that level of connections is what we see when dealing out HTTP403s to the bad bot(s)15:24
fungioh, entirely possible15:24
*** mkalcok has quit IRC15:24
clarkbbasically we're doing a bunch of cheap connections to send 403s back to the bot rather than a bunch of expensive connections trying to serve all those git commits and files to the bot15:25
fungi#status log restarted apache on (serving most static content and documentation sites) at 15:09 utc to recover from an unexplained hung process causing site content not to be served15:26
openstackstatusfungi: finished logging15:26
ttxLGTM! thanks15:26
fungioutbound traffic levels for static.o.o have picked way up after the restart15:27
fungiaround 80mbps at the moment, though nothing out of the ordinary for our weekly peaks15:28
fungiscoreboard still looks okay for now15:29
*** marios has joined #opendev15:34
*** sgw has left #opendev15:39
*** sgw has joined #opendev15:42
*** fressi has joined #opendev15:43
fungithe scoreboard is starting to fill up now, though not looking terrible yet15:44
*** fressi has quit IRC15:44
*** hashar has joined #opendev15:44
fungiand now it's not serving a response to me15:44
fungioh, there it went, just took a moment15:45
fungifwiw, there's a bunch of ip addresses retrieving many different files from tarballs.o.o/osf/openstackid/15:45
fungialmost all the slots are in a "sending reply" state now15:46
*** rpittau is now known as rpittau|afk15:52
fungii just got a connection timeout trying to get the server status15:58
fungirequests i'm seeing tailing the access log do look like the ones we'd expect from weibo :/16:00
fungioh, i finally managed to get server status to reply to me agani16:01
fungiyeah, it's looking like weibo bots were each fetching copies of all the osf/openstackid tarballs and some are now moving on to starting to get the osf/groups tarballs16:03
*** eolivare has quit IRC16:10
*** ysandeep|away is now known as ysandeep16:27
*** marios is now known as marios|out16:32
openstackgerritNoam Angel proposed openstack/diskimage-builder master: kill chroot processes before "dib-block-device umount"
openstackgerritNoam Angel proposed openstack/diskimage-builder master: kill chroot processes before "dib-block-device umount"
*** marios|out has quit IRC16:39
*** lpetrut has joined #opendev16:44
*** hashar is now known as hasharTurns4216:51
fungiactivity on the server seems to have calmed back down now, and the scoreboard is mostly open/waiting16:52
*** lpetrut has quit IRC16:59
*** hamalq has joined #opendev17:02
openstackgerritClark Boylan proposed opendev/system-config master: Update mirror apache configs to 2.4 acl primitives
clarkbfungi: ^ something like that17:08
clarkbI've updated sjc1's mirror by hand to match ^ if people want to check it17:13
fungii can confirm, my personal sites are all only using require directives, no order or satisfy anywhere17:14
*** ykarel is now known as ykarel|away17:19
*** andrewbonney has quit IRC17:20
*** ysandeep is now known as ysandeep|away17:24
*** roman_g has quit IRC17:37
clarkbrelated does anyone know why we have ?17:40
*** Vadmacs has joined #opendev18:20
*** ralonsoh has quit IRC18:22
dmsimardpardon me for losing track but do git.o.o and end up at the same place ? seeing clones on git.o.o hang but not opendev.org18:38
clarkbdmsimard: git.o.o should be a redirect to opendev.org18:39 does not support git:// so if you're doing git:// that will break18:39
clarkbneed to be http or https18:39
dmsimardover https yeah18:39
clarkbah git.o.o is also hosted on static which has been sad today18:40
clarkbI think its been happier recently though, is this an ongoing problem? or was it only happening between ~12:45UTC and 15:10UTC?18:40
dmsimardI'm reproducing it right now18:40
dmsimardi.e, git clone
dmsimardI can update our URLs to use but I thought I'd point it out18:41
clarkbya this seems to be the same overwhelmed server state18:41
clarkbI wonder if its actually the git vhost that is eating all the connections on that server18:41
clarkbyou really should be using opendev.org18:41
dmsimardsure thing18:42
dmsimardI don't know who/what else might be using git.o.o out there though18:42
clarkbhrm it says we have idle workers in apache though18:42
*** ykarel|away has quit IRC18:43
clarkblow system load, plenty of memory, and we have free connections whats up18:43
dmsimardare you seeing the issue as well ?18:43
clarkbI mean it works but its laggy18:44
clarkblike apache doesn't have a free connection to give me when I try to git clone from it18:44
clarkb(in reality its a free connection to serve the redirect)18:44
clarkbbut its still on the order of a few seconds18:45
clarkbnoticeable but not end of the world18:45
clarkbI wonder if it is iowait reltaed to afs reads somehow?18:46
clarkbwould that bog down the rest of apache when all it needs to do is serve a redirect?18:46
dmsimardgood question18:46
clarkb(side note is this the first release where tarballs have all been in afs?)18:46
dmsimardsometimes it completes in a few seconds, others not
clarkbfungi: when I can get server-status it shows a happy server then other times it times out18:48
fungiyeah, same here18:48
clarkbwe could try tuning the mpm configs to allow many more connections (the server seems super idle)18:49
clarkband maybe we're just getting syn flooded?18:49
fungicacti graphs suggest the problem has spiked up again as of ~18:1518:50
clarkbtop reports low wai which I think rules out the iowait theory?18:50
clarkbshould we try adopting a mpm tuning config more similar to etherpad's?18:51
fungialso the server status extended report indicates most of the requests are still tarballs for osf/openstackid and osf/groups18:51
clarkbfungi: is that smarcet ddos'ing us?18:52
fungiif he's got hundreds of random ip addresses in china who all want to download old versions of openstackid, then maybe18:52
clarkbI can copy over the mpm config from etherpad and restart apache on static if we want to give that a shot18:52
clarkb(it should dramatically increase the number of allowed connections)18:53
fungiwell, we don't seem to be at risk of filling them up at the moment18:53
clarkbit seems like maybe we are though? otherwise why does apache not respond when I try to connect sometimes?18:54
dmsimardconntrack ?18:54
clarkb(I guess that is something we could rule in or out by increasing the limits)18:54
clarkbI don't think we run any special conntrack rules here18:54
clarkbjust basic iptables port and source ip based rules18:54
fungithe open tcp connections reported by cacti is hovering around the same level as the total connections for our apache workers18:55
clarkbya that is why I wondered if it could be a syn flood.18:55
clarkbthat may nto show up on our graphs as readily?18:55
fungiand the scoreboard is reporting something like 100 idle connections as of a minute ago18:56
*** ozzzo has joined #opendev18:56
fungibut yeah, can't hurt to try *if* we have the available cpu/ram for it18:56
clarkb14.8 requests/sec 158.4 kB/request18:56
fungilooking at the uas for all these tarball requests the workers seem to be spending time on, they look like the same set we blocked in gitea's apache18:57
clarkboh hahahahah18:57
clarkbI mean I have to laugh otherwise I'll be sad18:58
clarkbok instead of changing our mpm tuning shoudl we add the same set of UA blockign rules?18:58
fungithat might be more useful18:58
clarkb++ do you want to put that chagne together? I'm missing lunch right now but can write it if you're in a bad spot for it18:59
fungiconnections has jumped way up too, we have one worker with 25 active connections and another 25 closing18:59
fungiyeah, should be able to give it a shot, though i guess we need to include the same deny rules into every vhost? i guess i could start with the tarballs19:00
clarkblooks like it users mod headers and mod rewrite19:02
fungiwe have both enabled already, looks like19:03
fungii've manually inserted those rules into the vhost to see if they help19:05
*** roman_g has joined #opendev19:06
fungichurn in that vhost's access log has died way down19:06
fungiconnection counts are dropping in the scoreboard too19:06
clarkbya I can refresh it quickly19:08
clarkbdefinitely looking healthier19:08
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add nim roles and job
fungiassuming this has a continued impact, i'll see if we can centralize the list of patterns so it can be included into multiple servers without carrying copies around19:08
clarkbthat would be excellent19:09
fungiprobably just install it as a dedicated config file defining a macro, and then call that macro in each vhost19:09
fungi has just updated and shows a dramatic drop19:10
fungii can browse the tarballs site just fine still too19:11
fungibut the scoreboard is now showing almost no tarballs vhost request in progress, whereas before those were the vast majority19:12
fungier, not the scoreboard but the extended status data19:12
fungithough the scoreboard is settling to around 1/8 of the earlier connection count19:13
fungi15 to 1/819:13
fungier, 1/5 to 1/819:13
funginot quite an order of magnitude drop, but substantial nonetheless19:13
clarkbcool I'm really going to take a break now for lunch and bike ride and all that, back later today19:16
fungii'll keep an eye on this for a little longer and then look at a possible patch19:18
*** hasharTurns42 is now known as hashar19:22
*** hashar has quit IRC19:26
*** roman_g has quit IRC19:39
*** roman_g has joined #opendev19:40
fungiokay, things are looking a good sight better with this in place, so i'll write up a review19:47
openstackgerritJeremy Stanley proposed opendev/system-config master: Block restricted user agents for the tarballs site
fungiso that's step 120:02
fungii'm going to wip it while i make sure doing it that way will work for the tarballs site20:02
fungithen i'll look at splitting it into a dedicated role and deduplicating the gitea use20:03
fungiseems to be working so far20:07
fungiianw: when you're around, your input would be most appreciated, as you have the most experience with this so far20:08
openstackgerritJeremy Stanley proposed opendev/system-config master: Block restricted user agents for the tarballs site
fungiokay, there it is done as a reusable ansible role added to the service-static playbook20:28
openstackgerritJeremy Stanley proposed opendev/system-config master: Use the apache-ua-filter role on Gitea servers
fungiand that ^ will remove the copy in the gitea role20:31
*** Vadmacs has quit IRC20:36
ianwfungi: hey, catching up ...20:38
dmsimardwow, is all kinds of messed up o_O20:45
ianwfungi: idea lgtm; but i do think the handler will cause issues mentioned in comment20:45
ianwdmsimard: as will be the bill someone eventually gets ...20:54
fungiianw: oh, good catch20:54
* clarkb keeps getting distracted is actually doing bike ride next20:54
fungidmsimard: maybe you know... what happens if you have multiple roles installing the same handler (or handlers with the same name)?20:56
dmsimardI've never tried it, good question -- my guess would be that there's a matter of precedence and it would use the one from the current role before reaching out to other roles20:57
fungii suppose it can't hurt to try in that case20:58
dmsimardyou should be able to tell which role the handler came from from the task output -- it's prefixed by the role name20:58
openstackgerritJeremy Stanley proposed opendev/system-config master: Block restricted user agents for the tarballs site
openstackgerritJeremy Stanley proposed opendev/system-config master: Use the apache-ua-filter role on Gitea servers
fungiguess we'll find out! ^21:00
dmsimardscience \o/21:00
fungiscience isn't science without splosions21:01
ianw"Use unique handler names. If you trigger more than one handler with the same name, the first one(s) get overwritten. Only the last one defined will run."21:07
ianwthat isn't really clear what first and last means21:07
fungibut in this case they have the same actions, so probably fine?21:08
openstackgerritIan Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code
fungiotherwise maybe we want to extract that handler into its own apache-config-handler role or something along those lines, and just include it where it's needed?21:09
ianwi think kiss and leave it where it is, unless it doesn't work :)21:12
fungiin the past little while, apache on static.o.o has scaled back from 6 to 3 process slots21:12
fungiconnections graph in apache is looking stable too21:13
fungiooh, only 2 now21:14
fungimost of the traffic is docs and legacy git redirects now21:15
fungithings seem stable so i'm going to disappear for a couple hours but will check back in later21:18
openstackgerritIan Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code
*** slaweq has quit IRC21:31
*** roman_g has quit IRC21:32
*** sboyron has quit IRC21:47
openstackgerritIan Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code
openstackgerritIan Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code
*** slaweq has joined #opendev22:22
*** hamalq has quit IRC22:32
*** hamalq has joined #opendev22:33
openstackgerritIan Wienand proposed openstack/diskimage-builder master: WIP: boot test of containerfile image
*** slaweq has quit IRC22:38
*** qchris has quit IRC22:41
openstackgerritIan Wienand proposed zuul/zuul-jobs master: merge-output-to-logs: convert to untrusted executor safe code
clarkbI'm back from my bike ride, is there anything I should review promptly?22:54
*** qchris has joined #opendev22:55
*** tkajinam has joined #opendev22:59
openstackgerritClark Boylan proposed opendev/system-config master: Update mirror apache configs to 2.4 acl primitives
ianwclarkb: we can do the referrals thing on static if you like?23:09
clarkbianw: referrals thing? also is gerrit being slow for anyone else ? I wonder if it is being hit by the same thing gitea and static were23:10
ianwsorry the blocking UA macro that fungi has put together23:10
clarkblooks like CI -1'd it23:10
clarkbbut ya I'm for landing that in a general way particularly if gerrit is now being hit by it23:11
clarkb(not sure about that yet just noticing gerrit is very slow to respond to my requests, could be a local issue)23:11
ianwyes i have to agree on review.opendev.org23:12
clarkbthere seems to be a dotbot and a stormcrawler hitting gerrit23:14
clarkbnot seeing the other UAs we've come to expect23:14
ianwi really doesn't seem to be under a lot of apache load23:16
clarkbya static was in the same situation though23:16
clarkbit seemed fine except it wasnt23:16
clarkb we can see the connection spike there23:18
ianwshould we iptables out the ip-range manually for a bit, so we can at least merge the UA blocking?23:19
clarkbianw: or we can manually apply the UA blocking assuming that will help23:19
clarkbI'm still not seeing those UAs on review though23:20
clarkbI also see "The Knowledge AI" hitting port 80 a lot23:20
ianwyeah, and when i connect it logs me in apache pretty quickly, but i don't get a response23:20
ianwergo my request is getting to apache23:21
clarkbya I think like gitea the issue here is gerrit is busy as a result23:21
clarkbsystem load is 1223:21
clarkbis there a specific IP you've identified as a bad actor? I don't think I've seen one that stands out as particularly bad yet23:22
ianw66.160.140.183 - - [15/Oct/2020:23:22:54 +0000] "GET /748684 HTTP/1.1" 302 558 "-" "The Knowledge AI"23:23
ianwis currently going nuts23:23
clarkbI'm good with blocking that one and see if things are happier23:23
ianwi dunno if that's the root cause though23:23
*** mlavalle has quit IRC23:23
*** tosky has quit IRC23:23
ianw66.160.140.183 - - [15/Oct/2020:07:34:43 +0000] "GET /robots.txt HTTP/1.1" 302 567 "-" "The Knowledge AI"23:24
ianwwas the first occurence of that23:24
clarkbhrm thats well before things got sad23:24
ianwso we've not been seeing problems that long23:24
clarkbI'm going to see if java melody will load and show us what is slow23:25
clarkbnot getting my hopes up23:25
clarkbbut maybe we can work backward from slow threads in melody to specific requests23:25
ianwiotop shows nothing of interest23:26
TheJuliaI guess gerrit is down?23:29
* TheJulia looks back a little23:29
ianwTheJulia: yes, it is unhappy unfortunately, debugging ongoing :/23:36
* TheJulia leaves folks be and goes to email23:39
ianwwe are going to stop apache on for a bit, to see if gerrit can clear the connections and slow down, then watch what happens if we re-enable23:41
ianwok, apache is stopped23:42
clarkbianw: maybe give it until 0000UTC and if it isn't happier by then we should consider restarting the service too?23:45
ianwclarkb: are there two containers running?23:46
clarkboh looks like it23:47
clarkboh wait no thats track upstream23:47
clarkbif you do a docker ps -a --no-trunc you can see the whole command23:47
clarkbI think we're good23:47
clarkbI mean other than that it is broken :)23:47
ianwdoh, right23:47
clarkbtheory: its ssh that is doing it not http23:48
ianwit had 211 connections open, now 21023:48
clarkbfwiw the gerrit server having a hard time happened the other day iirc23:52
clarkbfungi had to restart it23:52
clarkbthen it was ifne after that, I wonder if this is a repeat23:52
ianwis 102 mysql connections too many?23:53
clarkbaiui no23:53
clarkbat least in an openstack context you use a lot of connections23:53
clarkbwhether that is the case for java and gerrit is possibly another matter23:53
clarkbquick google shows gerrit is happy with many connections23:54
ianwtotal open tcp connections still at 209, it's not like things are dropping out quickly23:55
clarkbbut maybe our db isn't like we could have a limit set on the db side that we're hitting23:55
clarkband that backed up the queues while things spun on slow db lookups?23:55
clarkb(it would be so nice if melody was accessible23:55
clarkbour pool limit is 225 in the config for mysql23:56
*** ysandeep|away is now known as ysandeep23:59

Generated by 2.17.2 by Marius Gedminas - find it at!