Thursday, 2016-08-25

*** mgagne_ is now known as mgagne12:54
*** ChanServ changes topic to "gerrit tuning"19:02
jeblairssh review gerrit show-caches --show-threads19:03
jeblairThreads: 16 CPUs available, 377 threads19:03
jeblair                                    NEW       RUNNABLE        BLOCKED        WAITING  TIMED_WAITING     TERMINATED19:03
jeblair  SSH git-upload-pack                 0              0              0             14              0              019:03
jeblair  SSH-Stream-Worker                   0              0              0             17              0              019:03
jeblair  HTTP                                0              5              0              0             20              019:03
jeblair  SSH-Interactive-Worker              0              0              0            182              0              019:03
jeblair  Other                               0             26              0             66             29              019:03
fungiaha19:03
jeblair  ReceiveCommits                      0              0              0             16              0              019:03
jeblair  SshCommandStart                     0              0              0              2              0              019:03
jeblairi get that19:03
*** zaro has joined #openstack-infra-incident19:04
fungiahh, yeah so it's gerrit show-caches --show-threads19:04
jeblairi'm not quite certain how to read that yet.19:04
jeblairand i need to get lunch.19:04
fungiif you add up all the numbers in the http row, they come out to 25 which is what the documentation says the max threads default for httpd is19:05
fungii've polled it a few times and the numbers in RUNNABLE and TIMED_WAITING vary a bit, but always seem to add up to 2519:06
fungii caught it dipping down to 24 once19:07
fungiso i take this as confirmation that the default max mentioned in the configuration docs is actually being enforced here19:08
jeblairhttp://help.collab.net/topic/teamforge80-git-gerrit210x/reference/Gerrit-Performance-Tuning-Cheat-Sheet.pdf19:08
jeblairthat may also be helpful19:08
jeblairi think some of that information is not entirely correct, but it may help fill in some missing gaps.19:09
jeblairnow lunch for real19:09
fungithat's an interesting document19:09
fungizaro: you have a feel for any of this?19:10
zaroi can't tell from the documentation what the correct number should be. but probably higher than the default.19:11
zarohigher than default would be good.19:11
fungiyeah, that's where i am too at this point ;)19:12
fungiit's likely going to involve some trial and error, but performance is also at this point being impacted by the elevation in git gc activity again19:12
zaroi guess it depends on a lot of factors so maybe just pick one and try it?19:13
fungiso we are unlikely to be able to effectively iterate on it19:13
fungior iterate on it quickly anyway19:13
zaroyeah, i'm guessing it's something that takes a few tries and may require time to know what the correct number is.19:14
fungithat cheatsheet is suggesting 100 is a reasonable "large site" value for httpd.maxThreads19:14
clarkbwell tweaking those numbers will require a gerrit restart anyways which will avoid the GC issue temporarily. Then we should compare to see if GC happens quicker than normal (I think its like once every couple weeks now)19:14
fungiand that 25 (the default) is "small"19:14
fungiclarkb: yeah, that's basically what i wanted to try19:15
clarkbI think we should bum min httpd threads too just to avoid delays when things spike19:15
clarkbwe could 4x the defaults and do 5-> 20 and 35 -> 10019:15
zarowell at least it's already setup in puppet19:16
fungiclarkb: okay, so you're in favor of upping teh base and max values then, not just the max?19:16
fungii guess that may make ramp-up a little more snappy19:16
zaro++19:16
clarkbfungi: ya I think we should do both19:16
fungiwfm19:16
clarkbfungi: yup for when things spike19:16
clarkbwe also need to incrase the db threads as described in review.opp19:17
zarohow about acceptorthreads?19:17
clarkber review.pp. basically the sshd threads + httpd threads must be < than db threads19:17
fungiahh, yep looks like we're at database.poolLimit=150 right now19:17
clarkbzaro: the docs say that 2 acceptor threads should be sufficient for most high traffic sites19:18
fungiso should probably bump it to 250 to give some breathing room? (that's 125% of sshd+httpd max)19:18
clarkbfungi: 225 would maintain the same headroom19:18
clarkbright now its 100 + 25 = 12519:18
fungifair enough--i'm fine with 22519:19
zaro++19:19
fungiwe've apparently already tuned httpd/maxqueued to 3x the default of 5019:20
fungier, 4x19:20
clarkbapparently 200 is the new default for maxqueued19:20
zaroit's from this https://review.openstack.org/#/c/285588/19:20
fungiin 2.12+?19:20
clarkbso maybe we want to increase that a bit too? that one is the one I really don't have ideas for19:20
zaro200 is the new default19:21
fungiit's likely fine to leave as-is19:21
clarkbwfm to leave as is19:21
fungii guess these are enough values i should propose the change first19:22
* mordred joins the party ...19:22
fungion the way19:22
zarowonder what luca mean with this 'If you have over 200 incoming requests queued, possibly there is19:22
zarosomething more serious to investigate..'19:22
clarkbzaro: probably that you are under attack of some sort19:22
zaroahh yeah, that's completely possible19:23
fungiyeah, like you're not handling requests fast enough (either becaus eyou've tuned the other values poorly, your system is under-sized, or you're in the middle of a denial of service attack)19:23
fungiokay, as zaro pointed out (and i just confirmed), the parameters are already all plumbed through19:28
fungihttps://review.openstack.org/36074419:28
fungiclarkb: zaro: mordred: jeblair: ^ does that makes sense then?19:28
clarkblooking19:28
fungiif you approve, i'll hand-patch the result into gerrit.config and restart the service19:30
fungijust making sure we're on the same page with the suggested values19:30
zarodidn't we agree on 100 for maxthreads?19:32
clarkbya I think that should be 100 not 20019:34
mordredlgtm - other than the 100/200 from zaro clarkb19:35
fungigah, yep19:35
fungithat was a typo19:35
fungiokay, correction is up as patchset 219:36
fungii got thrown off by copying and editing the httpd_maxqueued line and neglected to switch the 2 to a 119:37
fungiclarkb: zaro: mordred: jeblair: ^19:39
zarolgtm19:40
mordredfungi: +219:41
clarkbtrying to get it to load19:41
fungioh the irony ;)19:42
clarkbI keep getting proxy errors19:43
clarkbI am just going to trust you replaced the 200 with 100 and everything else stayed the same19:43
fungiyep, i did19:44
fungi#status notice The Gerrit service on review.openstack.org is restarting to implement some performance tuning adjustments, and should return to working order momentarily.19:44
openstackstatusfungi: sending notice19:44
fungicool, on its way back up with the new values applied19:46
-openstackstatus- NOTICE: The Gerrit service on review.openstack.org is restarting to implement some performance tuning adjustments, and should return to working order momentarily.19:46
fungii'm keeping an eye on javamelody19:46
openstackstatusfungi: finished sending notice19:47
fungithe threadcount graph dropped significantly, but not for long. it's already climbing back up almost to where it left off19:52
clarkband will likely go past it19:53
jeblairo/20:07
fungiyeah, it's just now gotten back to the old level20:11
fungiunfortunately we're only around 20 httpd threads in use according to show-caches20:11
fungii'm waiting to see that go over 2520:12
funginow i'm worried that i mistyped max in there twice, but puppet has already reverted the config so i can't tell20:13
fungiso particularly eager to see it go over 2020:13
fungithough i guess unless demand increases past 20 it's just going to have 20 threads regardless20:14
clarkband we probably have to wait for one of those spikes we were seeing to see it really push up20:16
clarkbsince under the normal load it seemed happy with the old params20:17
fungilater on this evening after 360744 merges and is reflected in the config on disk i'll do another quick gerrit restart just to be doubly certain it's applied as written20:19
clarkbfungi: you can also see the threads in the java melody thread listing it expands in the page with a little + button20:20
* jeblair helps by enqueing those changes from earlier20:21
fungiclarkb: yeah, though the ssh api is a little easier to get counts from20:23
clarkbhttps://review.openstack.org/monitoring?part=graph&graph=httpSystemErrors shows that the errors have dropped off. I think there is always sort of a baseline error count with gerrit since it throws exceptions for things that are relatively normal too20:23
fungithe threads count graph shows it's flatlined right about where it was before the restart20:30
fungiand still only totalling 20 httpd threads20:30
clarkbhuh20:32
jeblairdid puppet restart it?20:32
jeblair(does not look like it; current proc is from 19:45)20:34
fungiyay! fears abated... up to 23 httpd threads now20:45
fungii'll check again after dinner20:46
fungiwrong time of day i guess. back down to 20 httpd threads22:39
*** ChanServ changes topic to "situation normal"22:39

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!