Thursday, 2018-11-22

*** betherly has joined #openstack-infra00:01
clarkbianw: reading that maybe what we really want to test here is that each of the individual rules we care about are in the policy?00:04
clarkbianw: we can likely ignore the -P lines, check that -N openstack-INPUT is present and that -A for ssh, snmp, and the ending reject rule are present?00:05
clarkboh and icmp is important too00:05
clarkborder technically matters but I think we can fudge that for testing this00:06
clarkbthen basically ignore the docker ruleset since it doesn't appear to interfere with us00:06
*** betherly has quit IRC00:06
clarkboh there is a "conflict"00:07
clarkbwe set -P FORWARD ACCEPT on our default policy? but docker sets it to DROP?00:07
clarkbwe might actually want to reconcile that00:08
clarkb(I think we can safely set -P FORWARD DROP since the onyl place it matter is our test nodes and those are dib configured00:08
fungithat's for routing, correct?00:08
clarkbfungi: ya00:08
clarkbfungi: so it would matter for infracloud or test nodes running openstack00:08
clarkbfungi: but I don't think we route in control plane anywhere00:08
fungiagreed, none of our control plane servers should be acting as routers anyway00:08
fungii expect that was there because we were originally using the same base ruleset for test nodes running devstack00:10
clarkbianw: fungi the other thing we should test is if bouncing the network/iptables-persist services results in removal of the docker rules00:10
clarkbwe can sort that out later though since this whole thing is a learning experience00:10
clarkbfungi: ya00:10
fungioh, yep, iptables-persistent may clear any rules not in /etc/firewall/rules00:11
clarkbdocker uses its own chains but we dump the whole set with iptables-persistent00:11
clarkbinternet seems to agree this is an issue and the fix is to have iptables-persistent manage the docker rules for it00:12
clarkbhttps://blog.daknob.net/debian-firewall-docker/00:12
*** rkukura has quit IRC00:13
clarkbugh this could get complicated very quickly00:14
clarkbI wonder if restarting the docker daemon will update iptables00:17
clarkba better fix here might be to have those two services depend on each other in such a way that restarting one restarts the other00:18
clarkbI guess its unidirectional. If we restart iptables-persistent then restart docker00:18
clarkband existing connections should keep running through that since we are nice to established connections00:18
clarkb(except maybe nat will have a sad?)00:18
*** betherly has joined #openstack-infra00:21
*** rh-jelabarre has quit IRC00:22
fungii doubt it will clear the conntrack tables so shouldn't?00:22
clarkbfungi: actually reading more the sad is that restarting docker daemon restarts your containers00:23
clarkbbecause bonghits00:23
fungioh :/00:24
clarkbbut restarting docker after restarting iptables would address it00:24
*** betherly has quit IRC00:25
clarkbhttps://unrouted.io/2017/08/15/docker-firewall/ is another approach via the DOCKER-USER chain and explicit flushes (that don't flush existing docker rules)00:26
clarkbreality is that we update firwealls so rarely that restarting docker after updating iptables is probably fine for us00:26
clarkband yet another option is to put docker in its own network namespace (lol)00:27
clarkbthen you punch holes from real interface namespace to docker namespace00:27
clarkbI wonder how neutron addresses this because it presumeably has the same problem00:27
fungiDOCKER-USER looks like the easiest solution, but it's unfortunately a break from how we've been firewalling our servers in the past00:31
fungiand also i expect we'll have to revisit yet again when the inevitable userland transition from iptables to nftables finally happens00:34
*** betherly has joined #openstack-infra00:34
clarkbreading this I want pf00:36
fungitell me about it. i was helping my brother troubleshoot a rather complex pf ruleset earlier today, and it was a breeze00:36
fungiit had zone-specific inclusions, tables loaded from additional files, interface and host aliases out the wazoo00:37
clarkbits been a while since I had to read a ruleset but I seem to recall that pf rules always applied on packets going in a single direction00:39
clarkbmuch easier to reason about because you didn't have different types of chains, you just evaluate at a given interface and direction00:40
fungithey certainly can if you write them relative to particular interfaces (or interface groups)00:40
*** betherly has quit IRC00:40
fungibut you can also define rules generally directionless and have them evaluated on ingress to all interfaces00:40
fungicoupled with state tracking and urpf it ends up being much more efficient00:41
clarkbpass on tl0 from $web_serv_int to any binat-to $web_serv_ext00:42
fungithen you let your routing tables/interface configs worry about which networks live where and let the firewall apply sensible traffic policy00:42
clarkbcompared to iptables -t nat -A POSTROUTING -s ${IPADDR_OUTSIDE} -j MASQUERADE00:42
clarkbthese days I let pfsense build my rules for me because elazy but once upon a time I had the dual proc pentium 3 router running pf on openbsd00:43
fungialso modern openbsd has automagic interface group names like "egress" which applies to any interface through which a default route lies00:43
clarkbfwiw I think we should probably punt on this particular concern for now. We can restart docker (and containers) if iptables rules change00:43
clarkbthe other thing to keep in mind is that any service we don't actually want to expose may end up being fun to fireawll (think elasticsearch)00:44
*** longkb has joined #openstack-infra00:47
*** longkb has quit IRC00:48
fungifor a more modern example, here's the pf.conf on my home firewall (only uses two physical interfaces and doesn't have many services behind it, performs overload nat outbound for ipv4 and allows ipv6 out without translation): http://paste.openstack.org/show/735924/00:48
clarkbwe'd need to use DOCKER-USER chain to filter access to elasticsearch nodes I think00:49
*** mordred has quit IRC00:49
fungiinbound ipv4 port redirection for ssh/mosh to my workstation (.64) and access to my jukebox icecast stream (.65)00:51
fungioh, and a udp port range on the jukebox because i occasionally ran a bittorrent client there00:51
clarkbsuddenly though I'm reminded why people cling to their centralized firwalls00:53
clarkbtldr docker makes this hard on your network leaves so punt to the routers00:53
clarkbmordred[m]: ianw ^ its probably worth an update to our spec to outline whatever firewall design choices we've made once we've made them01:01
clarkbmordred[m]: ianw in particular I think what will happen is any service running in a docker container with exposed ports will be available on the internet because the forwading chain will bypass any input rules we already have01:02
clarkbfor most of our services this is fine since we expose them to the internet, but for others like gearman and elasticsearch we don't and those should be carefully managed01:02
*** tosky has quit IRC01:07
*** jamesmcarthur has quit IRC01:12
*** jamesdenton has quit IRC01:12
openstackgerritIan Wienand proposed openstack/diskimage-builder master: package-installs: provide for skip from env var  https://review.openstack.org/61911901:13
openstackgerritIan Wienand proposed openstack/diskimage-builder master: simple-init: allow for NetworkManager support  https://review.openstack.org/61912001:13
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Fix unit tests for elements  https://review.openstack.org/61938701:13
clarkbhttps://github.com/containernetworking/plugins/pull/75#issuecomment-398332210 remains an unsolved problem on podman too (in case anyone is wondering if that will solve the problem)01:13
clarkbapparently our existing rules would work if k8sing ebcause they expose the services directly and not via nat?01:14
clarkbI think the ideal we want here is to have DOCKER-USER explicitly allow traffic we want and drop anything else (so that we don't accidentally expose an elasticsaerch). Then we can add that to our existing rules and restart docker for easy mode. Or update iptable rule setting to only drop and reset the chains we control and not affect the other docker* chains01:16
openstackgerritTristan Cacqueray proposed openstack-infra/zuul-jobs master: run-test-command: support list in test_command variable  https://review.openstack.org/61088801:23
imacdonnNeed validation on an assumption ... I have a change which spans glance and glance_store ... if I put the gerrit URL of the glance_store change as a Depends-On in the commit message for the glance change, it should pull in both, right ?01:30
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Fix unit tests for elements  https://review.openstack.org/61938701:31
openstackgerritIan Wienand proposed openstack/diskimage-builder master: package-installs: provide for skip from env var  https://review.openstack.org/61911901:31
openstackgerritIan Wienand proposed openstack/diskimage-builder master: simple-init: allow for NetworkManager support  https://review.openstack.org/61912001:31
ianwimacdonn: yes ... as long as the test is configured to install glance_store from the zuul checkout01:32
ianwwhich it most likely is01:32
imacdonnhmm01:32
imacdonnwhere would that be configured?01:32
*** rkukura has joined #openstack-infra01:33
ianwnot so much configured, but the test is written to do that.   tox based tests, devstack based test should pretty much all get this right out of the box01:33
imacdonnthe test (module) just goes "import glance_store"01:34
ianwimacdonn: right, what particular test are you talking about?  it's where glance_store gets installed01:34
*** rkukura has quit IRC01:35
imacdonnopenstack-tox-py27 and openstack-tox-py35 both failed, but, oddly-enough, penstack-tox-py36 didn't01:35
ianwright, for those, you want to make sure glance_store is a required-project in your zuul config.  the tox runners in zuul-jobs will then install things correctly01:36
ianwin particular, see tox_install_siblings in http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/tox/README.rst01:37
imacdonnI don't see any mention of "openstack-tox-py" in the glance repo01:39
*** rcernin has quit IRC01:39
*** rcernin has joined #openstack-infra01:39
imacdonnnor "py36" for that matter ... I must be looking in the wrong place01:41
ianwthose jobs come in via the templates01:42
ianwimacdonn: i think what you're probably going to have to do is add glance-store to the jobs01:44
imacdonndo you have an example handy? ... of a project that does this properly01:44
ianwimacdonn: you can see a few places where it's done like http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n537401:45
ianwyou'll have to add entries like01:45
ianwopenstack-tox-py27:01:46
ianw  required-projects:01:46
ianw  - openstack/glance-store01:46
ianwthis is probably right, as i see glance-store is in the requirements.txt for glance, so it's a hard dependency01:46
ianw... but then again ...01:47
ianwyou're going to create a glance master that requires glance-store master to work01:47
imacdonnright01:48
ianwok, just pointing out that currently it tests against the pip version, where if you add it to required-projects, the tox role is going to install it from a git checkout01:52
imacdonnright .. hmm ... I guess I need to discuss this with the glance team01:58
*** annp has quit IRC02:04
*** mrsoul has quit IRC02:31
*** bhavikdbavishi has joined #openstack-infra02:46
*** jamesmcarthur has joined #openstack-infra03:13
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Fix unit tests for elements  https://review.openstack.org/61938703:14
openstackgerritIan Wienand proposed openstack/diskimage-builder master: package-installs: provide for skip from env var  https://review.openstack.org/61911903:14
openstackgerritIan Wienand proposed openstack/diskimage-builder master: simple-init: allow for NetworkManager support  https://review.openstack.org/61912003:14
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Caputure ramdisk test run logs  https://review.openstack.org/61940003:14
*** jamesmcarthur has quit IRC03:17
*** psachin has joined #openstack-infra03:19
*** fuentess has quit IRC03:20
*** BOKALDO has joined #openstack-infra03:25
*** bhavikdbavishi has quit IRC03:33
*** eernst has joined #openstack-infra03:33
*** bhavikdbavishi has joined #openstack-infra03:38
*** roman_g has quit IRC03:38
*** eernst has quit IRC03:38
*** ykarel|away has joined #openstack-infra03:49
*** ykarel|away is now known as ykarel03:56
openstackgerritRui Chen proposed openstack-infra/system-config master: Add #askopenlab into meetbot channels  https://review.openstack.org/61943203:57
*** BOKALDO has left #openstack-infra04:12
*** udesale has joined #openstack-infra04:15
*** rlandy|bbl is now known as rlandy04:21
*** janki has joined #openstack-infra04:32
*** yamamoto has quit IRC04:49
openstackgerritIan Wienand proposed openstack-infra/nodepool master: Add Fedora 29 testing  https://review.openstack.org/61867104:58
openstackgerritIan Wienand proposed openstack-infra/glean master: Add NetworkManager distro plugin support  https://review.openstack.org/61896404:59
openstackgerritIan Wienand proposed openstack-infra/system-config master: Remove pypi symlink  https://review.openstack.org/61943505:15
ianwinfra-root: ^ if we have a sec, this will make sure the volume is unreferenced05:16
*** armax has quit IRC05:17
*** ykarel has quit IRC05:24
*** yamamoto has joined #openstack-infra05:29
*** chandankumar has joined #openstack-infra05:33
*** chandankumar is now known as chkumar|ruck05:33
*** yamamoto has quit IRC05:39
*** ykarel has joined #openstack-infra05:40
*** rkukura has joined #openstack-infra05:40
AJaegergmann: could you review https://review.openstack.org/617662, please? That adds a new grenade py35 job to integrated gate06:01
openstackgerritMerged openstack-infra/project-config master: Add operations-docs to storyboard  https://review.openstack.org/61872206:05
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Only setup zNode caches in launcher  https://review.openstack.org/61944006:07
*** bhavikdbavishi has quit IRC06:07
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Fix print-zk tool for python3  https://review.openstack.org/61944106:10
*** yamamoto has joined #openstack-infra06:14
*** noama has joined #openstack-infra06:16
gmannAJaeger:  +1 on running it on integrated gate but one question inline for setting the branch.06:19
*** ifat_afek has joined #openstack-infra06:20
*** ahosam has joined #openstack-infra06:31
*** jamesmcarthur has joined #openstack-infra06:34
openstackgerritRui Chen proposed openstack-infra/irc-meetings master: Add OpenLab team meeting  https://review.openstack.org/61944606:39
*** jamesmcarthur has quit IRC06:39
AJaegerthanks, gmann - agree with your comment06:42
*** bhavikdbavishi has joined #openstack-infra06:46
*** bhavikdbavishi has quit IRC06:51
*** ahosam has quit IRC06:57
openstackgerritKartikeya Jain proposed openstack/diskimage-builder master: Adding support for SLES 15 in element 'sles'  https://review.openstack.org/61918606:57
*** slaweq has joined #openstack-infra06:59
*** quiquell|off is now known as quiquell07:01
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool master: Implement an OpenShift resource provider  https://review.openstack.org/57066707:02
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool master: Implement an OpenShift Pod provider  https://review.openstack.org/59033507:02
openstackgerritRui Chen proposed openstack-infra/project-config master: Add OpenLab channel #askopenlab into accessbot list  https://review.openstack.org/61945007:05
openstackgerritIan Wienand proposed openstack-infra/nodepool master: Add Fedora 29 testing  https://review.openstack.org/61867107:08
openstackgerritRui Chen proposed openstack-infra/irc-meetings master: Add OpenLab team meeting  https://review.openstack.org/61944607:08
*** aojea has joined #openstack-infra07:16
*** pcaruana has joined #openstack-infra07:22
*** ramishra has joined #openstack-infra07:30
*** jaosorior has joined #openstack-infra07:41
*** ykarel is now known as ykarel|lunch07:44
*** rpittau has joined #openstack-infra07:44
*** eumel8 has joined #openstack-infra07:51
*** kjackal has joined #openstack-infra07:56
*** xarses_ has quit IRC07:56
*** xarses_ has joined #openstack-infra07:57
*** ginopc has joined #openstack-infra08:11
*** bhavikdbavishi has joined #openstack-infra08:12
*** rcernin has quit IRC08:13
*** florianf has joined #openstack-infra08:15
*** jbadiapa has quit IRC08:16
*** ralonsoh has joined #openstack-infra08:18
*** shardy has joined #openstack-infra08:26
*** jbadiapa has joined #openstack-infra08:32
*** ykarel|lunch is now known as ykarel08:36
*** jpena|off is now known as jpena08:37
*** jbadiapa has quit IRC08:37
*** imacdonn has quit IRC08:38
*** xek has joined #openstack-infra08:42
*** tosky has joined #openstack-infra08:49
*** slaweq has quit IRC08:56
*** jpich has joined #openstack-infra09:01
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Capture ramdisk test run logs  https://review.openstack.org/61940009:09
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Fix unit tests for elements  https://review.openstack.org/61938709:09
openstackgerritIan Wienand proposed openstack/diskimage-builder master: package-installs: provide for skip from env var  https://review.openstack.org/61911909:09
openstackgerritIan Wienand proposed openstack/diskimage-builder master: simple-init: allow for NetworkManager support  https://review.openstack.org/61912009:09
openstackgerritIan Wienand proposed openstack-infra/glean master: Add NetworkManager distro plugin support  https://review.openstack.org/61896409:13
*** e0ne has joined #openstack-infra09:17
*** jbadiapa has joined #openstack-infra09:19
*** mrhillsman is now known as openlab09:19
*** openlab is now known as mrhillsman09:20
*** yboaron_ has joined #openstack-infra09:28
*** ifat_afek has quit IRC09:31
*** gfidente has joined #openstack-infra09:31
*** derekh has joined #openstack-infra09:34
*** Emine has joined #openstack-infra09:41
*** electrofelix has joined #openstack-infra09:52
*** dtantsur|afk is now known as dtantsur09:54
*** zigo has joined #openstack-infra10:03
*** apetrich has joined #openstack-infra10:06
openstackgerritHemanth Nakkina proposed openstack-infra/project-config master: Create airship-spyglass repo  https://review.openstack.org/61949310:07
*** bhavikdbavishi has quit IRC10:10
*** yamamoto has quit IRC10:14
*** ifat_afek has joined #openstack-infra10:15
*** ianychoi has quit IRC10:25
*** dpawlik has quit IRC10:37
*** dpawlik_ has joined #openstack-infra10:37
*** jpena is now known as jpena|off10:37
*** bhavikdbavishi has joined #openstack-infra10:41
*** xarses_ has quit IRC10:43
*** ramishra has quit IRC10:43
*** xarses_ has joined #openstack-infra10:43
*** toabctl has joined #openstack-infra10:54
*** sshnaidm|afk is now known as sshnaidm10:54
*** ifat_afek has quit IRC10:55
*** yamamoto has joined #openstack-infra10:57
*** bhavikdbavishi has quit IRC11:01
*** yamamoto has quit IRC11:05
*** hemanth_n has joined #openstack-infra11:07
*** ccamacho has quit IRC11:08
*** ccamacho has joined #openstack-infra11:10
*** yamamoto has joined #openstack-infra11:11
hemanth_nAJeager: Regarding https://review.openstack.org/#/c/619493 the code will be ready by next monday... once the project is created i would like to upload the codebase11:12
hemanth_nAjaeger: Regarding https://review.openstack.org/#/c/619493 the code will be ready by next monday... once the project is created i would like to upload the codebase11:13
*** udesale has quit IRC11:15
*** ifat_afek has joined #openstack-infra11:19
*** ianychoi has joined #openstack-infra11:31
AJaegerhemanth_n: commented on review - I think you better WIP the change (workflow-1) and use the upstream field to import code you have...11:32
hemanth_nAjaeger: Ok thanks..11:33
hemanth_nAJaeger: One query on upstream field.. is it possible to just pull one branch instead of all branches11:35
AJaegerno, we import whole repo. But you can dleete branches... Clone the repo to another place, delete what you don't need...11:35
hemanth_nAJaeger: Okie.. thanks11:36
openstackgerritFabien Boucher proposed openstack-infra/nodepool master: Add ip-pool option to the openstack provider  https://review.openstack.org/61952511:38
*** janki has quit IRC11:43
*** dims has quit IRC11:45
*** e0ne has quit IRC11:46
*** yboaron_ has quit IRC11:50
openstackgerritBrendan proposed openstack-infra/zuul master: Update change URL for Gerrit v2.16  https://review.openstack.org/61953311:55
*** ralonsoh has quit IRC11:58
*** hemanth_n has quit IRC11:58
*** e0ne has joined #openstack-infra11:59
*** ralonsoh has joined #openstack-infra11:59
*** yamamoto has quit IRC11:59
*** yamamoto has joined #openstack-infra12:00
openstackgerritJesse Pretorius (odyssey4me) proposed openstack-infra/openstack-zuul-jobs master: Correct MariaDB proxy suffix  https://review.openstack.org/61953712:04
odyssey4memnaser ^12:04
*** kjackal has quit IRC12:07
mnaserthanks odyssey4me12:07
openstackgerritJakub Sliva proposed openstack-infra/project-config master: Create repository for ceilometer-dashboard plugin  https://review.openstack.org/61923512:13
*** ginopc has quit IRC12:15
*** ramishra has joined #openstack-infra12:17
*** apetrich has quit IRC12:18
*** kjackal has joined #openstack-infra12:18
*** bobh has joined #openstack-infra12:20
*** ginopc has joined #openstack-infra12:29
*** yamamoto has quit IRC12:30
*** kjackal has quit IRC12:31
*** kjackal has joined #openstack-infra12:31
*** apetrich has joined #openstack-infra12:38
fricklerinfra-root: review.o.o is pretty slow for me currently. there is also a peak of traffic at http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=45&rra_id=all12:39
* prometheanfire just realized he needed to restart neomutt to see the new folder for the new list...12:42
AJaegeroh, "gerrit show-queue" shows a stuck queue since 2nd of November for openstack/networking-odl - anything we need to care about?12:42
*** witek has joined #openstack-infra12:51
openstackgerritChandan Kumar proposed openstack-infra/elastic-recheck master: Add query for ssh timeout in tempest scenario for bug 1802971  https://review.openstack.org/61757912:52
openstackbug 1802971 in tripleo "tempest volume_boot_pattern and basic_ops running concurrently causing timeouts" [Critical,Triaged] https://launchpad.net/bugs/180297112:52
*** yamamoto has joined #openstack-infra13:01
mnaseris it me or gerrit seems pretty slow13:05
kaiokmomnaser: same13:05
*** yamamoto has quit IRC13:06
*** yamamoto has joined #openstack-infra13:08
*** yamamoto has quit IRC13:08
openstackgerritSorin Sbarnea proposed openstack-infra/elastic-recheck master: Made elastic-recheck py3 compatible  https://review.openstack.org/61657813:15
*** bobh has quit IRC13:25
*** yamamoto has joined #openstack-infra13:26
fricklerinfra-root: gerrit still seems to suffer from memory shortage, started swapping at 10:30 when also the load peaked http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=29&rra_id=all13:28
fricklerI'm not sure whether I should leave it that way in the hope that it will either recover or someone will find time to debug further. the other option would be a restart which may or may not resolve the issue13:29
fricklerand I'll be pretty much offline for a long weekend soon myself13:30
fungii'm not in a position to troubleshoot it yet, but it may recover on its own if the traffic burst doesn't persist13:32
fungiotherwise i probably have to dig into logs and state tables to find abusers and block them with iptables rules13:33
fricklerfungi: the traffic seems to have normalized, while memory usage hasn't. the logs didn't show anything obvious for me. there seems to be one bot at work, but I see requests from it also before the issue started13:34
fungifrickler: if it helps, gerrit admins can also view the javamelody monitoring interface for gerrit at https://review.openstack.org/monitoring once authenticated13:39
*** slaweq has joined #openstack-infra13:41
*** yamamoto has quit IRC13:41
fungisluggishness usually transpires when there is heavily jvm garbage collection activity13:42
fungi(you have to expand "other charts" to see that)13:43
fungilooks like there was a gc burst around 12:15z13:43
fungibut it hasn't continued13:43
fricklerfungi: it's not garbage collection, that shows only a single peak an hour ago13:43
*** udesale has joined #openstack-infra13:44
fungiso this isn't the sort of thing we were seeing regularly in the past13:44
*** pcaruana has quit IRC13:50
*** fuentess has joined #openstack-infra13:52
*** ifat_afek has quit IRC13:58
*** chkumar|ruck has quit IRC14:02
*** zul has joined #openstack-infra14:06
*** slaweq has quit IRC14:07
*** mnencia has quit IRC14:07
*** pbourke has quit IRC14:09
*** mnencia has joined #openstack-infra14:09
*** pbourke has joined #openstack-infra14:11
*** dims has joined #openstack-infra14:14
*** rfolco has quit IRC14:15
*** rfolco has joined #openstack-infra14:16
*** pcaruana has joined #openstack-infra14:25
*** emine__ has joined #openstack-infra14:30
*** Emine has quit IRC14:30
*** ykarel is now known as ykarel|away14:34
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Add second level cache of nodes  https://review.openstack.org/61902514:37
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Add second level cache to node requests  https://review.openstack.org/61906914:37
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Only setup zNode caches in launcher  https://review.openstack.org/61944014:37
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Asynchronously update node statistics  https://review.openstack.org/61958914:37
*** ykarel|away has quit IRC14:47
*** quiquell is now known as quiquell|off14:53
*** yamamoto has joined #openstack-infra14:55
*** psachin has quit IRC14:55
*** bhavikdbavishi has joined #openstack-infra14:55
*** emine__ has quit IRC15:02
*** yamamoto has quit IRC15:03
*** quiquell|off is now known as quiquell15:15
mnaseris it possible to get a bigger instance for gerrit?15:15
dmsimarddepends what the problem is15:16
dmsimardit's probably not very loaded right now, a lot of people are off because of thanksgiving15:16
dmsimardI did get a 502 proxy error earlier and frickler mentioned issues as well15:16
*** kjackal has quit IRC15:17
*** kjackal_v2 has joined #openstack-infra15:17
fungiit's a 60gb ram instance already15:18
funginormally it doesn't (and shouldn't) need anywhere near that amount of memory15:18
dmsimardjava (gerrit) is at ~500% CPU and 86% memory15:19
dmsimardwhich is probably not normal15:19
fungithe javamelody graph says it's been hovering around 8gb of used memory in the jvm up until around 10:30z today when it started to grow to around 40gb15:19
*** jistr is now known as jistr|call15:20
mnaser<insert joke about thanksgiving and gerrit is also hungry and eating a lot of memory instead of turkey>15:20
fungiso a fivefold growth in memory utilization over the course of ~4 hours15:20
fungibloated! ;)15:20
dmsimardfungi: looks like there's something going on *right now* according to cacti http://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=25&rra_id=0&view_type=tree&graph_start=1542035940&graph_end=154289994015:20
dmsimarda very abnormal cpu spike15:20
fungii'll see what i can spot15:20
dmsimardSeeing a lot of these: [Thu Nov 22 15:21:12.292112 2018] [mpm_event:error] [pid 48415:tid 140553527216000] AH00485: scoreboard is full, not at MaxRequestWorkers15:21
*** kjackal_v2 has quit IRC15:21
*** kjackal has joined #openstack-infra15:22
fungiyeah, that likely indicates we have a lot of additional established connections to apache than usual15:22
fungier, a lot more than usual15:22
fungiwhich could also just be a symptom of gerrit being slow to respond for some other reason, clients end up connected longer waiting for responses so they pile up15:23
dmsimardyeah, was thinking the same thing15:24
dmsimardI feel like I remember something about Zuul switching to the http API for gerrit (instead of SSH) -- or was it the other way around ?15:26
dmsimardI ask because there's a lot of ssh login/logout from zuul although it's probably normal/expected if we're still using SSH and my memory is failing me15:27
*** jistr|call is now known as jistr15:28
fungii think it still needs ssh for the event stream15:28
fungibut uses the rest api for reporting now15:29
dmsimardah, makes sense15:30
fungi>600 established connections to the gerrit ssh api right now, though i don't know whether that's typical15:31
*** ccamacho has quit IRC15:31
*** ccamacho has joined #openstack-infra15:32
fungihighest offending ip address only has 34 connections. not great, but not necessarily indicative of a runaway system15:33
dmsimardwhat's the url for melody again? doesn't seem mentioned in https://docs.openstack.org/infra/system-config/gerrit.html15:33
fungiit's /monitoring15:34
mnaseri think it's completely fallen over now15:34
fungihighest offending account for established ssh api connections is 53, so also not near our set max limit15:35
dmsimardfungi: in /monitoring you see a particular request that has been running for a very long time -- should we attempt to kill that ?15:36
dmsimardsays it's been running for 312 minutes15:36
dmsimard~5 hours ago15:36
fungithe /changes/?q=is:watched+is:merged&n=25&O=81 one?15:37
scaschiming in to mention that i'm also seeing a 502 proxy error15:37
dmsimardfungi: there's a few, actually: https://screenshots.firefox.com/wCp17ZUj4NP5JEFg/review.openstack.org15:38
scassomething something holidays15:38
dmsimardscas: we're on it :)15:38
scas<315:38
AJaegerdmsimard: that are 5 of these - interesting15:39
fungiif you expand the request details there are 5 of those, all long-running15:39
fungiyeah15:39
dmsimardfungi: may I kill them ?15:39
AJaeger+115:39
dmsimardit sounds like it's an odd query as well... "is watched and is merged" ? that'd return a LOT of changes15:40
fungii wish it told you the account as well15:40
scasif someone's filters weren't set up, yeah, it would. i use that query on rare occasion, but i have filters15:40
fungii'm suspecting the active stackalytics account is running those, but it's not clear15:41
dmsimardwe can search the thread numbers in the logs15:41
scasit was not me issuing that, however, as i'm just a bystander15:41
dmsimardfor example the first one turns up15:41
dmsimard[2018-11-22 03:14:33,810] [HTTP-4277706] WARN  com.google.gerrit.server.extensions.events.EventUtil : Error in listener com.google.gerrit.server.events.StreamEventsApiListener for event com.google.gerrit.server.extensions.events.CommentAdded: null15:41
fungioh, actually not stackalytics as it still queries via the ssh api15:42
fungigo ahead and kill those long-running rest api queries, though i have doubts it will help it's a fair test15:43
dmsimardeh, getting a 502 proxy error when attempting to kill15:44
fungifun!15:44
*** psachin has joined #openstack-infra15:44
dmsimardI guess we're looking at a restart ?15:44
*** mdbooth has joined #openstack-infra15:44
mdboothI'm very conscious that I only ever join here when things are going wrong :)15:44
fungithe kill button warns me "Killing a Java thread is not recommended and unsafe."15:45
mdboothOr at least I suspect they are. Is gerrit down for you, too?15:45
AJaegermdbooth: team is on it...15:45
mdboothAJaeger: Awesome, thanks.15:45
fungithe 502 proxy error is generally just apache giving up after the java web service takes too long to return a response15:46
AJaegerfungi, shall I send an alert out?15:46
dmsimardfungi: this is what searching for the http thread IDs in the error log turns up: http://paste.openstack.org/show/735947/15:46
fungii'm managing to kill them15:46
AJaegerWhat about? #status alert review.openstack.org is currently under high load, we're investigating who works on Thanksgiving...15:47
AJaegerOr notice?15:47
dmsimardfungi: load has already significantly gone down15:47
dmsimardand gerrit is reachable again15:48
fungii think it started to become reachable on its own before i killed those threads, but it's hard to be certain15:48
*** psachin has quit IRC15:48
dmsimardAJaeger: I think we should be good -- a notice to mention the issue, the recovery and that we're keeping an eye on things is probably appropriate15:49
*** quiquell is now known as quiquell|off15:49
fungimost of those threads get reused for different things over time, so the entries closer to when the issue started (~10:00z today) are likely more relevant15:49
AJaegerSo, #status notice review.openstack.org was under heavy note and should recover now; infra team keeps monitoring ?15:50
AJaegerOr just leave it, since we hope it's fine?15:50
scasas a drive-by observer, a quick note making people aware that you're aware goes a long way to making sure when it happens again it won't be a (big) shock15:51
fungidmsimard: looking at that subset of the error log, i don't think any of those are relevant15:51
funginone are for queries15:52
fungithey're storyboard plugin and stream events and comment added15:52
AJaegerfungi, dmsimard, memory is still high, isn't it?15:52
fungiin theory it should get garbage-collected15:53
AJaegerquestion is when...15:53
fungithe gc thread wakes up periodically, or we can trigger it through javamelody on demand15:53
dmsimardfungi: yeah, neither do I -- I looked at the context too and there's no traces or anything.. searching for the actual error did turn up a bug in "its-base" which might be used by storyboard ?15:53
dmsimardhttps://bugs.chromium.org/p/gerrit/issues/detail?id=743515:53
* AJaeger needs to step out15:56
fungiits-base is the issue tracker base plugin, which the its-storyboard plugin is built on15:56
*** tobias-urdin has joined #openstack-infra15:58
dmsimard#status log We have recovered from high cpu usage on review.openstack.org by killing several requests in melody that had been running for several hours and brought gerrit to a crawl with proxy errors. Requests looked like this: "/changes/?q=is:watched+is:merged&n=25&O=81 GET" but we haven't been able to identify where these requests came from.15:58
openstackstatusdmsimard: finished logging15:58
fungii'm going to check for those query strings in the gerrit ssh and apache logs16:00
dmsimardoh, good idea16:00
dmsimardthere's matches in gerrit-ssl-access.log16:00
*** mgutehall has quit IRC16:02
fungioh, cool, i'll leave you to that. still trying to deal with things in the kitchen here at the same time16:02
dmsimardThere's requests from that IP with the gertty user agent but those particular hits came from a chrome agent16:02
dmsimardsure, I'll try and find out16:02
*** mgutehall has joined #openstack-infra16:03
dmsimardfungi: you're off today anyway right?16:03
*** aojea has quit IRC16:08
*** jamesmcarthur has joined #openstack-infra16:09
fungisome of today16:09
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Correct MariaDB proxy suffix  https://review.openstack.org/61953716:10
fungiit's a good excuse to get together with friends who have more rigid work schedules and are off for a holiday16:10
*** ykarel|away has joined #openstack-infra16:11
funginot one of my preferred holidays personally what with the oppressive religious overtones and whitewashing of imperialist genocide16:11
dmsimard++16:11
dmsimardI've identified the username who ran the queries in question16:12
dmsimardBut I'm not entirely positive on how to match that username to a name/email16:12
fungiyou can start a query on the gerrit webui like reviewer:thatusername and see what it tries to autocomplete with16:18
fungithough there are also rest api methods to get user info, and as a last resort the external_id column of the account_external_ids table in the backend db can be queried for a match on "username:thatusername"16:19
*** bhavikdbavishi has quit IRC16:19
dmsimardI was ~75% certain of a match but an IRC whois confirmed it :p16:20
*** bhavikdbavishi has joined #openstack-infra16:20
dmsimardSent a PM to see if we can figure out what happened16:20
fungithe jvm gc seems to have woken up and started to free up references too16:21
fungikeep in mind that we only have at best anecdotal evidence to suggest those query threads were the problem16:21
dmsimardyup16:22
fungijust because it started to recover roughly when i killed them doesn't mean they were the cause, or in any way related for that matter16:22
dmsimardI /could/ try to reproduce the issue heh16:23
*** bhavikdbavishi1 has joined #openstack-infra16:23
fungiwe probably should try that at some point, yes16:23
*** bhavikdbavishi has quit IRC16:24
*** bhavikdbavishi1 is now known as bhavikdbavishi16:24
dmsimardsearching for "is:watched is:merged" returns instantly for me, so might be a red herring16:28
*** e0ne has quit IRC16:28
*** lbragstad has joined #openstack-infra16:28
*** roman_g has joined #openstack-infra16:28
clarkbdmsimard: the is watched means watched by you I think16:30
clarkbso different users watch lists could have different costs to query16:30
*** ramishra has quit IRC16:32
*** Emine has joined #openstack-infra16:35
*** lbragstad has quit IRC16:39
*** roman_g has quit IRC16:41
*** armax has joined #openstack-infra16:41
*** kjackal has quit IRC16:44
*** kjackal has joined #openstack-infra16:44
*** bhavikdbavishi has quit IRC16:48
*** udesale has quit IRC16:48
*** diablo_rojo has joined #openstack-infra16:58
*** ginopc has quit IRC17:01
*** shardy has quit IRC17:05
*** mdbooth_ has joined #openstack-infra17:16
*** mdbooth has quit IRC17:19
*** jpich has quit IRC17:20
*** e0ne has joined #openstack-infra17:25
*** mdbooth_ has quit IRC17:27
*** Emine has quit IRC17:34
*** dtantsur is now known as dtantsur|afk17:35
*** jamesmcarthur has quit IRC17:35
*** jamesmcarthur has joined #openstack-infra17:39
openstackgerritDirk Mueller proposed openstack-infra/openstack-zuul-jobs master: use opensuse15 as generic name instead of opensuse150  https://review.openstack.org/61962817:40
*** rossella_s has quit IRC17:40
tobias-urdincan you request op in a channel using accessbot? would like to be able to set topic in #puppet-openstack if that's allowed since it displays CI status17:42
*** gfidente is now known as gfidente|afk17:44
fungiwe just add people as chanops for specific channels on request. in this case as ptl for a project where that's the preferred irc channel for discussions, i have no problem adding you to the access list there17:44
fungithe global access lists are managed by accessbot more for bootstrapping purposes, so we don't end up with channels where nobody can add you as an op17:45
openstackgerritMarkus Hosch proposed openstack-infra/zuul master: Add config parameters for WinRM timeouts  https://review.openstack.org/61963017:46
fungitobias-urdin: freenode -- ChanServ (ChanServ@services.): Flags +Aefortv were set on tobasco in #puppet-openstack.17:46
fungitobias-urdin: you should now be able to `/msg chanserv #puppet-openstack topic This is our fancy new topic!`17:47
fungier, that should be `/msg chanserv topic #puppet-openstack This is our fancy new topic!`17:47
*** e0ne has quit IRC17:49
*** kjackal has quit IRC17:49
*** kjackal_v2 has joined #openstack-infra17:49
*** jamesmcarthur has quit IRC17:56
*** jamesmcarthur has joined #openstack-infra18:00
fungi#status log manually triggered gerrit's jvm garbage collection from the javamelody interface, freeing some 40gb of used memory within the jvm18:00
openstackstatusfungi: finished logging18:00
*** derekh has quit IRC18:01
*** jamesdenton has joined #openstack-infra18:04
*** e0ne has joined #openstack-infra18:07
*** jamesmcarthur has quit IRC18:08
*** ykarel|away has quit IRC18:09
AJaegerwow, 40GB! Thanks, fungi !18:15
*** e0ne has quit IRC18:15
funginp18:18
* fungi disappears for some hours18:18
*** armax has quit IRC18:21
AJaegerfungi, I don't see that one yet on http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=27&rra_id=all18:33
AJaegerfungi, enjoy thanks giving18:33
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix manual dequeue of github items  https://review.openstack.org/61927218:39
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Retry queries for commits  https://review.openstack.org/57513718:40
*** jamesmcarthur has joined #openstack-infra18:45
*** rpittau has quit IRC18:46
*** lpetrut has joined #openstack-infra18:46
*** electrofelix has quit IRC18:53
*** jamesmcarthur has quit IRC18:53
*** jamesmcarthur has joined #openstack-infra18:57
*** yamamoto has joined #openstack-infra19:00
*** jamesmcarthur has quit IRC19:02
*** armax has joined #openstack-infra19:03
*** yamamoto has quit IRC19:05
*** noama has quit IRC19:05
*** Emine has joined #openstack-infra19:08
*** apetrich has quit IRC19:16
*** auristor has quit IRC19:16
*** apetrich has joined #openstack-infra19:17
dmsimardAJaeger: I believe the RAM is still allocated to the JVM, it's within the JVM that it has been free'd up19:35
dmsimard"Java memory used: 8,111 Mb / 47,378 Mb "19:35
AJaegerI see - thanks19:40
*** Emine has quit IRC19:48
*** diablo_rojo has quit IRC19:59
*** kjackal_v2 has quit IRC20:10
*** kjackal has joined #openstack-infra20:11
openstackgerritMerged openstack-infra/nodepool master: Move k8s install to pre playbook  https://review.openstack.org/61769920:12
openstackgerritMerged openstack-infra/zuul master: Add allowed-triggers and allowed-reporters tenant settings  https://review.openstack.org/55408220:14
openstackgerritMerged openstack-infra/zuul master: encrypt_secret: support self-signed certificates via --insecure argument  https://review.openstack.org/61728120:14
openstackgerritMerged openstack-infra/zuul-jobs master: upload-logs-swift: Turn FileList into a context manager  https://review.openstack.org/59285020:41
openstackgerritMerged openstack-infra/zuul-jobs master: upload-logs-swift: Keep the FileList in the indexer class  https://review.openstack.org/59285120:41
openstackgerritIan Wienand proposed openstack-infra/nodepool master: [wip] Add Fedora 29 testing  https://review.openstack.org/61867120:55
*** gfidente|afk has quit IRC21:04
*** bobh has joined #openstack-infra21:13
*** yamamoto has joined #openstack-infra21:22
*** lpetrut has quit IRC21:23
*** slaweq has joined #openstack-infra21:23
*** bobh has quit IRC21:26
*** yamamoto has quit IRC21:37
*** kjackal has quit IRC21:38
*** yamamoto has joined #openstack-infra21:40
*** ralonsoh has quit IRC21:47
*** lbragstad has joined #openstack-infra21:54
*** rcernin has joined #openstack-infra22:05
*** xek_ has joined #openstack-infra22:06
*** yamamoto has quit IRC22:07
*** yamamoto has joined #openstack-infra22:07
*** yamamoto has quit IRC22:08
*** xek has quit IRC22:09
*** slaweq has quit IRC22:15
*** auristor has joined #openstack-infra22:19
*** jtomasek has quit IRC22:20
*** xek__ has joined #openstack-infra22:25
*** xek_ has quit IRC22:27
*** xarses_ has quit IRC22:42
*** xarses_ has joined #openstack-infra22:42
*** yamamoto has joined #openstack-infra22:46
*** ahosam has joined #openstack-infra22:47
*** ahosam has quit IRC22:57
*** yamamoto has quit IRC22:57
*** pfallenop has quit IRC23:04
*** pfallenop has joined #openstack-infra23:05
*** yamamoto has joined #openstack-infra23:12
*** lbragstad has quit IRC23:23
*** tosky has quit IRC23:31
*** jamesmcarthur has joined #openstack-infra23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!