Thursday, 2018-10-25

*** gyee has quit IRC00:14
*** jamesmcarthur has joined #openstack-infra00:17
*** openstackgerrit has joined #openstack-infra00:24
openstackgerritMarcH proposed openstack-infra/git-review master: Make it possible to configure draft as default push mode  https://review.openstack.org/22042600:24
*** smarcet has joined #openstack-infra00:26
*** longkb has joined #openstack-infra00:55
*** jamesmcarthur has quit IRC00:58
*** jamesmcarthur has joined #openstack-infra01:01
*** diablo_rojo has quit IRC01:07
*** rlandy|bbl is now known as rlandy01:09
*** longkb has quit IRC01:10
*** carl_cai has joined #openstack-infra01:18
*** mrsoul has quit IRC01:19
*** diablo_rojo has joined #openstack-infra01:20
*** smarcet has quit IRC01:20
*** hongbin has joined #openstack-infra01:21
*** jamesmcarthur has quit IRC01:23
*** smarcet has joined #openstack-infra01:26
*** efried has quit IRC01:49
*** jamesmcarthur has joined #openstack-infra01:51
*** jamesmcarthur has quit IRC01:56
*** efried has joined #openstack-infra02:01
*** smarcet has quit IRC02:02
*** anteaya has quit IRC02:04
*** felipemonteiro has joined #openstack-infra02:08
*** tinwood has quit IRC02:10
*** tinwood has joined #openstack-infra02:11
*** bobh has joined #openstack-infra02:13
*** agopi has joined #openstack-infra02:13
*** apetrich has quit IRC02:16
*** longkb has joined #openstack-infra02:18
*** jamesmcarthur has joined #openstack-infra02:20
*** jamesmcarthur_ has joined #openstack-infra02:27
*** jamesmcarthur has quit IRC02:27
*** munimeha1 has quit IRC02:30
*** roman_g_ has quit IRC02:47
*** psachin has joined #openstack-infra02:53
*** jamesmcarthur_ has quit IRC03:10
*** bhavikdbavishi has joined #openstack-infra03:18
*** diablo_rojo has quit IRC03:21
*** jesusaur has joined #openstack-infra03:21
*** lpetrut has joined #openstack-infra03:30
*** bobh has quit IRC03:32
*** felipemonteiro has quit IRC03:32
*** ramishra has joined #openstack-infra03:37
*** cfriesen has quit IRC03:50
*** ykarel|away has joined #openstack-infra03:54
*** ykarel|away is now known as ykarel03:54
*** lpetrut has quit IRC03:56
*** lbragstad has quit IRC04:00
*** hongbin has quit IRC04:08
*** janki has joined #openstack-infra04:12
*** udesale has joined #openstack-infra04:27
*** felipemonteiro has joined #openstack-infra04:28
*** ykarel has quit IRC04:30
*** armax has quit IRC04:38
openstackgerritMerged openstack-infra/irc-meetings master: Remove Glare meeting  https://review.openstack.org/61269304:40
*** ykarel has joined #openstack-infra04:49
*** spsurya has joined #openstack-infra04:50
*** larainema has joined #openstack-infra04:55
openstackgerritTobias Henkel proposed openstack-infra/zuul-jobs master: WIP: Add sar logging roles  https://review.openstack.org/61311204:57
openstackgerritTobias Henkel proposed openstack-infra/zuul-jobs master: Pin flake8  https://review.openstack.org/61319405:07
*** armax has joined #openstack-infra05:08
*** carl_cai has quit IRC05:08
openstackgerritIan Wienand proposed openstack-infra/nodepool master: Prepend exception output with time, date and thread  https://review.openstack.org/61319605:10
*** kjackal has joined #openstack-infra05:16
*** felipemonteiro has quit IRC05:16
openstackgerritTobias Henkel proposed openstack-infra/zuul-jobs master: WIP: Add sar logging roles  https://review.openstack.org/61311205:22
*** rlandy has quit IRC05:31
openstackgerritSimon Westphahl proposed openstack-infra/zuul master: Fix issue in Github connection with large diffs  https://review.openstack.org/61298905:32
*** armax has quit IRC05:39
openstackgerritSimon Westphahl proposed openstack-infra/zuul master: Fix issue in Github connection with large diffs  https://review.openstack.org/61298905:49
openstackgerritTobias Henkel proposed openstack-infra/zuul-jobs master: Add sar logging roles  https://review.openstack.org/61311206:01
*** tobiash_ has quit IRC06:03
*** lpetrut has joined #openstack-infra06:03
*** tobiash has joined #openstack-infra06:04
openstackgerritAndreas Jaeger proposed openstack-infra/zuul-jobs master: Fix flake8 3.6.0 errors  https://review.openstack.org/61320506:11
openstackgerritOpenStack Proposal Bot proposed openstack-infra/project-config master: Normalize projects.yaml  https://review.openstack.org/61320606:13
*** gfidente has joined #openstack-infra06:26
*** kjackal_v2 has joined #openstack-infra06:34
*** kopecmartin|off is now known as kopecmartin06:34
*** slaweq has joined #openstack-infra06:35
*** kjackal has quit IRC06:37
openstackgerritTobias Henkel proposed openstack-infra/zuul-jobs master: Add sar logging roles  https://review.openstack.org/61311206:42
*** AJaeger has quit IRC06:43
*** aojea has joined #openstack-infra06:45
*** quiquell|off is now known as quiquell06:49
*** xek has joined #openstack-infra06:49
*** AJaeger has joined #openstack-infra06:57
*** xek has quit IRC06:59
*** rcernin has quit IRC07:00
*** apetrich has joined #openstack-infra07:00
*** pcaruana has joined #openstack-infra07:04
*** cfriesen has joined #openstack-infra07:08
*** ccamacho has joined #openstack-infra07:09
openstackgerritTobias Henkel proposed openstack-infra/zuul-jobs master: DNM: Run tox with eatmydata  https://review.openstack.org/61322107:11
*** SpamapS has quit IRC07:11
openstackgerritTobias Henkel proposed openstack-infra/zuul master: DNM: Enable sar logging for unit tests  https://review.openstack.org/61311707:11
openstackgerritMerged openstack-infra/project-config master: Normalize projects.yaml  https://review.openstack.org/61320607:15
*** jpena|off is now known as jpena07:15
*** ginopc has joined #openstack-infra07:16
openstackgerritTobias Henkel proposed openstack-infra/zuul master: DNM: pass LD_PRELOAD and LD_LIBRARY_PATH vars  https://review.openstack.org/61322207:20
*** SpamapS has joined #openstack-infra07:24
*** hashar has joined #openstack-infra07:25
openstackgerritTobias Henkel proposed openstack-infra/zuul-jobs master: DNM: Run tox with eatmydata  https://review.openstack.org/61322107:37
openstackgerritMerged openstack-infra/zuul-jobs master: Fix flake8 3.6.0 errors  https://review.openstack.org/61320507:37
openstackgerritTobias Henkel proposed openstack-infra/zuul master: DNM: Enable sar logging for unit tests  https://review.openstack.org/61311707:37
openstackgerritTobias Henkel proposed openstack-infra/zuul master: DNM: pass LD_PRELOAD and LD_LIBRARY_PATH vars  https://review.openstack.org/61322207:37
*** carl_cai has joined #openstack-infra07:42
*** Emine has joined #openstack-infra07:46
*** cfriesen has quit IRC07:48
*** ykarel is now known as ykarel|lunch07:59
*** jpich has joined #openstack-infra08:00
*** ccamacho has quit IRC08:00
*** ccamacho has joined #openstack-infra08:01
openstackgerritTobias Henkel proposed openstack-infra/zuul-jobs master: DNM: Run tox with eatmydata  https://review.openstack.org/61322108:03
*** bhavikdbavishi has quit IRC08:11
openstackgerritNatal Ngétal proposed openstack/gertty master: [Documentation] Add a link for aur.  https://review.openstack.org/61323808:21
*** roman_g has joined #openstack-infra08:32
*** ykarel|lunch is now known as ykarel08:34
*** e0ne has joined #openstack-infra08:42
*** electrofelix has joined #openstack-infra08:54
*** ccamacho has quit IRC08:54
*** ccamacho has joined #openstack-infra08:55
*** xek has joined #openstack-infra08:57
*** dtantsur|afk is now known as dtantsur09:09
*** tosky has joined #openstack-infra09:16
ssbarnea|bkp2hi! i want to test some commands on the f28 image we use in CI. How can I do this?09:26
quiquellianw: ^09:26
quiquellianw: Can we get the image and start it up at a local host ?09:27
ssbarnea|bkp2f28 images has some customizations that are affecting what we do and I cannot really wait for CI for these. Currently I am using ~clean f28 which is good for generic use-case, but i need to cover CI too.09:27
*** ccamacho has quit IRC09:27
ianwquiquell ssbarnea|bkp2 : you can grab the images from https://nb01.openstack.org/images/ ... they boot with config-drive + glean, so should pick up root keys via that09:30
quiquellianw: Do the have the exclusion at dnf.conf ?09:33
quiquellianw: Or this is done later on at some ansible role ?09:33
ianwquiquell: that will all be in the base image09:33
quiquellianw: ack09:34
quiquellykarel: ^09:34
*** psachin has quit IRC09:35
ykarelquiquell, yes that only i refereed09:35
*** panda|off has quit IRC09:35
ykarelin #oooq09:35
ykarelok u referred dnf.conf thing09:36
*** kopecmartin is now known as kopecmartin|afk09:38
*** panda has joined #openstack-infra09:38
*** derekh has joined #openstack-infra09:39
*** psachin has joined #openstack-infra09:40
ykarelquiquell, so if i got it correct https://github.com/openstack/diskimage-builder/blob/e796b3bc1884cbb0a7259be486d835ca114cca9e/diskimage_builder/elements/pip-and-virtualenv/install.d/pip-and-virtualenv-source-install/04-install-pip#L29-L30 and https://github.com/openstack/diskimage-builder/blob/e796b3bc1884cbb0a7259be486d835ca114cca9e/diskimage_builder/elements/pip-and-virtualenv/install.d/pip-and-virtualenv-source-install/04-insta09:41
ykarelll-pip#L156 does add excludes09:41
ykareland the images in upstream are build using diskimage builder iirc, ianw ?09:41
quiquellykarel: I think that's it, and it's by design so we don't mess around with those09:42
ykarelquiquell, yes if we know what we are doing, we can hack i think09:43
*** yamamoto has quit IRC09:44
quiquellykarel: You mean changing dnf.conf at our jobs ?09:45
ianwykarel: yes, it's using those images09:45
*** yamamoto has joined #openstack-infra09:45
ykarelianw, ack09:45
ianwi mean elements09:45
ykarelquiquell, yes if it's really required when using nodepool images09:45
ykarelianw, ack09:46
ianwit might be that "yum install python-virtualenv" when it's held does nothing, and "dnf install python-virtualenv" fails09:46
quiquellykarel: We can fix our stuff just not installing if it's present in the system so we use nodepool versions09:46
quiquellykarel: I mean for example if virtualenv is already install don't install python*-virtualenv and the same for pip and setuptools09:47
quiquellssbarnea|bkp2: ^09:47
ykarelquiquell, that's what i said, if really required, if we can fix other way it's fine,09:47
ykarelbut remember we need to add support for non nodepool images09:47
quiquellykarel: I will try to do that at my review09:47
ykarelquiquell, ack09:48
quiquellykarel: yep, just want to make the job for f28 pass and then productify the changes so they work for all the environments09:48
ykarelquiquell, cool09:48
quiquellykarel: Puff let's see09:48
*** yamamoto has quit IRC09:49
*** carl_cai has quit IRC09:52
*** longkb has quit IRC09:58
*** jbadiapa has quit IRC10:02
*** psachin has quit IRC10:03
*** jbadiapa has joined #openstack-infra10:04
*** psachin has joined #openstack-infra10:05
*** yamamoto has joined #openstack-infra10:17
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool master: Add tox functional testing for drivers  https://review.openstack.org/60951510:20
*** bhavikdbavishi has joined #openstack-infra10:20
*** psachin has quit IRC10:25
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool master: Implement a Kubernetes driver  https://review.openstack.org/53555710:25
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool master: Add tox functional testing for drivers  https://review.openstack.org/60951510:25
*** psachin has joined #openstack-infra10:27
openstackgerritTobias Henkel proposed openstack-infra/zuul-jobs master: Add prepare-workspace-git role  https://review.openstack.org/61303610:31
*** yamamoto has quit IRC10:34
*** yamamoto has joined #openstack-infra10:35
*** udesale has quit IRC10:36
*** Emine has quit IRC10:38
*** Emine has joined #openstack-infra10:39
*** yamamoto has quit IRC10:39
ssbarnea|bkp2what is the best place to talk about bindep? ... and its inability to have conditions based on distro version.10:41
openstackgerritBenoît Bayszczak proposed openstack-infra/zuul master: Disable Nodepool nodes lock for SKIPPED jobs  https://review.openstack.org/61326110:43
ssbarnea|bkp2https://storyboard.openstack.org/#!/story/2004176 -- bindep no support for disto versions10:44
openstackgerritBenoît Bayszczak proposed openstack-infra/zuul master: Disable Nodepool nodes lock for SKIPPED jobs  https://review.openstack.org/61326110:47
*** pbourke has quit IRC10:48
*** pbourke has joined #openstack-infra10:48
*** psachin has quit IRC11:02
*** ccamacho has joined #openstack-infra11:07
*** dave-mccowan has joined #openstack-infra11:15
*** carl_cai has joined #openstack-infra11:17
*** florianf is now known as florianf|pto11:19
*** yamamoto has joined #openstack-infra11:24
*** adriancz has quit IRC11:31
*** jpena is now known as jpena|lunch11:34
*** rh-jelabarre has joined #openstack-infra11:34
*** jesusaur has quit IRC11:43
*** lpetrut has quit IRC11:44
*** jesusaur has joined #openstack-infra11:46
*** bhavikdbavishi has quit IRC11:56
*** ldnunes has joined #openstack-infra12:01
*** haleyb has joined #openstack-infra12:02
*** fuentess has joined #openstack-infra12:03
*** quiquell is now known as quiquell|lunch12:05
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool master: Add tox functional testing for drivers  https://review.openstack.org/60951512:05
*** fuentess has quit IRC12:09
*** adrianreza has quit IRC12:09
*** jistr_ is now known as jistr12:14
*** jamesmcarthur has joined #openstack-infra12:14
*** jamesmcarthur has quit IRC12:19
*** zul has joined #openstack-infra12:19
*** boden has joined #openstack-infra12:21
*** pcaruana has quit IRC12:26
*** rlandy has joined #openstack-infra12:27
*** ykarel is now known as ykarel|afk12:28
*** beagles is now known as beagles_mtg12:29
*** janki has quit IRC12:31
*** janki has joined #openstack-infra12:31
*** ykarel|afk has quit IRC12:33
*** jpena|lunch is now known as jpena12:33
fungissbarnea|bkp2: here is probably the best place to talk about bindep (or on the infra ml)12:34
*** pcaruana has joined #openstack-infra12:39
Shrewscorvus: clarkb: before we merge the zk cluster stuff to the launchers and zuul, I think we need a plan of action on how to handle the current provider instances. If we just switch, we'll have a LOT of instances we'll have to manually clean up (and rather quickly to free up quota).12:41
Shrewscorvus: clarkb: nodepool won't see those as leaked instances since they'll have the right metadata12:41
Shrewsmaybe we should first set max-servers to 0 for all providers and let most of them go away naturally?12:41
*** gfidente has quit IRC12:42
*** gfidente has joined #openstack-infra12:42
fungissbarnea|bkp2: i've commented on your story12:42
fungissbarnea|bkp2: we've made extensive use of that feature in the past when, say, packages were renamed, split, combined, et cetera between different distro versions12:43
*** quiquell|lunch is now known as quiquell12:43
sshnaidm|ruckfungi, clarkb do you know if it's possible to check if docker proxy works fine? We have some jobs (but not all) failing because a long time for containers preparing, I'd like to ensure we still download them from proxy, not from docker.io12:44
sshnaidm|ruckfungi, clarkb or maybe you know the way to check it in jobs - whether we download from proxy or docker.io directly12:45
* Shrews steps away momentarily for bfast before the mass zoo migration12:46
fungii'm not familiar enough with what sort of debug output docker commands provide. does it not tell you the urls it's using?12:46
fungiShrews: clarkb: i'm similarly going to go try to catch early voting while it's hopefully quiet, and then be back as quickly as possible. what time did we say the zk migration was starting?12:46
*** smarcet has joined #openstack-infra12:47
*** smarcet has quit IRC12:48
*** jcoufal has joined #openstack-infra12:50
*** ykarel has joined #openstack-infra12:50
corvusShrews, clarkb: or if we set min-ready to 0 then stop zuul, [almost] all of the nodes should be deleted12:50
*** ansmith has joined #openstack-infra12:50
*** janki has quit IRC12:56
*** bnemec has joined #openstack-infra12:56
*** yamamoto has quit IRC12:56
*** yamamoto has joined #openstack-infra12:56
*** bobh has joined #openstack-infra13:02
*** lpetrut has joined #openstack-infra13:03
*** rascasoft has quit IRC13:05
*** _ari_ has quit IRC13:05
*** rascasoft has joined #openstack-infra13:05
*** kgiusti has joined #openstack-infra13:05
*** agopi has quit IRC13:12
*** hashar is now known as hasharAway13:12
*** kgiusti has quit IRC13:15
*** rascasoft has quit IRC13:15
*** kgiusti has joined #openstack-infra13:17
*** rascasoft has joined #openstack-infra13:17
*** eharney has joined #openstack-infra13:19
openstackgerritFabien Boucher proposed openstack-infra/zuul master: WIP - Pagure driver  https://review.openstack.org/60440413:20
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Filter file comments for existing files  https://review.openstack.org/61316113:21
*** felipemonteiro has joined #openstack-infra13:23
Shrewsfungi: i think t-35 minutes?13:25
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Collect docker logs after quick-start run  https://review.openstack.org/61302713:25
Shrewscorvus: yes, that would be faster i think. then we'd just have to delete the ready nodes that are left13:26
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Add opendev nameservers (2/2)  https://review.openstack.org/61006613:27
*** ansmith has quit IRC13:28
*** ansmith_ has joined #openstack-infra13:28
*** jistr is now known as jistr|call13:29
openstackgerritJames E. Blair proposed openstack-infra/zuul master: WIP: support foreign required-projects  https://review.openstack.org/61314313:30
*** tobberydberg has joined #openstack-infra13:30
openstackgerritDavid Shrewsbury proposed openstack-infra/project-config master: Disable all providers in nodepool launchers  https://review.openstack.org/61332913:31
Shrewsclarkb: corvus: ^^^ in case that's the route we choose13:31
*** rascasoft has quit IRC13:31
*** yamamoto has quit IRC13:31
*** yamamoto has joined #openstack-infra13:31
*** d0ugal has quit IRC13:33
*** lbragstad has joined #openstack-infra13:33
*** rascasoft has joined #openstack-infra13:33
*** d0ugal has joined #openstack-infra13:34
clarkbwe need to restart zuul before we restart the launchers then?13:35
clarkb(thats fine, making sure I understand)13:36
*** pcaruana has quit IRC13:37
clarkbfungi: re docker it doesnt even use urls, it is more than a bit furstrating13:37
clarkbwith current docker its just a hostname iirc and images mist be served relative to the http root at that lovation13:38
fungiokay, i'm back13:39
fungiwith 20 minutes to spare13:40
*** agopi has joined #openstack-infra13:40
Shrewsclarkb: i think we (1) set max-servers to 0, (2) stop zuul, (3) delete (or record) any instances we need to manually cleanup, (4) merge your zk change to launchers & zuul, (5) revert max-servers change, (6) start launchers & zuul13:40
Shrewsi *think* ???13:41
* Shrews would like a logic check there13:41
Shrewsor if someone has a better plan...13:41
clarkbShrews: I think there is a 2.5 of stop launchers13:42
Shrewsclarkb: oh yes13:42
Shrewsactually, if we stop launchers, do we need to set max-servers to 0?13:42
Shrewsoh, yes. we need them to delete the USED instances13:43
Shrewsso we need a 2.1 step to wait for that to happen13:43
clarkbyup13:45
clarkbshould I go ahead and put everything in  them emergency file now, then we can approve and merge stuff and use kick.sh to apply things?13:45
clarkbwe won't be able to rely on zuul merging stuff while we do the work13:45
clarkb(we may also want to set max-server:0 by hand? since that is a short temporary state?13:46
Shrewsi'm fine with setting max-servers by hand13:48
Shrewswill be shorter downtime13:48
*** jcoufal has quit IRC13:48
Shrewsmaybe we need an announcement?13:48
clarkbnl*, ze*, zm*, and zuul* are in the emergency file now13:48
clarkbShrews: ya we can #status notice as soon as we start rolling and I'll let the release team know13:49
clarkbas long as we capture queues and restore them only state changes in gerrit that happen while zuul-scheduler is off will be a problem13:49
*** d0ugal has quit IRC13:49
clarkbhttps://review.openstack.org/#/c/612443/ and https://review.openstack.org/#/c/612442/ should be safe to approve now with those hosts in the emergency file. Any objections to doing that now?13:50
*** beagles_mtg is now known as beagles13:51
Shrewslet's start Operation Cattle Drive  \o/13:51
*** d0ugal has joined #openstack-infra13:51
*** jcoufal has joined #openstack-infra13:51
clarkbnote I did use the * glob in the emergency file which I think works?13:51
corvuswe will find out13:52
clarkbheh I can list them too if we want13:52
*** rpittau has quit IRC13:52
clarkbpretty sure the * should work13:52
Shrewsshould we go ahead and set max-servers to 0 in the configs?13:56
*** efried has quit IRC13:56
clarkbShrews: we should let those two changes merge first (they are waiting on node allocations)13:56
Shrewso rite13:56
* Shrews enables zuul --turbo option13:57
*** efried has joined #openstack-infra13:57
*** smarcet has joined #openstack-infra13:58
clarkbsshnaidm|ruck: when we are done with this zuul and nodepool work, I can take a look at docker things13:58
sshnaidm|ruckclarkb, thanks13:59
clarkblooks like the tripleo gate just did a almost full restart ahead of our changes :/13:59
clarkbwe might consider direct merging if we are on a tight time schedule, I think corvus was the one with the time bounds?14:01
clarkbcorvus: do you think  we should bypass the gate on those two changes? they did both pass check14:01
corvusthe good news is it just did another partial reset14:02
clarkbI expect we'll move fairly quickly once those two changes merge. The biggest time sink is likely waiting for executors to stop and launcher to delete nodes14:03
corvusso we're getting nodes now; i think we can just let them merge14:03
clarkbwfm14:03
*** smarcet has quit IRC14:04
openstackgerritSimon Westphahl proposed openstack-infra/zuul master: Use branch for grouping in supercedent manager  https://review.openstack.org/61333514:07
*** ykarel is now known as ykarel|away14:08
clarkbok I don't think the * glob in the emergency file worked14:09
clarkbI'm going to list out the nodes instead14:09
clarkb(puppet just ran on zuul01)14:09
clarkb(which is fine at this point, nothing has merged yet)14:10
corvusthey might be regexes, but listing is good now i think :)14:12
clarkbthough project-config merging might make things interesting with the launchers racing ansible, arg14:13
clarkbthe launchers should puppet in about 15 minutes and project config will merge before then14:13
fungijust a heads up, the etherpad system cpu bump from yesterday has returned as of 14:00z from the looks of it14:13
openstackgerritSimon Westphahl proposed openstack-infra/zuul master: Use branch for grouping in supercedent manager  https://review.openstack.org/61333514:13
openstackgerritSimon Westphahl proposed openstack-infra/zuul master: Use branch for grouping in supercedent manager  https://review.openstack.org/61333514:15
fungii suspect it's memory pressure and etherpad is using a bunch of cache memory for its operation14:16
clarkbfungi: ya that is what made me think about the hwe kernels because memory was weird on the xenial kernel on our executors and switching to hwe fixed it14:16
openstackgerritMerged openstack-infra/project-config master: Switch nodepool launchers to use new zk cluster  https://review.openstack.org/61244214:16
fungilooking at the graph, restarting nodejs yesterday the cache memory usage spiked back up to ~3.5 out of 4 gb immediately, and as of nowish it's about out of free memory14:16
clarkbok ^ may or may not apply to the launchers depending on whether or not the globs work for the launchers14:16
clarkbShrews: ^ fyi, I'm not sure there is anything we can do to control that other than to force stop ansible on bridge right now14:17
fungiso we may want to think about resizing etherpad.o.o to 8gb?14:17
clarkboh nevermind14:17
clarkbI did my math wrong and puppet ran on the launchers a few minutes ago?14:17
Shrewslaunchers still point to nodepool.o.o14:17
clarkbShrews: yup I think we ended up having the timing just work out afterall14:18
clarkbso now just waiting on the zuul config update and we can do the manual steps after that14:18
clarkbfungi: what is odd is the version of etherpad and the version of nodejs hasn't changed and we kept the flavor fixed on the upgrade14:18
mwhahahais gerrit ssh broken or is it just me?14:19
clarkbmwhahaha: I can ssh to gerrit from here14:19
clarkbthe ls-projects command in particular works for me14:19
mwhahahahmmm ok14:19
corvusditto14:19
mwhahahahrm it seems to be trying ipv614:20
mwhahahathat's odd14:20
corvusmwhahaha: v6 wfm; maybe your v6 route is sad?14:21
clarkbfungi: before we bump the memory I'd be inclined to try the hwe kernel14:21
clarkbfungi: then if that doesn't help a rebuild on bigger flavor will give us the normal kernel14:21
mwhahahacorvus: yea something was odd, i  started and ping which was delayed for a bit then when it kicked in it worked.14:21
mwhahahasorry for the false alarm14:21
corvusfalse alarms better than real ones14:22
dmsimardclarkb: out of curiosity, have we tried etherpad on 18.04 ?14:22
clarkbdmsimard: no, because we don't have deployment stuff for etherpad that works on 18.04 currently14:22
dmsimardack14:22
clarkbif someone wants to invest in that nowish we could do that too. It isn't a terribly complicated system once you get the nodejs and npm stuff working (I have no idea if that is a solved problem in ansible land, but containers theoretically make that better too)14:23
clarkbShrews: sounds like you are watching the launchers, are you planning to set max-servers to 0 by hand? corvus did you want to do the zuul shutdown? I can run the kick.sh commands and help watch the cleanup that happens14:24
fungiclarkb: i agree, trying hwe kernel next would be good14:24
clarkbalso how does this look: #status notice Zuul and Nodepool services are being restarted to migrate them to a new Zookeeper cluster. THis brings us an HA database running on newer servers.14:26
fungilgtm14:26
clarkbI'll send that as soon as we start making changes to the running services14:27
Shrewsclarkb: i can do the launcher configs14:27
fungii guess it would be redundant to also say we're taking our quotas down to zero14:27
Shrewsclarkb: are we ready to set max-servers to 0 now?14:27
clarkbShrews: lets let the last job finish just in case it has to restart or something14:27
Shrewsclarkb: awaiting the go signal...14:27
corvusi can do the zuul shutdown14:29
clarkbI think that job is currently compiling afs modules14:30
clarkbmight be a couple minutes more if so14:30
corvusi should just do a full system restart, yeah?14:31
corvusjust to go ahead and get everything current14:31
clarkbcorvus: yes, but do a stop, then we'll pause for a sec to make sure configs are updated then we'll do a start14:31
corvus(we only *need* to do the scheduler, but since that's the disruptive one)14:31
clarkbcorvus: but this way we have good data on current zuul tree so maybe zuul can do a release next week14:32
gnuoyHi, does https://review.openstack.org/#/c/608866/ need re-approval now that the dependant change has landed ?14:32
corvusclarkb: ack.  we still have the zuul-web pid bug, so i'll run the restart playbook and wait to remove the pidfile until we're ready.14:32
clarkbcorvus: shrews didn't want to apply the zk changes to running processes. So we are stopping everything, updating config, then starting everything14:32
*** gfidente has quit IRC14:32
corvus++14:33
clarkbgnuoy: a recheck will work too.14:33
fungiclarkb: do rechecks work now even when there's already a verified +1?14:33
fungidid zuul v3 solve that?14:34
clarkbfungi: they should, I think it was gerrit 2.13 that fixed that14:34
fungioh, interesting14:34
clarkbfungi: the problem before was that older gerrit only sent vote deltas. So if you reapplied a +1 that info wasn't sent to zuul14:34
clarkbzaro fixed it so that gerrit sends the entire event content14:34
gnuoyclarkb, excellent, thanks, will do14:35
openstackgerritMerged openstack-infra/system-config master: Switch zuul scheduler to new zk cluster  https://review.openstack.org/61244314:36
clarkbShrews: ^ I think you can set max-servers to 0 now.14:36
Shrewsok14:36
clarkb#status notice Zuul and Nodepool services are being restarted to migrate them to a new Zookeeper cluster. THis brings us an HA database running on newer servers.14:38
openstackstatusclarkb: sending notice14:38
Shrewsok, done14:38
Shrewsgood to stop zuul now14:38
clarkbcorvus: ^14:38
corvusstopping zuul14:38
clarkbI'm going to update system-config on bridge.o.o so that we are ready to run kick.sh14:38
-openstackstatus- NOTICE: Zuul and Nodepool services are being restarted to migrate them to a new Zookeeper cluster. THis brings us an HA database running on newer servers.14:39
corvusoh neat all the zuul hosts are disabled...14:40
corvustrying again14:41
openstackstatusclarkb: finished sending notice14:41
corvusscheduler stopped14:41
clarkbI'm watching nodepool list now to see nodes hopefully get cleaned up14:42
Shrewssame14:42
clarkbyup a lot of deleting in there now14:42
*** maciejjozefczyk has quit IRC14:42
corvusclarkb: want to go ahead and kick the zuul servers?14:42
Shrewswe should have only READY and HOLD nodes left eventually14:43
corvusor, well, at least zuul0114:43
fungignuoy: the reason i recommended having the change reapproved is that it's non-urgent (just bookkeeping), so mnaser or dhellmann will get to it when they're available14:43
*** otherwiseguy has joined #openstack-infra14:43
*** munimeha1 has joined #openstack-infra14:43
Shrewscorvus: all requests will be declined14:43
fungia reapproval will run fewer jobs since the change already passed the check pipeline once14:43
clarkbShrews: corvus in that case maybe we wait for the launchers to move over first?14:43
Shrewsyeah14:44
clarkbok I will wait on the kick.sh then14:44
corvusShrews: i'm asking clarkb to kick zuul01 so it has the correct config in place.  i was not planning on starting zuul.14:44
gnuoyfungi, ah ,ok. I didn't appreciate there was a mechanism for requesting re-approval14:44
corvusit takes a long time to kick14:44
Shrewsoh14:44
clarkbcorvus: do we know if puppet will start the scheduler?14:45
corvusclarkb: it .... well better not.14:45
*** kopecmartin|afk is now known as kopecmartin14:46
*** gfidente has joined #openstack-infra14:46
*** ramishra has quit IRC14:46
*** smarcet has joined #openstack-infra14:47
clarkbwe are down toe 28 nodes in the launcher I expect we can just wait since we are almost ready to stop the launchers then kick and start them?14:47
corvuswe have a policy of not having our config management start user-facing services.  i really hope we have not decided to violate that.14:47
Shrewsone more to delete14:48
clarkbcorvus: I'm skimming the puppet and I think it will actually do the right thing14:48
clarkbcorvus: we ensure => undef but enable => true in the scheduler service definition14:48
clarkbcorvus: however I'm not sure if ensure => undef has weird puppet default behavior like ensure running?14:48
*** Swami has joined #openstack-infra14:49
corvusclarkb: can you just run it and we'll find out?14:49
Shrewswow, we've created (or attempted to create) over 3 million nodes since running nodepool v314:49
clarkbcorvus: I can14:49
clarkbcorvus: doing that now14:49
*** jistr|call is now known as jistr14:50
Shrewshrm, vexxhost is being slow with that last delete14:50
*** Swami has quit IRC14:50
clarkbShrews: I've recorded the nodepool list output and since there are held and ready nodes to delete anyway maybe lets move ahead with stopping the launchers now?14:50
clarkbShrews: then I can kick.sh the launchers too14:51
Shrewsclarkb: yeah, we can get that last one manually too if we need to14:51
clarkbShrews: ya lets do that14:51
Shrewsstopping launchers...14:51
Shrewsjust fyi, http://paste.openstack.org/raw/733050/14:52
clarkbcorvus: puppet says it is done on zuul0114:52
corvusclarkb: i agree.  config looks good, no procs running14:52
clarkbcorvus: let me know when you think I should run it on ze* and zm*14:52
Shrewsclarkb: corvus: launchers stopped14:52
clarkbShrews: ok kicking launchers now14:52
corvusclarkb: they won't use it so it's not important to run on the other z servers14:53
*** gema has joined #openstack-infra14:53
clarkbcorvus: oh right. We can let normal puppet update that then14:53
Shrewsclarkb: that should take care of resetting max-servers too, right?14:53
clarkbShrews: it should14:53
Shrewsi'll make sure14:53
clarkbShrews: the max-servers thing ended up working really well. Much smaller list of things to cleanup this way14:54
Shrewsclarkb: yeah14:55
*** bobh has quit IRC14:55
clarkbShrews: kick.sh is done14:55
Shrewsclarkb: we would have quickly had quota issues, too14:55
clarkbShrews: I think you are good to start launchers when you are happy with their configs14:56
Shrewschecking configs...14:56
*** tobberydberg has quit IRC14:56
Shrewsclarkb: configs look good. i'm going to start nl02 first since it has the lowest setting for max-servers14:57
clarkbShrews: ok14:57
ShrewsMarking for delete leaked instance ubuntu-bionic-limestone-regionone-0002677659 (819013fc-1051-4655-bc61-1769bdc1af4d) in limestone-regionone (unknown node id 0002677659)14:59
Shrewsoh, maybe we validate node IDs???14:59
* Shrews looks14:59
clarkbok not much activity on nl02 because we set min ready with nl0114:59
clarkbnl01 should be the next one to start14:59
clarkbnl02 looks happy in its idling though15:00
Shrewsclarkb: ok, maybe this max-servers step was unnecessary15:00
Shrewsstarting nl01 now15:00
clarkbah the alien cleanup is more sophisticated than anticipated?15:01
corvuswell, it helped reduce churn.  but yeah, nodepool should do the cleanup for us.15:01
*** cfriesen has joined #openstack-infra15:01
*** jtomasek has quit IRC15:01
clarkbcorvus: thats a good point, we avoid the shock of it having to do it all at once15:02
clarkbwe have a bunch of building nodes now. Do we want to wait to see them go ready before starting the other launchers?15:02
corvus(we set metadata with the nodepool id, and if that id isn't in the db, it's a leaked instance)15:02
Shrewsclarkb: yeah, give it a minute15:02
clarkbShrews: these first boots may be slower than normal because the images are new and haven't been used yet (whcih caches them on hypervisors)15:03
*** eernst has joined #openstack-infra15:03
*** jpena is now known as jpena|brb15:04
Shrewsclarkb: yeah, i just wanted to validate some stuff first. good to start the others now15:04
*** tobberydberg has joined #openstack-infra15:04
clarkbShrews: are you going to start them or should I help with that?15:04
*** ccamacho has quit IRC15:05
Shrewsi can do it15:05
clarkbok15:05
Shrews03 and 04 started now15:05
clarkbwe have ready nodes15:05
corvusshall i continue with zuul?15:06
clarkbcorvus: Shrews  I think we can start zuul now that ^ is in place15:06
Shrewsi see some ready nodes now15:06
clarkbcorvus: I'm good to start zuul if shrews is15:06
clarkbhttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64704&rra_id=all is going to be a fun graph to watch15:07
corvuszuul is starting, but we're still a few mins away from node requests.15:07
Shrewsup to corvus at this point. nodepool is priming, but ready15:07
Shrewsclarkb: unless ianw updated the graphs for the new stat names, some may be empty15:08
clarkbShrews: that is cacti network bw usage on the current zk leader. Independent of the statsd stuff15:08
Shrewsoh, that's cacti15:08
Shrewsyeah15:08
*** yamamoto has quit IRC15:09
*** yamamoto has joined #openstack-infra15:10
*** smarcet has quit IRC15:10
clarkbcorvus: how is zuul doing?15:12
clarkbiirc its about a 5 minute startup for zuul? should start seeing stuff zoon?15:12
corvusclarkb: scheduler is running, executors are stopping15:12
clarkbheh zoon15:12
corvusnode requests are being submitted15:12
clarkbsee them on then odepool side, lots of building requests15:13
corvuswe're up to 145 node reqs15:13
*** tobberydberg has quit IRC15:13
corvusi'll re-enqueue now15:13
clarkbthere are in-use nodes now too15:13
corvusthat's surprising15:13
corvusoh, they're marked in use in nodepool, but zuul hasn't actually begun execution yet15:14
corvus(it has no executors online)15:14
clarkbhuh15:14
corvusit's normal -- zuul has claimed the nodes15:15
corvusexecutors have started15:15
Shrewsclarkb: so i don't think the nodes in 'hold' state from before the migration will be cleaned up by nodepool15:15
Shrewsso those will have to be tracked15:15
*** jpena|brb is now known as jpena15:15
*** jamesmcarthur has joined #openstack-infra15:15
clarkbShrews: noted, we also need to clean up the old image builds on the builders15:15
corvusShrews: i think they should be deleted as leaks too15:15
corvuszuul is completely restarted; re-enqueue is in progress15:16
*** yamamoto has quit IRC15:16
Shrewscorvus: oh, i think you're right15:16
Shrewsi hope they weren't needed!  :)15:16
clarkbShrews: we can always hold new ones15:17
corvusour max servers line in grafana is lower than before15:17
Shrewsyup, i know. just sayin...15:17
corvusmax went from 1034 -> 93415:18
clarkbthat may be ovh gra115:18
clarkbwe've been sort of manually editing it and if the kick.sh undid that we'd lose almost a hundred nodes15:18
* clarkb looks15:18
corvusoh there we go it's back now15:18
clarkbnope gra1 is correct15:18
corvusif you refresh, you can see it's back at 1034 after a step up15:19
clarkboh that must've been stats lag due to starting launchers one at a tiem?15:19
clarkbso it stepped up for each launcher started15:19
*** quiquell is now known as quiquell|off15:21
clarkbthe in-use number keeps going up according to grafana15:22
clarkbzk stat continue to look happy15:22
*** smarcet has joined #openstack-infra15:22
clarkbzk network graph spiking but not crazy15:22
clarkbcorvus: Shrews thoughts on removing all of these hosts from the emergency file at this point?15:23
Shrewsi think nodepool is happy15:23
clarkb`echo stat | nc localhost 2181` is the zk monitoring hack I learned if anyone else wants to look at zk too15:24
clarkbthe Mode: and Oustanding: fields are interesting. Mode tells you if cluster is set up right (followers and leaders) and outstanding shows you if you have a syncage backlog I think15:25
*** tobberydberg has joined #openstack-infra15:25
clarkbIf we are good removing the nodes from the emergency file I'll send #status notice The Zuul and Nodepool database transition is complete.15:27
corvuswfm.  reenqueue is still proceeding (busy mergers), but that's fine.15:27
clarkbok15:27
Shrewsi'm now going to find a fedex drop off point to send a laptop back to HQ15:29
clarkb#status notice The Zuul and Nodepool database transition is complete. Changes updated during the Zuul outage may need to be rechecked.15:29
openstackstatusclarkb: sending notice15:29
*** bobh has joined #openstack-infra15:29
clarkbemergency file is updated. I left the builders in it due to the need for an sdk release15:29
-openstackstatus- NOTICE: The Zuul and Nodepool database transition is complete. Changes updated during the Zuul outage may need to be rechecked.15:31
clarkbcorvus: the zuul status page doesn't seem to want to load for me. Is that reconfigure induced slowness that we've seen before? I expect it is15:31
*** armax has joined #openstack-infra15:32
corvusclarkb: yes, i expect that to continue as long as the re-enqueue is happening -- the same gearman worker handles both things15:32
clarkbcorvus: roger15:32
corvusclarkb: it should eventually load -- like, it shouldn't take more than a few minutes (you may need some amount of refreshing due to js stuff)15:32
openstackstatusclarkb: finished sending notice15:33
clarkbcorvus: ya it did eventually reload15:33
corvusre-enqueue is finished15:33
corvusi think that's everything then15:33
clarkbya the only other outstanding item on my list is cleaning out the old images from the nodepool builders15:34
clarkbthat isn't urgent but I shall try to get around to it today while shrews is around15:34
clarkbthen I need to delete nodepool.o.o (say on monday?)15:34
clarkbinfra-root ^ if you have anything on nodepool.o.o you want to keep please grab it now :)15:34
clarkbcorvus: did you write down the zuul sha1 that is running?15:35
*** bobh has quit IRC15:35
clarkbcorvus: probably want to grab that for a possible zuul reelase?15:35
corvusclarkb: it's on the status page now15:35
corvus(it's the scheduler sha)15:35
clarkboh nice15:35
clarkbthank you everyone for helping with this. Also pabelanger did a bunch of the prep work a while back15:36
clarkbsshnaidm|ruck: hey, I'm about to context switch to docker things. I did notice that https://review.openstack.org/608319 is failing pep8 in the gate due to a name not being valid15:38
*** ccamacho has joined #openstack-infra15:38
clarkbsshnaidm|ruck: can you point me to a specific job that you'd like to learn more about the docker setup for?15:38
clarkbsshnaidm|ruck: I'd like to start with the logs to either confirm we log the important bits or if not, undersatnd what is missing15:38
fungimtreinish: do you think any of the security fixes mentioned in recent releases at https://github.com/eclipse/mosquitto/blob/master/ChangeLog.txt are relevant to our occasional crashes? especially the cve-2017-7651 fix in 1.4.15 looks suspicious15:38
*** bobh has joined #openstack-infra15:39
fungidebian just backported a bunch of security fixes per https://security-tracker.debian.org/tracker/DSA-4325-1 https://security-tracker.debian.org/tracker/DLA-1409-1 https://security-tracker.debian.org/tracker/DLA-1334-115:40
fungiin theory ubuntu ought to be able to import those updates15:40
sshnaidm|ruckclarkb, I'd like to ensure that afs docker proxies really cache the image, we have this config: http://logs.openstack.org/87/610087/4/gate/tripleo-ci-centos-7-scenario001-multinode-oooq-container/2e409e1/logs/undercloud/etc/docker/daemon.json.txt.gz15:44
sshnaidm|ruckclarkb, I think it's enough to use docker proxy, right? But I can't really check where I download the image from in reality15:45
*** apetrich has quit IRC15:45
clarkbsshnaidm|ruck: in the job logs can you point me to where the images are fetched? we can then cross check against the mirror configured there15:45
clarkbsshnaidm|ruck: the ability to check where you downloaded the image from would be a logging function of whatever you use to pull thei mage15:45
mordredclarkb: it would require starting the docker daemon with debug logging I believe15:45
mordredten there will be http trace entries in the logs that will indicate from where docker actually fetched things15:46
clarkbsshnaidm|ruck: but we can check the mirror node logs too since I Guess docker doesn't etll you by default15:46
sshnaidm|ruckclarkb, yeah, for example here: http://logs.openstack.org/87/610087/4/gate/tripleo-ci-centos-7-scenario001-multinode-oooq-container/2e409e1/logs/undercloud/home/zuul/install-undercloud.log.txt.gz#_2018-10-25_03_07_03_97515:46
sshnaidm|ruckdocker.io/tripleomaster/centos-binary-rsyslog-base in 2018-10-25 03:07:03.975 from http://mirror.bhs1.ovh.openstack.org:8081/registry-1.docker/15:47
clarkbsshnaidm|ruck: sha256:7810f63ac7ce7026eb5bcb308fd485fb7aa3224707bb2c57c24d2dedd7992cbb looks like the hash for that image right ? (all of the image serving is from hashes so I'll be grepping that in the logs)15:47
clarkboh that is specifically the centos base image. I'll check that one to start15:47
clarkbsha256:a55bd98df50363f394ecbb21d19aade7e250590211dd64e83019f8b9cc5273ea looks like a layer for rsyslog specifically15:48
clarkbneither sha256 is in the apache logs15:50
*** yamamoto has joined #openstack-infra15:50
*** eernst has quit IRC15:51
sshnaidm|ruckclarkb, I see this, maybe it's it:                 "docker.io/tripleomaster/centos-binary-rsyslog-base@sha256:19ff38dcdc12a167bcf8dcbef4cb55247194b101d8fc1c4aff781ce73a794756"15:51
sshnaidm|ruckand this: "Id": "sha256:5455eec0649474d22cd21dc3a08f9a80659973551c4c5ecbf675609926489c80"15:52
sshnaidm|ruckso many shas15:52
mordredsshnaidm|ruck: don't you know - shas make everything better :)15:53
clarkbsshnaidm|ruck: sha256:5455eec0649474d22cd21dc3a08f9a80659973551c4c5ecbf675609926489c80 shows up a bunch in the logs as cache hits. But the previous one does not15:54
sshnaidm|ruckclarkb, ok, let's hope it is :D15:54
sshnaidm|ruckclarkb, thanks!15:54
*** sshnaidm|ruck is now known as sshnaidm|bbl15:54
fungithe system cpu spike on the etherpad server has died back off. i wonder if it corresponded to the board call which also just wrapped up?15:55
clarkbfungi: that would be an unfortuante regression if using the service made it slow :)15:56
clarkbfungi: but that should be testable at least15:56
fungiespecially concerning since there weren't _that_ many people using that particular pad15:56
smcginnisIf we need a few folks to all hit an etherpad around the same time, I can help.15:56
clarkbsshnaidm|bbl: as far as I can tell given that id it should be doing what we expect. If you want to be double sure adding the  extra docker logging that mordred pointed out is probably worthwile15:56
clarkbwe keep using steadily less quota in bhs115:57
clarkbI wonder if the port cleanups failed there?15:57
*** derekh has quit IRC15:58
clarkb#status log Zuul and Nodepool running against the new three node zookeeper cluster at zk01 + zk02 + zk03 .openstack.org. Old server at nodepool.openstack.org will be deleted in the near future15:59
openstackstatusclarkb: finished logging15:59
*** e0ne has quit IRC15:59
clarkbalso inap doesn't look happy. I'm going to start with inap since bhs1 is still mostly working16:00
clarkbthe inap errors appear to be timeouts, possibly related to our switch to new images?16:01
mgagneclarkb: we redeployed a new minor version of Nova. Didn't expect that much impact. Is there anything I can look at for now?16:01
mgagnenew packages were promoted ~55m ago16:02
*** carl_cai has quit IRC16:02
clarkbmgagne: from our side it looks like we may have timed out some boots because we transitiioned to new images globally. Then those boots are now timing out trying to delete16:02
clarkbmgagne: I can get you uuids in just a momment16:02
mgagneclarkb: I never remember what I do to fix those issues :-/16:03
clarkbe064b5bf-dfca-48aa-8b02-b3da37509688 bdf1a01f-1e95-47ec-8e72-827d0180140a 637d76be-930a-4ea2-b145-96c8501d03f416:03
clarkbare three examples16:03
mgagnechecking16:03
clarkbmgagne: thank you!16:04
mgagneI think it was restarting nova-compute?16:04
clarkbmgagne: ya that sounds familiar16:04
mgagneclarkb: ok, one is now in error, I think it's now in a state where nodepool can retry its delete and it will work16:05
clarkbmgagne: great, nodepool should do that automatically16:06
mgagneI guess I will restart the whole region in that case16:06
clarkbI don't know enough about your cloud to advise one way or the other. But your help is greatly appreciated :)16:07
mgagnehehe16:07
clarkbopenstack.exceptions.SDKException: Error in creating the server. Compute service reports fault: No valid host was found. There are not enough hosts available. is the ovh bhs1 usage reduction cause16:11
clarkbamorin: dpawlik ^ fyi if you happen to be around (this may be some side effect of us restarting zuul which creates a rush of demand)16:12
clarkbI wonder if the next nodepool feature is going to be a launch throttle16:13
*** smarcet has quit IRC16:14
*** ginopc has quit IRC16:14
*** bhavikdbavishi has joined #openstack-infra16:15
*** ccamacho has quit IRC16:15
*** gyee has joined #openstack-infra16:16
clarkbother than those two cloud related issues (which could theoretically also be related to new openstacksdk?) we appear to be quite stable16:16
clarkbzookeeper seems to be keepign up with the demand in its new 3 node configuration as well16:16
*** weshay has joined #openstack-infra16:18
clarkbmgagne: I see the deleting count falling in inap16:19
mgagneyea, instances are now in ERROR state.16:19
*** emine__ has joined #openstack-infra16:21
*** dtantsur is now known as dtantsur|afk16:22
*** Emine has quit IRC16:24
*** jhesketh has joined #openstack-infra16:25
*** shardy has quit IRC16:25
*** mnaser has quit IRC16:26
*** yamamoto has quit IRC16:26
*** yamamoto has joined #openstack-infra16:26
*** mnaser has joined #openstack-infra16:26
*** jhesketh_ has quit IRC16:27
*** jpich has quit IRC16:28
clarkbfor the bhs1 thing we have 116 instances according to quota but only ~68 instances according to server list16:30
clarkbI think quota may have gotten out of sync there and so the hypervisors think they are used (and possibly are)16:31
*** trown is now known as trown|lunch16:32
fungiclarkb: amorin said yesterday that was a known issue in bhs1 i think? they're still working on trying to get gra1 back in okay shape16:32
*** smarcet has joined #openstack-infra16:32
clarkbgotcha16:32
*** panda is now known as panda|off16:38
openstackgerritHervé Beraud proposed openstack/gertty master: Introduce security checks with bandit and fix it  https://review.openstack.org/61337116:38
clarkbwe have 1 node available in inap. I think we may be turning the corner there.16:39
mgagneclarkb: most are stuck in building right?16:40
clarkbmgagne: ya I think that is due to new images?16:40
mgagneyea16:40
mgagnejust making sure there is no other issue I can fix16:41
clarkbmgagne: we should know in another 10-15 minutes.16:41
clarkb2 available now. Makes me think in 10-15 minutes we'll be operating normally16:41
mgagne+116:41
*** fuentess has joined #openstack-infra16:42
*** gfidente has quit IRC16:42
*** imacdonn has quit IRC16:42
*** imacdonn has joined #openstack-infra16:43
ssbarnea|bkp2fungi clarkb: question regarding basepython=python3 : please read https://github.com/tox-dev/tox/issues/1072 -- I am curious how openstack plans to cover this aspect.16:44
clarkbssbarnea|bkp2: you have to set basepython on the docs and linting to python3 then let py35/py36 etc do the right thing16:47
ssbarnea|bkp2centos-7 concerns me because on it we python3->python3.4 which was dropped by ansible, see https://github.com/ansible/ansible/blob/devel/setup.py#L24716:48
fungiclarkb: ssbarnea|bkp2: where's the context? you likely also need to set ignore_basepython_conflict = True16:48
clarkbfungi: the context is centos7 using python3.4 I guess?16:49
ssbarnea|bkp2clarkb yeah, this is where I seen the failure to install ansible on python3, because it was incompatible.16:49
clarkbfwiw that seems more like a distro problem16:49
fungiotherwise setting basepython = python3 if python3 is 3.4 will result in the implicit py35 and py36 testenvs using 3.416:49
clarkbnot a tox problem16:49
clarkbfungi: aha16:49
clarkbssbarnea|bkp2: this is why we carefully run things on a variety of distros to make sure that their python versions line up with what we expect16:50
clarkb(it can be painful at times, but does work)16:50
fungihttps://github.com/tox-dev/tox/issues/47716:51
ssbarnea|bkp2for testing purposes is a PITA, not on my main (macos) machine where I have the freedom to juggle them but if you want to run just "tox" across multiple platforms, you soon realize that conflict.16:51
clarkbssbarnea|bkp2: but then setting basepython to python3 means we can explicitly run the docs job on xenial to get 3.5, on bionic to get 3.6. Then when the next release comes out we don't have to update tox.ini just add the job to run on that distro release16:51
fungifixed by https://github.com/tox-dev/tox/pull/84116:52
clarkbssbarnea|bkp2: right we don't support tox on multiple platforms generally. We support specific versions of python informed by what is on distros (which we use to test) then you have to get the right version of python16:52
fungistephenfin did excellent work there16:52
clarkbmgagne: up to 5 available now. Slow going but trending the right direction16:53
ssbarnea|bkp2clarkb: just to be clear: I am not trying to say that setting it to python3 is bad. i am going to test the ignore_basepython_conflict16:53
clarkbssbarnea|bkp2: ya. I'm just trying to point out that just because an openstack tox.ini says python3 doesn't mean it will work with any ypthon3. We do that for convenience to avoid needign to update tox.ini frequently. You still need a valid python3 version16:54
*** lpetrut has quit IRC16:54
*** lpetrut has joined #openstack-infra16:55
ssbarnea|bkp2clarkb: we are not in conflict here :D16:55
ssbarnea|bkp2now I only need to explain others that we still have to use basepython for some tasks, like https://review.openstack.org/#/c/613083/2/tox.ini16:58
*** jpena is now known as jpena|off16:58
*** xek has quit IRC16:58
*** xek has joined #openstack-infra16:59
*** chandankumar is now known as chkumar|off17:02
*** ccamacho has joined #openstack-infra17:02
*** pcaruana has joined #openstack-infra17:03
ssbarnea|bkp2now I have a cosmetic question about zuul html output not wrapping at screen width, doing horizontal scrolling browser sucks. Is this by design, or a known bug?17:06
*** jamesmcarthur has quit IRC17:06
clarkbssbarnea|bkp2: at least on mobile it does one column without horizontal scrolling. I also don't have horizontal scrolling on current browser /me tries resizing17:07
*** jamesmcarthur has joined #openstack-infra17:07
clarkbssbarnea|bkp2: it seems to resize without doing horizontal scrolling on firefox for me17:07
ssbarnea|bkp2clarkb firefox on http://logs.openstack.org/83/613083/2/check/openstack-tox-linters/10deb24/job-output.txt.gz -- desktop17:07
clarkboh the job logs not the zuul status web page17:08
ssbarnea|bkp2to be exact http://logs.openstack.org/83/613083/2/check/openstack-tox-linters/10deb24/job-output.txt.gz#_2018-10-24_21_57_28_54908917:08
clarkbssbarnea|bkp2: my personal opinion on that is that is desired. It is a txt file not an html file17:08
clarkbit is the raw output17:08
*** jamesmcarthur has quit IRC17:08
clarkbif we want something to render that differently we should do that on top of the raw data17:08
ssbarnea|bkp2i think it has wrapping only on spaces which prevents the wrapping from occuring.17:09
clarkbit will be however firefox wraps text file lines17:09
clarkb(might even be configurable?)17:09
ssbarnea|bkp2clarkb i am sure css can change behavior but i wanted to know if this was desired or fixable :D17:09
clarkbI think we want to make the raw data available, but if we also render it nicely for people with browsers that is good too17:10
clarkbthe reason I say that is some log files are massive (hundreds of meg) and i have to view them with vim locally17:10
clarkbwe also index the raw data in elasticsaerch so you want to be able to support use cases like that17:10
ssbarnea|bkp2this is 100% css issue, i do not expect the lines to be wrapped server side.17:11
ssbarnea|bkp2as you said: they should be as close as possible to raw17:11
clarkbssbarnea|bkp2: except there is no css in txt files17:11
*** eharney has quit IRC17:11
clarkboh except that os loganalyze is sending some. I understand now17:12
clarkbso ya you could update os-loganalyze to change the html rendering. Sorry I've been using vim a lot lately because too many large files.17:12
clarkbos-loganalyze will serve the raw data id you don't set accept encoding to html17:13
clarkbor accept-type? whatever the header is17:13
fungiinfra-root: just a heads up, i have to disappear for a few hours to deal with insurance company stuff in person, but will be back on later today17:13
clarkbfungi: gl17:13
ssbarnea|bkp2I found the fix, is missing: word-break: break-all;17:13
ssbarnea|bkp2now i only have to find the place to add that code.17:14
fungiclarkb: supposedly they'll be handing me a briefcase full of unmarked bills at the end, so totally worth it (okay, really just a paper check, but regardless...)17:14
clarkbfungi: ah you are past the point of arguing over what was insured then :)17:14
fungiyep!17:14
clarkbssbarnea|bkp2: look in openstack-infra/os-loganalyze17:15
fungiwell, except for the wind damage claim which we haven't finished yet. but flood and care are done17:15
*** sshnaidm|bbl is now known as sshnaidm|off17:15
fungier, flood and car are done17:15
* fungi vanishes in a puff of errands17:15
*** apetrich has joined #openstack-infra17:17
*** apetrich has quit IRC17:17
*** apetrich has joined #openstack-infra17:18
openstackgerritSorin Sbarnea proposed openstack-infra/os-loganalyze master: Assures that wrapping on PRE occurs on any kind of characters  https://review.openstack.org/61338317:18
ssbarnea|bkp2this reminded me that i hate the timestamp column, too much screen real estate taken by it. I would personally prefer to transform it into a line-numer and have the time value as tooltip.17:20
clarkbssbarnea|bkp2: I find the timestamps to be invaluable17:20
ssbarnea|bkp2but obviously that I would need support for such change.17:20
*** smarcet has quit IRC17:21
ssbarnea|bkp2it is valuable, but not sure if it needs to be visible by default and all the time. maybe expandable so something similar.17:21
clarkbssbarnea|bkp2: I think if you want to do something like that then we want a render layer that allows you to toggle things like that. I don't think we should remove that from the raw txt17:22
clarkbit is really useful to undersatnd when things happen in a distributed system17:22
ssbarnea|bkp2clarkb: sure I was referring to the display layer17:22
clarkbto the point where it is the one requirement I push on people to use the elasticsearch/logstash system17:22
clarkbssbarnea|bkp2: I also think it is important because it helps remind people that their jobs have a time cost17:23
clarkbthat time cost impacts everyone else's ability to use those test resources17:23
ssbarnea|bkp250% of timestamp is spam = first and last. year and month are useless as we don't even keep logs for so long, and sub second divisions ...17:24
clarkbsub second is very useful. The year may not be necessary. But the rest of it is I think17:24
clarkbwe want to keep logs for ~6 months again which is why the swift work is happening17:24
ssbarnea|bkp2another thing that I could fix with css alone, almost for sure.17:25
*** diablo_rojo has joined #openstack-infra17:26
*** eharney has joined #openstack-infra17:26
*** lpetrut has quit IRC17:28
clarkbya I think we can fiddle with overlay type stuff to make it toggleable to user preference, but I also think being clear about how long jobs are taking and how long specific job tasks take is important particularly when we run behind the curve with constricted resources17:29
clarkbotherwise as soon as I ask someone to make their jobs run faster the response will be but I can't tell where the time is spent17:29
ssbarnea|bkp2i am building now a proposal, and I will show it to you.17:30
clarkbmgagne: hrm it seems to have gone back to unhappy deleting nodes again17:30
ssbarnea|bkp2i got the idea, i will try to cover all use cases17:30
*** eharney has quit IRC17:35
clarkbmgagne: I'm going to find lunch/breakfast soon but let me know if I can help with any debugging17:36
hogepodgeI'm looking through some Loci code, and there are notes saying things like "Remove this when infra starts signing thier mirrors" for the apt repositories.17:40
hogepodgeJust curious, is this something that infra is now doing or plans on doing?17:40
clarkbhogepodge: it is not something we are doing now, and I know of no plam to do so. The problem there is apt repo updates are race prone and can lead to broken repos/clients. What happens is you can have packages removed from disk that are still in the index then your clients fail to install the package. The other fail mode is you update the index on a client then the packege is removed from the repo17:41
clarkbhogepodge: to address this we use reprepro to build a new valid index based on what is on disk ( and tell it to not clean up old packages for some hours ). Unfortuantely this means the indexes we produce are different than those from upstream and so the upstream keys aren't valid17:42
hogepodgeOk, thanks. I'm thinking I'm going to make that bit configurable so we're doing secure by default, but do insecure in the gate17:42
clarkbhogepodge: We could sign our repos and you could trust the keys, but we also want to avoid people treating those repos as consumable outside of testing17:42
clarkbor if someone can figure out a way to use the upstream signed indexes and mirror them without breaking clients we'd probably do that17:43
hogepodgeNo, I'm just going through our notes and trying to get TODOs out of code.17:43
hogepodgeIt's not a strong requirement, just wanted to see if the note reflected reality, and it kind of doesn't. :-) We don't require it to be signed.17:43
hogepodgeBut I can imagine a downstream user not wanting to trust unsigned repos for producing production packages.17:44
hogepodgeIn the gate, it's not critical ¯\_(ツ)_/¯17:44
hogepodgethanks clarkb17:45
*** smarcet has joined #openstack-infra17:46
clarkbok food is here. I'm out for a bit to eat17:47
*** eharney has joined #openstack-infra17:49
mgagnechecking17:52
*** felipemonteiro has quit IRC17:52
*** jamesmcarthur has joined #openstack-infra17:54
*** trown|lunch is now known as trown17:54
openstackgerritAakarsh proposed openstack-infra/project-config master: Move openstack-browbeat zuul jobs to project repository  https://review.openstack.org/61309217:54
*** betherly has joined #openstack-infra17:55
*** jamesmcarthur has quit IRC17:58
*** zzzeek_ has joined #openstack-infra17:59
*** betherly has quit IRC17:59
*** eumel8 has joined #openstack-infra18:04
*** apetrich has quit IRC18:07
*** tung_comnets has joined #openstack-infra18:08
tung_comnetsCan someone give one more +2 to this patch: https://review.openstack.org/#/c/612962/18:10
tung_comnetsThanks :)18:10
*** jamesmcarthur has joined #openstack-infra18:10
openstackgerritAakarsh proposed openstack-infra/project-config master: Move openstack-browbeat zuul jobs to project repository  https://review.openstack.org/61309218:11
*** jamesmcarthur has quit IRC18:13
*** apetrich has joined #openstack-infra18:20
*** jamesmcarthur has joined #openstack-infra18:23
openstackgerritPete Birley proposed openstack-infra/project-config master: New Repo - OpenStack-Helm Images  https://review.openstack.org/61189218:28
mgagneclarkb: so I'm not sure what to do next. centos image looks fine, maybe because there aren't much instances based on it. but xenial is having a hard time.18:29
*** bnemec has quit IRC18:29
openstackgerritPete Birley proposed openstack-infra/project-config master: New Repo: OpenStack-Helm Docs  https://review.openstack.org/61189318:30
*** felipemonteiro has joined #openstack-infra18:30
*** electrofelix has quit IRC18:31
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Support node caching in the nodeIterator  https://review.openstack.org/60464818:32
openstackgerritTobias Henkel proposed openstack-infra/nodepool master: Support node caching in the nodeIterator  https://review.openstack.org/60464818:35
*** munimeha1 has quit IRC18:37
*** bhavikdbavishi has quit IRC18:37
*** felipemonteiro has quit IRC18:37
openstackgerritJames E. Blair proposed openstack-infra/zuul master: DNM: Enable sar logging for unit tests  https://review.openstack.org/61311718:38
*** bobh has quit IRC18:40
clarkbmgagne: is it just timing out?18:45
clarkbmgagne: maybe we be patient with it and see if the cachnig is able to get in place?18:47
mgagnecould be it, some are active now18:47
clarkbnormally we don't rotate all images at once so this shouldn't be a common thing18:47
clarkb(we did it in this case because ti was easier when we moved zookeeper clusters to not migrate the data)18:47
clarkbShrews: I'm going to look at nb01 now18:48
clarkbfor cleaning up old images18:48
Shrewsk k18:48
clarkbI'm just going to delete the stuff in /opt/nodepool_dib that is old18:49
clarkbShrews: then after that we need to delete the images on the cloud side18:51
clarkbmordred: re ^ if you get a chance could you look at rax swift and glance to see if those are all cleaned up properly? I worry the sdk issue caused extra weirdness thee18:54
openstackgerritJames E. Blair proposed openstack-infra/zuul master: quick-start: add a note about github  https://review.openstack.org/61339819:00
clarkb#status log Old dib images cleared out of /opt/nodepool_dib on nb01, nb02, and nb03. Need to remove them from cloud providers next.19:02
openstackstatusclarkb: finished logging19:02
clarkbI'm going to start looking at image cleanup in $clouds19:02
Shrewsclarkb: finishing up a required training thing, but can assist when i'm done (if you haven't finished by then)19:05
clarkbShrews: ok.19:06
mordredclarkb: heya - once the images are imported the swift objects that are created re no longer needed - so we can clean any of those out - I can do a cleanup pass tomorrow19:06
clarkbOne thing I've noticed is that in bhs1 some images can't be deleted because they have snapshots. Odd19:07
clarkbthese are images older than the ones I expected to need to clean. The images I expected to clean apepar to delete ok19:07
clarkbmordred: ya, I just have no idea if that was working when we were having that error happen19:07
clarkbI want to say we don't catch the exception in the image create path and so it may not happen automatically19:07
*** hasharAway is now known as hashar19:07
*** ykarel|away has quit IRC19:11
*** smarcet has quit IRC19:12
*** rlandy is now known as rlandy|brb19:13
*** bobh has joined #openstack-infra19:15
clarkbBHS1 is done, expcet for all the images that can't be deleted because they have snapshots (I expect that is something cloud side we should look into later)19:17
clarkbI don't think we made any snapshots ourselves19:17
mordredclarkb: yeah - that's weird, I can't think of any reason we'd make snapshots of images19:17
*** bobh has quit IRC19:20
AJaegerconfig-core, could you review these two changes, please? https://review.openstack.org/612820 and https://review.openstack.org/61296219:23
clarkbGRA1 list of images to delete is running now19:25
clarkbit seems to be failing less with snapshots than BHS119:26
*** jcoufal_ has joined #openstack-infra19:26
*** jcoufal has quit IRC19:27
*** bobh has joined #openstack-infra19:31
*** rlandy|brb is now known as rlandy19:31
clarkbgra1 is done now too. Going to do inap next19:40
*** lbragstad has quit IRC19:43
*** lbragstad has joined #openstack-infra19:43
clarkbinap is going to take a while. I may run a few of these in parallel. I'll look at vexxhost sjc1 next19:50
clarkbwhat I'm doing is a openstack iamge list --private. Trimming out any images we want to keep and putting that ina  file. Then doing a for loop catting that file and openstack image deleting19:50
clarkbits not very elegant, but I'm finding there are just enough new corner cases in each cloud that trying to automate it would take all day19:51
clarkblike for some reason there are cloud specific private images we didn't upload19:51
*** irclogbot_1 has joined #openstack-infra20:01
*** mriedem has joined #openstack-infra20:03
*** kgiusti has left #openstack-infra20:03
openstackgerritJames E. Blair proposed openstack-infra/zuul master: WIP: support foreign required-projects  https://review.openstack.org/61314320:09
*** smarcet has joined #openstack-infra20:16
mordredclarkb: that might be osc not having full support for the new shared state?20:16
clarkbmordred: oh maybe20:16
clarkbI'm through all clouds but rax, packethost, citycloud (I think we upload but don't boot there), and inap20:18
clarkbinap is in progress20:18
clarkbthe arm clouds were nice and tidy. Only the two we leaked by changing DBs had to be deleted looks like20:18
* clarkb does packethost next20:18
mgagneclarkb: I think there is a network bottleneck somewhere. But there is little I can do except trying to tell our netadmin that it's "normal".20:18
*** jcoufal_ has quit IRC20:19
clarkbmgagne: whats odd is it didn't do that before. So either our start it all at once shock to the system or your update (or some other change?) must've changed the behavior?20:19
clarkbmgagne: I'm happy to help however we can20:19
mgagneclarkb: the package contained unrelated changes to some management tools.20:20
clarkbah20:20
mgagnemaybe the network gear is much more overloaded than last time all images were updated.20:20
*** jcoufal has joined #openstack-infra20:21
*** irclogbot_1 has quit IRC20:22
clarkbpossible20:23
mgagne;)20:26
*** smarcet has quit IRC20:27
*** hashar has quit IRC20:29
clarkbmordred: on the glance side of things we did seem to leak a bunch of images20:31
*** imacdonn has quit IRC20:31
clarkbmordred: I'm going to go ahead and delete all but the ones we are using now since it should be safe to cleanup swift later by just clearing things out20:32
clarkbstarting with rax-iad20:32
mordredclarkb: ++20:32
mgagneclarkb: we would need to put a hold on mtl01, this is affecting some other critical systems.20:33
clarkbmgagne: ok, if you write the change I'll go ahead and put it in place manually20:34
clarkbmgagne: max-servers: 0? (or I can write the change too)20:34
mgagneyes, 0 please20:34
*** anteaya has joined #openstack-infra20:34
clarkbok I've put that in place manually and will make sure puppet doesn't undo it while we wait for the change to merge20:35
mgagnethanks20:36
openstackgerritMathieu Gagné proposed openstack-infra/project-config master: Disable inap-mtl01 provider  https://review.openstack.org/61341820:36
clarkbmgagne: I also have an out of band image cleanup running against inap. Should I stop that too? It is running openstack image delete serially one after another to cleanup some images that we leaked (some were stuck in saving and others are from us changing DBs)20:38
clarkb(I don't expect this is doing much to your cloud since it is running one at a time serially and cleaning things up, but happy to stop it too if you think it will help)20:38
mgagneclarkb: I don't think this will affect the network performance as this shouldn't pull much bandwidth20:38
*** ansmith_ has quit IRC20:39
clarkbya I don't expect it would cause that20:41
*** xek has quit IRC20:42
*** jcoufal has quit IRC20:54
*** betherly has joined #openstack-infra20:56
clarkbrax-iad is done. Now on to ord20:57
*** betherly has quit IRC21:01
*** larainema has quit IRC21:02
clarkbheh I've been deleting by name. A few of the delete failures in rax were due to unique names. I'll make a second pass on iad and ord21:13
clarkbShrews: ^ is that a nodepool bug? I wouldn't expect us to reuse a name21:14
*** fuentess has quit IRC21:14
*** irclogbot_1 has joined #openstack-infra21:15
*** trown is now known as trown|outtypewww21:15
*** bobh has quit IRC21:16
*** betherly has joined #openstack-infra21:16
*** ldnunes has quit IRC21:18
*** betherly has quit IRC21:21
*** kjackal_v2 has quit IRC21:22
*** kjackal has joined #openstack-infra21:22
*** tung_comnets has quit IRC21:28
*** kjackal has quit IRC21:36
*** jamesmcarthur has quit IRC21:37
*** betherly has joined #openstack-infra21:37
*** betherly has quit IRC21:42
*** kopecmartin is now known as kopecmartin|off21:43
*** efried is now known as pot21:43
*** pot is now known as efried21:43
*** jamesmcarthur has joined #openstack-infra21:44
clarkbok doing rax-dfw now then I think I am done21:46
*** jamesmcarthur has quit IRC21:48
*** bobh has joined #openstack-infra21:51
*** bobh has quit IRC21:56
*** betherly has joined #openstack-infra21:58
*** betherly has quit IRC22:02
*** armax has quit IRC22:03
*** armax has joined #openstack-infra22:03
*** boden has quit IRC22:13
*** betherly has joined #openstack-infra22:18
*** gema has quit IRC22:18
*** mriedem has quit IRC22:21
*** betherly has quit IRC22:23
*** emine__ has quit IRC22:24
openstackgerritMerged openstack-infra/project-config master: New Airship project - Utils  https://review.openstack.org/61282022:25
ianwclarkb: does the drop in http://grafana.openstack.org/d/8wFIHcSiz/nodepool-rackspace?panelId=15&fullscreen&orgId=1&from=now-7d&to=now correlate about when something nodepoolish was restarted?22:25
openstackgerritMerged openstack-infra/project-config master: Add release tag and remove python jobs for Apmec  https://review.openstack.org/61296222:27
clarkbianw: yes22:28
clarkbianw: I sent the notice we were taking zuul down about 14:38UTC and we were done about an hour later22:29
ianwclarkb: hrm, well i guess i have something to look into now then :)22:30
clarkbianw: part of the rename in stats that you did?22:31
ianwclarkb: this was certainly not an intended result of that, but yeah, it's the suspect22:32
ianwhrm, although these stats are coming from openstacksdk via ... magic ... i wonder if this task thread etc has changed things22:33
clarkb#status log Old nodepool images cleared out of cloud providers as part of the post ZK db transition cleanup.22:33
openstackstatusclarkb: finished logging22:33
clarkbianw: possible, openstacksdk did update22:33
clarkbianw: also I've disabled inap at the request of mgagne22:35
clarkbovh is looking happy22:35
clarkbseems somethign to do with asking inap to use a bunch of new images all at once made networking there sad22:36
ianwahh, i wondered what that drop was.  yeah, occasionally we cleanup a few ports on ovh, but not much22:36
mgagneclarkb: we will perform some tests tomorrow to see how we can improve the network performance for mtl01. For now, it should stay disabled.22:36
clarkbmgagne: ok, if it would help we can also turn it back on with a lower max servers value tio reduce thrashing but induce behavior if you need it22:37
clarkbsay to 5 or 10. Not sure if that is desireable on your end22:37
ianwlike in the last 2 hours on ovh gra1, we found 1 DOWN port that had been sitting around for 3+ minutes22:37
clarkbianw: not bad22:37
mgagneclarkb: we need to avoid sending trafic to a specific network hardware. so we will test on our end first and enable back when we are sure the problem is mitigated.22:38
clarkbmgagne: roger22:38
ianwamorin: we might be at a point where it would make sense for us to modify the script to keep track of the leaked id's?  it might be practical from your side to trace through just one port allocation and see why it leaked22:38
ianw2018-10-25 12:12:27,814 DEBUG nodepool.TaskManager: Manager rax-iad ran task ComputeGetServersDetail in 1.608656644821167s22:40
ianwso the name is being mangled correctly ... this leaves the possibility that stats are being produced but not making it to statsd22:41
*** tosky has quit IRC22:45
*** tpsilva has quit IRC22:47
ianwE..P..@.@.....c=h........<r.nodepool.task.rax-ord.ComputePostServers:0.000000|ms22:48
*** eharney has quit IRC22:48
ianwthere's your problem ... it's sending zeros22:48
clarkbthat will do it22:49
*** betherly has joined #openstack-infra22:49
clarkbpossibly related to the sdk update in that case22:49
clarkbfwiw I've yet to see anything that would indicate a problem with the new zk cluster22:49
ianwHA FTW22:50
clarkbya the only two spofs now are gerrit and zuul scheduler22:50
clarkb(I guess technically log copies too, but that is being worked with swift uploads)22:50
clarkbI had to get up at a ridiculously early hour this morning so I may begin to call it a day at this point. Anything else I should look at or help with before doing so? I did AJaeger's review requests22:51
ianwno ... vice versa anything i should watch particularly?22:53
*** betherly has quit IRC22:53
clarkbianw: I would keep an eye on zk periodically just to make sure it hasn't done anything weird (cacti is probably good enough for that). Otherwise I don't think so22:53
ianwclarkb: ok, no worries.  result will probably be gate grinding to a halt, so that's also a good canary :)22:54
clarkbindeed22:54
*** yamamoto has quit IRC23:01
*** yamamoto has joined #openstack-infra23:01
*** yamamoto has quit IRC23:06
*** betherly has joined #openstack-infra23:09
*** markmcd has joined #openstack-infra23:09
*** betherly has quit IRC23:14
*** rlandy has quit IRC23:23
*** carl_cai has joined #openstack-infra23:39
*** yamamoto has joined #openstack-infra23:44
*** agopi is now known as agopi|brb23:56
*** gyee has quit IRC23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!