Monday, 2017-10-02

*** stakeda has joined #openstack-infra00:03
*** vaidy has quit IRC00:04
*** isviridov_away has quit IRC00:04
*** lnxnut has quit IRC00:07
*** markvoelker has joined #openstack-infra00:12
*** yamamoto has joined #openstack-infra00:14
*** yamamoto has quit IRC00:14
*** yamamoto has joined #openstack-infra00:14
*** dave-mccowan has joined #openstack-infra00:15
*** yamamoto has quit IRC00:19
*** lukebrowning has joined #openstack-infra00:28
*** lukebrowning_ has joined #openstack-infra00:32
*** lukebrowning has quit IRC00:33
*** lukebrowning_ has quit IRC00:37
*** edmondsw has joined #openstack-infra00:39
*** psachin has quit IRC00:40
*** psachin has joined #openstack-infra00:41
*** lukebrowning has joined #openstack-infra00:43
*** edmondsw has quit IRC00:44
*** markvoelker has quit IRC00:45
*** lukebrowning has quit IRC00:47
*** lukebrowning has joined #openstack-infra00:49
*** kiennt26 has joined #openstack-infra00:50
*** lukebrowning has quit IRC00:54
*** lukebrowning has joined #openstack-infra00:56
*** lukebrowning has quit IRC01:00
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Remove legacy loci jobs  https://review.openstack.org/50855601:00
*** jascott1 has joined #openstack-infra01:03
*** claudiub has quit IRC01:05
*** lukebrowning has joined #openstack-infra01:06
SamYapleis it possible to give jobs priorities? for example if i have a job that i know takes 20 minutes to run, i want that to start before a job that i know takes 10 minuts to run (instead of sitting "queued")01:07
*** lukebrowning_ has joined #openstack-infra01:09
*** cuongnv has joined #openstack-infra01:09
*** lukebrowning has quit IRC01:11
*** lukebrowning_ has quit IRC01:13
*** hongbin has joined #openstack-infra01:15
*** lukebrowning has joined #openstack-infra01:15
*** yamamoto has joined #openstack-infra01:19
*** lukebrowning has quit IRC01:20
*** lukebrowning has joined #openstack-infra01:21
*** yamamoto has quit IRC01:25
*** lukebrowning has quit IRC01:26
*** cshastri has joined #openstack-infra01:26
*** lukebrowning has joined #openstack-infra01:27
*** dave-mccowan has quit IRC01:30
*** lukebrowning has quit IRC01:32
*** dave-mccowan has joined #openstack-infra01:33
*** lukebrowning has joined #openstack-infra01:34
*** lnxnut has joined #openstack-infra01:34
*** edmondsw has joined #openstack-infra01:37
jeblairSamYaple: they're run in the order listed, so you can put the long ones first01:38
*** lukebrowning has quit IRC01:39
*** lihi has quit IRC01:39
*** dimak has quit IRC01:39
*** lihi has joined #openstack-infra01:39
*** dimak has joined #openstack-infra01:40
*** leyal has quit IRC01:40
*** lukebrowning has joined #openstack-infra01:40
*** leyal has joined #openstack-infra01:40
SamYaplejeblair: if thats the design, thats not the way its working. they start in seemingly random order01:41
SamYapleive had the long job start first sometimes01:41
SamYaplesee the job in the queue right this second01:41
SamYaplefirst job is queued, last job is running01:41
*** edmondsw has quit IRC01:41
SamYapleand a mix in between01:41
*** markvoelker has joined #openstack-infra01:42
*** vaidy has joined #openstack-infra01:44
*** lukebrowning has quit IRC01:44
*** isviridov_away has joined #openstack-infra01:45
*** lukebrowning has joined #openstack-infra01:46
*** dave-mcc_ has joined #openstack-infra01:49
*** lukebrowning_ has joined #openstack-infra01:50
*** dave-mccowan has quit IRC01:51
*** lukebrowning has quit IRC01:51
jeblairSamYaple: well, that's the order in which the nodes are requested.  the order they arrive is determined by cloud.  :)01:53
*** lukebrowning has joined #openstack-infra01:53
SamYapleoh i see. well then i suppose it doesn't make much of a difference! ill still move it to the top though01:54
*** lukebrowning_ has quit IRC01:54
SamYapledid zuulv3 add a way to move artifact from one job to anout (say from a job in the gate queue to a job in the post queue)?01:55
SamYaples/anout/another/01:55
jeblairyeah, the only time it would make a difference is if we're at capacity with no turnover.  that generally never happens (even when we're at capacity, we turn over like 10 nodes a minute.  so you'll never notice.01:55
jeblairSamYaple: not yet, that's still via tarballs.o.o for now01:55
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Implement an OpenContainer driver  https://review.openstack.org/46875301:56
SamYapleok. (in this case i was hoping to publish something from the gate queue to tarballs.o.o )01:56
*** dave-mccowan has joined #openstack-infra01:57
SamYapleits not a big thing. i jsut have to rebuild it in post01:57
*** lukebrowning has quit IRC01:58
*** dave-mcc_ has quit IRC01:59
SamYapleoh man. so excited. very happy with zuulv302:00
*** lnxnut has quit IRC02:01
SamYapleby the end of next week i should be publishing images to dockerhub based on changes to the cinder/keystone/nova/etc repo02:01
*** shu-mutou-AWAY is now known as shu-mutou02:01
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP add repl  https://review.openstack.org/50879302:06
*** efried_thbagh has quit IRC02:07
*** lukebrowning has joined #openstack-infra02:07
jeblairi'm going to restart zuul again02:07
*** lukebrowning has quit IRC02:12
SamYaplemonday its going to get hammered so hard02:13
*** lukebrowning has joined #openstack-infra02:14
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP add repl  https://review.openstack.org/50879302:15
*** markvoelker has quit IRC02:16
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP add repl  https://review.openstack.org/50879302:17
*** dave-mcc_ has joined #openstack-infra02:17
*** lukebrowning has quit IRC02:18
*** dave-mccowan has quit IRC02:18
*** yamamoto has joined #openstack-infra02:21
jeblairinfra-root: i've restarted zuul with a locally applied patch to add a repl so we can do some interative debugging tomorrow ^02:23
*** lukebrowning has joined #openstack-infra02:26
*** esberglu has joined #openstack-infra02:26
*** yamamoto has quit IRC02:27
*** ijw has joined #openstack-infra02:28
*** lukebrowning_ has joined #openstack-infra02:30
*** lukebrowning has quit IRC02:30
*** esberglu has quit IRC02:31
*** ijw has quit IRC02:32
*** lukebrowning_ has quit IRC02:35
*** lukebrowning has joined #openstack-infra02:37
*** lukebrowning has quit IRC02:41
*** lukebrowning has joined #openstack-infra02:43
*** bobh has quit IRC02:46
*** lukebrowning has quit IRC02:47
*** ericyoung has quit IRC02:48
*** lukebrowning has joined #openstack-infra02:49
*** baoli has quit IRC02:52
*** lukebrowning has quit IRC02:54
*** dave-mcc_ has quit IRC02:55
*** lukebrowning has joined #openstack-infra02:56
Jeffrey4lcould anybody check this in-project-job? zuulv3 show that kolla-build-centos-source job is in the queue all the time.02:59
Jeffrey4lhttps://review.openstack.org/#/c/508768/1/.zuul.yaml02:59
*** ekcs has quit IRC03:00
*** lukebrowning has quit IRC03:00
*** ekcs has joined #openstack-infra03:01
*** bobh has joined #openstack-infra03:01
*** lukebrowning has joined #openstack-infra03:02
*** rkukura has joined #openstack-infra03:05
*** lukebrowning has quit IRC03:06
*** lukebrowning has joined #openstack-infra03:08
*** dhajare has joined #openstack-infra03:12
*** markvoelker has joined #openstack-infra03:13
*** lukebrowning has quit IRC03:13
*** nicolasbock_ has quit IRC03:16
*** lukebrowning has joined #openstack-infra03:17
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Extract releasenotes build into a role  https://review.openstack.org/50876503:17
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Run release note jobs only when they change  https://review.openstack.org/50876203:17
*** rossella_s has quit IRC03:19
*** rossella_s has joined #openstack-infra03:20
*** lukebrowning has quit IRC03:22
SpamapSSamYaple: we _could_ give you a way to upload stuff to dockerhub in gate. But the git sha will change between gate and post since Zuul just tells gerrit to merge...03:23
*** yamamoto has joined #openstack-infra03:23
boris_42_heh =( this patch didn't help with Rally gates  https://review.openstack.org/#/c/508630/ ;(03:24
boris_42_not sure how to add dib-utils to legacy jobs =(03:24
boris_42_I take a look at other project they just add them to Projects ...03:24
SpamapSboris_42_: let me look03:25
clarkbboris_42_: look at required-projects03:25
clarkbalso thr comment about gnocchi was correct03:25
clarkbthat wont work03:25
*** edmondsw has joined #openstack-infra03:25
clarkbyou need to install it from wherever they publish it now03:25
SpamapSgithub03:26
SpamapSso technically03:26
clarkbSpamapS: its orthogonal to zuul03:26
clarkbdevstack-gate assumes things03:27
clarkbso it wont work03:27
SpamapSdoes it assume openstack as an org?03:27
clarkbyes03:27
boris_42_clarkb: could you elaborate about required-projects  ? where that thing is?03:27
clarkbboris_42_: its a job setting should be plenty of examples to grep for in openstack-zuul-jobs03:28
*** lnxnut has joined #openstack-infra03:28
*** yamamoto has quit IRC03:29
SpamapSclarkb: oh, so we can't just add https://github.com/gnocchixyz/gnocchi as a project?03:29
*** ijw has joined #openstack-infra03:29
*** lukebrowning has joined #openstack-infra03:30
SpamapS(probably not the best time to be trying out new trigger drivers ;)03:30
boris_42_clarkb: ah ok03:30
*** edmondsw has quit IRC03:30
SamYapleSpamapS: hmmm. theres a thought.03:30
SamYapleSpamapS: nah you know its probably fine. it takes 15m to build the wheels that i need, its not a huge issue right now. i can wait until there is better methods03:31
clarkbSpamapS: not to solve boris_42_'s problem I dont think03:32
clarkbSpamapS: we'd have to sort out how it also breaks in devstack-gate03:33
*** ijw has quit IRC03:33
*** lukebrowning has quit IRC03:34
leyalHi, i got -1 from zuul on this patch - https://review.openstack.org/#/c/508785/1 , where can i check why it's fail ?  - it's not contain references to any failed test ..03:34
*** lukebrowning has joined #openstack-infra03:36
clarkbleyal: did you see the comment from zuul?03:36
boris_42_clarkb: so I can enable gnocchi as a devstack plugin03:37
clarkbI think it may be a yaml indentation thing, after a : the next line needs 2 spaces of indentation03:37
leyalclarkb, nope - just - Verified "-103:37
clarkbleyal: toggle ci at the bottom03:38
clarkbthere is a comment from zuul that tries to explain what is going on03:39
leyalclarkb , thanks :)03:39
*** lukebrowning has quit IRC03:41
*** lukebrowning has joined #openstack-infra03:42
openstackgerritBoris Pavlovic proposed openstack-infra/openstack-zuul-jobs master: Fix Rally jobs: Add dib-utils to required projects  https://review.openstack.org/50879903:43
boris_42_clarkb: ^ so something like this?03:44
clarkbboris_42_: ya03:44
boris_42_clarkb: okay it will take some time to get familiar with this new system =)03:45
*** isaacb has joined #openstack-infra03:46
*** markvoelker has quit IRC03:46
*** lukebrowning has quit IRC03:47
*** lukebrowning has joined #openstack-infra03:48
*** links has joined #openstack-infra03:48
SamYapletrying to make a commit to the requirements repo, nova function test is failing, http://logs.openstack.org/91/508791/1/check/legacy-cross-nova-func/f469b5d/job-output.txt.gz#_2017-10-02_02_57_39_95101503:51
SamYapleany ideas?03:51
openstackgerritEyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow  https://review.openstack.org/50878503:52
*** lukebrowning has quit IRC03:53
*** isaacb has quit IRC03:54
sshnaidm|offcores, please check problem with XStatic package on pypi mirror on rh1 cloud: https://bugs.launchpad.net/tripleo/+bug/172072103:54
openstackLaunchpad bug 1720721 in tripleo "CI: OVB jobs fail because can't install XStatic from PyPI mirror on rh1 cloud" [Critical,Triaged] - Assigned to Paul Belanger (pabelanger)03:54
*** sshnaidm|off is now known as sshnaidm03:54
*** lukebrowning has joined #openstack-infra03:55
*** gouthamr has joined #openstack-infra03:55
*** lnxnut has quit IRC03:55
clarkbSamYaple: I think that job may be running tox in requirements repo and not nova03:56
clarkbsshnaidm: that pacage isnt in our mirror index. Where are you jobs finding it? is it a constraint? (but likely means our bandersnatch isnt updating properly for some reason and will have to be debugged)03:58
openstackgerritOmer Anson proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow  https://review.openstack.org/50878503:58
*** lukebrowning has quit IRC03:59
*** dhajare has quit IRC04:04
*** lukebrowning has joined #openstack-infra04:11
*** lukebrowning_ has joined #openstack-infra04:14
SamYapleclarkb: i am unsure how to fix that... ill start digging into the legacy job i suppose04:14
*** esberglu has joined #openstack-infra04:14
*** lukebrowning has quit IRC04:15
sshnaidmclarkb, it happens when zuul starts to install ara, it's even before job code runs, it's infra step04:17
SamYapleclarkb: https://github.com/openstack-infra/openstack-zuul-jobs/blob/master/playbooks/legacy/cross-nova-func/run.yaml#L70 so this should run in the nova src_dir correct?04:17
*** esberglu has quit IRC04:19
*** jaosorior has joined #openstack-infra04:19
*** lukebrowning_ has quit IRC04:19
clarkbSamYaple: ya, but workspace root will be for the repo change is made against I think04:19
*** lukebrowning has joined #openstack-infra04:19
clarkbsshnaidm: hrm, I wonder why ara works for us elsewhere04:20
SamYapleso ive got to basically check that it is nova and run it in workspace root, otherwise run it in the cloned nova dir?04:20
SamYaplewait if this is just the cross-nova-func, then it shouldnt run in nova at all04:22
SamYapleand nova should always be in the cloned dir.. right04:22
clarkbI dont thi k so as the job runs against requiremwnts changes only04:24
clarkbI think that is what confuses it04:24
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: DNM: test CI job  https://review.openstack.org/50866004:24
*** lukebrowning has quit IRC04:24
clarkbI thi k you put nova in required projects then just cd into the nova repo dir04:24
*** yamamoto has joined #openstack-infra04:25
SamYaplenova is in required-projects http://logs.openstack.org/91/508791/1/check/legacy-cross-nova-func/f469b5d/job-output.txt.gz#_2017-10-02_02_55_22_76886504:26
SamYapleso i just need to update teh chdir04:26
*** hongbin_ has joined #openstack-infra04:26
clarkbya04:27
*** yamamoto has quit IRC04:29
*** lukebrowning has joined #openstack-infra04:30
*** hongbin has quit IRC04:30
*** gouthamr has quit IRC04:30
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Remove obsolete releasenotes jobs  https://review.openstack.org/50870904:32
SamYapleis there a variable preset with /home/zuul/src/git.openstack.org/ ? or do i just need to put in /home/zuul/src/git.openstack.org/openstack/nova04:33
openstackgerritSam Yaple proposed openstack-infra/openstack-zuul-jobs master: Move to the appropriate directory before tox  https://review.openstack.org/50880304:34
*** lukebrowning has quit IRC04:34
*** lukebrowning_ has joined #openstack-infra04:34
*** yamahata has joined #openstack-infra04:35
*** ykarel has joined #openstack-infra04:36
clarkbI'm not sure would have to look04:37
clarkband its time to call it a day. Will check back in the morning04:37
SamYapleok04:38
SamYaplethanks04:38
*** lukebrowning_ has quit IRC04:39
*** lukebrowning has joined #openstack-infra04:40
*** markvoelker has joined #openstack-infra04:43
*** lukebrowning has quit IRC04:45
SamYapleclarkb: that was not the issue. its definetly running in the correct directory.... something else is going on and im afraid its beyond me04:46
SamYapleim notfamiliar with the tests04:46
*** hongbin_ has quit IRC04:46
*** lukebrowning has joined #openstack-infra04:47
*** lukebrowning has quit IRC04:51
*** lukebrowning has joined #openstack-infra04:53
*** lukebrowning has quit IRC04:58
*** bobh has quit IRC04:58
*** lukebrowning has joined #openstack-infra04:59
AJaegerSamYaple: see also https://review.openstack.org/50878304:59
*** bobh has joined #openstack-infra04:59
openstackgerritSam Yaple proposed openstack-infra/openstack-zuul-jobs master: Move to the appropriate directory before tox  https://review.openstack.org/50880305:02
*** numans has quit IRC05:03
*** lukebrowning has quit IRC05:04
SamYapleAJaeger: ill give it a go05:04
*** lukebrowning has joined #openstack-infra05:06
*** numans has joined #openstack-infra05:06
*** jascott1 has quit IRC05:07
*** jascott1 has joined #openstack-infra05:08
AJaegerianw, yolanda, jhesketh, could you put https://review.openstack.org/#/c/508598/ https://review.openstack.org/508706 https://review.openstack.org/#/c/508764/ on your review queue, please?05:08
AJaegerSamYaple: sorry, can't help with your change, hope others will review later05:09
SamYapleAJaeger: not a problem i get it. its really late here for me, im going to sleep soon05:09
SamYaplethanks!05:09
*** lukebrowning has quit IRC05:10
AJaegerSamYaple: early morning here ;) Good night!05:10
*** lukebrowning has joined #openstack-infra05:12
AJaegerjlk: the current translation jobs are broken, you have two changes up (https://review.openstack.org/#/c/502207/ and https://review.openstack.org/#/c/502208/), should we get these in and migrate to your new ones instead of debugging the legacy ones?05:13
*** edmondsw has joined #openstack-infra05:13
* AJaeger will rebase 502208 now05:14
*** lukebrowning_ has joined #openstack-infra05:16
*** lukebrowning has quit IRC05:16
*** markvoelker has quit IRC05:17
*** edmondsw has quit IRC05:18
*** logan- has quit IRC05:18
openstackgerritAndreas Jaeger proposed openstack-infra/project-config master: Add translation jobs  https://review.openstack.org/50220805:19
*** lukebrowning_ has quit IRC05:20
*** logan- has joined #openstack-infra05:21
openstackgerrityatin proposed openstack-infra/project-config master: Remove legacy magnum jobs from pipeline  https://review.openstack.org/50880405:21
*** lnxnut has joined #openstack-infra05:22
*** kiennt26 has quit IRC05:26
*** yamamoto has joined #openstack-infra05:26
sshnaidmclarkb, does whit change should resolve hardlinks problem? https://review.openstack.org/#/c/508772/05:29
jeblairinfra-root: i've started some zuulv3 memory analysis which is taking a lot of cpu time, and is likely to affect performance while it runs.  it'd be nice if we can leave it running for at least a few hours, but if it hasn't finished by the time things get busy later today and is causing errors, feel free to restart zuul-scheduler.05:29
*** nunchuck has quit IRC05:31
*** deduped has quit IRC05:31
*** yamamoto has quit IRC05:31
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: Use sub_nodes_private  https://review.openstack.org/50875205:32
*** bobh has quit IRC05:32
*** bobh has joined #openstack-infra05:34
*** bobh has quit IRC05:34
AJaegerpabelanger, mordred, https://review.openstack.org/#/q/project:openstack-infra/zuul-jobs+status:open shows a couple of changes from both of you that are in merge conflict - which ones do we need to move forward and which can be abandoned? I hope there's nothing we really need that just needs updating...05:40
*** thorst has joined #openstack-infra05:41
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Increase ansible internal_poll_interval  https://review.openstack.org/50880505:42
*** ethfci has joined #openstack-infra05:44
ethfcifungi thanks for fixing legacy-horizon-selenium-headless requirements for Zuul v305:47
*** logan- has quit IRC05:48
*** lnxnut has quit IRC05:50
fricklerjeblair: how long would you expect zuul to take until it starts processing stuff again?05:50
*** dizquierdo has joined #openstack-infra05:50
*** thorst has quit IRC05:51
*** thorst has joined #openstack-infra05:51
AJaegerfrickler: it should still process - just slower05:52
*** thorst has quit IRC05:52
AJaegerfrickler: and looking at queue length: It is processing...05:53
*** logan- has joined #openstack-infra05:53
openstackgerritEyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow  https://review.openstack.org/50878505:53
fricklerAJaeger: hmm, indeed, seems I need to be more patient with snail zuul05:57
openstackgerritEyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow  https://review.openstack.org/50878506:01
AJaegerfrickler: it's fast compared to yesterday evening ;)06:02
*** esberglu has joined #openstack-infra06:02
*** bobh has joined #openstack-infra06:05
*** esberglu has quit IRC06:07
*** bobh has quit IRC06:09
openstackgerritgaryk proposed openstack-infra/project-config master: Add missing projects to vmware-nsxlib  https://review.openstack.org/50877906:11
fricklerhas anyone seen "cannot create hard link" chain of errors when cloning repos? http://logs.openstack.org/81/508781/1/check/legacy-neutron-dynamic-routing-dsvm-functional/fed6569/job-output.txt.gz#_2017-10-02_06_07_32_29932706:13
openstackgerritYuval Brik proposed openstack-infra/openstack-zuul-jobs master: Remove karborclient LIBS_FROM_GIT from Karbor gate  https://review.openstack.org/50880706:13
*** markvoelker has joined #openstack-infra06:14
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Require requirements prj for legacy-requirements  https://review.openstack.org/50846006:14
*** lukebrowning has joined #openstack-infra06:17
*** kzaitsev1pi has quit IRC06:20
*** kzaitsev_pi has joined #openstack-infra06:22
*** jtomasek has joined #openstack-infra06:24
*** eumel8 has joined #openstack-infra06:24
*** ekcs has quit IRC06:25
ykarelfrickler, this should fix hardlink: https://review.openstack.org/#/c/508772/06:27
openstackgerritJens Harbott (frickler) proposed openstack-infra/project-config master: Add neutron to required-projects for neutron-dynamic-routing  https://review.openstack.org/50877506:28
*** yamamoto has joined #openstack-infra06:28
*** lukebrowning has quit IRC06:30
fricklerykarel: thx, added that to my test-patch06:32
*** yamamoto has quit IRC06:33
openstackgerritOmer Anson proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow  https://review.openstack.org/50878506:34
*** stakeda has quit IRC06:36
*** shardy has joined #openstack-infra06:37
*** lukebrowning has joined #openstack-infra06:38
*** lukebrowning has quit IRC06:43
*** lukebrowning has joined #openstack-infra06:44
*** markvoelker has quit IRC06:47
*** ykarel has quit IRC06:48
*** lukebrowning has quit IRC06:49
*** lukebrowning has joined #openstack-infra06:51
openstackgerritYuval Brik proposed openstack-infra/openstack-zuul-jobs master: Remove karborclient LIBS_FROM_GIT from Karbor gate  https://review.openstack.org/50880706:51
*** lukebrowning has quit IRC06:55
sshnaidmI use the patch for fixing hardlinks (508772), but still half of jobs have errors: Invalid cross-device link06:56
*** lukebrowning has joined #openstack-infra06:57
openstackgerritNuman Siddique proposed openstack-infra/project-config master: ovn: Make scenario007-multinode-oooq-container voting  https://review.openstack.org/50289906:58
openstackgerritEyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow  https://review.openstack.org/50878506:58
*** jpena has joined #openstack-infra06:59
*** edmondsw has joined #openstack-infra06:59
*** lukebrowning has quit IRC07:01
*** lukebrowning has joined #openstack-infra07:03
*** rcernin has joined #openstack-infra07:04
*** edmondsw has quit IRC07:04
*** lukebrowning has quit IRC07:08
*** pcaruana has joined #openstack-infra07:08
*** hashar has joined #openstack-infra07:09
*** lukebrowning has joined #openstack-infra07:10
openstackgerritEyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow  https://review.openstack.org/50878507:14
*** lukebrowning has quit IRC07:14
*** dizquierdo has quit IRC07:15
openstackgerritOmer Anson proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow  https://review.openstack.org/50878507:16
*** lnxnut has joined #openstack-infra07:17
fricklerhmm, seems my hard link errors are related to running with "sudo -u stack" rather than different file systems07:18
*** electrofelix has joined #openstack-infra07:18
*** lukebrowning has joined #openstack-infra07:19
*** martinkopec has joined #openstack-infra07:20
openstackgerritEyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow  https://review.openstack.org/50878507:21
*** tesseract has joined #openstack-infra07:22
*** lukebrowning has quit IRC07:23
*** martinkopec has quit IRC07:24
openstackgerritEyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow  https://review.openstack.org/50878507:24
*** lukebrowning has joined #openstack-infra07:25
*** martinkopec has joined #openstack-infra07:26
dmelladoI'm having issues using LIBS_FROM_GIT in my jobs since the zuulv3 migration07:26
dmelladoi.e. 2017-09-29 08:37:04.664 | + inc/python:check_libs_from_git:404       :   die 404 'The following LIBS_FROM_GIT were not installed correct:  python-octaviaclient diskimage-builder'07:26
*** yamamoto has joined #openstack-infra07:29
*** lukebrowning has quit IRC07:30
fricklerdmellado: its broken, see https://review.openstack.org/50834407:30
*** tmorin has joined #openstack-infra07:31
dmelladofrickler: I see, thanks!07:31
dmelladoI had a workaround patch which I'll stall for me07:31
*** lukebrowning has joined #openstack-infra07:31
*** yamamoto has quit IRC07:36
*** lukebrowning has quit IRC07:36
*** lukebrowning has joined #openstack-infra07:38
*** lukebrowning has quit IRC07:42
*** lin_yang has quit IRC07:43
*** egonzalez has joined #openstack-infra07:43
*** lukebrowning has joined #openstack-infra07:44
*** lnxnut has quit IRC07:44
*** markvoelker has joined #openstack-infra07:44
*** yamamoto has joined #openstack-infra07:45
*** seanhandley has left #openstack-infra07:47
*** lukebrowning has quit IRC07:48
*** jpich has joined #openstack-infra07:50
*** lukebrowning has joined #openstack-infra07:50
*** esberglu has joined #openstack-infra07:51
*** kiennt26 has joined #openstack-infra07:52
*** lukebrowning has quit IRC07:54
oansonHi. Do Depends-On flags in commit messages work for project-config? e.g. the Depends-On here: https://review.openstack.org/#/c/508761 would make a difference?07:54
*** esberglu has quit IRC07:56
*** slaweq has joined #openstack-infra07:56
*** kiennt26 has quit IRC07:56
*** lukebrowning has joined #openstack-infra07:56
openstackgerritMerged openstack-infra/project-config master: Add releasenotes publication job  https://review.openstack.org/50876408:00
*** slaweq has quit IRC08:00
*** lukebrowning has quit IRC08:01
tmorininfaroot: I have a few (non-legacy) jobs (e.g. pep8 in networking-bgpvpn)  that fail because of zuul-cloner not finding the repo to clone ; In understand that I need to add the repo to required-projects and that this happens somehwere in openstack-zuul-jobs/zuul.d .  I've found out how I would do this for a legacy-job, but not how to do this for a non-legacy generic/templated job such as pep8.  Can someone provide guidance ?08:01
*** hashar has quit IRC08:02
*** hashar has joined #openstack-infra08:02
*** kiennt26 has joined #openstack-infra08:02
*** lukebrowning has joined #openstack-infra08:03
*** seanhandley has joined #openstack-infra08:03
seanhandleyI have a question about Sphinx doc08:03
*** lucas-pto is now known as lucasagomes08:03
seanhandleyI'm building out RST documents at https://github.com/openstack/publiccloud-wg using the openstacktheme. And I can view the build output locally and it looks fine. I'm missing something though - how does that compiled doc get picked up and published on o.o ?08:05
*** rossella_s has quit IRC08:05
*** slaweq has joined #openstack-infra08:05
seanhandleyfor instance, the API WG has this RST doc defined in their docs repo: https://github.com/openstack/api-wg/blob/master/guidelines/discoverability.rst08:05
seanhandleyand it appears over at specs.o.o http://specs.openstack.org/openstack/api-wg/guidelines/discoverability.html08:06
*** Hal has joined #openstack-infra08:06
*** bobh has joined #openstack-infra08:06
*** Hal is now known as Guest6230408:07
*** lukebrowning has quit IRC08:07
*** lukebrowning has joined #openstack-infra08:09
sshnaidmDid something happen to cirros image? I have an error: No such file or directory: '/opt/stack/cache/files/cirros-0.3.5-x86_64-disk.img08:09
*** garyk has joined #openstack-infra08:11
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Remove legacy-requirements-python34  https://review.openstack.org/50859808:11
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Remove old publish-openstack-python-docs templates  https://review.openstack.org/50869108:11
*** bobh has quit IRC08:11
*** lnxnut has joined #openstack-infra08:11
garykIs there anyone version in the required-project support? I posted https://review.openstack.org/#/c/508779/ and then a patch depending on that and the same issue was hit.08:12
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: DNM: look for info  https://review.openstack.org/50882008:12
frickleroanson: garyk: I'm seeing the same issue, seems either the dependencies don't work or the fix must be different08:12
garykfrickler: thanks!08:13
*** lukebrowning has quit IRC08:13
oansongaryk, frickler, I see in https://docs.openstack.org/infra/manual/developers.html#limitations-and-caveats that Depends-On isn't supported. I was hoping it was solved in Zuulv3, which is why I asked.08:13
garykhmm. so we are between a rock and a hard place here - wonder how we can test the update required-projects?08:14
frickleroanson: it should be solved indeed, working for other patches. maybe the solution is wrong, let me do a different version08:15
fricklertmorin: see https://review.openstack.org/508775 but it doesn't seem to work yet08:15
*** lnxnut has quit IRC08:15
* tmorin looking08:16
*** ralonsoh has joined #openstack-infra08:16
oansonfrickler, sure. Thanks.08:16
*** namnh has joined #openstack-infra08:16
tmorinfrickler: ok, got the idea, will now try apply... thanks!08:17
garykAJaeger: can you please look at https://review.openstack.org/#/c/508779/ and let us know if there is anything missing?08:18
*** markvoelker has quit IRC08:18
*** lukebrowning has joined #openstack-infra08:19
*** dizquierdo has joined #openstack-infra08:21
eumel8seanhandley: maybe something like this: https://review.openstack.org/#/c/507660/08:21
*** lukebrowning has quit IRC08:23
*** bauwser is now known as bauzas08:25
*** akscram1 has quit IRC08:25
seanhandleythanks eumel808:25
seanhandleyI'll try to make some sense of that08:25
*** lukebrowning has joined #openstack-infra08:26
*** akscram1 has joined #openstack-infra08:27
*** lukebrowning has quit IRC08:30
*** derekh has joined #openstack-infra08:30
*** lukebrowning has joined #openstack-infra08:32
*** stakeda has joined #openstack-infra08:32
openstackgerritJens Harbott (frickler) proposed openstack-infra/openstack-zuul-jobs master: Add tox jobs including neutron repo  https://review.openstack.org/50882208:33
fricklergaryk: oanson: tmorin: https://docs.openstack.org/infra/manual/zuulv3.html#installation-of-sibling-requirements makes me think tht we may need different jobs names, so I'm trying this now ^^08:33
*** lukebrowning has quit IRC08:36
openstackgerritJens Harbott (frickler) proposed openstack-infra/project-config master: Add neutron to required-projects for neutron-dynamic-routing  https://review.openstack.org/50877508:38
*** lukebrowning has joined #openstack-infra08:38
openstackgerritZhiyuan Cai proposed openstack-infra/openstack-zuul-jobs master: Fix nodesets for tricircle multi-region job  https://review.openstack.org/50882408:40
openstackgerritMerged openstack-infra/zuul-jobs master: Change in to work dir before executing tox  https://review.openstack.org/50878308:40
garykfrickler: i do not think that will address our issues as e need to include a number of different neutron projects. In addition to this I am not sure how you can test that this actually addresses the issues int he dynamic-routing project.08:40
openstackgerritThomas Morin proposed openstack-infra/project-config master: Add projects to required-projects for networking-(bagpipe|bgpvpn)  https://review.openstack.org/50882508:41
garykfirst the project config patch will need to be approved then you can post a path in dyanmic routing, then rinse recycle.08:41
seanhandleyeumel8: The docs team recommends a word with clarkb08:41
*** lukebrowning has quit IRC08:43
tmorinfrickler: can you have a look at https://review.openstack.org/508825 ?  (seen  https://review.openstack.org/508822 but in n8g-bagpipe and n8g-bgpvpn we need other repos than just neutron)08:43
*** jascott1 has quit IRC08:44
*** jascott1 has joined #openstack-infra08:44
*** lukebrowning has joined #openstack-infra08:45
eumel8seanhandley: yes, then you have to wait for the US wake up time, I think :)08:45
openstackgerritEyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow  https://review.openstack.org/50878508:45
fricklergaryk: with zuulv3 depends-on should be working for this kind of tests, it did for a couple of other things. except we might be hitting another zuul bug here08:46
seanhandleyok eumel8 - I can wait :)08:47
*** edmondsw has joined #openstack-infra08:48
fricklertmorin: zuul/layout.yaml is/was for zuul v2, you need to look at stuff in zuul.d now. also I'm still testing stuff for neutron-dynamic-routing, might be easier if you wait a moment until that one is working08:49
*** jascott1 has quit IRC08:49
garykfrickler: the depends on does not seem to be working.08:49
*** lukebrowning has quit IRC08:49
*** lukebrowning has joined #openstack-infra08:51
openstackgerritThomas Morin proposed openstack-infra/project-config master: neutron-lib: add openstack/requirements to required projects  https://review.openstack.org/50882708:51
*** edmondsw has quit IRC08:52
tmorinfrickler: wops, didn't notice the directory was different...08:54
* aspiers is at the Gerrit User Summit in London. Any other stackers attending?08:54
*** lukebrowning has quit IRC08:55
tmorinfrickler: will modify so that the work progresses in parallel ; but understood: I'll wait for you confirmation that the neutron-dynamic-routing thing is fixed before I start bugging you on the other one :)08:55
*** yamamoto has quit IRC08:56
*** lukebrowning has joined #openstack-infra08:57
fricklergaryk: seems you are correct about the dependency not working :(08:57
openstackgerritThomas Morin proposed openstack-infra/project-config master: neutron-lib: add openstack/requirements to required projects  https://review.openstack.org/50882708:58
*** bhavik1 has joined #openstack-infra09:00
*** lukebrowning has quit IRC09:02
*** lukebrowning has joined #openstack-infra09:03
openstackgerritThomas Morin proposed openstack-infra/project-config master: Add projects to required-projects for networking-(bagpipe|bgpvpn)  https://review.openstack.org/50882509:05
openstackgerritEyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow  https://review.openstack.org/50878509:06
*** bobh has joined #openstack-infra09:07
*** lukebrowning has quit IRC09:08
fricklerAJaeger: yolanda: could you consider speculatively merging https://review.openstack.org/508775 and https://review.openstack.org/508822 ? seems the speculative testing on https://review.openstack.org/508781 isn't working, I'm still seeing the old jobs being run there09:08
*** lukebrowning has joined #openstack-infra09:09
oansonAJaeger, yolanda: Same for https://review.openstack.org/#/c/508785/ , if possible.09:10
*** bobh has quit IRC09:11
openstackgerritThomas Morin proposed openstack-infra/project-config master: neutron-lib: add openstack/requirements to required projects  https://review.openstack.org/50882709:13
*** lnxnut has joined #openstack-infra09:13
*** witek has joined #openstack-infra09:13
*** lukebrowning has quit IRC09:14
*** markvoelker has joined #openstack-infra09:15
openstackgerritThomas Morin proposed openstack-infra/project-config master: Add projects to required-projects for networking-(bagpipe|bgpvpn)  https://review.openstack.org/50882509:17
*** e0ne has joined #openstack-infra09:18
*** lukebrowning has joined #openstack-infra09:20
*** panda|bbl is now known as panda09:23
*** tosky has joined #openstack-infra09:23
*** dizquierdo has quit IRC09:24
*** lukebrowning has quit IRC09:24
*** lukebrowning has joined #openstack-infra09:26
*** sambetts|afk is now known as sambetts09:28
openstackgerritThomas Morin proposed openstack-infra/project-config master: Add projects to required-projects for networking-(bagpipe|bgpvpn)  https://review.openstack.org/50882509:30
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: WIP: Fix TripleO CI jobs  https://review.openstack.org/50866009:30
*** lukebrowning has quit IRC09:31
*** dizquierdo has joined #openstack-infra09:31
*** lukebrowning has joined #openstack-infra09:33
*** stakeda has quit IRC09:35
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: WIP: Fix TripleO CI jobs  https://review.openstack.org/50866009:36
*** lukebrowning has quit IRC09:37
*** esberglu has joined #openstack-infra09:39
*** lukebrowning has joined #openstack-infra09:39
*** lnxnut has quit IRC09:39
openstackgerritAndrey Kurilin proposed openstack-infra/project-config master: Remove legacy-rally-dsvm-keystone-v2api-rally job  https://review.openstack.org/50883309:40
*** greghaynes has quit IRC09:40
*** esberglu has quit IRC09:43
*** lukebrowning has quit IRC09:43
*** greghaynes has joined #openstack-infra09:45
*** lukebrowning has joined #openstack-infra09:45
openstackgerritAndrey Kurilin proposed openstack-infra/project-config master: [rally] fix cases when *verify* job should be launched  https://review.openstack.org/50883409:46
aspiersAJaeger: who are our main folks looking after Gerrit?09:48
*** markvoelker has quit IRC09:48
openstackgerritDavid Pursehouse proposed openstack-infra/lodgeit master: Removes unnecessary utf-8 encoding  https://review.openstack.org/41874809:49
*** lukebrowning has quit IRC09:50
*** vsaienk0 has joined #openstack-infra09:50
*** kiennt26 has quit IRC09:51
*** lukebrowning has joined #openstack-infra09:51
vsaienk0infra team could you please  help to figure out why timeout for ironic-grenade job is set to 110 min according to logs http://logs.openstack.org/83/506983/3/check/legacy-grenade-dsvm-ironic/783379c/job-output.txt.gz#_2017-10-02_07_23_28_887978 while it should be 180min according to job definition https://git.openstack.org/cgit/openstack-infra/openstack-zuul-jobs/tree/zuul.d/zuul-legacy-jobs.yaml?h=refs/heads/master#n279109:53
*** egonzalez has quit IRC09:55
toskyuhm uhm, was the issue related to jobs running on trusty instead of xenial fixed?09:56
*** lukebrowning has quit IRC09:56
*** yamamoto has joined #openstack-infra09:56
fricklertosky: should be fixed, yes09:57
*** lukebrowning has joined #openstack-infra09:58
toskyfrickler: thanks, then I can safely recheck :)09:58
*** lukebrowning has quit IRC10:02
*** yamamoto has quit IRC10:02
openstackgerritJulie Pichon proposed openstack-infra/openstack-zuul-jobs master: Add required-projects for Cliff legacy tox jobs  https://review.openstack.org/50883710:06
*** lnxnut has joined #openstack-infra10:07
*** bobh has joined #openstack-infra10:08
*** shu-mutou is now known as shu-mutou-AWAY10:08
*** cshastri has quit IRC10:08
*** lukebrowning has joined #openstack-infra10:09
*** lnxnut has quit IRC10:11
*** bobh has quit IRC10:13
*** lukebrowning has quit IRC10:13
*** cuongnv has quit IRC10:13
vsaienk0clarkb: could you please help to figure out why job timeout is not applied to ironic-grenade job ^10:14
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Add multinode integration jobs and integration tests for known_hosts  https://review.openstack.org/50478710:14
*** lukebrowning has joined #openstack-infra10:15
fricklervsaienk0: clarkb: maybe zuul sets BUILD_TIMEOUT but doesn't export it, so it isn't set when "./safe-devstack-vm-gate-wrap.sh" is executed?10:17
*** dtantsur|afk is now known as dtantsur10:18
vsaienk0frickler: it might be as according to job BUILD_TIMEOUT was set to 120 min which is the default10:18
*** lukebrowning has quit IRC10:19
fricklervsaienk0: the zuul/timeout variable seems be to correctly set here http://logs.openstack.org/83/506983/3/check/legacy-grenade-dsvm-ironic/783379c/zuul-info/inventory.yaml10:20
openstackgerritMerged openstack-infra/project-config master: Remove legacy-rpm-packaging-tox-lint  https://review.openstack.org/50861010:20
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Add integration tests for multi-node-hosts-file  https://review.openstack.org/50578910:20
vsaienk0frickler: but I don't see where BUILD_TIMEOUT is set10:21
*** ociuhandu has joined #openstack-infra10:21
*** lukebrowning has joined #openstack-infra10:21
fricklervsaienk0: in zuul/ansible/filter/zuul_filters.py L3010:22
*** namnh has quit IRC10:23
openstackgerritMerged openstack-infra/project-config master: Remove legacy shade jobs from os-client-config  https://review.openstack.org/50877310:25
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Fix Rally jobs: Add dib-utils to required projects  https://review.openstack.org/50879910:25
*** lukebrowning has quit IRC10:25
*** lukebrowning has joined #openstack-infra10:27
openstackgerritJens Harbott (frickler) proposed openstack-infra/openstack-zuul-jobs master: Really only copy logs dir for chef-rake-integration  https://review.openstack.org/50884110:29
*** jkilpatr has quit IRC10:29
*** lukebrowning has quit IRC10:32
*** lukebrowning has joined #openstack-infra10:33
*** psachin has quit IRC10:34
yolandaAJaeger, question... look at https://review.openstack.org/#/c/506138/ . See that legacy-bifrost-integration-tinyipa-opensuse-423 is being triggered. But according to the zuul.d/projects.yaml it should not even be triggered: http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n511110:35
yolandaare we missing something?10:35
*** edmondsw has joined #openstack-infra10:36
*** lukebrowning has quit IRC10:38
*** bhavik1 has quit IRC10:39
*** lukebrowning has joined #openstack-infra10:40
*** edmondsw has quit IRC10:40
*** vsaienk0 has quit IRC10:44
*** lukebrowning has quit IRC10:44
*** vsaienk0 has joined #openstack-infra10:45
*** markvoelker has joined #openstack-infra10:45
*** jascott1 has joined #openstack-infra10:46
*** lukebrowning has joined #openstack-infra10:46
*** lukebrowning has quit IRC10:50
*** pbourke has quit IRC10:50
*** adisky has quit IRC10:51
*** lukebrowning has joined #openstack-infra10:52
*** pbourke has joined #openstack-infra10:52
*** slaweq has quit IRC10:54
Shrewsyolanda: that job is also listed in 'check'10:54
*** slaweq has joined #openstack-infra10:54
openstackgerritEyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow  https://review.openstack.org/50878510:55
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Remove rpm-packaging-tox-lint legacy job  https://review.openstack.org/50860910:57
*** lukebrowning has quit IRC10:57
yolandaShrews, but also with stable/newton|ocata there10:58
yolandato block from running for those branches10:58
yolandawhat do i miss'10:58
yolanda?10:58
*** lukebrowning has joined #openstack-infra10:58
*** yamamoto has joined #openstack-infra10:58
*** slaweq has quit IRC10:59
Shrewsyolanda: oh yeah, i missed the branch. looks like a bug on branch matching. jeblair: mordred: ^^^11:01
AJaegeryolanda: looking at http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n5138 - that's the check definition - something is indeed wrong11:01
*** slaweq has joined #openstack-infra11:01
AJaegerYeah, as Shrews said11:01
*** nicolasbock_ has joined #openstack-infra11:01
yolandaok, shall i report a bug ? or just the mention here is ok?11:02
AJaegeryolanda: best to ask jeblair later what to do...11:02
AJaegeryolanda: https://review.openstack.org/#/c/508689 and https://review.openstack.org/#/c/508697 are ready for review if time permits11:02
openstackgerritAndrey Kurilin proposed openstack-infra/openstack-zuul-jobs master: Remove legacy-rally-dsvm-keystone-v2api-rally definition  https://review.openstack.org/50885011:03
*** lukebrowning has quit IRC11:03
*** yamamoto has quit IRC11:04
openstackgerritFrank Kloeker proposed openstack-infra/puppet-zanata master: Preparation for Zanata 4 version  https://review.openstack.org/50679511:04
*** sdague has joined #openstack-infra11:05
*** lukebrowning has joined #openstack-infra11:05
*** slaweq has quit IRC11:05
*** jkilpatr has joined #openstack-infra11:07
*** slaweq has joined #openstack-infra11:07
*** dave-mccowan has joined #openstack-infra11:07
*** lnxnut has joined #openstack-infra11:08
*** bobh has joined #openstack-infra11:09
*** lukebrowning has quit IRC11:09
*** nicolasbock_ has quit IRC11:10
*** bcafarel has joined #openstack-infra11:11
*** slaweq has quit IRC11:11
*** edmondsw has joined #openstack-infra11:13
andreykurilinAJaeger: hi! should we enable new-jobs in the local repo or it should be still done in project-config?11:13
*** bobh has quit IRC11:13
openstackgerritJavier Peña proposed openstack-infra/project-config master: Remove legacy Packstack integration jobs  https://review.openstack.org/50885111:13
*** edmondsw has quit IRC11:14
*** lukebrowning has joined #openstack-infra11:15
*** yamamoto has joined #openstack-infra11:17
*** yamamoto has quit IRC11:18
*** markvoelker has quit IRC11:19
*** lucasagomes is now known as lucas-hungry11:19
*** alexchadin has joined #openstack-infra11:19
*** lukebrowning has quit IRC11:20
garykAJaeger: is there a timeout for the unit tests? ours take about 40 minutes and we are getting a timeout at 35 minutes? Anyway of tweaking this11:20
AJaegerandreykurilin: I suggest to first consolidate the existing jobs before tackling new ones ;) The ifnra manual explains how to do set - once you moved them over to your repo, new jobs should go to your repo...11:20
AJaegergaryk: you should be able to override this - similar how some changes are in the queue for adding required-repositories11:21
garykAJaeger: do you have an example by chance?11:21
*** lukebrowning has joined #openstack-infra11:21
*** bobh has joined #openstack-infra11:22
*** bobh has quit IRC11:23
*** nicolasbock_ has joined #openstack-infra11:23
AJaegergaryk: similar to what you do for vmware-nsxlib, just add timeout.11:23
AJaegergaryk: this is new for all of us ;)11:23
AJaegergaryk: so, don't have a working example for this specific case11:23
garykAJaeger: gracias!11:24
*** egonzalez has joined #openstack-infra11:24
*** mrunge has quit IRC11:24
*** mrunge has joined #openstack-infra11:24
*** dhajare has joined #openstack-infra11:25
fricklergaryk: do you have an example of a job run with a timeout? there seem to be issues that configured timeouts sometimes do not get applied correctly, see my conversation with vsaienk0 earlier11:26
garykfrickler: https://review.openstack.org/508840 and https://review.openstack.org/50880911:26
openstackgerritMerged openstack-infra/project-config master: Remove old publish-openstack-python-docs jobs  https://review.openstack.org/50868911:26
*** lukebrowning has quit IRC11:26
*** yamamoto has joined #openstack-infra11:28
*** lukebrowning has joined #openstack-infra11:28
*** bobh has joined #openstack-infra11:28
* AJaeger runs some errands, will be back later11:28
*** kjackal_ has joined #openstack-infra11:29
openstackgerritMerged openstack-infra/project-config master: Fix typo in comment  https://review.openstack.org/50869711:32
*** lukebrowning has quit IRC11:32
openstackgerritAndrey Kurilin proposed openstack-infra/openstack-zuul-jobs master: Fix typo in descr of build-openstack-sphinx-docs  https://review.openstack.org/50885411:32
sshnaidmcat /etc/nodepool/sub_nodes_private - 15.184.65.108, it doesn't seems like private address..11:32
*** ociuhandu has quit IRC11:33
*** ociuhandu has joined #openstack-infra11:33
*** lukebrowning has joined #openstack-infra11:34
*** ijw has joined #openstack-infra11:34
*** lnxnut has quit IRC11:36
openstackgerritJavier Peña proposed openstack-infra/openstack-zuul-jobs master: Remove legacy Packstack jobs  https://review.openstack.org/50885511:38
*** lukebrowning has quit IRC11:38
eumel8what that mean: ERROR! /home/zuul/src/git.openstack.org/openstack-infra/project-config not found11:38
*** ijw has quit IRC11:39
eumel8in legacy-openstackci-beaker-ubuntu-trusty job11:39
*** jamesdenton has quit IRC11:39
*** lukebrowning has joined #openstack-infra11:40
openstackgerritJens Harbott (frickler) proposed openstack-infra/openstack-zuul-jobs master: Really only copy logs dir for chef-rake-integration  https://review.openstack.org/50884111:41
*** yamamoto has quit IRC11:43
fricklereumel8: it means that you need to add the missing project to your job definition, see https://review.openstack.org/508799 for an example11:44
*** lukebrowning has quit IRC11:45
*** yamamoto has joined #openstack-infra11:45
*** lukebrowning has joined #openstack-infra11:46
eumel8frickler: so the legacy jobs will also run in the future? or is there a plan to migrate this to normal jobs?11:47
eumel8I'm affected since this morning in https://review.openstack.org/#/c/506795/11:49
openstackgerritJavier Peña proposed openstack-infra/project-config master: Remove legacy Packstack integration jobs  https://review.openstack.org/50885111:50
openstackgerritJavier Peña proposed openstack-infra/openstack-zuul-jobs master: Remove legacy Packstack jobs  https://review.openstack.org/50885511:51
*** lukebrowning has quit IRC11:51
*** mat128 has joined #openstack-infra11:52
*** slaweq has joined #openstack-infra11:52
fricklereumel8: you may want to read https://docs.openstack.org/infra/manual/zuulv3.html for the big picture11:53
*** lukebrowning has joined #openstack-infra11:53
openstackgerritEyal Leshem proposed openstack-infra/openstack-zuul-jobs master: Add neutron to dragonflow requiered-project  https://review.openstack.org/50885611:55
*** lukebrowning has quit IRC11:57
*** jamesdenton has joined #openstack-infra11:58
*** dprince has joined #openstack-infra11:59
eumel8frickler: thx for the link. First I'm on a puppet module change for I18n and now I have a Zuul-v3 migration at the neck :)11:59
*** lukebrowning has joined #openstack-infra11:59
*** pblaho has joined #openstack-infra11:59
*** yamamoto has quit IRC11:59
*** tpsilva has joined #openstack-infra12:00
openstackgerritEyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow  https://review.openstack.org/50878512:01
*** ldnunes has joined #openstack-infra12:02
*** yamamoto has joined #openstack-infra12:02
*** tmorin has quit IRC12:02
*** jamesdenton has quit IRC12:02
*** lnxnut has joined #openstack-infra12:03
*** lukebrowning has quit IRC12:04
leyalAJaege - is that the write place to insert the the required project of a a specific job - https://review.openstack.org/#/c/508856/1 ?12:04
*** lukebrowning has joined #openstack-infra12:05
*** alexchadin has quit IRC12:05
*** jamesdenton has joined #openstack-infra12:06
*** markvoelker has joined #openstack-infra12:06
*** alexchadin has joined #openstack-infra12:06
openstackgerritFrank Kloeker proposed openstack-infra/openstack-zuul-jobs master: Add missing repo to legacy-openstackci-beaker jobs  https://review.openstack.org/50885712:08
*** sshnaidm is now known as sshnaidm|afk12:08
eumel8frickler: maybe it helps ^^12:08
*** edmondsw has joined #openstack-infra12:09
*** lukebrowning has quit IRC12:10
*** lucas-hungry is now known as lucasagomes12:10
*** lukebrowning has joined #openstack-infra12:11
*** trown|outtypewww is now known as trown12:14
*** lukebrowning has quit IRC12:16
*** claudiub has joined #openstack-infra12:17
*** sshnaidm|afk is now known as sshnaidm12:20
*** mkostrzewa has joined #openstack-infra12:21
*** lnxnut has quit IRC12:21
*** lukebrowning has joined #openstack-infra12:21
*** tmorin has joined #openstack-infra12:22
mkostrzewaHi guys! I've heard that there is some work going on for Nodepool Windows support?12:23
openstackgerritFrank Kloeker proposed openstack-infra/openstack-zuul-jobs master: Add missing repo to legacy-openstackci-beaker jobs  https://review.openstack.org/50885712:24
Shrewsinfra-root: / on nodepool.o.o is full again12:24
*** aviau has quit IRC12:24
*** aviau has joined #openstack-infra12:24
*** jpena is now known as jpena|lunch12:24
openstackgerritEyal Leshem proposed openstack-infra/project-config master: Add neutron to required-projects for dragonflow  https://review.openstack.org/50878512:25
*** lukebrowning has quit IRC12:26
*** yamamoto has quit IRC12:26
fungiShrews: happen to remember what the command was to clean up the excess zk snapshots?12:27
Shrewsfungi: https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_administering12:27
Shrewsfungi: i'm not sure about the values we should use though12:27
Shrewsreading now12:27
tobiashmkostrzewa: yes, https://review.openstack.org/#/q/status:open++topic:windows-support12:28
fungii think we determined that we only need to keep at most a couple since the data is unimportant/ephemeral12:28
*** lukebrowning has joined #openstack-infra12:28
tobiashmkostrzewa: not complete yet, configuring client keys on the executor is missing (but hello world worked already with hard coded keys hacked in)12:29
Shrewsfungi: yeah. anything more than 2 and less than the current 12000 is probably good12:29
Shrewsfungi: /var/lib/zookeeper/version-2$ ls -l snap* | wc -l12:30
Shrews1200412:30
Shrewsfungi: i'll let you have the honors  :)12:30
fungii'm still pulling up the documentation12:30
*** jaypipes has joined #openstack-infra12:30
Shrewsi'm not even sure where the zk jar file lives12:31
*** lukebrowning has quit IRC12:32
Shrewsfungi: there is a /usr/share/zookeeper/bin/zkCleanup.sh, but i'm not sure what that does12:32
* Shrews googles12:33
Shrewsfungi: https://community.hortonworks.com/content/supportkb/48795/how-to-purge-old-zookeeper-directory-files.html12:33
*** lukebrowning has joined #openstack-infra12:34
Shrewsthat script does use the PurgeTxnLog command12:35
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: WIP: Fix TripleO CI jobs  https://review.openstack.org/50866012:35
Shrewsi wonder if snapshot dir and log dir are the same (given that it seems those files exist in the same directory)12:35
*** ijw has joined #openstack-infra12:36
*** rtjure has quit IRC12:36
*** bobh has quit IRC12:36
fungimaybe. the paths to the jarfiles seem to be present in the command line for the zk daemon as it appears in the ps output12:36
Shrews# the directory where the snapshot is stored.12:36
ShrewsdataDir=/var/lib/zookeeper12:36
Shrews# Place the dataLogDir to a separate physical disc for better performance12:37
Shrews# dataLogDir=/disk2/zookeeper12:37
Shrewsfrom /etc/zookeeper/conf/zoo.cfg12:37
Shrews^^^12:37
fungii guess i need to sub in the path to the log4j jar and conffile from the example config as well12:37
Shrewsso maybe: zkCleanup.sh /var/lib/zookeeper /var/lib/zookeeper 312:37
Shrewsfungi: i think the .sh script takes care of it all for you12:38
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: WIP: Fix TripleO CI jobs  https://review.openstack.org/50866012:38
*** lukebrowning has quit IRC12:38
*** rtjure has joined #openstack-infra12:38
*** rlandy has joined #openstack-infra12:39
Shrewsfungi: it uses zkEnv.sh to grab the proper paths, it seems12:39
fungiShrews: oh, i'll pull up that url next12:39
fungii'm not actually at my computer so this is taking way too long. just a sec i'll go downstairs12:40
*** lukebrowning has joined #openstack-infra12:40
*** ijw has quit IRC12:40
andreafmordred, jeblair, clarkb: I guess this may be due to remote not being defined in zuulv3 cloned repos? I think this specific issue is not on the known issues list? http://logs.openstack.org/32/504232/4/gate/legacy-requirements/b740a6a/job-output.txt.gz#_2017-10-02_10_46_14_454791 but it probably affects more than tempest?12:40
mnaserafaik there is no origin with how zuulv3 clones things12:42
*** hemna_ has joined #openstack-infra12:42
andreafmnaser: yeah that's my understanding too12:43
*** hemna_ has quit IRC12:44
fungi#status log ran `sudo -u zookeeper ./zkCleanup.sh /var/lib/zookeeper 3` in /usr/share/zookeeper/bin on nodepool.openstack.org to free up 22gib of space for its / filesystem12:44
openstackstatusfungi: finished logging12:45
fungiinfra-root: ^12:45
*** lukebrowning has quit IRC12:45
*** kgiusti has joined #openstack-infra12:45
Shrews44% usage. now to see if the nodepool launchers recover12:45
*** dhajare has quit IRC12:46
*** lukebrowning has joined #openstack-infra12:46
funginow we only have 3 snapshot files in /var/lib/zookeeper/version-212:47
fungiwhich is what the outcome was supposed to be, so i think that worked12:47
Shrewsyeah, but the launchers don't seem to be recovering. may have to manually restart them12:48
*** mat128 has quit IRC12:48
mkostrzewatobiash: thx for the link; I can see you use Ansible for Windows? Were you thinking about using DSC anywhere?12:49
tobiashmkostrzewa: dsc?12:50
tobiashmkostrzewa: yes, ansible works for windows12:50
Shrews#status log Restarted nodepool-launcher on nl01 and nl02 to fix zookeeper connection12:50
openstackstatusShrews: finished logging12:50
mkostrzewaDesired State Configuration, Microsoft's Ansible equivalent12:50
*** lukebrowning has quit IRC12:51
tobiashmkostrzewa: well, my home is not windows ;)12:51
Shrewsfungi: it seems to be a bug in the launchers where the suspended ZK connection never revives. i'll have to work on debugging that.12:51
Shrewsinfra-root: fyi ^^^12:52
mkostrzewatobiash: sure. I was wondering because we had lots of little issues with Ansible for Windows, and the module support just wasn't there...12:52
tobiashmkostrzewa: zuul3 is based entirely on ansible so dsc would probably not being supported (other than running ansible to kick dsc)12:52
*** lukebrowning has joined #openstack-infra12:53
Shrewsfungi: i think we need that cleanup command added to nodepool.o.o cron12:53
mkostrzewatobiash: do you install any Windows Features / Roles via Ansible?12:53
tobiashmkostrzewa: no, I just did a hello world job so far to prove that it is possible to integrate windows in zuulv312:55
*** rhallisey has joined #openstack-infra12:55
pabelangermorning!12:55
pabelangerready to get to work12:55
pabelangerwhat should I be looking at?12:55
fungiShrews: the documentation suggested it could be added to the zk config to have it automagically clean up old snapshots when it creates new ones12:56
*** jcoufal has joined #openstack-infra12:56
fungiinclude the following in zoo.cfg and restart the zookeeper servers to automatically purge older files:12:57
fungiautopurge.snapRetainCount=3 [number of logs to retain, in this example 3]12:57
fungiautopurge.purgeInterval= 168 [in hours, in this example 1 week]12:57
Shrewsfungi: whatever works!  :)12:57
fungihttps://community.hortonworks.com/content/supportkb/48795/how-to-purge-old-zookeeper-directory-files.html12:57
*** lukebrowning has quit IRC12:57
fungianybody who's less confused by ansible who can take a guess as to what this job did? http://logs.openstack.org/22/2293d561bf37b71b14a7b89e2ada1a5552fc2168/release-post/tag-releases/7a41b08/12:58
pabelangerlooking12:58
*** alexchadin has quit IRC12:58
fungii can't even tell whether it succeeded or failed12:59
sshnaidmis any reason not to see the patch https://review.openstack.org/#/c/508660/ in zuul status page? http://zuulv3.openstack.org/12:59
*** lukebrowning has joined #openstack-infra12:59
*** links has quit IRC12:59
sshnaidmpabelanger, please take a look at my mail in your time12:59
pabelangerfungi: that job did't have a run playbook, so only did pre-run and post-run.  Why, is another question. Looking into job layout now12:59
fungisshnaidm: "Queue lengths: 532 events, 202 results"12:59
mkostrzewatobiash: OK.. we had lots of little issues like Powershell printing something to stderr would cause Ansible to stop. Stuff like that...13:00
pabelangersshnaidm: yup, will take me a bit to catch up13:00
mkostrzewatobiash: thx for help anyways13:00
sshnaidmfungi, so it's just a queue?13:00
*** tosky_ has joined #openstack-infra13:01
fungisshnaidm: sshnaidm if you just pushed the change in the past few minutes, zuul probably hasn't processed that trigger event yet13:01
*** tosky has quit IRC13:01
*** tosky_ is now known as tosky13:01
sshnaidmfungi, ok, will wait then, thanks13:01
*** thorst has joined #openstack-infra13:02
fungizuul v3 is a lot slower processing events at this scale than v2 was, so there's still some debugging and profiling going on to attempt to speed it up13:02
fungilooks like the next reconfiguration event will likely push the server back over into swap space again13:03
*** mriedem has joined #openstack-infra13:03
*** scottda_ has joined #openstack-infra13:03
*** lukebrowning has quit IRC13:03
*** hemna_ has joined #openstack-infra13:04
fricklerfungi: jeblair commented earlier about memory debugging and a possible restart being needed13:04
fungiyep, i read all the scrollback13:04
*** lukebrowning has joined #openstack-infra13:05
fungithough i'm still not entirely awake yet. need to get back to morning routine stuff and pour some coffee13:05
openstackgerritPaul Belanger proposed openstack-infra/project-config master: Fix typo in tag-release pre playbook  https://review.openstack.org/50887113:06
*** tmorin has quit IRC13:06
pabelangerfungi: there was an error in the log you posted, but it didn't bubble up to logs properly.  If you look on ze08.o.o, you'll see ^ is the reason for the failure.13:06
fungipabelanger: thanks!13:07
*** lbragstad has joined #openstack-infra13:07
Shrewsfungi: oh, you know what? i think maybe i just didn't give the np launchers enough time. zuul uses almost the exact same zk code and it only just now re-established the zk connection.13:07
andreafit looks like this http://git.openstack.org/cgit/openstack-infra/project-config/tree/jenkins/scripts/project-requirements-change.py#n205 will never work with a zuul cloned repo - I suppose we should change that to checking out local branches?13:08
fungiahh13:08
fungiShrews: so maybe it's okay in that case?13:08
Shrewsfungi: i think so. i'm going to poke around and see if maybe zuul uses different timeout settings (i think these were changed recently)13:09
Shrewsfor zk, that is13:09
*** lukebrowning has quit IRC13:09
pabelangerandreaf: ya, all of the jenkins/script are going to need to be converted into ansible playbooks / roles. So, it is possible things won't work, like you found, in zuulv3.13:09
*** esberglu has joined #openstack-infra13:09
*** mkostrzewa has quit IRC13:09
fungiandreaf: ahh, the `git checkout remotes/origin/...` definitely won't work, correct13:10
mnaseris it possible zuul might need a restart as well re: that zk issue?13:10
pabelangerandreaf: I'm not sure if we have discussed a plan recently, but what I was doing was move the bash script into an ansible role, then start refactoring it from bash to ansible over a series of commits13:10
mnasernothing added to queue in the past hour, 0 cpu usage13:10
*** tmorin has joined #openstack-infra13:10
dmsimardmnaser: yeah Shrews found a disk full issue again13:10
andreaffungi, pabelanger: before we ansiblelize that, can we just drop the remote/origin bit out if it? I'm happy to make a patch for that13:11
fungiandreaf: but those branches should be pushed to the server for you with the target branch already checked out, so you should be able to just omit that entirely13:11
*** erlon has joined #openstack-infra13:11
pabelangermnaser: I know jeblair is collecting memory stats, which affects preformance13:11
*** bobh_ has joined #openstack-infra13:11
Shrewsmnaser: zuul should be doing things now13:11
pabelangeruntil jeblair comes online, I don't think we should restart anything on zuulv3.o.o13:11
*** lukebrowning has joined #openstack-infra13:11
mnaserdmsimard i think nodepool was restarted to reconnect but zuul (maybe?) was not13:12
Shrewszuul only came back to life 5 minutes ago13:12
mnasercause cacti shows the server pretty much idle13:12
mnaserah okay13:12
mnasermaybe that doesnt show it yet then13:12
*** isaacb has joined #openstack-infra13:13
andreaffungi: heh ok - I was not sure if I could assume that13:14
andreaffungi: I guess not having the remote set is by design :)13:14
*** yamamoto has joined #openstack-infra13:15
toskyso... should we wait for zuul to notice the jobs that were not triggered so far, or we should wait a bit and retrigger?13:15
*** lukebrowning has quit IRC13:15
fungiandreaf: yes, the repositories are no longer pulled on the nodes, they're pushed to the nodes, so there is no git remote url involved13:15
pabelangertosky: best to wait for a bit13:16
toskypabelanger: ack13:16
*** thorst has quit IRC13:17
fungiandreaf: and dependency changes are merged in sequence to their respective target branches in the pushed copy of the repo, with the target branch for the change which triggered the job checked out in the worktree, so in most cases you should just be able to cd into it and use it, or explicitly checkout a local branch name if you need to work on a different branch than the one which triggered it13:17
*** alexchadin has joined #openstack-infra13:18
*** lnxnut has joined #openstack-infra13:19
pabelangerinfra-root: I'm going to see why graphite.o.o is no longer collecting statsd info13:19
openstackgerritAndrea Frittoli proposed openstack-infra/project-config master: Do not checked from remote branch  https://review.openstack.org/50887513:20
andreaffungi: ^^^13:20
andreaffungi: as far as I understood the logic there we need to temp switch to branch (from HEAD) so I think using the local branch is the right approach there13:21
openstackgerritPaul Belanger proposed openstack-infra/system-config master: Add zuulv3.o.o to graphite.o.o  https://review.openstack.org/50887613:21
*** yamamoto has quit IRC13:21
andreaffungi: I guess that fix will require a new nodepool image if/when it merges?13:21
*** lukebrowning has joined #openstack-infra13:22
pabelangerinfra-root: okay, I reloaded firewall on graphite.o.o, that has done something. I've also noticed zuulv3.o.o missing^13:22
*** jcoufal_ has joined #openstack-infra13:22
*** jcoufal has quit IRC13:24
*** Guest62304 has quit IRC13:24
*** jpena|lunch is now known as jpena13:24
fungiandreaf: oh, yes that will certainly be a slow one to iterate on if we continue trying to fix up that legacy job while it's relying on scripts baked into the images13:24
*** eharney has joined #openstack-infra13:25
*** Hal has joined #openstack-infra13:25
*** Hal is now known as Guest1059413:26
*** lukebrowning has quit IRC13:26
andreaffungi: you mean we should have ansible role that contains that script and runs it, and migrate the legacy-requirements job to a new zuulv3 native job that uses it?13:28
andreaffungi: that sounds like some work too :)13:28
fungiandreaf: i think that's what was done elsewhere, but i don't know for sure13:28
*** lukebrowning has joined #openstack-infra13:28
*** thorst has joined #openstack-infra13:29
*** lukebrowning has quit IRC13:32
*** baoli has joined #openstack-infra13:32
*** martinkopec has quit IRC13:32
*** jcoufal has joined #openstack-infra13:32
*** jcoufal_ has quit IRC13:33
*** baoli_ has joined #openstack-infra13:33
esbergluAnyone know how the readthedocs pages are generated? I think we have something misconfigured for nova-powervm. The stable branches docs aren't showing up since newton13:33
esbergluhttps://readthedocs.org/projects/nova-powervm/versions/13:33
mriedemcan someone link me to the reviewday rendered hosted page? i always seem to lose it - unless it doesn't exist anymore, and i'm thinking of the bug smash page13:34
*** lukebrowning has joined #openstack-infra13:34
*** Goneri has joined #openstack-infra13:34
*** alex_xu has quit IRC13:35
*** ijw has joined #openstack-infra13:36
*** baoli has quit IRC13:37
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: WIP: Fix TripleO CI jobs  https://review.openstack.org/50866013:37
fungimriedem: http://status.openstack.org/reviews/13:38
edmondswesberglu they are showing up, but inactive...13:38
*** baoli_ has quit IRC13:38
*** ihrachys has joined #openstack-infra13:38
*** alex_xu has joined #openstack-infra13:38
edmondswdoes anyone know if there's a way to mark them active automatically, or does that have to be done manually?13:38
*** Anticimex has quit IRC13:40
mriedemfungi: thanks, but i thought we had something that looked like this, but for reviews http://status.openstack.org/bugday/13:40
*** ijw has quit IRC13:41
*** Anticimex has joined #openstack-infra13:41
fungimriedem: not that i remember, though i think sdague maybe did some custom pages for something along those lines at one point? i'm really not sure13:42
mriedemok np13:43
fungilike maybe some nova-specific burndown charts for certain sprint-like events13:43
sdaguefungi: yeh, that was a bit different13:43
mriedemwe've had those burndown pages for some other things13:43
sdaguemriedem: what kind of view do you want?13:43
openstackgerritAnastasia Kravets proposed openstack-infra/project-config master: add ec2-api to unified doc build jobs  https://review.openstack.org/50888013:44
mriedemi'll see if https://github.com/openstack-infra/reviewday.git generates it13:44
*** rloo has joined #openstack-infra13:44
*** lukebrowning has quit IRC13:45
*** thorst has quit IRC13:45
*** lnxnut has quit IRC13:45
*** sileht has joined #openstack-infra13:46
sdaguethat's a big rails app13:47
sshnaidmfungi, maybe it's problem with zuul status page, but it seems like something is stuck there.. 492224,3 patch finished but appears there for an hour, I don't see any new patches arrive there13:47
sdagueoh, maybe it changed up13:47
mordredandreaf, fungi: yes - we should definitely shift that script to being in a new non-legacy role so that it's not baked in to images anymore13:47
edmondswanother issue with readthedocs is that http://nova-powervm.readthedocs.io redirects to latest, so how is someone supposed to determine what other branches are available?13:47
*** thorst has joined #openstack-infra13:47
*** thorst has quit IRC13:48
openstackgerritVasyl Saienko proposed openstack-infra/openstack-zuul-jobs master: Set BUILD_TIMEOUT for ironic grenade jobs  https://review.openstack.org/50888213:48
dmsimardI'm a bit confused. Is there a directory on the *node* that is uploaded to the executor by default before logs are uploaded to the logserver ? Say, "{{ ansible_user_dir }}/logs" for example ? I'm trying to find if there's one but I can't ?13:48
*** thorst has joined #openstack-infra13:48
dmsimardIt seems what ends up being uploaded to logs is explicitely uploaded13:48
*** wolverineav has joined #openstack-infra13:49
dmsimards/explicitely uploaded/explicitely uploaded to the executor/13:49
mordreddmsimard: there is a directory on the *execturor* that is automatically uploaded to logs.o.o13:49
*** lukebrowning has joined #openstack-infra13:49
mordreddmsimard: things are (currently) explicitly placed in the logs dir on the executor13:49
dmsimardmordred: right, but nothing is uploaded to the executor by default ?13:49
*** yamamoto has joined #openstack-infra13:50
*** yamamoto_ has joined #openstack-infra13:50
*** garyk has quit IRC13:52
mordreddmsimard: that is correct13:52
mordreddmsimard: we've had a few discussions about changing that - but haven't had the brainspace yet to do so13:52
andreafmordred: does anything exists yet to for requirement job in zuulv3 native format? Is there a WIP on that?13:53
*** bnemec has joined #openstack-infra13:53
dmsimardmordred: right, I was thinking about that just now. I guess "feature parity" with v2 includes uploading "{{ ansible_user_dir }}/logs" by default but anyway it's not really important for the time being.13:53
*** lukebrowning has quit IRC13:53
dmsimardAJaeger: fyi ^ this was me validating the comment from https://review.openstack.org/#/c/508434/15/playbooks/upload-logs.yaml13:54
*** yamamoto has quit IRC13:54
pabelangermordred: dmsimard: Ya, I think originally I didn't like the idea of auto upload back to executor, but after PTG, I might have reconsidered it. Would make writing jobs a little easier13:54
openstackgerritAlfredo Moralejo proposed openstack-infra/tripleo-ci master: Use infra proxy server for trunk.r.o in delorean-deps  https://review.openstack.org/50888413:54
*** lukebrowning has joined #openstack-infra13:55
mordredpabelanger: yah - I agree with you on both sides of that - I liked not doing it originally, but now I think I've become more of a fan of doing it13:56
pabelanger++13:56
dmsimardmordred: adding it to the base job makes it easy and convenient IMO, the only thing jobs have to do is put their logs there and they'll be picked up.. instead of offloading the logic of uploading their things first13:56
*** kgiusti has left #openstack-infra13:57
mordredandreaf: no - I don't think there is - but I can help put one together real quick if you like13:58
pabelangerThe other way, which I also like, is to create a whitelist of the logs you want, and the post-run role, will go out and fetch them.  This way, it forces job authors to be specific about logs, and not glob everything13:59
andreafmordred: sure that would be great13:59
*** thorst has quit IRC13:59
*** lukebrowning has quit IRC13:59
*** mat128 has joined #openstack-infra14:01
*** thorst has joined #openstack-infra14:01
Jeffrey4lhow to handle the current blocked jobs?14:01
*** lukebrowning has joined #openstack-infra14:01
dmsimardpabelanger: I don't think there's much of a difference between putting the whitelist logic inside the job role parameter or inside some other log collection bash whatever14:02
vsaienk0frickler: clarkb could you please check https://review.openstack.org/#/c/508882 this should help with applying correct timeout for ironic jobs14:02
dmsimardpabelanger: for example, devstack and other projects already have their log collection scripts which is pretty much a whitelist :p14:02
*** martinkopec has joined #openstack-infra14:03
*** thorst has quit IRC14:04
pabelangerdmsimard: right, clarkb is a fan of whitelist approach, since we have limited storage on logs.o.o14:04
*** thorst has joined #openstack-infra14:04
*** efried has joined #openstack-infra14:05
dmsimardinfra-root: is there anyone actively working on keeping zuul/nodepool healthy right now? Doesn't seem clear from recent backlog14:05
pabelangerdmsimard: not until jeblair comes online14:06
*** lukebrowning has quit IRC14:06
pabelangerfor now, I'm just in pending mode14:06
dmsimardok, probably worth a #status notice or something, there's people asking all over the place14:06
mnaseri think it's probably just stuck because it lost connectivity to zk (rather than stuck leaking memory)14:07
mnaseraccording to cacti graphs14:07
*** hongbin has joined #openstack-infra14:08
*** lukebrowning has joined #openstack-infra14:08
pabelangerYah, we likely should ask people for some patience this week until we get everything optimized14:08
mnaserhttp://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=557&page=2 idle with no cpu usage, in tune with the losing zookeeper connectivity14:08
mnasernodepool was restarted14:08
mnaserbut yeah i guess we can wait14:09
dansmithit just flushed I think14:09
dansmithalthough things I had in the backlog didn't seem to get queued14:09
pabelangeryah, debug log on zuul is now doing merging14:09
pabelangerlet me see if I can find out why14:10
mnaseroh it did14:10
mnaserof course it does14:10
*** lukebrowning has quit IRC14:12
*** srobert has joined #openstack-infra14:13
*** vsaienk0 has quit IRC14:13
dmsimardpabelanger: fyi I removed -W on https://review.openstack.org/#/c/505233/ -- looked at logstash and we have occurrences.14:14
*** lukebrowning has joined #openstack-infra14:14
dmsimard^ I'm seeing ansible sudo timeouts, it's an elastic-recheck query to track it14:14
dmsimardFor example http://logs.openstack.org/71/323971/59/check/openstack-tox-py27/9ad5128/job-output.txt#_2017-10-01_23_58_01_69330914:15
pabelangerI don't see anything specific in debug log for zuul.14:15
pabelangerSeems after: 2017-10-02 14:00:54,509 DEBUG zuul.IndependentPipelineManager: Build <Build 1235b8f2fd404df483bad805885a651d of legacy-grenade-dsvm-neutron-multinode-live-migration on <Worker ze02.openstack.org>> completed14:15
pabelangerthings started moving again14:15
pabelangernode requests being processed14:15
pabelangeretc14:15
mordredandreaf: remote:   https://review.openstack.org/508891 Add requirements-check job14:18
toskyhttps://review.openstack.org/#/c/508847/ (sahara-tests) is both in check and check-tripleo queues, is it expected?14:18
mordredandreaf: (that includes your change, fwiw)14:18
*** lukebrowning has quit IRC14:18
dmsimardtosky: if you look at the check tripleo change, you'll see that there are no actual jobs14:18
dmsimardtosky: that is zuul evaluating if there are any jobs to run in that pipeline14:19
*** kiennt26 has joined #openstack-infra14:19
dmsimardtosky: usually it'd be fast enough that you wouldn't notice it but there are performance issues right now.14:19
toskydmsimard: oh, oki14:19
toskythanks14:19
toskyit's still chewing the delayed events14:20
pabelangerhttp://grafana.openstack.org/dashboard/db/nodepool14:21
pabelangernodepool-launchers working again14:21
dmsimardpabelanger: btw for that dashboard, https://review.openstack.org/#/c/508349/14:21
pabelangermordred: mind adding https://review.openstack.org/508876/ do your review pipeline14:21
pabelangerdmsimard: cool14:22
*** lukebrowning has joined #openstack-infra14:22
jeblair13:53 < dmsimard> mordred: right, I was thinking about that just now. I guess "feature parity" with v2 includes uploading "{{ ansible_user_dir }}/logs" by default but anyway it's not really important for the time being.14:23
jeblairdmsimard: that's not how jenkins worked at all14:23
pabelangerdmsimard: left question14:23
pabelangerdmsimard: nevermind, reading commit message now14:23
jeblairdmsimard: jobs have always explicitly had to explicitly save logs14:24
dmsimardjeblair: pretty sure all I have to do in a v2 job is to put the stuff I want collected in $WORKSPACE/logs and the scp log publisher took care of rsync'ing it to the logserver14:24
andreafmordred: cool, thanks14:24
*** eharney has quit IRC14:24
jeblairdmsimard: if you were using devstack-gate, that's because it had an scp publisher set up for $WORKSPACE/logs.14:24
jeblairdmsimard: in other words, the jjb config said "copy $WORKSPACE/logs to the log server"14:25
dmsimardjeblair: fair, I guess I took that publisher as default/granted14:25
dmsimardas in, a vast proportion of jobs used that publisher, not just devstack/devstack-gate14:25
mordredandreaf: and a follow up, remote:   https://review.openstack.org/508894 Stop using zuul-cloner in project-requirements-change14:26
*** eharney has joined #openstack-infra14:26
*** lukebrowning has quit IRC14:27
jeblairpabelanger: my query that was taking a long time is finished; feel free to restart zuulv3 if needed (i thought i said to do that anyway)14:27
dmsimardjeblair: it looks like things have started properly dequeuing once again for the time being14:27
andreafmordred: thanks. The original job passed ZUUL_BRANCH to the script, any reason for not passing it anymore?14:28
dmsimardpabelanger: so you want to keep zuul launchers for the time being ? I think two of your comments are contradicting :/14:28
*** lukebrowning has joined #openstack-infra14:29
*** rbrndt has joined #openstack-infra14:29
dmsimardpabelanger: ok, reading them in a chronological order makes sense :P14:29
pabelangerjeblair: okay, I wanted to wait until you were online to check the state of zuul. But, we seem to be processing again. Possible it recovered?14:29
pabelangerdmsimard: ya, sorry. 2 reviews14:29
jeblairpabelanger: no idea, i've been asleep14:30
mordredandreaf: yah - I missed that in patch one - did it in patch 214:30
pabelangerI need to relocate to the library, will be back in a few minutes14:30
mnaserhttps://review.openstack.org/#/c/508763 -- can i borrow a +W here? to help fix releasenote jobs and move projects to the new non legacy job14:31
jeblairfungi: you said the next reconfiguration event will push the server into swap -- have we determined that reconfiguration events cause memory increase?14:31
jeblairfungi: i thought i looked over the weekend and could not make that correlation, but if someone has, that would be great14:31
mordredandreaf: although - we could go further with this and rework the script to not do any checking out - since zuul should be setting up the branches already how that script expects / desires14:31
jeblairat the moment, i have no idea why memory use jumps.  if anyone sees any correlations, please let me know.  :)14:32
openstackgerritDavid Moreau Simard proposed openstack-infra/project-config master: Update Nodepool graphite metric names  https://review.openstack.org/50834914:32
mordredjeblair: I'm not sure we're 100% satisfied it is a correlation - but it seemed like we were seeing memory jumps during reconfigurations14:32
jeblairmordred: which kind of reconfiguration?  full reconfig, tenant reconfig, or creating a dynamic layout?14:32
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Increase ansible internal_poll_interval  https://review.openstack.org/50880514:32
mordredjeblair: but given the weekend nature of the observations, I would not stand by them strongly14:33
mordredjeblair: one sec ...14:33
mnaseri think we saw hangs during dynamic layout reconfig (and i assume the high volume to zuul.yaml changes contributes to the hangs)14:33
*** lukebrowning has quit IRC14:33
mnaserfor a while it was stuck during phase 1 for a little while according to logs14:33
jeblairmnaser: that for sure -- it takes about 1 minute to make a dynamic config14:33
andreafmordred: yeah - so if I want to use this now I need to setup a .zuul.yaml job in tempest, and whatever is there will be combined with the jobs from layout.yaml, right?14:34
*** lukebrowning has joined #openstack-infra14:35
mordredjeblair: http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-10-01.log.html#t2017-10-01T16:52:44 is where clarkb was talking about what he was seeing14:35
jeblairmordred, clarkb: thanks, i'll dig into that timeframe14:37
mordredandreaf: that's right - you can make that tempest patch a depends-on on that patch - and that could be a good way to verify it works properly - once we're happy with that patch and land it, we can update the check-requirements project-template in openstack-zuul-jobs14:37
mordredjeblair: I remember noticing a dynamic config stuck in phase 1 (and before phase 1) for quite some time that was actually processing a shade patch14:38
andreafmordred: ok I'll do that14:39
mordredjeblair: I think if you search for 500365,27 in the logs you should be able to see that14:39
mordredandreaf: alternately ...14:39
*** lukebrowning has quit IRC14:39
andreafmordred?14:39
mordredandreaf: you could make a patch to opentsack-zuul-jobs updating the project template and then just make some other patch in a different repo that depends on that... project-template changes themselves are speculative14:39
mordredandreaf: (instead of making a patch to tempest adding it to the check pipeline there)14:39
pabelangero/14:40
mordredandreaf: in fact - why don't I make the o-z-j patch real quick, since we'll need it anyway14:40
andreafmordred: ok yeah that makes more sense14:40
andreafmordred: sure even easier for me :)14:40
*** lukebrowning has joined #openstack-infra14:41
andreafjeblair, mordred: on the tempest native job, are you happy to move on with this series in d-g https://review.openstack.org/#/q/topic:run_tempest+(status:open+OR+status:merged)? I can move the relevant roles to zuul-jobs as a follow-up14:41
jeblairandreaf: fyi https://review.openstack.org/50425914:43
mordredandreaf: sorry - I haven't looked at those much yet - I think  roles/process-test-results probably wants to move into zuul-jobs and be merged with roles/fetch-stestr-output and roles/fetch-testr-output14:43
*** ericyoung has joined #openstack-infra14:43
jeblairandreaf: if you have time to take over 504259, that would be great14:44
mordredandreaf: (like, I think one role that finds testr / stestr results and processes them as appropriate would be great)14:44
*** dizquierdo has quit IRC14:44
mordredandreaf: the other stuff should, I believe, actually move into the tempest repo - since it's about running tempest and whatnot14:44
mordredjeblair: (does that sound right to you?)14:44
jeblairmordred: yeah -- work on this started before the cutover so... :)  i still have a todo to move the devstack job into devstack as well14:45
*** lukebrowning has quit IRC14:46
mordredjeblair: oh - we should do that before I land the shade patch that consumes it14:46
mordredjeblair: since I think the shade patch is the only consumer of that atm14:46
*** tesseract has quit IRC14:46
mordredjeblair: I'll put moving it on my todo-list for today14:46
jeblairmordred: i thought you were already consuming it14:46
jeblairmordred: thanks14:46
mordredjeblair: nah - haven't landed the patch yet14:46
jeblairandreaf: i'm still surprised devstack files directory needs perm changes :)14:47
*** lukebrowning has joined #openstack-infra14:47
*** ccamacho has left #openstack-infra14:48
openstackgerritMonty Taylor proposed openstack-infra/openstack-zuul-jobs master: Replace legacy-requirements with requirements-check  https://review.openstack.org/50889814:48
jeblairdmsimard, pabelanger, mordred: see the save-file role in 506835 for another component of the log conversation14:48
mordredandreaf: ^^ that updates the template - so a depends-on from any project that should trigger that job should do the rightthing14:48
*** wolverineav has quit IRC14:49
dmsimardandreaf: include_role works14:49
dmsimardandreaf: if you don't use it with a conditional and static: no14:49
fungijeblair: i had an anecdotal correlation of memory surging right after a .zuul.yaml change merged, and there hadn't been any obvious job-related changes merging for nearly an hour prior to that; not a particularly strong correlation: http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2017-10-01.log.html#t2017-10-01T17:37:1614:49
andreafjeblair: yeah I can rebase https://review.openstack.org/#/c/504259/14:49
mordredjeblair, pabelanger, andreaf: agree - I think we should tee up a discussion about those log/artifact collection stuff first thing once we have the current zuul memory/config issue under control14:49
jeblairdmsimard, pabelanger, mordred, andreaf: i was also thinking that we could try out setting a job variable on devstack jobs to specify files for save-file.  so in the same way you can enable a plugin with a job var, you can also add extra files to save.14:50
andreafdmsimard: cool, so I should be able to use it in loop right? did a think land for that?14:50
mordredjeblair: ++14:51
dmsimardandreaf: in a loop ? why would you run an include_role task with items ?14:51
mordreddmsimard: to collect a list of files14:51
openstackgerritAnastasia Kravets proposed openstack-infra/project-config master: add ec2-api to unified doc build jobs  https://review.openstack.org/50888014:51
andreafdmsimard:  include_role: save-file with_items: fileA fileB14:51
mordredandreaf: I *think* the suggestion from ansible people would be to make the role handle a list of inputs instead of doing include_role in a loop - so that each of the tasks in save-file has the with-items in it14:52
dmsimardandreaf, mordred: I don't like that logic, I would run the role once with a list of files and the iteration occurs inside the role14:52
mordreddmsimard: jinx14:52
dmsimardyou should not be including the role like 500 times14:52
*** lukebrowning has quit IRC14:52
openstackgerritAnastasia Kravets proposed openstack-infra/project-config master: add ec2-api to unified doc build jobs  https://review.openstack.org/50888014:52
jeblairdmsimard: can you find an infra-root that's not me to help you fix up include_role in zuul_json?  pabelanger maybe?14:52
dmsimardlike include_role: name: foo vars: files: - one - two14:53
dmsimard(oneline yaml is ew)14:53
jeblairdmsimard: (sorry, i think my dance card is full for the next bit)14:53
dmsimardjeblair: sure14:53
*** alexchadin has quit IRC14:53
mordredandreaf: the thing that is counter-intuitive with roles is that they aren't like function/methods - with one definition and multiple calls -they are actually just fancy include statements or macro expansions -so an include role in a loop winds up like you've got a copy of every task in the role for each item in the loop14:53
*** lukebrowning has joined #openstack-infra14:53
dmsimardandreaf, mordred, andreaf: if we're going to have a 'save-file' role, I'd rather have it in zuul-jobs btw14:54
dmsimardinstead of including andreaf twice, that last nickname should've been jeblair14:54
jeblairfungi: thanks -- that helps -- .zuul.yaml changes landing are tenant reconfiguration events.14:54
mordreddmsimard: yes - I think save-file and also process-test-results are great candidates for zuul-jobs14:54
andreafyeah that could go in zuul-jobs14:55
openstackgerritPaul Belanger proposed openstack-infra/openstack-zuul-jobs master: Add project-config to legacy-openstackci-beaker job  https://review.openstack.org/50890014:55
fungijeblair: i was really only able to isolate that enough for it to be remotely suspicious because of the relative infrequency of memory jumps over the weekend coupled with the relative infrequency of changes merging. if there is actually a causal relationship there, i doubt it's one we can easily isolate at our current change frequency14:56
dmsimardI'd also name it something like collect-logs or something, but that's bikeshedding :)14:56
mordredandreaf: fwiw, in my crazy utopian future world, I would love if it we had a process-test-results role that was smart enough to find the test results from whatever test runner was used, and to transform them to html as appropriate14:56
andreafmordred, dmsimard: so is one role and a loop that uses a task better? I can put a task in the same folder... the thing is that I need to do a sequence of things for each file...14:56
pabelangermordred: fungi: Shrews: https://review.openstack.org/508876/ should allow zuulv3.o.o to post statsd info to graphite.o.o now14:56
*** wolverineav has joined #openstack-infra14:56
mordredandreaf: so that "process-test-results" could also be called at the end of a go job or a javascript job or a java job and things would just-work -that may take a wile, but I think handling testr/stestr as appropriate is a great first step14:57
mordredpabelanger: +A14:57
fungithanks pabelanger!14:57
dmsimardandreaf: what you want to do is fairly simple, right ? put all the logs in one place and then you archive them14:57
andreafdmsimard14:58
dmsimardandreaf: so it's like one "aggregate logfiles" task which pulls all the logs to a single location and then another task to archive them14:58
*** xarses has joined #openstack-infra14:58
*** yamahata has quit IRC14:58
*** lukebrowning has quit IRC14:58
dmsimardthis aggregate logfiles task would be the one with_items, the archive would be something recursive that doesn't need a iteration14:58
dmsimardthat's how I would see thing working imo14:59
*** yamahata has joined #openstack-infra14:59
andreafdmsimard: I need to compress them individually not all in one archive14:59
*** baoli has joined #openstack-infra14:59
andreafthe sequence is: check that file exists, rename if applicable, compress - and copy15:00
dmsimardandreaf: https://github.com/openstack-infra/zuul-jobs/blob/master/roles/emit-ara-html/tasks/main.yaml#L1815:00
*** lukebrowning has joined #openstack-infra15:00
*** wolverineav has quit IRC15:00
*** wolverineav has joined #openstack-infra15:00
eumel8pabelanger: 508900 seems duplicated. I've already invited you to https://review.openstack.org/#/c/508857/ :)15:00
andreafdmsimard: nice I'll try that15:01
dmsimardandreaf: renaming files seems overkill, what's the purpose ?15:01
dmsimardandreaf: for mimetypes on logs.o.o ?15:02
dmsimardjeblair, mordred: seeing a lot of merger errors on zuulv3.o.o15:02
dmsimard"merger failure"15:02
dmsimardfor example 50866015:02
dansmithyeah tons15:03
pabelangereumel8: +3 Thanks!15:03
eumel8:)15:03
*** lukebrowning has quit IRC15:05
*** wolverineav has quit IRC15:05
pabelangerI15:05
pabelangererr15:05
pabelangerI've noticed glance jobs in gate that are not in the integrated chance queue, is that expected?15:05
pabelanger507957 is the change15:06
*** tesseract has joined #openstack-infra15:07
openstackgerritMonty Taylor proposed openstack-infra/devstack-gate master: Remove new-style devstack job  https://review.openstack.org/50890515:09
mordredandreaf: remote:   https://review.openstack.org/508906 Add devstack base job for zuul v315:09
mordredandreaf: ^^ those two above shift the new-style devstack job to the devstack repo15:10
*** lukebrowning has joined #openstack-infra15:11
pabelangerjeblair: mordred: I believe zm05 and zm06 are currently stuck, did you want to check first before I asked to restart them?15:11
openstackgerritMerged openstack-infra/project-config master: Fix typo in tag-release pre playbook  https://review.openstack.org/50887115:12
pabelangerboth seems to be having issues cloning openstack/glance-specs15:12
jeblairpabelanger: if it's just a stuck git process, go ahead and kill it.  i think we need to add some timeouts.15:13
pabelangerjeblair: ack15:13
mordredpabelanger: yah- you can just kill the git process itself15:13
*** lnxnut has joined #openstack-infra15:13
mordredthis is, btw, the second time that the thing it's been stuck on is cloning glance-specs15:13
mordredso we might want to check to see if glance-specs is broken somehow15:13
mordred(and also add some timeouts)15:13
lucasagomeshi all, I'm getting an error in the networking-ovn gate jobs saying I explicitly need to add openstack/neutron to "required-projects"... Which repository I need to submit a change to add it ? See: http://logs.openstack.org/31/507031/1/gate/openstack-tox-py27/8511859/tox/py27-1.log15:13
mordredlucasagomes: that'sa great question! (we should add that to the error message...)15:14
clarkblucasagomes: there is an example change, let me dig it up15:14
lucasagomesmordred, ++ that would be very useful15:14
lucasagomesclarkb, thanks a lot15:14
pabelangerokay, mergers mering again15:14
mordredlucasagomes: hrm. ...15:14
mordredlucasagomes, clarkb: this is a little weird15:15
clarkblucasagomes: https://review.openstack.org/#/c/508775/15:15
mordredthat's an openstack-tox-py27 job - I'm confused why it's trying to zuul-cloner anything15:15
clarkbmordred: yes nuetron is firmly in the weird location15:15
mnasercan i get a very quick/simple review on https://review.openstack.org/#/c/508742 ? just moving release-note-jobs template to use the new job?15:15
clarkbmordred: because neutron15:15
*** lukebrowning has quit IRC15:15
clarkbmordred: basically everything in neutron land deps on neutron15:15
mordredoh ... there's a zuul-cloner getting run in the tox itself?15:15
clarkbyes15:16
*** isaacb has quit IRC15:16
jeblairyeah, this is something we should rework with v315:16
jeblairprobably have it use tox-siblings or something15:16
mordredyah - agree. for now I think openstack-python-jobs-neutron is good15:16
mordredjeblair: ++15:16
clarkbbut also stop deping on a server in general15:16
*** lukebrowning has joined #openstack-infra15:17
jeblairmordred: won't it use tox-siblings if we add neutron to r-p?15:17
mordredwell - there's a few issues- one is that we don't release servers to pypi15:17
lucasagomesmordred, clarkb thanks a lot, I will submit one for networking-ovn then!15:17
mordredjeblair: it will - but neutron still has to get installed the first time - so I'm guessing the neutron tarball is in the requirements file?15:17
* frickler is getting 502s from gerrit again like yesterday15:18
clarkbjeblair: http://logs.openstack.org/28/501128/6/gate/build-openstack-sphinx-docs/4945316/job-output.txt.gz#_2017-10-02_15_13_41_96561215:18
mordredjeblair, clarkb: k - I've read the script tin there - I think we should put it on the list of things to do something about for sure15:19
clarkbfrickler: java melody says memory use did incrase and just feel significantly and no significant garbage collection overhead. I wonder if there is something else going on15:19
jeblairclarkb: do you need me to look at something there?15:19
clarkbjeblair: looks lke the executor ran out of disk15:19
mordredbut for now the openstack-python-jobs-neutron should keep us fine for now15:19
jeblairclarkb: any other evidence of that?15:20
clarkbjeblair: not that I have, just noticed that restarted the gate15:20
openstackgerritVasyl Saienko proposed openstack-infra/openstack-zuul-jobs master: Set BUILD_TIMEOUT for ironic grenade jobs  https://review.openstack.org/50888215:20
jeblairclarkb: i'd appreciate it if you could dig into that15:20
aspiersjeblair: I'm currently at the Gerrit User Summit in London. If there are any questions etc. from the OpenStack side I'll do my best to represent them15:20
jeblairi've been at work for 1.5 hours and still haven't spent more than 2 minutes looking at the memory leak15:20
jeblairin fact15:20
jeblairinfra-root: can we have a quick conversation in #openstack-infra-incident ?15:21
clarkbaspiers: an updated tuning guide would probably be nice since our memory usage seems significantly higher than it used to be15:21
aspiersclarkb: OK I'll see what I can find out15:21
clarkbaspiers: eg is our gerrit just small and we need to tune for it better or is there something wrong15:21
*** lukebrowning has quit IRC15:21
aspiersgoogle currently demoing 2.15 (rc0 was cut last night)15:21
openstackgerritLucas Alvares Gomes proposed openstack-infra/project-config master: Add neutron to required-projects for networking-ovn  https://review.openstack.org/50891115:22
aspiersin particular the migration to NoteDB15:22
mordredjeblair: yes15:22
pabelanger++15:23
aspiersPolyGerrit looks really nice15:23
mordredjeblair: when we're done with that - if you could give me a quick thumbs-up/thumbs-down on whether https://review.openstack.org/#/c/508767/ will do the right thing (that is - will file matchers work on post jobs with the merge commits involved)15:23
*** lukebrowning has joined #openstack-infra15:23
*** rhallisey has quit IRC15:23
*** rhallisey has joined #openstack-infra15:24
mordredaspiers: yah -there's several nice things in 2.14 and 2.15 that we're looking forward to making use of15:24
jeblairclarkb, Shrews, pabelanger: can you join #openstack-infra-incident please?15:24
Shrewsjeblair: there already15:24
*** dizquierdo has joined #openstack-infra15:24
aspiersclarkb, mordred: I talked to Ericsson about their pretty large internal instance. They have an active/passive HA setup.15:25
mordredaspiers: one of them is zaro added a thing for us so that we can register test results in a structured fashion so we don't need to have hacky javascript scraping the comment we leave to produce that table of test results15:25
*** srobert_ has joined #openstack-infra15:25
pabelangerjeblair: yes15:25
fricklercould the full disk earlier also have triggered this post_failure during log collection? http://logs.openstack.org/28/501128/6/gate/build-openstack-sphinx-docs/4945316/job-output.txt.gz#_2017-10-02_15_13_41_96561215:27
chandankumarAJaeger: regarding this review https://review.openstack.org/#/c/508502/4 in order to import tags also what i need to change ?15:27
*** pcaruana has quit IRC15:27
*** lukebrowning has quit IRC15:28
*** srobert has quit IRC15:28
*** e0ne has quit IRC15:31
*** lukebrowning has joined #openstack-infra15:32
*** dave-mccowan has quit IRC15:32
*** lnxnut has quit IRC15:32
*** dave-mcc_ has joined #openstack-infra15:32
*** lukebrowning has quit IRC15:35
openstackgerritChandan Kumar proposed openstack-infra/project-config master: Add python-tempestconf project  https://review.openstack.org/50850215:35
*** lukebrowning has joined #openstack-infra15:35
*** lennyb_ has joined #openstack-infra15:36
*** lennyb_ has quit IRC15:36
*** kiennt26 has quit IRC15:36
toskycan anyone help explaining what's wrong with this migrated job, which fails with "[WARNING]: No hosts matched, nothing to do"on pip installation? http://logs.openstack.org/47/508847/2/check/legacy-sahara-cli/5de4cbe/job-output.txt.gz15:38
*** martinkopec has quit IRC15:40
*** rcernin has quit IRC15:41
mordredandreaf: you're working on some of the issues - we've spun up https://etherpad.openstack.org/p/zuulv3-issues to track things15:42
*** dhill_ has quit IRC15:43
*** dhill__ has joined #openstack-infra15:43
jlvillalOur Ironic grenade jobs are failing since Zuul v3 changeover. Suspicion is that the BUILD_TIMEOUT variable is not being passed through. Proposed patch for that issue: https://review.openstack.org/#/c/508882/15:46
openstackgerritBernard Cafarelli proposed openstack-infra/project-config master: Add neutron to required-projects for networking-sfc  https://review.openstack.org/50891615:46
*** dbecker has joined #openstack-infra15:47
*** jaypipes_ has joined #openstack-infra15:47
mnaserBUILD_TIMEOUT is passed in the env15:47
AJaegerchandankumar: tags are imported by default - remove what you do *not* want15:47
*** apevec has joined #openstack-infra15:47
chandankumarAJaeger: i have cleaned up the things which is not needed15:47
openstackgerritClark Boylan proposed openstack-infra/zuul-jobs master: Handle z-c shim copies across filesystems  https://review.openstack.org/50877215:47
mriedemfyi, the links in http://status.openstack.org/reviews/ are busted for the new version of gerrit. i can't reproduce getting the list locally, since it looks like reviewday scripts are all written to run server side (they require ssh to the gerrit server i think)15:47
mriedemi can hit the gerrit REST API but that doesn't give you the same info as the CLI being used to query the database15:48
*** jaypipes has quit IRC15:48
*** jistr is now known as jistr|off|mtg15:48
*** jistr|off|mtg is now known as jistr15:49
*** thorst has quit IRC15:49
fungimriedem: if it has ,n,z on the end of the urls and they may need to be stripped15:50
fungis/and //15:50
mriedemno, it's like this: https://review.openstack.org/#change,46398715:50
mriedemi don't know why the reviewday code even formats the url that way, b/c that's not what comes back from the gerrit CLI15:51
mriedemi'm just going to remove the splitting15:51
openstackgerritMatt Riedemann proposed openstack-infra/reviewday master: Fix change URL links with latest review.openstack.org gerrit  https://review.openstack.org/50891915:52
mriedem^ but someone with access would have to test it15:52
*** egonzalez has quit IRC15:54
*** gouthamr has joined #openstack-infra15:54
AJaegerchandankumar: please comment - if you haven't done so - in the review15:55
chandankumarAJaeger: done15:57
*** lucasagomes is now known as lucas-afk15:57
*** jascott1 has quit IRC15:58
*** rpittau_ has joined #openstack-infra15:58
*** ijw has joined #openstack-infra16:00
*** xyang1 has joined #openstack-infra16:01
*** rpittau has quit IRC16:01
openstackgerritJoe D'Andrea proposed openstack-infra/system-config master: Add openstack-valet to statusbot and meetbot  https://review.openstack.org/50892416:02
*** scottda_ has quit IRC16:03
*** trown is now known as trown|lunch16:04
*** caphrim007 has joined #openstack-infra16:06
*** chlong has joined #openstack-infra16:06
*** vhosakot has joined #openstack-infra16:08
openstackgerritSagi Shnaidman proposed openstack-infra/tripleo-ci master: WIP: Fix TripleO CI jobs  https://review.openstack.org/50866016:09
*** thorst has joined #openstack-infra16:10
*** tmorin has quit IRC16:12
*** thorst has quit IRC16:13
*** finucannot is now known as stephenfin16:13
AJaegerregarding release notes: Could somebody +2A https://review.openstack.org/#/c/508763/  , please? We need to merge that stack ...16:15
andreafmordred: the error message in https://review.openstack.org/#/c/508891 is not so verbose, but I'm guessing it doesn't like the empty project dict16:15
*** jaypipes_ is now known as jaypipes16:15
openstackgerritPaul Belanger proposed openstack-infra/system-config master: Bring ze09.o.o and ze10.o.o online  https://review.openstack.org/50892916:16
dansmithdo we know what the cause for the "merger_failure" is? like, is it a thing we should recheck for, or is that another fix that needs to be applied before we should expect it to work?16:16
pabelangerclarkb: fungi: mordred: ^ bring 2 more zuul-executors online to help with load issues. Booting servers now16:17
clarkbdansmith: I think that is probably a new one. Will add to the list at https://etherpad.openstack.org/p/zuulv3-issues16:17
fungithanks pabelanger16:17
dansmithclarkb: okay seems to be affecting a large number of jobs to me16:17
dansmithclarkb: also seeing retry_limit, dunno what that means16:18
fungidansmith: means that zuul tried to run the job several times but it aborted unnaturally every time so it stopped retrying16:18
pabelangerdansmith: clarkb: fungi: this is likely a result of the large load times on executors16:19
pabelangerI've seen zuul-executor killing ansible-playbook tasks because of 10min timeout for pre / post runs16:19
AJaegerFor releasenotes, https://review.openstack.org/#/c/508742 and  https://review.openstack.org/#/c/508763/ look ready..16:19
clarkbpabelanger: ok, I guess step 1 is rollout load fixes and add executors then reassesss16:19
fungiagreed, if we get load down on the executors this instability may subside16:19
pabelangerclarkb: yah16:20
pabelangercurrent load on ze01 for example: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63999&rra_id=all16:20
dansmithso don't kill me, genuinely just asking: has rolling back been reconsidered? it seems like we're in a worse spot now than we were on saturday...16:21
openstackgerritJavier Peña proposed openstack-infra/project-config master: Remove legacy Packstack integration jobs  https://review.openstack.org/50885116:21
clarkbfungi: any idea why cacti wouldn't be picking up the /var/lib/zuul fs on ze0X nodes?16:21
fungidansmith: worse spot than saturday (because saturday was low load) but not worse than friday when we contemplated rolling back16:21
dansmithfungi: okay16:21
openstackgerritJavier Peña proposed openstack-infra/project-config master: Remove legacy Packstack integration jobs  https://review.openstack.org/50885116:22
fungiclarkb: were those filesystems added since the last time snmpd restarted?16:22
clarkbfungi: I've restarted snmpd about 15 minutes ago at this point.16:22
fungik16:22
clarkbfungi: and poll interval is 5 minutes iirc16:22
clarkbits possible that systemctl restart snmpd is't doing what I expect. I should double chekc the process timestamps16:22
fungisnmp poll interval is, but i'm not sure how often cacti rechecks to determine what new counters to poll16:22
pabelangerYah, I think once we get load under control on zuul-executors, we'll start seeing more consistant job runs16:22
mordredandreaf: oh - ha - I really suck there16:23
andreafmordred: also https://review.openstack.org/#/c/508906/1 has an error coming back from zuul while paring .zuul.yaml16:23
fungidansmith: right now i think we have a handle on what problems are platform-related (which we're prioritizing) and which ones are job migration related (which we're also considering important but can more easily source fixes/reviews from a broader segment of the community for)16:24
mordredandreaf: ah- yes on the other one - one sec16:24
openstackgerritLogan V proposed openstack-infra/project-config master: Remove duplicate definition of OSA integrated AIO job  https://review.openstack.org/50893116:24
dansmithfungi: okay, it seemed like on friday we'd have one or two jobs fail per patch and now it's a lot, which is why it feels like a bit of a backslide. But, I know you have a better view than me, so I was just wondering16:25
clarkbfungi: ya snmpd processes did actually restart, I guess next step is checking how often cacti refreshes its oid listings16:25
mordredandreaf: thanks16:25
clarkbdansmith: fungi I think on friday things were so completely broken that it helped self manage load16:26
clarkbdansmith: fungi then over the weekend we fixed a bunch of things which is allowing the system to run itself over right now16:26
dansmithclarkb: oh that's... good? :)16:27
fungi"things seem worse because they're getting better" ;)16:27
pabelangerindeed16:27
pabelangerwe are running a lot of ansible-playbook tasks now16:27
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Limit concurrency in zuul-executor under load  https://review.openstack.org/50864916:28
mordredpabelanger: awesome ^^ we should roll that out now I think16:29
SpamapSwhat's load like now?16:29
toskyehm, sorry for asking again in a short time: I see a failure in a converted job which I can't decode ("[WARNING]: No hosts matched, nothing to do" when installing with pip)16:29
toskywhat could it be? http://logs.openstack.org/47/508847/2/check/legacy-sahara-cli/5de4cbe/job-output.txt.gz16:29
inc0dmsimard: man Ara is awesome.16:29
SpamapSalso did we merge the thing that makes ansible less-UI-responsive-but-also-less-CPU-guzzling ?16:29
pabelangermordred: ah, good idea. Reading up on it now16:29
*** thorst has joined #openstack-infra16:29
*** lnxnut has joined #openstack-infra16:30
*** thorst has quit IRC16:30
pabelangerSpamapS: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63999&rra_id=all is ze01.o.o atm16:30
pabelangerze03.o.o is currently running the fixes from tobiash16:31
mordredpabelanger: awesome16:31
pabelangerwe can start to rotate other executors after puppet runs on the servers16:31
SpamapSpabelanger: is ze03 more healthy?16:32
fungitosky: http://logs.openstack.org/47/508847/2/check/legacy-sahara-cli/5de4cbe/logs/devstack-gate-setup-workspace-new.txt16:32
SpamapSthat was like, an epic find16:32
SpamapS"hey Ansible, take a chill pill, k?"16:32
*** panda is now known as panda|bbl16:32
*** jtomasek has quit IRC16:32
electrofelixunusual error appearing for a pep8 job for jjb - POST_FAILURE https://review.openstack.org/#/c/134307/ -> http://logs.openstack.org/07/134307/13/check/openstack-tox-pep8/708607e/job-output.txt.gz16:32
fungitosky: maybe openstack-dev/devstack isn't in the required-projects list for the legacy-sahara-cli job?16:32
inc0hey guys, qq, why is zuul -1 when only non-voting jobi is red? https://review.openstack.org/#/c/508661/3916:33
pabelangerSpamapS: hard to say right now16:33
*** thorst has joined #openstack-infra16:33
toskyfungi: oh, let me check16:33
*** thorst has quit IRC16:33
pabelangerSpamapS: currently 166 ansible-playbook processes, with 36 load16:33
electrofelixseeing a number of UNREACHABLE errors being reported from the ansible jobs as part of those POST_FAILURES16:34
pabelangeroh16:34
clarkbinc0: look in the review comments16:34
fungiinc0: kolla-build-centos-source kolla-build-centos-source : NODE_FAILURE16:34
pabelangerand we are swapping on ze0316:34
*** yamamoto_ has quit IRC16:34
mnaserhttps://review.openstack.org/#/c/508742/ could use a very quick +2 to fix puppet's release note jobs :>16:34
SpamapSyeah I think the concurrency limiter is going to be needed even with a bit of CPU help from the ansible internal polling factor16:34
fungiinc0: toggle ci and then look at zuul's comment. i guess we don't pipe NODE_FAILURE results up into the ci results table we generate in gerrit16:34
SpamapS155 ansible processes is going to chew up a ton of RAM16:34
inc0ah ok, thanks16:34
inc0it's also voting, so probably that's why16:35
pabelangerSpamapS: mordred: Ya, we are swapping on all zuul-executors currently. So more servers and patch hopefully helps16:35
*** hashar has quit IRC16:35
*** efried is now known as efried_afk16:36
fungielectrofelix: looks like fetch-tox-output is where that first started going wrong: "ssh: connect to host 15.184.67.23 port 22: Connection timed out"16:39
fungii wouldn't be surprised if this is also fallout from executor load starving rsync/ssh of cycles16:39
*** yamamoto has joined #openstack-infra16:39
toskyfungi: in the requirements there is openstack-infra/devstack-gate, I guess it's not enough16:39
toskyrequired-jobs16:39
pabelangerokay, ze09.o.o launched, /etc/fstab cleaned up, rebooting to confirm16:40
fungitosky: openstack-dev/devstack is a different project from openstack-infra/devstack-gate, so probably not enough no16:40
*** thorst has joined #openstack-infra16:40
toskyfungi: I was expecting a transitive dependency (openstack-infra/devstack-gate bringing in openstack-dev/devstack), but no problem, going to add it16:41
*** jpena is now known as jpena|away16:41
pabelangermoving to launch ze10.o.o now16:41
openstackgerritAnastasia Kravets proposed openstack-infra/project-config master: add ec2-api to unified doc build jobs  https://review.openstack.org/50888016:42
clarkbpabelanger: maybe before adding them to zuul we make sure the new code is already running on them too?16:42
clarkbthat means fewer restarts of services16:42
openstackgerritJoe D'Andrea proposed openstack-infra/irc-meetings master: Propose OpenStack Valet meeting date/time  https://review.openstack.org/50893316:42
pabelangerclarkb: good idea, checking16:42
clarkbok snmpwalk seems to show that snmp sees the filesystems. So guessing it must be on the cacti side as far as grabbing that data16:42
*** lnxnut has quit IRC16:43
openstackgerritLuigi Toscano proposed openstack-infra/openstack-zuul-jobs master: Add openstack-dev/devstack to all dsvm legacy sahara jobs  https://review.openstack.org/50893416:44
*** thorst has quit IRC16:44
inc0can I turn off legacy jobs with local .zuul.yaml?16:45
fungitosky: there are plenty of jobs which use devstack-gate but don't use devstack, so hard to make assumptions there. if this had been a traditional "dsvm" job we likely would have guessed the required-projects list for it automatically, but it seems the sahara-cli job was a little bit special16:45
fungiinc0: no, we'll need a patch removing them from the global configuration where they're defined instead16:46
mordredinc0: you cannot - BUT - you can totally remove thelegacy jobs from your project by submitting a patch to project-config and removing them16:46
*** dave-mcc_ has quit IRC16:46
toskyfungi: it should have had dsvm in the name probably; but also other jobs with dsvm in the name did not get openstack-dev/devstack as dependency (see my patch above)16:46
inc0well what I wanted to do is to turn them off in gate patch locally and remove from project-config when it merges16:46
mordredinc0: I made a doc section: https://docs.openstack.org/infra/manual/zuulv3.html#howto-update-legacy-jobs16:47
openstackgerritSagi Shnaidman proposed openstack-infra/project-config master: Fix OVB jobs config for TripleO  https://review.openstack.org/50893616:47
toskyfungi: out of curiosity, what is the use case of using devstack-gate without devstack? I thought that devstack-gate extends devstack with some default settings & co16:47
electrofelixfungi: so does that mean, just wait a while?16:47
mordredinc0: you'll need to either remove from project-config first, leaving you with a moment where you're not running either legacy or new verisons - or land new jobs to zuul.yaml first (double-jobs) then remove from project-config16:47
pabelangerfungi: clarkb: mordred: any objections is we make legacy-logstash-filters-ubuntu-trusty non-voting for the moment on system-config? Otherwise, we'll need somebody to fix the job. http://logs.openstack.org/29/508929/1/check/legacy-logstash-filters-ubuntu-trusty/efec67f/job-output.txt.gz#_2017-10-02_16_38_36_696318 Currently blocking patches to system-config16:47
mordredpabelanger: fine by me16:48
mordredpabelanger: let's put it on the list of jobs to fix though16:48
fungielectrofelix: yes, we have a several-pronged effort underway to address load on the executors (a patch from tobiash to make ansible run slightly less frequently, one from SpamapS implementing a load-average-based governor, and also adding more executor servers to spread the current load out better)16:48
*** slaweq_ has joined #openstack-infra16:48
mordredpabelanger: I have added it to the list16:48
inc0ehh, I don't like this gate-less limbo, but double jobs will make it nightmare to run16:48
pabelangermordred: thanks16:49
sshnaidmdoes anybody know where are logs of legacy periodic jobs now?16:49
inc0what if I redeclare legacy jobs locally and run them with exit 0?16:49
electrofelixfungi: thanks, will sit tight, it's not a critical fix16:49
fungitosky: devstack-gate is a poorly-named bit of software which provides a general job environment framework for setting up trees of repositories and doing log collection... you can completely remove any dependency on devstack by defining your own gate_hook function for it16:49
toskyfungi: I see, thanks16:50
SpamapSmordred: couldn't inc0 Depends-On: the project-config patch?16:50
mordredinc0: you can turn them non-voting in your .zuul.yaml16:50
mordredSpamapS: no. project-config patches are non speculative16:50
SpamapSmordred: I mean, yeah he'd be double-gated for _one single patch_ but that would be the end of it.16:50
inc0they're largely non-voting, it's just time it takes to run is quite a problem16:50
tobiashSpamapS: maybe a ram based limiter also makes sense16:50
SpamapStobiash: already written.. testing locally :)16:50
tobiash:)16:50
fungipabelanger: i'd be okay with disabling that job temporarily if we're not merging the sorts of changes that's supposed to be testing (they seem likely to be a lower priority right now anyway)16:51
mordredinc0: yah - it's a scenaio that is not optimized at the moment- I'd recommend makign the project-config patch and following up with a .zuul.yaml patch with a depends-on as SpamapS suggests - the .zuul.yaml patch won't actually not have the jobs until we land the project-config patch - but it should lower the amount of time you're exposed as much as possible16:51
openstackgerritPaul Belanger proposed openstack-infra/openstack-zuul-jobs master: Set legacy-logstash-filters-ubuntu-trusty non-voting  https://review.openstack.org/50894016:52
pabelangerfungi: mordred: thanks! ^ if you don't mind reviewing16:52
*** jascott1 has joined #openstack-infra16:52
inc0ok and I'll probably freeze non-gate, non-critical changes in meantime in kolla16:52
SpamapSmordred: I kind of like it actually. You get one patch where the old and new both pass.16:52
tobiashIf load is still a concern with the limiters also a poll interval of 0.1 could be considered, which could reduce the load a bit further at the expense of about 10s longer jobs per 100 tasks16:53
SpamapSI guess there might be a scenario where other things in the gate for your project behind the project-config patch could land w/o tests16:53
fungiSpamapS: what it's especially non-optimal for is teams who just want to abandon the legacy jobs in favor of native jobs in their repos, rather than fixing the legacy jobs first16:53
fungisince they end up needing to do it the other way around16:54
SpamapSfungi: agreed!16:54
SpamapSso yeah, for that you're exposed while the legacy jobs are gone16:54
*** slaweq_ has quit IRC16:55
SpamapSsince you can't Depends-On -> the .zuul.yaml patch since it won't pass16:55
sshnaidmpabelanger, can you please look in your time? thanks https://bugs.launchpad.net/tripleo/+bug/172072116:56
openstackLaunchpad bug 1720721 in tripleo "CI: OVB jobs fail because can't install XStatic from PyPI mirror on rh1 cloud" [Critical,Triaged] - Assigned to Paul Belanger (pabelanger)16:56
pabelangersshnaidm: won't be right now, working on another issue. However, any infra-root will be able to see what is going on16:56
*** thorst has joined #openstack-infra16:57
sshnaidmany infra root, please look at https://bugs.launchpad.net/tripleo/+bug/172072116:57
openstackLaunchpad bug 1720721 in tripleo "CI: OVB jobs fail because can't install XStatic from PyPI mirror on rh1 cloud" [Critical,Triaged] - Assigned to Paul Belanger (pabelanger)16:57
SpamapSThough I guess you could do a three-step ... #1- Submit patch to project-config disabling legacy jobs, #2- Submit .zuul.yaml patch that Depends-On #1, #3- Approve #2 and #1 in sequence.16:57
fungigranted, most (if not all) awake infra-root sysadmins are also busy trying to stabilize zuul16:57
*** baoli has quit IRC16:57
SpamapSThen scream at anyone who +A's before #2 lands.16:58
fungiSpamapS: yeah, i think that's what SamYaple ended up doing for the loci jobs16:58
*** baoli has joined #openstack-infra16:58
*** chlong has quit IRC16:58
SpamapSfungi: we should write that down as a Zuul v3.1 feature request. :)16:58
SpamapShm, memory governor..16:59
SamYaplei didn't do a depends on, but about the same thing yea16:59
SamYaplepurge project-config jobs then drop in a quick noop, then i built real gates in-repo16:59
*** derekh has quit IRC16:59
SpamapSwould people rather specify the minimum available memory as a percentage of total system memory, or something like minium available MiB?16:59
*** jpich has quit IRC16:59
SpamapSlike "stop accepting jobs when there is less than 10% of available memory" or "stop accepting jobs when there is less than 2500 MB of available system memory" ?17:00
pabelangerSpamapS: does bubblewrap have any ability to do that?17:00
*** dklyle has joined #openstack-infra17:00
SpamapSpabelanger: that would fail the job too late17:00
SpamapSpabelanger: we need to stop accepting jobs entirely17:00
pabelangerah17:00
pabelangerright17:01
fungipabelanger: the idea is to implement this around the same code paths as the load average governor (could probably go in the same thread and just become an additional check and tuning config parameter)17:01
SpamapSjust let them sit in gearman17:01
*** thorst has quit IRC17:01
SpamapSIt does go in the same thread17:01
SpamapSI have it working17:01
SpamapSbut I did it as psutil.virtual_memory().percent17:01
SpamapSand I'm jsut wondering if that's too abstract17:01
*** david-lyle has quit IRC17:02
fungias for how to configure it, i like the percent implementation. probably not worth bikeshedding over for now and we can find out if it needs tweaking later17:02
SpamapSthat said, it's a nice default behavior, as it never needs to be tuned... just don't do concurrent jobs if you have less than 10% of whatever system RAM available.17:02
SpamapSjeblair: ^ ?17:03
SpamapS10% could be a _lot_ of RAM tho ;)17:03
SpamapSmaybe 5% is better.17:03
SpamapSalso haven't looked at how psutil accounts for swap in that17:03
fungiSpamapS: we're trying to let him focus on tracking down the memory leak/performance issues for now, so can probably just push an initial version into review17:03
SpamapSfungi: (He had some strong thoughts on the load average implementation so I figured I'd ask ;)17:04
SpamapSI'll let review do the talking17:04
fungifair17:04
*** bhavik1 has joined #openstack-infra17:04
fungiSpamapS: i have a feeling we can just get working implementations of these governors and then work out whether we need to change the defaults/scaling factors/config options before 3.0.0 gets tagged17:05
*** yamahata has quit IRC17:05
fungithis is not a binding api until we tag17:05
pabelangerSpamapS: mordred: fungi: clarkb: i propose we restart ze01.o.o, which has been updated with both patches to zuul from tobiash and SpamapS17:05
openstackgerritMichal Jastrzebski (inc0) proposed openstack-infra/project-config master: Remove Kolla and Kolla-Ansible jobs  https://review.openstack.org/50894417:07
pabelangermordred: fungi: clarkb: I also propose we consider a force merge of https://review.openstack.org/508929/ to by pass the load issues in the gate. That will bring online 2 new zuul-executors17:08
fungipabelanger: i concur, sounds good17:08
fungion both fronts17:08
fungii can merge 50892917:08
pabelangergreat, ty17:08
inc0I'll remove legacy job definition when we're done with migrations17:08
openstackgerritboden proposed openstack-infra/openstack-zuul-jobs master: Add neutron-lib as required to legacy-tempest-dsvm-neutron-src  https://review.openstack.org/50894517:08
clarkbpabelanger: sounds like a plan17:09
*** yamahata has joined #openstack-infra17:09
*** harlowja has joined #openstack-infra17:10
openstackgerritMerged openstack-infra/system-config master: Bring ze09.o.o and ze10.o.o online  https://review.openstack.org/50892917:11
fungipabelanger: ^ merged17:11
pabelangerfungi: thanks!17:11
pabelangerI'm preparing to stop ze01.o.o now17:11
*** claudiub|2 has joined #openstack-infra17:11
*** trown|lunch is now known as trown17:13
*** ykarel has joined #openstack-infra17:13
*** claudiub has quit IRC17:14
pabelangerokay, should be stopping now17:15
*** chlong has joined #openstack-infra17:15
SpamapSpabelanger: how long does that take? ;)17:17
pabelangerSpamapS: still running17:17
pabelangerunder zuulv2.5 is was much faster when we stopped17:17
pabelangeralmost instant17:18
*** bhavik1 has quit IRC17:18
*** yamahata has quit IRC17:18
SamYapleyea lets just rollback17:18
pabelangerpossible that post-runs still need to complete on an abort. Will look into logs in a moment17:19
*** efried_afk is now known as efried17:19
*** gouthamr has quit IRC17:21
SpamapSdurn.. psutil.virtual_memory().percent is kinda lame17:21
SpamapSincludes buffers/cache17:21
pabelangeralmost done...17:22
*** slaweq_ has joined #openstack-infra17:22
SpamapSpabelanger: it's also possible that killing 155*3 processes takes a while17:22
pabelangerand stopped :D17:22
pabelangerSpamapS: ya, post-run was the reason17:22
SpamapSactually 155*5 .. ssh agents.. ssh bin17:22
pabelangerit was uploading logs and stuff17:22
pabelangerwhich is a little different then v2.517:23
SpamapSah rsync didn't obey?17:23
pabelangereither way, making note and going to start ze0117:23
*** thorst has joined #openstack-infra17:23
pabelangerand ze01.o.o started17:24
* SpamapS hoping load governor 'splodes less than it helps ;)17:24
*** Guest10594 has quit IRC17:27
openstackgerritMichal Jastrzebski (inc0) proposed openstack-infra/openstack-zuul-jobs master: Removal of kolla legacy jobs  https://review.openstack.org/50895017:27
openstackgerritMichal Jastrzebski (inc0) proposed openstack-infra/project-config master: Remove Kolla and Kolla-Ansible jobs  https://review.openstack.org/50894417:27
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Use publish-to-pypi and friends for python releasing  https://review.openstack.org/50895117:27
mordredinfra-root: ^^ that patch should fix all of the issues with python tarball jobs and pypi release jobs17:28
pabelangerSpamapS: mordred: we only have 13 ansible-playbook processes currently. Looking to see why now on ze0117:29
SpamapSpabelanger: load?17:29
SpamapSalso you do have to wait for stuff to be triggered/retried ;)17:29
pabelanger38 currently17:29
AJaegermordred: +471, -13006 lines? WOW!17:29
*** chlong has quit IRC17:30
*** baoli_ has joined #openstack-infra17:30
SpamapSpabelanger: are these 8cpu flavors?17:30
pabelangerSpamapS: climbing now, 5017:30
pabelangeri don't think I gave it enough time17:30
SpamapSpabelanger: also it takes 30s to notice high load :-P17:30
mordredAJaeger: right? nice patch right?17:31
*** chlong has joined #openstack-infra17:31
*** tosky has quit IRC17:31
jdandreaReceived a -1 from Zuul in reviewing Change 508933. "legacy-irc-meetings-tox-ical TIMED_OUT in 35m 37s" - should I wait or is there something I need to do/change?17:31
SpamapSpabelanger: there's an INFO log line that it will print when it unregisters17:31
AJaegermordred: do a few more of these and maybe our memory problems disappear ;)17:31
*** baoli has quit IRC17:31
pabelangerSpamapS: ya, looking for that now17:32
mordredAJaeger: :)17:32
pabelanger2017-10-02 17:28:56,232 INFO zuul.ExecutorServer: Unregistering due to high system load 29.44 > 20.017:32
pabelangerpossible we want to up that from 2017:32
SpamapSpabelanger: \o/17:33
*** e0ne has joined #openstack-infra17:33
pabelangerlet me check what we have on zuul-launchers17:33
*** slaweq_ has quit IRC17:33
SpamapSpabelanger: it's ncpu*2.5 ;)17:33
*** sshnaidm is now known as sshnaidm|afk17:33
pabelangerwith a limit of 53 ansible-playbooks, that is down from 150 concurrent ansible-playbook processes on zuulv2.517:33
pabelangerbut, let me first roll out the change to all zuul-executors, and bring online ze09.o.o and ze10.o.o17:34
AJaegermordred: you add an empty initial line ;(17:34
pabelangerthen evaluate17:34
mordredAJaeger: booo17:34
mordredAJaeger: want me to fix? or just send in a followup?17:35
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: DNM: Fix branch matching logic  https://review.openstack.org/50895517:35
*** dizquierdo has quit IRC17:35
AJaegermordred: the job is still in the queue, fix might be nice. But followup works as well...17:36
pabelangerze02.o.o stopping17:36
dansmithclarkb: I put something on that etherpad you linked me to a big ago. Should I put my name next to it if I reported it? Can't tell if those are locks or authors on the other ones17:36
dhill__ hi guys17:37
dhill__what can makes addresses to be {} when we do a "nova list --debug" but we clearly see the neutron ports are still there?17:37
dhill__if we try do dettach the port, it says it's being used17:37
dhill__this happened after last newton update17:37
dhill__older VMs almost all have addresses {} if they have a given net/subnet with ipv4 and ipv617:37
dhill__but we can create new vms in that same network17:37
AJaegerdhill__: #openstack is the general channel for OpenStack related questions.17:37
SpamapSpabelanger: if jobs queue, jobs queue. That's life.17:37
AJaegerdhill__: This channel is about the infrastructure the project runs on17:37
AJaegerdhill__: please ask again on #openstack17:38
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Use publish-to-pypi and friends for python releasing  https://review.openstack.org/50895117:38
SpamapSdo we graph gear status somewhere?17:38
chandankumarAJaeger: regarding this review https://review.openstack.org/#/c/506669/ will i add these jobs to openstack-zuul-jobs till neutron moves their legacy zuul v3 jobs under neutron repo?17:38
dhill__AJaeger, oh sorry17:38
pabelangerSpamapS: yup, i figure it will take a few days to even everything out17:38
mordredAJaeger: I went ahead and updated the patch17:39
AJaegerchandankumar: this is all new to us ;) Why not start adding those directly to neutron repo - and then neutron can move the rest later...17:39
chandankumarAJaeger: ok17:39
pabelangerze02.o.o started; ze03.o.o stopping17:39
mordredfrickler: I think we should roll forward with 508822 as the approach for neutron jobs - but I think we can learn from 508785 and just make variants in the project-template rather than full new jobs17:40
AJaegerfrickler: are you still around and want to give it a try?17:40
*** lnxnut has joined #openstack-infra17:40
SpamapSso the next thing to watch is this http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64005&rra_id=all <-- swap on ze0117:41
SpamapSlooks like it's staying low so that's good17:41
clarkbdansmith: names there are people will to work to fix tge problem. Feel free to volunteer or just put your name down as pote tially having more info17:41
AJaegermordred: +2A on your change17:42
dansmithclarkb: ack17:42
pabelangerze03.o.o started; ze04.o.o stopping17:43
*** andreww has joined #openstack-infra17:44
*** andreww has quit IRC17:44
AJaegerfrickler: I'll update your change now following mordred's suggestion17:44
*** dtantsur is now known as dtantsur|afk17:44
*** ekcs has joined #openstack-infra17:44
*** xarses has quit IRC17:45
mordredfrickler, AJaeger: updating frickler's change real quick17:45
mordredAJaeger: oh - I'm alreayd on it ...17:45
openstackgerritMonty Taylor proposed openstack-infra/openstack-zuul-jobs master: Add tox jobs including neutron repo  https://review.openstack.org/50882217:45
AJaegermordred: you're too quick ;)17:46
*** andreww has joined #openstack-infra17:46
mordredAJaeger: I'm going to make a project-config patch to make that change to all opentack/networking-* repos17:46
AJaegermordred: so, one change instead of many? OK!17:48
AJaegermordred: once you're done, I'll abandon the others and add links17:48
*** ralonsoh has quit IRC17:50
*** camunoz has joined #openstack-infra17:50
mordredAJaeger: yah - I've got a script locally I can edit for doing some of these big edits - so when there'sa decent pattern it's fairly easy17:51
*** shardy has quit IRC17:52
AJaegerconvenient ;)17:52
*** yamahata has joined #openstack-infra17:53
AJaegermordred: fungi likes the plan as well - he just +2A 508822. Now we have to wait for merging...17:53
*** sambetts is now known as sambetts|afk17:54
*** lnxnut has quit IRC17:55
*** tesseract has quit IRC17:55
*** electrofelix has quit IRC17:56
mordredpabelanger: we should make ourselves a really good playbook for doing rolling restarts of ze0*17:56
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add memory awareness to system load governor  https://review.openstack.org/50896017:57
pabelangermordred: ya, we have something for zuul-launchers, shouldn't take much to update for zuul-executors17:57
*** gouthamr has joined #openstack-infra17:57
mordredpabelanger: maybe one that does a one-at-a-time version of 'stop ; no-really-stop ; run-puppet ; start'17:57
mordredpabelanger: also - there are times when service zuul-executor stop doesn't, you know, stop things - when things are calmed down - we should really sort that out too :)17:58
SpamapSmordred: and maybe a wait_for after start ... and serial: 117:58
mordredSpamapS: yes!17:58
pabelangermordred: yah17:58
mordredSpamapS: turns out we can wait_for the finger port to be open, since that's a thing the executors do when they start17:58
SpamapSso 508960 above is the memory governor. I'm throwing it up to have it run through tests. I'll also test it on my local zuul box.17:58
SpamapSmordred: nice17:58
*** tosky has joined #openstack-infra17:59
mnaserhas there been any issues @ 137.74.26.164 ?18:00
mnaseri dont see anything in buffer but i see this sort of failure - fatal: [centos-7]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \"137.74.26.164\". Make sure this host can be reached over ssh", "unreachable": true}18:00
*** thorst_ has joined #openstack-infra18:01
pabelangerHmm, ze07 has 3 zuul-executor processes again18:02
mnaser(another similar case if this helps)18:02
mnaserUNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \"15.184.66.218\". Make sure this host can be reached over ssh", "unreachable": true}18:02
pabelangermnaser: we are just stopping / starting executors, so possible it is related18:02
mnaserok cool18:02
mnaserfigured as much18:02
SpamapSso, ze01 I see has dropped way below 20.018:03
pabelangermordred: ze07.o.o might be in a weird state. Did you want to look at it before I consider killing processes?18:03
AJaegerargh, https://review.openstack.org/#/c/508822/2 got "Unknown configuration error" ;(18:03
*** Swami has joined #openstack-infra18:03
*** thorst has quit IRC18:03
pabelangerSpamapS: ya, down to 36 ansible-playbook processes18:03
*** jpena|away is now known as jpena|off18:04
*** bnemec has quit IRC18:04
SpamapSmemory usage is still pretty high, but without the swapping18:04
*** nikhil has joined #openstack-infra18:04
sc`frickler kicked off a job for chef, and the integration gate said 'aborted' from ze06. is this relevant or meaningful?18:06
jlko/18:06
mordredinfra-root, AJaeger, SpamapS, dmsimard: we have issues with tox flake8 envs in a bunch of infra repos - project-config is a good example ...18:06
jlkany fires I can help with?18:06
*** dklyle is now known as david-lyle18:06
mordredjlk: we've made an etherpad ...18:06
*** kzaitsev_pi has quit IRC18:07
mordredjlk: https://etherpad.openstack.org/p/zuulv3-issues18:07
jlkword.18:07
*** kzaitsev_pi has joined #openstack-infra18:07
AJaegerjlk, all translation jobs are broken - and I saw that you started earlier working on them. Those are more camp fires ;) I'd like to discuss whether t ocontinue your changes instead of fixing the existing ones. But have a look first whether there are forest fires first ;)18:07
jlkah yeah, it was... difficult if not impossible to test those translation jobs before merging at the time.18:08
mordredinfra-root, AJaeger, SpamapS, dmsimard, jlk: they are failing open - one of the issues is that we have a 'select = H231' which means ONLY run that test - but also when I try to run locally flake8 seems to actually run on nothing, while if I run pyflakes directly it works as expected18:08
jlkmight want to continue that work18:08
AJaegerjlk: I added this to etherpad18:08
* mordred is bringing this up in channel because it's possible there is a deeper systemic issue with flake8 right now and our general usage in tox in opentack that might be worth figuring out18:09
clarkbSpamapS: pabelanger so initial indications are that it is working?18:09
mordredAJaeger, jlk: I think switching translations to the new jobs is the route to go - the old auto-converted jobs aren't going to be able to run properly no matter what we do18:09
SpamapSclarkb: it certainly stopped accepting jobs. I don't know that it has re-started as they completed.18:09
mordredSpamapS: \o/18:10
dmsimardmordred: I guess I'm missing context, the linters job for project-config is passing right now18:10
pabelangerclarkb: SpamapS: Load has been limited, seems our cap is 20.0 right now18:10
mordreddmsimard: right - it's not actually doing anything18:10
mordreddmsimard: for reasons I do not understand18:10
mordreddmsimard: so it's not ACTUALLY running flake8 on the files we think it is, and it's not showing errors when they exist18:10
pabelangerjust finishing up with ze07.o.o, need to kill processes then all zuul-executors running the code18:10
pabelangerand ze09.o.o and ze10.o.o are also online18:10
mordreddmsimard: I can verify that just running flake8 in the project-config repo when there are files that very much should be failing - it just exits 018:11
*** slaweq_ has joined #openstack-infra18:12
clarkbmordred: SpamapS before I end up in completely the wrong direction debugging this disk issue, are the executors enforcing job disk limits? and if they are would those limits be enforced as disk out of space errors?18:12
dmsimardmordred: but that's not something that was changed due to v3, right ? I mean, it's just tox18:12
SpamapSclarkb: there's a thread that runs du and will stop jobs that use too much disk18:14
*** slaweq_ has quit IRC18:14
clarkbSpamapS: do you know off the top of your head if that du will manifest errors in this way http://logs.openstack.org/28/501128/6/gate/build-openstack-sphinx-docs/4945316/job-output.txt.gz#_2017-10-02_15_13_41_965612 I am thinking not as that appears to be the system call writing exploding18:15
*** hemna__ has joined #openstack-infra18:15
SpamapSclarkb: no18:15
*** thorst_ has quit IRC18:15
SpamapSthat probably wrote faster than the disk accountant could kill it18:15
SpamapSwould expect that to have caused widespread explosion on full disk18:16
SpamapSthough maybe it immediately unlinked and released the space18:16
mnaserSpamapS clarkb fwiw i saw this happen once on a debian image18:16
clarkbya I think bwrap likely cleaned it up quickly if it actually did fill the disk18:16
mnaserfurther investigation showed that the growroot stuff didnt actually run18:16
openstackgerritMonty Taylor proposed openstack-infra/openstack-zuul-jobs master: Add tox jobs including neutron repo  https://review.openstack.org/50882218:16
SpamapSit COULD be in the tmpfs18:17
mnaserso the partition was super tiny18:17
clarkbmnaser: I think this was on our zuul executors, but good to know we might have that problem on the test nodes18:17
SpamapSI forget where tmpfs's get mounted18:17
SpamapSmnaser: this was on localhost18:17
mnaserah okay my bad18:17
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Use the openstack-python-jobs-neutron templates  https://review.openstack.org/50896118:17
mordredAJaeger: ^^ there's the other side of that ozj patch18:17
*** hemna_ has quit IRC18:18
AJaegermordred: great, thanks!18:19
openstackgerritJeremy Stanley proposed openstack-infra/system-config master: Set ZooKeeper purge_interval for on nodepool.o.o  https://review.openstack.org/50896218:20
jeblairinfra-root: i've got another debug routine going that may impact performance; please don't restart zuul until i give the all-clear, even if it's seemingly dead (the info i get may be valuable)18:20
clarkbat least on ze04 I am seeing lower disk consumption since the restart, which makes sense because fewer jobs running18:20
pabelangerack18:20
fungiroger jeblair!18:20
pabelangerclarkb: yah18:20
clarkbjeblair: noted18:20
fungiShrews: https://review.openstack.org/508962 should hopefully address the nodepool disk filling up18:20
mordredjeblair: roger18:21
Shrewsfungi: spectacular18:22
clarkbfungi: Shrews I've confirmed that the version of zk on nodepool.o.o is new enough for that feature18:22
fungiclarkb: thanks for double-checking (dpkg told me 3.4.518:23
fungi)18:23
*** jbadiapa_ has joined #openstack-infra18:23
AJaegermordred: gave -1 on all related changes and linked to the new one.18:23
clarkbyup and 3.4 added it. I would approve but someone already beat me to it :)18:23
SpamapSare the logs from executors somewhere searchable? Kibana?18:23
*** rhallisey has quit IRC18:23
clarkbSpamapS: they are not18:24
mordreddhellmann, fungi, smcginnis: ok - I thought putting the new requirements job in the requirements repo straightaway was a good idea, but we have other jobs we need to fix there before we can land patches - so I'm going to add it to openstack-zuul-jobs and we can circle back and do a rename/move dance later18:24
dhellmannrequirements or release?18:24
mordreddhellmann: requirements - sorry, I'm pinging the wrong people :)18:24
fungiprometheanfire: ^18:25
clarkbSpamapS: is there something specific you'd like to see? I can work to collect data18:25
*** ykarel has quit IRC18:25
*** jbadiapa has quit IRC18:25
AJaegerfungi, do you want to +2A https://review.openstack.org/#/c/508822/ again, please?18:25
fungiwow, what happened to patchset 3 there?18:26
fungidid gerrit lose it?18:26
clarkbpossible db incrementation issue?18:27
AJaegerfungi, mordred : https://review.openstack.org/508822 still errors - no need to +2A18:27
prometheanfirefungi: ?18:27
fungiprometheanfire: mordred's comments about requirements jobs18:27
prometheanfirek18:27
*** claudiub|2 has quit IRC18:27
prometheanfireya, we've been broken since the switch I think :|18:27
openstackgerritMonty Taylor proposed openstack/os-client-config master: DNM Testing that new releasenotes job works  https://review.openstack.org/50896518:29
mordredprometheanfire: fixes coming up for you soon18:29
prometheanfireyep, thanks18:29
mordreddhellmann, smcginnis: ^^ https://review.openstack.org/508965 should verify the new releasenotes build jobs work18:29
dhellmannmordred : until we can release reno, those jobs are going to continue to fail.18:30
*** e0ne has quit IRC18:30
dhellmannI was watching job status for patches merging into the releases repo at http://zuulv3.openstack.org but now it's blank. Is that related to other changes?18:30
SpamapSclarkb: got distracted.. but the disk accountant should log when it stops jobs18:30
mordreddhellmann: ah - good point - well - the patch for fixing pypi jobs is up and has been approved: https://review.openstack.org/#/c/508951/218:30
mordreddhellmann: are there any other issues you are aware of affecting releasing reno?18:31
AJaegerregarding release notes, let's merge https://review.openstack.org/#/c/508742 and https://review.openstack.org/#/c/508763 , please18:31
dmsimardpabelanger: for https://review.openstack.org/#/c/508940/ .. would it not have been the same amount of effort to just add the required-project ?18:31
dhellmannmordred : we had weird post failures in the jobs associated with the releases repo; let me find a link18:31
mordredAJaeger: are you comfortable landing those without seeing a job pass with them first? I guess we have to be becaue we need the reno fix ...18:32
dhellmannmordred : http://logs.openstack.org/22/2293d561bf37b71b14a7b89e2ada1a5552fc2168/release-post/tag-releases/7a41b08/18:32
pabelangerdmsimard: possible, there there is more then 1 project to add. If you want to propose a fix, that would be great18:32
dmsimardpabelanger: ok, I'll do it18:32
dhellmannmordred : there doesn't appear to be a "run" step in that job18:32
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: DNM: Fix branch matching logic  https://review.openstack.org/50895518:32
AJaegermordred: if we really can test them, let's do it...18:32
mordreddhellmann: I'm sorry - too many balls in the air - which job is missing a run step?18:33
AJaegermordred: ah, I see...18:33
mordreddhellmann: ah - I can read scrollback better now18:33
dhellmannsmcginnis : were you looking into the release job failures?18:33
dhellmannI thought someone else was, so I didn't...18:33
pabelangerdhellmann: have a log handy?18:34
dhellmannpabelanger : http://logs.openstack.org/22/2293d561bf37b71b14a7b89e2ada1a5552fc2168/release-post/tag-releases/7a41b08/18:34
pabelangerdhellmann: oh, that is fixed18:34
pabelanger1 sec18:34
pabelangerdhellmann: mordred: https://review.openstack.org/508871/ for tag-releases job18:35
pabelangershould be able to try again18:35
pabelangermaking note to bubble up error messages from debug.log18:35
dhellmannok. it looks like the bot that reports when changes merge isn't reporting in #openstack-release18:35
*** vsaienk0 has joined #openstack-infra18:36
*** yamamoto has quit IRC18:37
jlkmordred: I'm fixing some pep8 issues with playbooks/files/project-requirements-change.py in openstack/requirements. This may be the first time py3 pep8 is ran on it, so some stuff outside your change needs updating. Fixing.18:37
*** wolverineav has joined #openstack-infra18:39
fungidhellmann: what merged change wasn't reported to the channel? i see openstackgerrit mention 508913 merging at 16:5318:39
dhellmannoh, maybe I had a client blip18:39
pabelangerI think I dropped before posting18:41
pabelangerdmsimard: No package matching 'emacs' is available do you know why? http://logs.openstack.org/40/508940/1/check/base-integration-ubuntu-xenial/3e2f551/ara/result/01e9f809-b683-4470-8d25-3f91ab56b765/18:41
dmsimardpabelanger: there is only this one required-project for the logstash job.18:41
dmsimardpabelanger: should I send another patchset on top of yours so we don't end up doing 3 commits ?18:41
dhellmannfungi : ignore that, I need new glasses18:41
dmsimardpabelanger: that's the configure-mirror integration test18:41
dmsimardpabelanger: hm, there should be a "clear cache" part somewhere18:42
SpamapSclarkb:     log = logging.getLogger("zuul.ExecutorDiskAccountant")18:42
SpamapSclarkb: that logger should show anything done by the accountant.18:43
jlkGAH I fucked that up18:43
SpamapSbut a job has to be over limits for a while18:43
dmsimardpabelanger: there's a notify update apt cache but it doesn't look like it ran18:43
clarkbSpamapS: thanks18:43
pabelangerdmsimard: why would that be?18:43
pabelangerHmm, that is ze09.o.o18:44
pabelangerwhich is a new executor18:44
dmsimardpabelanger: trying to see, hang on18:44
clarkbhttp://logs.openstack.org/92/489492/14/gate/openstack-tox-pep8/36e3dcd/job-output.txt.gz#_2017-10-02_18_39_16_680481 maybe we should shut off infracloud while we are debugging?18:44
clarkbat least in nodepool18:45
*** chlong has quit IRC18:45
pabelangerdmsimard: fwiw, haven't used handlers much for running things like apt-get update. Personally prefer just using a task to handle that18:45
dmsimardpabelanger: it's for idempotency18:45
*** vsaienk0 has quit IRC18:46
*** thorst has joined #openstack-infra18:46
*** thorst has quit IRC18:46
*** chlong has joined #openstack-infra18:46
pabelangerdmsimard: sure, but not really applicable for jobs, since we run once and delete the node18:47
dmsimardpabelanger: fair, but I write things to be idempotent by default to keep best practices and all18:48
dmsimardpabelanger: so the configure repo step didn't return a 'changed' status and that's why the handler didn't fire18:48
*** thorst has joined #openstack-infra18:48
dmsimardThat's weird. I don't see why the repositories would had already been set up and the template task would return ok18:49
openstackgerritMonty Taylor proposed openstack-infra/openstack-zuul-jobs master: Replace legacy-requirements with new requirements-check  https://review.openstack.org/50889818:49
clarkbinfra-root ya I'm seeing a lot of failures to ssh to infracloud18:50
clarkbI'll push up a change to set max-servers to 018:50
pabelangerclarkb: confirmed, I am seeing it too18:50
clarkbpabelanger: SpamapS general observation of the zuul status page is looking much better though18:50
mordredjlk: oh - you updated https://review.openstack.org/#/c/508891 - let's see if your latest update works - if it does land it - if not, we can land https://review.openstack.org/50889818:51
mordredAJaeger: ^^18:51
jlkyeah I'm fixing i right18:51
jlkmordred: I changed the wrong one. so I'm changing the beginning of the stack now and rebasing18:51
jlksorry for hte noise :(18:51
SpamapSyay18:51
*** baoli_ has quit IRC18:51
mordredjlk: no worries! thanks for the help18:52
dmsimardpabelanger: looking at a xenial job that isn't in base-integration, the template task for setting up the mirror properly returns 'changed': http://logs.openstack.org/04/508704/2/check/legacy-ara-integration-py35-latest/c29d413/ara/reports/664af4eb-7bd3-4453-8c0b-2efccca3379b.html18:52
jeblairzuul is swapping now and has slowed to a crawl.  my query still hasn't returned.  i have no idea how long it might take.  should we cut losses and restart now?18:52
pabelangerdmsimard: right, i think it is nice to have.  Like I said, I haven't used handlers much myself for things like update apt. FWIW: we likely don't need do that that anymore either, since we added python-apt to our images.  We might be able to have package / apt task do it directly18:52
*** lnxnut has joined #openstack-infra18:52
mordredjeblair: what query were you doing?18:53
openstackgerritClark Boylan proposed openstack-infra/project-config master: Disable infracloud  https://review.openstack.org/50896918:53
pabelangerclarkb: yah, executors seems happier now18:53
dmsimardpabelanger: that particular handler is not really to install python-apt, it's to make sure the image doesn't run with stale apt cache after modifying the apt repos18:53
fungijeblair: would being able to run it immediately/soon following a restart increase teh chances of getting data back? or would the data you get back likely be less useful?18:53
jeblairmordred: an objgraph path search.  there are 59 more layout objects than i expect; i'm trying to find their reference path.18:54
*** rcernin has joined #openstack-infra18:54
mordredjeblair: also - I'm on board with restarting the scheduler if you are- also ok to wait longer to see how it goes18:54
mordredjeblair: nod18:54
dmsimardpabelanger: but anyway, I confirm that the handler works as expected outside that job, so maybe there's something specific to the base-integration job.18:54
*** baoli has joined #openstack-infra18:54
mordredjeblair: fwiw - over the weekend I observed swap usage go back down after spiking - so it's not impossible for it to come back - but it also may take longer than is reasonable18:54
jeblairfungi: i'd need it to at least have one memory bump.  they don't seem to take long to happen when we're busy.18:55
jeblairmordred: yeah, i even noted a reduction in layout objects earlier18:55
fungijeblair: so maybe restart, give it a bit, then try to query when it's not yet in danger of halting and catching fire?18:55
jeblairfungi: yeah, i think that's how i'm leaning18:56
fungiwfm18:56
jeblairunfortunately, it's so dead i can't grab a copy of the queues18:56
pabelangerdmsimard: well, for the sake of keeping things simple, can we change it to a task over notify handler? It is preventing jobs in openstack-zuul-jobs from passing18:56
*** slaweq_ has joined #openstack-infra18:56
fungijeblair: and no way to abort the query to help it regain enough oomph to grab a copy of the pipelines?18:57
pabelangerthis should like what happened this morning too18:57
*** ociuhandu has quit IRC18:57
jeblairfungi: i think the swapping is what's killing it18:57
fungicollateral damage i guess18:57
pabelanger+1 to what jeblair thinks is best18:57
*** lukebrowning has quit IRC18:58
jeblairi mean, the query is probably causing some swapping...18:58
*** kjackal_ has quit IRC18:58
jeblairoh hey i got a queue list18:59
fungiwoo!18:59
*** dave-mccowan has joined #openstack-infra18:59
*** SumitNaiksatam has quit IRC19:00
dmsimardpabelanger: I'd like to understand why the mirror configuration task did not return 'changed' first19:00
jeblairit stopped actively swapping19:00
*** SumitNaiksatam has joined #openstack-infra19:00
dmsimardpabelanger: it implies that the mirrors had already been configured which should not happen19:00
*** gtmanfred has joined #openstack-infra19:00
dmsimardpabelanger: that job has been passing for as long as I can remember so I want to know if something new has been introduced19:01
jeblairfungi, mordred: what do you think?  leave it running for a bit more since it stopped swapping (for now), or take advantage of the fact we have a queue dump and restart?19:02
fungijeblair: i'm leaning toward the restart anyway, since we may have lucked out in getting queue dumps19:02
mordredjeblair: it's a tough call - if you think you can still get valid data after the restart - then I say restart with the queue dump19:02
jeblairokay, will restart now19:03
fungiand hopefully havnig it under less memory/io pressure will allow you to more quickly get back debugging details19:03
*** e0ne has joined #openstack-infra19:03
*** e0ne has quit IRC19:03
jeblairare there any scheduler fixes we need in place?19:03
pabelangerdmsimard: that is fine, but honestly i don't want to spend too much time on making things idempotent for configure-mirror tasks. I'd rather just laydown files and always apt-get update and move on to running jobs.19:03
jeblair(i'm running from a personal branch with extra debugging)19:04
mordredjeblair: your patches for yappi and stack dumping are the only relevant patches outstanding19:04
jeblairk, i'll use the same version as before19:05
jeblairit's starting up19:05
clarkbmordred: jeblair one sec19:05
clarkboh neverming19:05
clarkb(its not super urgent)19:05
*** lnxnut has quit IRC19:05
jeblairclarkb: ok. i can always restart  :)19:05
clarkbmordred: was going to ask where the BUILD_TIMEOUT env injection change ended up. Was that in zuul?19:05
pabelangerwe should maybe send out a status notice for people not in channel19:05
clarkbbecause ironic apparently has some tests with that not working19:05
mordredclarkb: yes19:05
jeblairclarkb: should be executor though19:05
clarkbjeblair: ah ok so different daemon, got it19:05
mordredclarkb: it's in zuul/ansible/filter/zuul_filters.py19:06
jeblairamusingly, our 3 components are almost never running the same version :)19:06
mordredjeblair: :)19:06
fungipabelanger: i guess we can, though that's probably not a huge upset given the continual disruption we've been inflicting over the past 4-5 days so far19:06
clarkbmordred: in what situations is the timeout not set? eg could the ironic jobs be failing because they don't have a timeout set?19:06
mordredclarkb: well - BUILD_TIMEOUT itself is just set based on zuul.timeout19:07
dmsimardzuulv3.o.o is empty ?19:07
jlkrestarting19:07
mordredclarkb: which comes from the job's config19:07
dmsimardah19:07
clarkbmordred: https://review.openstack.org/#/c/508882 <- that is their proposed fix19:07
clarkbmordred: which I think likely misses the point if we are fixing it that way19:07
jeblairokay, running re-enqueues now19:07
mordredclarkb: do we have an example of one that failed?19:08
jeblairi'm going to grab lunch while memory use expands19:08
pabelangerthis will be a good test for zuul-executors and load19:08
dmsimardthe polling fix is in ?19:09
clarkbmordred: not that I've seen (just going thorugh our list and trying to distill what we can push through at this point)19:09
clarkbmordred: so maybe we defer on that19:09
mordredclarkb: http://logs.openstack.org/32/507632/4/check/legacy-grenade-dsvm-ironic-multinode-multitenant/62cb405/zuul-info/inventory.yaml19:09
SpamapSone thing I find interesting, with the 20.0 cap on 1-minute load, 5 minute load average seems to be staying down around 3.5 - 4.0 or so.  http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63999&rra_id=all19:09
mordredclarkb: if you look at the bottom, you can see timeout: 1080019:09
*** chlong_ has joined #openstack-infra19:09
mordredclarkb: so there IS a timeout set in that job, so I'd expect the environment passed to the shell tasks to include BUILD_TIMEOUT=1080000019:10
pabelangerSpamapS: I've noticed that too, but want to see what happens now that zuulv3.o.o was restarted19:10
SpamapSbut if you set that next to the memory graph, I think the reason for this is that we have more available RAM for caching/buffers, so it's an exponential improvement http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=64003&rra_id=all19:10
pabelangerSpamapS: ze01.o.o is up to 10.38 currently19:10
clarkbSpamapS: ya fewer tires to spin19:10
openstackgerritDavid Moreau Simard proposed openstack-infra/openstack-zuul-jobs master: Set legacy-logstash-filters-ubuntu-trusty non-voting  https://review.openstack.org/50894019:10
openstackgerritDavid Moreau Simard proposed openstack-infra/openstack-zuul-jobs master: Add required-projects to legacy-logstash-filters jobs  https://review.openstack.org/50897119:10
openstackgerritDavid Moreau Simard proposed openstack-infra/openstack-zuul-jobs master: Revert "Set legacy-logstash-filters-ubuntu-trusty non-voting"  https://review.openstack.org/50897219:10
*** lukebrowning has joined #openstack-infra19:11
dmsimardpabelanger: doh, mistakenly rebased your patch19:11
dmsimardsorry :/19:11
mordredSpamapS: ++ ... turns out there are times when the way to improve both throughput and concurrency is to reduce concurrency19:11
pabelanger100 ansible-playbook processes on ze01 currently, load of 10.4119:12
SpamapSwould graphite have the throughput of jobs per-executor?19:12
pabelangerit should, IIRC19:12
pabelangerbut we need to land firewall change first19:12
pabelanger50887619:12
clarkbpabelanger: please add it to the list (also I've got a list of changes that should be good to review now if that one is ready for review)19:13
clarkball on the one etherpad19:13
pabelangerclarkb: it's approved, trying to land fixes to system-config first. See 50894019:13
SpamapSI see nothing in graphite unfortunately19:14
jdandreaAm I doing anything wrong? A lot of timeouts/limits/failures. Or something else might not be working ATM? https://review.openstack.org/#/c/50892419:14
pabelangerSpamapS: ya, firewall19:14
SpamapSohwell ;)19:14
*** e0ne has joined #openstack-infra19:14
clarkbpabelanger: ah ok 508940 had a failure but should get reenqueued so hopefully it gets in then progress19:14
jlkjdandrea: we just restarted zuul, so there may have been some oddness19:15
jdandreajlk Ah, got it, thx. Will wait a bit.19:15
dmsimardmordred: do you happen to have an answer for my question in https://review.openstack.org/#/c/507558/1/zuul.d/jobs.yaml@15 ?19:15
*** lukebrowning has quit IRC19:15
pabelangerclarkb: ya, there was a failure in integration-configure-mirror job too. dmsimard is looking into why19:15
mordreddmsimard: probably?19:15
AJaegerdmsimard: can't we merge https://review.openstack.org/#/c/508971/1 without the non-voting change?19:16
pabelangerbut, ya. we should do 507558 :)19:16
dmsimardAJaeger: yeah 508971 should address the issue with the job but pabelanger submitted a non-voting change first19:16
dmsimard508971 should had already merged but it hasn't (flap in a job) and zuul restarts etc19:17
dmsimarder, I meant 508940 should had already merged19:17
mordreddmsimard: I do, in fact19:17
pabelangerdmsimard: at this rate, I'd just rebase 971 to master and I'll abandon mine19:17
dmsimardpabelanger: ok let's do that19:18
* AJaeger agrees with pabelanger19:18
openstackgerritDavid Moreau Simard proposed openstack-infra/openstack-zuul-jobs master: Add required-projects to legacy-logstash-filters jobs  https://review.openstack.org/50897119:18
dmsimardrebased ^19:18
dmsimardmordred: it's possible to filter on files in different repos ?19:18
SpamapSFYI memory governor works for me in my private zuul here19:18
AJaegerpabelanger: I abandoned yours19:18
pabelangerAJaeger: thanks19:18
mordreddmsimard: yes19:19
AJaegerpabelanger: do yo uwant to +2A https://review.openstack.org/#/c/508971 later?19:19
smcginnisdhellmann: Sorry, was gone for a bit, but yes, I was. The patch Paul linked to I thought would resolve the issues.19:19
dmsimardmordred: how? the path is not relative to the project19:19
dmsimard... or is it ?19:19
mordreddmsimard: it's relative to the root of the proposed patch19:19
dmsimardpabelanger: if you have time, I'd like to troubleshoot https://review.openstack.org/#/c/504238/ for the finger zuul-stream-functional finger://ze05.openstack.org/40c5ebd0467545ad9fa37fac6fc12e2819:20
dmsimardpabelanger: it's the truncated json issue we looked at during the ptg.19:20
mordreddmsimard: so if base-integration is added to a pipeline for zuul-jobs, then patches proposed to zuul-jobs that match the files in the files: section will control whether that patch should run the integration job or now19:20
mordrednot19:20
*** lukebrowning has joined #openstack-infra19:20
mordreddmsimard: but if a given patch doesn't have ^roles/configure-mirrors in it, whether because it's a patch to openstack-zuul-jobs so can't have that file, or it's a patch to zuul-jobs and doesn't touch that file, then that file matcher wont' match19:21
pabelangerdmsimard: -1 on 50897119:21
mordreddmsimard: does that make sense? (I may have said too many or not enough words)19:21
*** lukebrowning has quit IRC19:22
dmsimardpabelanger: you're linking something from legacy-openstackci-beaker-ubuntu-trusty19:22
dmsimardpabelanger: that's not logstash filters ??19:22
*** lukebrowning has joined #openstack-infra19:22
pabelangerlooking19:22
clarkbSpamapS: pabelanger its not a direct mapping to job throughput but 245 nodes in use right now19:22
clarkb191 not in use19:23
dmsimardmordred: yeah that makes sense, I thought we needed to have a relative path from o-z-j to z-j19:23
dmsimardmordred: your explanation makes more sense, i.e, relative to the project triggering th ejob19:23
pabelangerdmsimard: the legacy-openstackci-beaker-ubuntu-trusty is what we need to fix in system-config, I am not sure where the logstash-filters job is coming from19:24
pabelangerI've lost track currently19:24
dmsimardpabelanger: that's the one you made non-voting lol19:24
dmsimardpabelanger: we can fix that one too, hang on, I'll submit another patch19:24
pabelangerHmm, checking19:24
dmsimardhttps://review.openstack.org/#/c/508940/ Set legacy-logstash-filters-ubuntu-trusty non-voting19:25
pabelangerdmsimard: yup, i did it wrong to start with :(19:25
dmsimardsending another patchset, hang on19:25
mordredpabelanger, dmsimard: so ... BOTH of those jobs should be fixed :)19:25
dmsimardmordred: yeah ++19:25
clarkbnow 286 in use and 201 not in use19:25
pabelangermordred: ya, i think I crossed the wires19:25
mordredpabelanger: there are many wires - easy to cross them ;)19:26
*** e0ne has quit IRC19:26
clarkbload on a spot check of servers including nodepool.o.o and nl01/2 lgtm19:26
mordreddmsimard: I believe you should be able to just add it to legacy-infra-puppet-apply-base yeah?19:26
*** e0ne has joined #openstack-infra19:27
SpamapSclarkb: with 10 executors?19:27
mordredclarkb: that's good - so the manage incoming executor load seems to be doing its job19:27
clarkbSpamapS: ya19:27
*** hashar has joined #openstack-infra19:27
*** e0ne has quit IRC19:27
SpamapSclarkb: and geard queue length?19:27
mordredclarkb: oh - you were talking about launchers19:27
clarkbmordred: I think so, obviously we need to watch it more and see if we are running enough jobs (but considering it doesn't sit at load of 20 the whole time I expect we are)19:27
SpamapS(you should be able to run the status admin command without blowing up your terminal)19:27
*** e0ne has joined #openstack-infra19:27
SpamapSsince it will just have like, 6 queues19:27
*** devananda has quit IRC19:28
* clarkb looks at geard19:28
SpamapSwell, 16, 1 for each executor's cancels19:28
*** e0ne has quit IRC19:28
SpamapSalso lol, I inverted the logic on the memory governor.. fixing19:28
pabelangerclarkb: fungi: mordred: I'd not object if we wanted to force merge: https://review.openstack.org/508876/ which opens firewalls for zuulv3.o.o to graphite.o.o19:28
pabelangerthen we'd be able to get better counts on online nodes19:28
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add memory awareness to system load governor  https://review.openstack.org/50896019:28
openstackgerritDavid Moreau Simard proposed openstack-infra/openstack-zuul-jobs master: Add required-projects to logstash-filters and openstackci-beaker jobs  https://review.openstack.org/50897119:28
*** e0ne has joined #openstack-infra19:29
mordreddmsimard: lgtm19:29
clarkbSpamapS: geard closed the connection on me when I asked for status, guessing because we use ssl auth now?19:29
*** e0ne has quit IRC19:29
clarkbis there an easy way to connect? guessing with openssl sclient?19:29
*** e0ne has joined #openstack-infra19:30
*** devananda has joined #openstack-infra19:30
*** e0ne has quit IRC19:30
smcginnisWas there a restart or something? https://review.openstack.org/#/c/508867/1 was approved, but I don't see it in zuul3.o.o.19:30
clarkbpabelanger: I'm ok wiht it as operating blind is annoying19:31
fungipabelanger: yeah, i agree that getting information on the health of the system is a high enough priority; i'm willing to bypass ci on that patch given the triviality of it19:31
clarkbsmcginnis: there was, if it wans't processed yet then it wouldn't have been readded to the queues, you can just approve it again19:31
*** e0ne has joined #openstack-infra19:31
smcginnisclarkb: ack, thanks19:31
mordredpabelanger: I'm also ok merging it19:31
*** e0ne has quit IRC19:32
SpamapSclarkb: oh yeah, you can connect with s_client19:32
mordredsmcginnis: you may want to hold off on that patch until https://review.openstack.org/#/c/508951/ has landed19:32
openstackgerritMerged openstack-infra/system-config master: Add zuulv3.o.o to graphite.o.o  https://review.openstack.org/50887619:32
fungipabelanger: ^19:33
pabelangerfungi: danke!19:33
*** e0ne has joined #openstack-infra19:33
*** jascott1 has quit IRC19:33
*** e0ne has quit IRC19:33
smcginnismordred: Hard to keep track what to wait for. :)19:33
clarkbSpamapS: `openssl s_client -cert $wherever_zuul_keep_it -connect localhost:4730` look right to you?19:34
*** jascott1 has joined #openstack-infra19:34
fungiclarkb: i think you'll probably need to tell s_client where your private key is too so it will be able to receive19:35
mordredsmcginnis: yah- I know - sorry aboutthat...19:35
mordredsmcginnis: also - while I'm looking- the legacy-releases-python35 should have just been a normal tox python35 job right?19:35
SpamapSclarkb: yep19:36
SpamapSoh yeah the key19:36
smcginnismordred: I think probably. Let me take a closer look.19:36
SpamapS2017-10-02 12:31:03,760 INFO zuul.ExecutorServer: Unregistering due to low avail memory 3.700000000000003% < 5.019:37
SpamapSperhaps I should format that float...19:37
mnasernah19:37
mnaserwe're precise in openstack world19:37
*** yamamoto has joined #openstack-infra19:38
SpamapSdamn right!19:38
jlkprecise AF19:38
fungimnaser: just not terribly accurate? ;)19:38
clarkbSpamapS: http://paste.openstack.org/show/622497/ the two statuses are about 30 seconds appart19:38
mnaserthere's a pun about floats to be made but19:38
SpamapSprecise to the quintillionth19:38
mnaserill leave it for someone else to find out19:38
jlkwe all Float down here19:38
smcginnismordred: I think you are correct that that job could just be the normal py35 job. But maybe dhellmann and others with more history can confirm.19:39
dhellmannI don't think there's anything special about that job19:39
SpamapSclarkb: ok, so executor:execute is the one that gets governed. Interesting that there are 10 registered in both cases.19:39
clarkb`sudo openssl s_client -connect localhost:4730 -cert /etc/zuul/ssl/client.pem -key /etc/zuul/ssl/client.key -CAfile /etc/zuul/ssl/ca.pem` is the command btw19:39
SpamapSactually IIRC that also counts active jobs, not waiting workers19:40
SpamapSclarkb: that's a handy command19:40
*** thorst has quit IRC19:41
mordreddhellmann, smcginnis: ok - I'm going ot fix that- and also a couple of other things for the openstack/releases repo19:41
clarkbSpamapS: now its 369 369 1019:41
clarkbSpamapS: so its not going off in either direction quickly (but maybe that is a good thing)19:41
dhellmannmordred : ok, thanks19:41
smcginnismordred: Thank you!19:42
SpamapSclarkb: should be fine to let it go up during busy periods. Just means jobs will sit queued for a while.19:43
SpamapSclarkb: but if we're not utilizing nodes.. that's concerning.19:44
*** yamamoto has quit IRC19:44
SpamapSprobably can raise the load limit up. Maybe go to 24.0 by raising the multiplier to 3.019:44
clarkbSpamapS: ya let me check current node uage19:46
fungiSpamapS: clarkb: i recommend also passing -quiet to s_client if you're not debugging ssl/tls negotiation19:46
clarkb493 in use and 129 not in use19:47
clarkbso thats about 2/3 our capacity I think19:47
sdaguereleasenotes not working is a known issue at this point - http://logs.openstack.org/22/490722/11/gate/legacy-releasenotes/c83671e/job-output.txt.gz#_2017-10-02_19_22_20_821466 ?19:47
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Update releases repo to use openstack-python35-jobs template  https://review.openstack.org/50897819:47
clarkbsdague: it is, mordred is working to fix it19:47
clarkbI think there are changes I hsould probably review related to that too19:47
fungiclarkb: you can also pipe commands into s_client, like so: `echo status|sudo openssl s_client -quiet -connect localhost:4730 -cert /etc/zuul/ssl/client.pem -key /etc/zuul/ssl/client.key -CAfile /etc/zuul/ssl/ca.pem 2>/dev/null`19:47
sdagueok, trying to figure out what kinds of patches are safe to review, and which aren't19:48
clarkbfungi: nice19:48
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Fix branch matching logic  https://review.openstack.org/50895519:48
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Switch release-note-jobs project-template to use new jobs  https://review.openstack.org/50874219:48
clarkbsdague: I believe that nova and devstack should be generally safe except for use of release notes19:48
clarkbtempest I'm not sure19:49
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add memory awareness to system load governor  https://review.openstack.org/50896019:50
sdagueclarkb: ok, good enough. Knowing that nova should be good except for release notes is handy19:51
inc0can I get +3 on https://review.openstack.org/#/c/508944/ please? Will make everyone's life better19:51
openstackgerritAndreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Remove un-used legacy-releasenotes job  https://review.openstack.org/50876619:51
clarkbinc0: I think its changes like that that create the memory explosion in zuul? mordred that the case? I guess we can't really avoid them19:52
fungiclarkb: i don't think we _want_ to avoid them19:52
mnaserclarkb sdague https://review.openstack.org/#/c/508763 release note jobs should be fixed anytime now when that change merges19:53
inc0hmm, yeah it would validate *everything*19:53
clarkbfungi: well we might be picky about using those that fix bugs first?19:53
fungimaybe19:53
jdandreajlk Progress! One of the two changes passed. The other has two failures in the ubuntu-trusty department. https://review.openstack.org/#/c/508924/19:53
fungiclarkb: but also we might should get a few merged before jeblair returns from lunch so he has some memory explosions to analyze19:53
clarkbfungi: ha ok, I'll review mordreds for release notes and inc0's19:53
jlkjdandrea: was that a misfire?19:53
clarkboh fungi beat me to the mordred release notes fix19:54
mnaserclarkb that weirdly enough got the +2 from zuul 13 minutes ago now19:55
mnaserbut hasnt merged?19:55
fungiclarkb: also note i've added Shrews's 508955 fix for branch matching to the priority list on the etherpad. we should try to get that merged soon19:55
clarkbmnaser: zuul is processing the event queues I think19:55
jdandreajlk Oh this is interesting. "ERROR! /home/zuul/src/git.openstack.org/openstack-infra/project-config not found" http://logs.openstack.org/24/508924/1/check/legacy-openstackci-beaker-ubuntu-trusty/a1746d2/job-output.txt.gz19:55
clarkbmnaser: so need to wait for zuul to catch up then run those jobs19:55
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Fix branch matching logic  https://review.openstack.org/50895519:55
mnaserclarkb oh i see19:56
*** thorst has joined #openstack-infra19:56
AJaegerjdandrea: should be fixed by dmsimard's https://review.openstack.org/50897119:56
*** dhill__ has quit IRC19:57
*** dhill_ has joined #openstack-infra19:58
clarkbAJaeger: comment on 508706, mordred can you double check my comment there is accurate?19:58
AJaegerclarkb: I commented - that step is the complete retiring of repo, so it should get removed from the file, shouldn't it?20:00
clarkbAJaeger: oh I read it as how to migrate for some reason20:00
* clarkb rereads20:00
*** thorst has quit IRC20:00
clarkbah yup I'm wrong, sorry for hte noise20:01
AJaegerclarkb: better safe than sorry - thanks for reviewing20:01
*** trown is now known as trown|brb20:01
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Simplify TestSchedulerBranchMatcher  https://review.openstack.org/50898020:01
*** lnxnut has joined #openstack-infra20:03
* AJaeger calls it a day and waves good night20:04
fungithanks for all the help AJaeger!20:05
fungihave a good evening20:05
jdandreaAJaeger Thanks for that, I'll give it a shot once that merges.20:06
*** gouthamr has quit IRC20:06
clarkbSpamapS: down to 202 202 10 now (I think because zuul is busy processing results queues and so not generating new jobs yet)20:06
openstackgerritMatthew Treinish proposed openstack-infra/openstack-zuul-jobs master: Add missing projects for openstackci beaker jobs  https://review.openstack.org/50898120:08
openstackgerritMatthew Treinish proposed openstack-infra/puppet-subunit2sql master: Ensure that build_names are unique per project  https://review.openstack.org/50825820:09
mtreinishclarkb, fungi: the beaker jobs on ^^^ are still failing, hopefully that'll fix it20:09
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Create static publication base job  https://review.openstack.org/50898220:12
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Add jobs for special static publication targets  https://review.openstack.org/50898320:12
mordredclarkb, pabelanger, fungi, dhellmann, smcginnis: ^^ I think that stack should take care of most of the things around releases repo20:13
*** dave-mcc_ has joined #openstack-infra20:13
dhellmannmordred : is the idea that eventually the stuff in https://review.openstack.org/#/c/508978/1/zuul.d/projects.yaml will move to an in-tree file?20:14
mordreddhellmann: yes - by and large. there will still be some things that will stay in there - like defining that openstack python projects run the openstack-python-jobs template, for instance20:15
dhellmannok20:15
mordreddhellmann: also things that need to share a change queue, like the integrated gate, need to keep definitions in project-config ...20:16
dhellmannsmcginnis and I were talking earlier about how to update the validation that we have in the releases repo to ensure that project repos have a release job attached. It sounds like we'll need to look in 2 places for a while20:16
mordreddhellmann: but for things that aren't those two cases, the rest should just migrate to in-repo20:16
*** dave-mccowan has quit IRC20:16
mordreddhellmann: yah - I believe that is true20:16
dhellmannk20:16
*** dave-mcc_ is now known as dave-mccowan20:17
dhellmannmordred : are all (or some?) of the variables used by https://review.openstack.org/#/c/508982/1/playbooks/publish/static.yaml defined somewhere?20:17
dhellmannfileserver.path, for example20:18
*** windmoon has joined #openstack-infra20:18
*** dprince has quit IRC20:18
mordreddhellmann: they are - in this particular case 'fileserver' is defined by the 'add-fileserver' role20:18
dhellmannok, so if I find the docs for that it will explain that it creates that variable?20:19
mordreddhellmann: oh - actually - no - that's totally a buggy patch from me :)20:19
dhellmannand I guess the "artifacts" directory use is somehow standard? or is that something made up for this job?20:19
mordreddhellmann: http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/add-fileserver is where the add-fileserver role is - but is also documented here: https://docs.openstack.org/infra/zuul-jobs/roles.html#role-add-fileserver20:20
*** windmoon has quit IRC20:20
dhellmannaha, ok, I think I came across that earlier and didn't recognize the significance20:20
mordreddhellmann: it's halfway in between - all of our fetch-docs roles put docs into artifacts/ on the executor20:21
dhellmannI thought those were inputs to the job, not outputs20:21
* mordred fixing patches20:21
*** lnxnut has quit IRC20:21
*** jascott1 has quit IRC20:22
*** jascott1 has joined #openstack-infra20:23
*** gouthamr has joined #openstack-infra20:25
*** jascott1 has quit IRC20:25
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Create static publication base job  https://review.openstack.org/50898220:26
*** jascott1 has joined #openstack-infra20:26
mordreddhellmann: ok - that ^^ might read a little better20:26
openstackgerritClark Boylan proposed openstack-infra/zuul feature/zuulv3: Fix Gearman UnknownJob handler  https://review.openstack.org/50899220:26
*** ccamacho has joined #openstack-infra20:27
openstackgerritMohammed Naser proposed openstack-infra/openstack-zuul-jobs master: Switch puppet unit tests base job  https://review.openstack.org/50899420:31
mnaser^ small one liner to change a base job to fix things temporarily till we move things over ^20:32
*** jungleboyj has joined #openstack-infra20:32
jungleboyjAfternoon oh Infra Gods.  We have a patch that can fix many of the problems we are seeing with Cinder but for some reason Zuul isn't picking it up.  https://review.openstack.org/#/c/508541/20:33
jungleboyjAny guidance you can provide?20:34
mnaserjungleboyj things are a bit slow in zuul land esp around reconfigs and what not20:34
mnaserexpect delays in things showing up in the queue20:35
clarkbjungleboyj: zuul is busy with "config generation and an object graph walk" accoridng to #zuul20:35
jungleboyjmnaser:  Ok, even of an hour plus?20:35
mnaseryes, i've seen it just over an hour and it'll click and pop in20:35
fungi_though_ zuul has never voted on that change20:35
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Add jobs for special static publication targets  https://review.openstack.org/50898320:35
fungiwill it actually act on an approval vote if there was never a verify +1 from zuul?20:35
smcginnisShould be a full new patch now.20:36
fungior does it need a recheck first?20:36
smcginnischeck and gate.20:36
fungioh. yep new patchset at 19:5520:36
*** Goneri has quit IRC20:36
fungiso right, it's probably just in the event queue still20:36
jungleboyjfungi:  Yep.  Did that to add the bug and try to give it a kick.20:36
jungleboyjWill just keep watching for it.20:37
*** thorst has joined #openstack-infra20:37
*** bnemec has joined #openstack-infra20:37
jeblairi think we should restart it again20:37
*** trown|brb is now known as trown20:37
fungijeblair: do you think the objgraph calls are killing it?20:37
jeblairfungi: likely a contributing factor this time at least20:38
fungiwe only got up around 5gib ram used since the last restart20:38
jeblairfungi: it's doing that and making a dynamic config at the same time.  that's been going on for about 40m now20:38
openstackgerritMonty Taylor proposed openstack-infra/openstack-zuul-jobs master: Add openstack-tox-validate job  https://review.openstack.org/50899620:39
fungiif you think we should restart it again, i won't argue. just don't want to make it harder for you to debug20:39
* mnaser much rather have a slow moving gate today but useful information to solve things for the long term20:40
jeblairi'm not getting any useful info right now :|20:40
jeblairthe ones that returned before it got really slow provided no info :(20:40
fungishould we get Shrews' branch matcher fix pulled locally before restartnig?20:40
jeblairgood idea20:41
mnaserjeblair ouch, this is really difficult, i ran into this the other day - https://pypi.python.org/pypi/mem_top -- maybe could be interesting (if you havent probably ran into it)?20:41
mnaserthe funny thing is they caught a gearman related memory leak as well with it :-P20:41
mnasersurprisingly close to home.. maybe if it has any usefulness, it can be added pre-restart?20:41
mnasermaybe after every dynamic reconfig or reconfig it can throw a logging.debug(mem_top()) ... just trying to throw some ideas that coudl be of help20:42
jeblairmnaser: that's like objgraph.show_most_common_types() or objgraph.show_growth() which i added to zuul's sigusr2 debug handler over the weekend20:42
*** esberglu has quit IRC20:43
mnaserjeblair ah i see20:43
jeblairmnaser: the things i'm doing now are asking objgraph to show me the reference chain to objects that i don't think should be in memory any more.  i'm still coming up empty.20:43
mnaserso technically according to objgraph the memory consumption is "normal" and everything has a refcount of non zero i guess?20:44
mnaserhow hard would it be to wire up a zuul instance or a test case that just reruns dynamic reconfig using openstack repos over and over again?20:45
jeblairmnaser: yeah, the 'graph' part of it is that it helps find where you have references to objects still sitting around20:45
harlowjapython and dynamic debugging == the suck, lol20:45
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Remove centrally defined jobs for openstack/releases  https://review.openstack.org/50899820:45
jeblairmnaser: i already tried doing this locally.  i have not figured out the trigger yet.20:45
Shrewsfungi: jeblair: my change does pass the TestScheduler suite of tests (and my newly added test), but i did not verify it against the entire test suite. so *fairly* confident it fixes more than it breaks20:45
jeblairShrews: any chance you can run the whole suite real quick?20:46
mnaserjeblair ouch, this is a real tricky one :(20:46
Shrewsjeblair: sure20:46
harlowjahttps://github.com/mgedmin/dozer#tracking-down-memory-leaks may also be fun20:46
jeblairwe all want to say it has something to do with configuration, but every simple theory i've examined hasn't held up based on logs.20:46
harlowjaif anyone wants to try that to20:46
Shrewsjeblair: running now20:46
mordreddhellmann, smcginnis: https://review.openstack.org/508998 is at the end of a depends-on chain that should, best I can tell, have your repo completely migrated20:46
Shrewsugh, i have no mysql20:47
mnaserjeblair i'd love to give a hand, if there are some instructions/docs on how to wire up a zuul instance locally (connected to openstack repos, etc), i'd gladly throw a hand20:47
jeblairit probably does, but maybe it's more like "a syntax error in dynamic reconfiguration for re-equeing a change that failed once already after a tenant reconfig" or something crazylike that.  :)20:47
SpamapSThere's likely a piece of the old config that gets carried into the new one.20:48
jeblairso i'm still trying to extract "what's happening in production" out of this20:48
SpamapSOr into running events20:48
*** jkilpatr has quit IRC20:48
SpamapSbut that's going to take a ton of digging to find20:48
openstackgerritMonty Taylor proposed openstack-infra/openstack-zuul-jobs master: Remove legacy static- and releases- jobs  https://review.openstack.org/50899920:48
mnaserwell the only thing i guess we've all noticed is the memory usage goes up when zuul is stuck and (from what i understand) it seems to be in a state of dynamic reconfig during that memory growth20:49
openstackgerritBrian Haley proposed openstack-infra/devstack-gate master: Support an IPv6 underlay network  https://review.openstack.org/34304120:49
SpamapSRight, so it's building the new tree, and not letting go of the old one. Possibly even making copies of objects for transformation purposes that get kept around deep in some object.20:50
*** hashar has quit IRC20:50
jeblairon suday zuul performed 100 dynamic reconfigs from 0900 to 1520 with no increase in memory usage.20:50
SpamapSThat's why I suspect it may be tied to ongoing events.20:50
funginot necessarily. it's been "stuck" for the last 45-ish minutes doing a dynamic reconfiguration and ram is holding fairly steady20:50
SpamapSAs in, a piece of config gets carried along forward with a long running event20:50
jeblairSpamapS: yep.  that's what i'm trying to find20:51
SpamapSAnd until the job finishes, no free for you20:51
clarkbjeblair: that is an interesting statistic (I didn't know that)20:51
jeblairso earlier, there were 77 layouts in memory, but i could only find 19 of them by trawling buildsets.  so what's holding on to the other 58?20:52
mnaserwould it be hard to extract logs of # of events queued by type and then graph that on top of the memory usage graphs we have?20:52
jeblairthat was the question i was trying to answer before the last restart.20:52
jeblairand "what was holding on to the other 7" was the question i was trying to answer this time.  :)20:52
mordredjeblair: ++20:52
mordredjeblair: out of curiosity - do you know what the resident memory size is after constructing the initial config but before any event processing or reconfigurations?20:52
SpamapSjeblair: maybe we should hack in a uuid for each config generation (or maybe just "all the shas hashed") and then you could look for that string in all the other objects.20:52
jeblairmordred: not yet; it's on my list to find out20:53
mnaserif the logs arent sensitive and include when an event is queued, i would volunteer to try and graph out different type of events queue'd over the memory usage recorded (if this is beneficial information?)20:54
*** caphrim007 has quit IRC20:54
SpamapSjeblair: oh also, have you checked that the layouts hanging around don't run into any of the gc cycle fails?20:54
jeblairSpamapS: no; can you elaborate?20:54
*** caphrim007 has joined #openstack-infra20:54
SpamapSI recall python3.4+ have most of them resolved, but a few scenarios still cause self-referential objects to be un-gc'able20:54
SpamapSjeblair: I will go get some backing info.20:55
Shrewsjeblair: test_scheduler.py and test_v3.py all pass. not sure which of the tests are requiring mysql, so i didn't run the entire suite. if we have time to wait, i can  install the db and get it setup for testing20:55
*** esberglu has joined #openstack-infra20:55
jeblairSpamapS: ugh.  i know we have a couple of cycles, but am expecting them to be okay.  if that's not something we can assume that may be worth checking.20:55
*** hichihara has joined #openstack-infra20:55
SpamapSjeblair: Yeah I remember seeing a pycon talk specifically about the ones that are left and how it's kind of hard to get into that situation. But.. we're REALLY good at getting into nigh-impossible situations ;)20:56
dhellmannjeblair, SpamapS : it's pretty hard in 3.5 to have objects that can't be collected. even cycles seem to need a reference from the outside, now.20:56
dhellmannI had to do a lot of work to rewrite that section of pymotw for the gc module.20:56
SpamapSdhellmann: zomg you are the right person to discuss this with. My swiss cheese brain can't remember the specific situations that can still get you into an un-collectable but seemingly unreferenced object and I can't remember them.20:56
SpamapSThis was like, 2014 so.. ugh20:57
jeblairSpamapS, dhellmann: is there a way to check?  like, if i have a reference to an object i think should be collected, can i ask the gc about it?20:57
dhellmannhttps://pymotw.com/3/gc/#finding-references-to-objects-that-cannot-be-collected20:57
*** ijw has quit IRC20:58
SpamapSThere you go20:58
dhellmannthere are a couple of different scenarios on that page for showing what the gc has that might be interesting if you're debugging a memory leak20:58
SpamapSalso isn't there a way to basically tell python to try harder?20:58
jeblairShrews: i see failures for tests.unit.test_change_matcher.TestBranchMatcher.test_matches_returns_true_on_matching_ref and tests.unit.test_change_matcher.TestBranchMatcher.test_matches_returns_false_for_missing_attrs20:58
*** mat128 has quit IRC20:59
dhellmannthe python 2 version of that had a lot more cases, but I couldn't get all of them to "work" in the same way for 320:59
SpamapSgc.set_threshold() or something?20:59
jeblairShrews: they may just be unit tests that need updates20:59
*** trown is now known as trown|outtypewww20:59
Shrewsjeblair: looking20:59
dhellmannSpamapS : yeah, scroll down the page a bit to the next section and it shows how to use that20:59
*** dave-mccowan has quit IRC21:01
*** srobert_ has quit IRC21:01
*** rockyg has joined #openstack-infra21:01
SpamapSdhellmann: thanks for the assist.21:01
SpamapSI have to run do non-infra things for a bit21:01
dhellmannSpamapS, jeblair : I have to drop off, but I hope that's helpful21:01
jeblairdhellmann: thanks21:02
jeblairlooks like zuul has resumed work21:04
fungiis the objgraph dump still in progress i guess?21:05
jeblairyes21:05
jeblairi think it may be finished now21:06
fungiooh!21:06
mnasercan i get a very short and quick +W here for puppet unit tests? https://review.openstack.org/#/c/508994/ (sorry for asking so much lately :()21:08
clarkbmnaser: done21:09
mnaserclarkb tyvm21:09
* clarkb is waiting on those three project-config changes to merge21:10
*** dhill_ has quit IRC21:11
*** dhill_ has joined #openstack-infra21:11
clarkbso release notes should work now? and pypi releasing fix is on its way. Infracloud will be turned off until we can sort out its ssh problems21:12
mordredclarkb: releaesnotes and pypi-releasing will work as soon as their patches land21:12
clarkbmordred: I think releasenotes landed21:12
clarkb93 nodes in use now21:12
mordredhttps://review.openstack.org/#/c/508951/ is publish-to-pypi21:12
clarkbmordred: ya that one is in the gate with test passed, just needs to merge21:13
mordredwoot21:13
*** jkilpatr has joined #openstack-infra21:13
mordredclarkb: https://review.openstack.org/#/c/508763 and https://review.openstack.org/#/c/508769 apply them21:13
mordredthe new releasenotes jobs21:13
*** ccamacho has quit IRC21:14
Shrewsjeblair: ok, apparently change.ref can be None in the tests. i'll change add that check back in21:14
mordredclarkb: releasenotes jobs themselves need a new reno release, which needs the publish-to-pypi fixes landed21:14
clarkbmordred: 8763 may need to be reenqueued to the gate? it says ready to submit but didn't actuall get submitted21:14
jeblairShrews: i think it may never be none anymore; maybe double check that and alter the tests?21:14
*** edmondsw has quit IRC21:14
inc0https://review.openstack.org/#/c/508661/ <- these two jobs just hung on some random line...is it possibly related to zuul memory explosion?21:15
jeblairShrews: (this is why i don't like unit tests)21:15
mordredclarkb: let's wait til the pypi jobs thing lands21:15
mordredclarkb: since the jobs will be bork either way without new reno21:15
mordredclarkb: then once new reno is cut - we can either retrigger or just click the merge button (since it did actually get its +2)21:16
openstackgerritMerged openstack-infra/project-config master: Disable infracloud  https://review.openstack.org/50896921:16
openstackgerritMerged openstack-infra/project-config master: Use publish-to-pypi and friends for python releasing  https://review.openstack.org/50895121:16
mordred\o/21:16
clarkbmordred: oh its because its based on a non current patchset21:16
Shrewsjeblair: well the test is _explicitly_ checking what happens when ref is None (test_matches_returns_false_for_missing_attrs), so maybe delete the test? i try to never do that, but maybe it makes sense here?21:16
clarkbmordred: so you'll need to rebase those two I think21:16
mordredclarkb: ah. that. kk21:16
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Switch jobs to use new release notes job  https://review.openstack.org/50876321:16
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Collapse releasenotes jobs to using project template  https://review.openstack.org/50876921:16
jeblairShrews: yes, i think that's the correct thing to do if there are no subclasses of Ref without a ref21:16
mordredclarkb: rebased21:16
clarkbmordred: should I reapprove? the +2's carried over21:18
jlvillalSo we keep trying rechecks on this patch to openstack-infra/openstack-zuul-jobs and the POST_FAILURE message seems to jump between jobs between rechecks: https://review.openstack.org/#/c/50888221:18
mordredclarkb: yah - might as well - it isn't going to break WORSE than it is right now21:18
*** ldnunes has quit IRC21:18
*** lnxnut has joined #openstack-infra21:19
jlvillalIs the POST_FAILURE a known issue?21:19
jeblairShrews: those are the only test failures for your change21:19
mordredjlvillal: oh - clarkb and I were looking at that patch and now I forget where we got21:19
clarkbjlvillal: backing up a step I think that is the wrong fix for the problem21:19
Shrewsjeblair: ack21:19
clarkbmordred: I think we need to check that BUILD_TIMEOUT is actually in the build env21:19
jlvillalmordred: Thanks21:19
Shrewsthx21:19
jlvillalclarkb: Okay. We aren't picky on how it is fixed. We would just like it fixed :)21:19
*** chlong_ has quit IRC21:20
*** chlong has quit IRC21:20
clarkbjlvillal: basically BUILD_TIMEOUT should already be there21:20
clarkbjlvillal: do you haev an example log where a job failed becaues it wasn't?21:20
* jlvillal looks21:20
clarkbjlvillal: and yes as far as the POST_FAILURES go we are not throttling zuul executors based on load so they should be better at a whole bunch of things including servicing those ssh connections21:21
jlvillalclarkb: Here is a failed job: https://review.openstack.org/#/c/505837/521:22
mordredjlvillal: the legacy-grenade-dsvm-ironic one?21:22
jlvillalhttp://logs.openstack.org/37/505837/5/check/legacy-grenade-dsvm-ironic/c666daf/job-output.txt.gz#_2017-10-02_10_45_58_17471921:22
jlvillalmordred: yes21:22
clarkbjlvillal: that is a change :) is http://logs.openstack.org/37/505837/5/check/legacy-grenade-dsvm-ironic/c666daf/ ya ok that thanks21:22
smcginnismordred: Should we be OK on releases? Or are there other patches still outstanding?21:23
mordredk. so - http://logs.openstack.org/37/505837/5/check/legacy-grenade-dsvm-ironic/c666daf/zuul-info/inventory.yaml shows zuul.timeout to be 1080021:23
smcginnismordred: Wasn't sure if we should get this through first: https://review.openstack.org/#/c/508997/21:24
mordreddmsimard: you don't magically collect environment variables passed to shell tasks in ara do you?21:25
clarkbmordred: jlvillal declare -x DEVSTACK_GATE_TIMEOUT="110" is in reproduce.sh for that job21:25
mordredsmcginnis: no - that's just cleaning thethings up - to my knowledge all of the patches needed for releasing things to work should be in place21:25
clarkbmordred: jlvillal the buffer is 10, so that would imply a BUILD_TIMEOUT of 120 (not 180 which is what inventory.yaml says)21:26
mordredclarkb: I see BUILD_TIMEOUT being set to 120 in shell output21:27
smcginnismordred: Excellent, thanks.21:27
clarkbmordred: in job-output?21:27
jlvillalclarkb: mordred: So we had this before: https://github.com/openstack-infra/project-config/blob/master/jenkins/jobs/ironic.yaml#L36-L3721:27
mordredhttp://logs.openstack.org/37/505837/5/check/legacy-grenade-dsvm-ironic/c666daf/job-output.txt.gz#_2017-10-02_09_08_09_42110821:27
clarkbjlvillal: those are independent, those apply to specific tests not the entire job21:27
mordredyes - I see those in the current output21:27
clarkbmordred: jlvillal I think those are unrelated21:28
mordredI agree21:28
clarkbyou can see at http://logs.openstack.org/37/505837/5/check/legacy-grenade-dsvm-ironic/c666daf/job-output.txt.gz#_2017-10-02_09_09_27_571148 that the job timeout is set properly at some point21:29
clarkbwell at least as a number and not a raw var name21:30
*** rloo has quit IRC21:30
clarkb(but that number isn't what we expect)21:30
jlvillalOkay, yeah I'm not sure where 104 comes from21:30
clarkbjlvillal: its 120 - 10 = 110, then that is ~6 minutes into the job so 10421:30
jlvillalOh, I think it subtracts how long the job has been running for up to this point21:30
clarkbyup21:30
clarkbok looking at the output again its totally set and timeout is doing what we expect21:31
*** lnxnut has quit IRC21:31
clarkbthe problem is the timeout value is too short21:31
mnaserjeblair just got another "Job base not defined" on https://review.openstack.org/#/c/508994/21:31
clarkbso exporting a raw BUILD_TIMEOUT is likely not going to change any of that21:31
jlvillalThe job starts at 9:03 and then dies at 10:45. That isn't 120 minutes.21:31
jlvillalIf the timeout is in minutes.21:31
mnaserShrews ^ maybe that helps as a hint of a possibly .. problem (I guess)21:31
mordredclarkb: so we should toss in an 'env' call at the top of one of the shell snippets to verify what's getting passed in to the job21:32
Shrewsmnaser: eh?21:32
clarkbjlvillal: correct its shorter21:32
mnaserShrews got a syntax error that job base is not defined ... maybe that might help hint towards memory / event issues21:32
clarkbjlvillal: because we intentionally substract cleanup overhead from that number21:32
Shrewsmnaser: oh, maybe. i'm not tracking that issue atm  :)21:32
mnaserokay :>21:33
clarkbjlvillal: that is however ~104 minutes from when the job reports the job timeout is 104 minutes21:33
clarkbjlvillal: so it is functioning properly in that regard21:33
*** rhallisey has joined #openstack-infra21:33
clarkbjlvillal: the problem here is that job should have a timeout of 180 minutes not 120 minutes21:33
*** jascott1 has quit IRC21:33
clarkbah ok the default BUILD_TIMEOUT is 12021:33
Shrewsjeblair: fyi, going to squash that test simplification change on the branch fix21:33
jlvillalclarkb: okay21:33
*** jascott1 has joined #openstack-infra21:33
*** eharney has quit IRC21:34
clarkbok I see the problem (maybe21:35
*** jascott1 has quit IRC21:35
openstackgerritMonty Taylor proposed openstack-infra/openstack-zuul-jobs master: Print environment for legacy-grenade-dsvm-ironic  https://review.openstack.org/50900721:35
clarkbmordred: timeout is in seconds21:35
openstackgerritMonty Taylor proposed openstack-infra/openstack-zuul-jobs master: Print environment for legacy-grenade-dsvm-ironic  https://review.openstack.org/50900721:35
clarkbmordred: but it should be in miliseconds?21:35
mordredclarkb: the filter plugin does that21:35
mordred        params['BUILD_TIMEOUT'] = str(int(zuul['timeout']) * 1000)21:35
clarkbmordred: ah ok21:35
mordredclarkb, jlvillal: ^^ if we depends-on an ironic change to https://review.openstack.org/509007, we can see what is actually getting put into the environment21:36
clarkbthen ya printing the env is likely best next step21:36
mordredclarkb: honestly - putting in a env in a step like that in the legacy-base pre playbook might not be a terrible idea for while we're debugging things liek that21:37
jlvillalmordred: clarkb: I can spin up a test ironic patch21:37
clarkbmordred: ya21:37
*** claudiub|2 has joined #openstack-infra21:37
*** jcoufal has quit IRC21:37
jlvillalmordred: So just make a dummy patch to Ironic and depend on the patch https://review.openstack.org/#/c/509007/  ?21:37
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul feature/zuulv3: Fix branch matching logic  https://review.openstack.org/50895521:38
*** ijw_ has joined #openstack-infra21:38
*** ijw_ has quit IRC21:38
*** ijw has joined #openstack-infra21:38
openstackgerritMonty Taylor proposed openstack-infra/openstack-zuul-jobs master: Print environment in pre playbook of legacy-base job  https://review.openstack.org/50900921:38
mordredclarkb: there ya go ^^21:39
Shrewsjeblair: there ya go ^^^  all test_v3, test_scheduler, and test_change_matcher tests passing, and pep821:39
mordredjlvillal: yes please21:39
clarkbmordred: does that zuul_legacy_vars thing in the filter module get flattened?21:39
* Shrews squashes mordred's "there ya go" because he can21:39
jlvillalmordred: clarkb: Test patch is at: https://review.openstack.org/#/c/509010/21:39
*** jascott1 has joined #openstack-infra21:40
mordredShrews: lgtm21:41
openstackgerritMichal Jastrzebski (inc0) proposed openstack-infra/project-config master: Remove Kolla and Kolla-Ansible jobs  https://review.openstack.org/50894421:41
inc0can I get re +3 on ^ please?21:41
*** rhallisey has quit IRC21:42
inc0also mordred plz check if I publisher jobs I removed are correct21:42
inc0https://review.openstack.org/#/c/508944/2..3/zuul.d/projects.yaml mordred21:43
*** askb has joined #openstack-infra21:44
inc0well looks good, hopefully I didn't break our release mechanism;)21:44
mordredinc0: yes- that's correct21:44
inc0thanks21:44
mordredinc0: we just landed  https://review.openstack.org/508951 - so it's possible you might have to rebase if git complains - but since you're removing the same thing the base patch is I think it should just work21:44
mtreinishfungi, clarkb, mordred, jeblair: do you know what the syntax error on: https://review.openstack.org/#/c/508981/ is coming from?21:45
mordredinc0: oh - wait - I grok what you're showing me now ...21:45
inc0well I just rebased21:45
mordredinc0: yes -that is the correct resolution of that rebase21:45
inc0cool, thanks21:45
mordredmtreinish: no I do not - and that is fairly disturbing :(21:46
jeblairmtreinish: not yet; mnaser pointed out that bug yesterday.  it's definitely a zuul bug and you should recheck.21:46
* jeblair adds to etherpad21:46
smcginnisSeeing a lot of POST_FAILURES now.21:47
eumel8just wondering, there are no more translation sync jobs running since last Friday: http://status.openstack.org/openstack-health/#/g/build_queue/periodic?groupKey=build_queue&searchJob=translation21:47
*** hichihara has quit IRC21:48
jlkthe translation jobs are currently broken, and we've got new jobs proposed to fix them21:49
jlkbut it'll be an iterative process.21:49
*** jascott1 has quit IRC21:50
*** Swami has quit IRC21:51
*** jascott1 has joined #openstack-infra21:51
eumel8ok, something where I can help?21:51
mordredjeblair: did you wind up getting data from the objdump?21:51
mordreds/objdump/objgraph/21:52
jeblairmordred: still nothing useful21:52
clarkbmordred: in environment: '{{ zuul | zuul_legacy_vars }}' what is zuul | doing?21:53
jlkeumel8: probably not yet. :(  THere is https://review.openstack.org/#/c/502208 and https://review.openstack.org/#/c/502207 as the start, then later I think we'll have to update all the jobs to make use of them.21:53
mordredclarkb: | applies a jinja filter21:53
*** jascott1 has quit IRC21:53
mordredclarkb: so that's saying "apply the filter zuul_legacy_vars to the variable zuul"21:53
eumel8jlk: thx, will take a look21:54
clarkbmordred: gotcha, so that filter is basically returning just the bits we want in the environment from the large zuul dict21:56
jeblairmordred: well, let me rephrase that to 'no smoking gun yet'21:56
jlkclarkb: it takes the large zuul var, runs some python on it to transpose it the way we need it, and re-exposes it.21:56
jlkor a subset of it21:56
*** jascott1 has joined #openstack-infra21:57
mtreinishjeblair: ok, will do22:02
*** jascott1 has quit IRC22:02
*** jascott1 has joined #openstack-infra22:02
*** rcernin has quit IRC22:02
*** rockyg has quit IRC22:05
*** baoli has quit IRC22:05
openstackgerritMonty Taylor proposed openstack-infra/openstack-zuul-jobs master: Add openstack-tox-validate job  https://review.openstack.org/50899622:06
jlkeumel8: is there a particular project that we could use as a test case to see if we can make the jobs work?22:06
openstackgerritMonty Taylor proposed openstack-infra/openstack-zuul-jobs master: Remove legacy static- and releases- jobs  https://review.openstack.org/50899922:06
*** jascott1 has quit IRC22:06
*** bobh_ has quit IRC22:07
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Move shadow layout to item  https://review.openstack.org/50901422:07
*** thiagolib has joined #openstack-infra22:09
mordredinfra-root: I +2'd ^^ but didn't +3 in case anyone else wanted to read/review22:11
ianwsorry, so much scrollback ... should i be debugging why a recheck on 508367 hasn't made it into the queue in ~15 minutes?22:11
smcginnisianw: I saw one that took over an hour.22:12
clarkbianw: its because zuul is very slowly doing dynamic reconfigurations22:12
eumel8jlk: searchlight-ui https://review.openstack.org/#/c/504768/ it's a small one with priority low22:12
clarkbyou'll notice the queues on the static page grow quite large then after time drop22:13
*** Swami has joined #openstack-infra22:13
clarkb(looks like they just dropped \o/ )22:13
jlkah22:13
*** xyang1 has quit IRC22:13
ianwyep and there it is22:13
*** ihrachys has quit IRC22:14
ianwok, i'm just going to try and keep an eye on etherpad etc and try and help if i can ... but i've been out for 3 days so mostly i'll just try to not make things worse :)22:15
jlkhrm, I wonder if we could make this job depend on the other job that is trying to add the translation jobs22:15
*** thorst has quit IRC22:15
jlkor if we just have to land the translation jobs and then iterate on them.22:16
*** thorst has joined #openstack-infra22:18
eumel8I'm not completly trough 502208. there's a lots of stuff in22:19
*** jascott1 has joined #openstack-infra22:20
jlkyeah, took a while to tear the old job apart to build the new ones22:20
*** thorst has quit IRC22:22
*** jascott1 has quit IRC22:22
eumel8but good to see someone is still working on it and the topic didn't got lost :)22:25
SpamapSinteresting.. kind of looks like the executors are still swapping a little, even with the reduced concurrency22:27
*** lnxnut has joined #openstack-infra22:28
eumel8zuul syntax error: Job base not defined. What could be happened after recheck?22:29
jlkmordred: I don't really follow what's going on with the requirements-check job. Looks like it's first getting moved into openstack/requirements, and then moved into openstack-zuul-jobs22:30
jlkor is it different, you're first moving it to openstack-zuul-jobs, and THEN moving it to openstack-requirements?22:30
*** thorst has joined #openstack-infra22:31
clarkblooks like node_failure means a node request failed?22:32
*** ijw has quit IRC22:33
clarkbShrews: http://paste.openstack.org/show/622511/22:34
clarkbis that a known issue?22:34
*** thorst has quit IRC22:35
clarkb804 nodes are in use right now22:35
clarkbwhich is a significant chunk of our quota22:36
clarkbSpamapS: ^ we may not need to scale up the governor at all. I think the reconfigures have been slowing down job assignments more than anything else22:36
*** lbragstad has quit IRC22:36
Shrewsclarkb: first i've seen that.22:37
SpamapSclarkb: Looking at the system graphs, it looks like zuul executors are going to be memory bound more than anything.22:37
*** rlandy is now known as rlandy|brb22:37
*** jascott1 has joined #openstack-infra22:38
Shrewsclarkb: i did notice a few moments ago (before i started dinner) that nodepool had *many* outstanding requests, but also had many fulfilled requests that weren't going away quickly.22:38
SpamapSthe swapping we saw was the main source of load AFAICT. Tons of CPU available still right now, but not much room for more user memory usage when you take buffers and cache into account, which seem to be important given all the network and pipes Zuul does.22:38
*** gouthamr has quit IRC22:38
* Shrews returns to dinner22:38
*** lnxnut has quit IRC22:38
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Add required-projects to logstash-filters and openstackci-beaker jobs  https://review.openstack.org/50897122:40
clarkbup to 896 nodes in use according to nl0222:42
sc`huzzah. chef gates finally passed22:43
jeblairi've started to look into the 'base not defined' error because it's not far from where i've been focusing on memory usage and think that understanding it may help22:44
*** nicolasbock_ has quit IRC22:45
*** tpsilva has quit IRC22:47
eumel8jeblair: you can look into https://review.openstack.org/#/c/508857/222:47
jeblaireumel8: i know it is a zuul bug (not an actual error), and a recheck may clear it22:48
eumel8ok, I triggered the recheck. the last one took up to 75 minutes22:52
*** claudiub|2 has quit IRC22:52
*** rbrndt has quit IRC22:54
clarkbmordred: is pip install -e just completely broken if you don't have a remote in your git config? http://logs.openstack.org/92/489492/14/gate/legacy-tempest-dsvm-cells/ed2d586/job-output.txt.gz#_2017-10-02_22_38_37_663923 /me experiments locally22:56
*** apevec has quit IRC22:56
fungiclarkb: check pip freeze... that's the biggest difference that i could find22:56
fungiit doesn't show a git url, and instead includes a comment line about not finding any remote named origin22:57
openstackgerritMerged openstack-infra/infra-manual master: Update retire instructions for Zuul 3  https://review.openstack.org/50870622:58
mgagnemordred: remember that time I updated a network to be marked as "external" and everyone could see it? It looks like you could update the policy to make it no happen. Like you would need to mark it as external AND shared to be visible. (not just one or the other)23:00
mriedemhmm, looks like py35 jobs are running against newton, where it's not supported23:01
mriedemand grenade jobs are running on newton but there is no mitaka23:01
mriedemtonyb: ^23:01
clarkbmriedem: there is a fix for that in the gate right now https://review.openstack.org/50895523:01
mriedemthanks23:01
*** rlandy|brb is now known as rlandy23:02
*** andreww has quit IRC23:04
clarkbfungi: I see so tox is doing a freeze as part of its logging and hitting that23:05
*** mrunge_ has joined #openstack-infra23:05
*** mrunge has quit IRC23:07
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Add tox jobs including neutron repo  https://review.openstack.org/50882223:08
SpamapSSeems like maybe 30s between load checks may be too long. the executor can gobble up a lot of jobs in 30s.23:08
SpamapSI notice ze01 spiked up to a load of 40 and started swapping again23:08
*** tosky has quit IRC23:09
SpamapSand cache went down to almost nothing23:09
fungiSpamapS: a big part of the problem is that the function we're checking returns a one-minute load average23:10
fungiso it can go from zero to crazy before the one-minute load average plus up to half-minute polling delay notice23:11
funginot to mention, the imposed load may creep up once an accepted job is underway rather than immediately23:11
fungiso lots of fuzz there23:12
*** ijw has joined #openstack-infra23:14
*** hemna__ has quit IRC23:15
*** baoli has joined #openstack-infra23:16
SpamapSfungi: it seems to be holding the line despite its flaws... so I'll just say that this teal bike shed is good enough to keep the bikes safe for now.23:16
fungiSpamapS: yeah i think it's useful but based on observations so far the memory percentage governor may make even more sense23:17
SpamapSfungi: the two in tandem should prevent any major catastrophe.23:17
SpamapS5% is still pretty conservative, the system will likely be healthy at 2%, but this gives us some head room.23:18
SpamapSalso I wondered about making a third check which was basically to just stop accepting swap goes above like 1%.23:19
SpamapSsince the major predictor of massive load and 400x slowdown is when we start copying RAM to the 400x slower disks. ;)23:19
*** gildub has joined #openstack-infra23:20
fungiwell, keying off swap utilization is dicey since it can grow and then not shrink even if regular memory gets freed23:21
*** esberglu has quit IRC23:21
fungiso you might end up deadlocking an executor to indefinite non-concurrency23:22
fungiif the kernel ends up paging out some things it doesn't really need anyway23:22
*** pahuang has joined #openstack-infra23:22
*** ijw has quit IRC23:22
fungimaybe in conjunction with tuning kernel swappiness knobs it would make a little sense23:23
SamYapleso question about the encryption/secrets. Are these secrets only available in the post job?23:24
SamYaplealso, encrypt_secret.py doesnt seem to exist anywhere23:26
fungithey're only usable by jobs defined in the same repo they are, and only when running in pipelines whitelisted for secrets consumption23:26
SamYaple(docs say it should be in zuul repo)23:26
jeblairSamYaple: tools/ dir23:27
fungiSamYaple: check the feature/zuulv3 branch (that hasn't merged back to master yet)23:27
jeblairthat too23:27
SamYaplegot it.thats what i needed23:27
SamYaplewhat pipelines are whitelisted?23:29
SamYapleim assuming post, yes? but the otehrs?23:29
jeblairSamYaple: look for 'post-review' in the pipeline config23:30
*** hongbin has quit IRC23:30
*** armax has joined #openstack-infra23:30
fungiyeah, in openstack-infra/project-config zuul.d/pipelines.yaml23:31
SamYapleawesome sauce23:31
SamYapleim going to get dockerhub pushing going23:31
SamYapleis there any way to retrigger a post job if it fails?23:32
fungiat the moment you have to ask for a zuul admin to help with that, or merge another change23:32
fungiwe have a utility to reenqueue refs in pipelines23:33
SamYapleunderstood. and im assuming access to the logs are also protected fro those jobs and an admin would need to help me with that?23:33
fungithe logs for zuul itself aren't currently exposed, but the job logs are published like normal23:33
SamYapleright i meant for, say, a failed post job23:34
fungiyeah, you can get to those just like you could in zuul v223:34
jeblairso be careful not to have anything log the credential23:34
SamYaplethats what i was getting at :)23:35
fungiright, we _do_ have some tests to try and ensure that zuul itself won't accidentally leak decrypted secrets as part of normal operation, but mistakes writing the job are still a potential foot-cannon23:35
jeblairansible no_log may be helpful here23:36
jeblairbut still not a panacea23:36
*** lnxnut has joined #openstack-infra23:36
fungiso also, as i'm sure you already know, you should make separate credentials for the jobs to use so that you can revoke them easily if accidentally leaked23:37
SamYaplewell in my case i just need ot login to docker, docker saves this info to a file and then references that to push in the future. so i think i can safely lock that down23:37
fungiif the service with which you want to interact allows that23:37
fungiand limit the access to only what the job needs to be able to accomplishg23:37
fungiyeah23:37
SamYapleyea ill be making an infra user and assigning it access to only the repo it needs23:38
fungiall the usual security belts and braces23:38
*** stakeda has joined #openstack-infra23:38
SamYaplei believe i can limit it so it cant delete23:38
mnaserthe puppet-nova job has the build openstack releasenotes job queued for close to 4:30 hours23:39
clarkbmriedem: depending on that fix likely won't help, it needs to merge and zuul needs to restart23:42
SamYapleim having trouble with encrypt_secret.py which repo are the public keys in?23:43
clarkbSamYaple: they aren't in a rpeo, they are served by the zuul server23:43
jeblairSamYaple: 'source' should be 'gerrit' (sorry, we're fixing this soon).  the project should be your own project eg 'openstack/foo'23:43
SamYaplegot it, url is zuul.openstack.org?23:44
jeblairzuulv3.openstack.org23:44
SamYapleok thanks23:44
SamYaple:O  "urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:748)>"23:45
*** lnxnut has quit IRC23:45
SamYapletsk tsk tsk pulling encryption keys over http23:45
clarkbhttps does work, but apparently its still a self signed cert23:46
jeblairyeah, it'll be over https after we finish migrating and have a cert for zuul.o.o23:46
*** aeng has joined #openstack-infra23:46
fungiahh, yeah we have a signed cert for zuul.o.o which zuulv3 can use once we rename it to that23:46
SamYapleclarkb: yea i just have to hack the script to hit it23:46
jeblairSamYaple: this part was intentionally made hard :)23:47
SamYaplecool23:47
SamYaplejeblair: i now see your TODO :)23:47
tonybI have: ailed: Read-only file system (30)\nrsync error: error in file IO (code 11) at main.c(674) [Receiver=3.1.1] from http://logs.openstack.org/a1/a1411cd50c74234692e0fd64c7df365663d37067/post/legacy-static-election-publish/e02d4e2/job-output.txt.gz#_2017-10-01_22_33_45_968313 is that something I can fix? It *looks* to me that it's outside the sphere I can influence23:48
jeblairinfra-root: ^ can someone look at that (could be urgent)23:48
*** Swami has quit IRC23:49
fungilooks like that was a job run from over 24 hours ago?23:49
clarkbit tried to write into the trusted/ dir of the bwrap overlay23:49
clarkbiirc we only make the working dir writeable?23:50
jeblairoh ok.  read-only filesystem caught my attention.23:50
clarkbso likely the job def juts has a bad dest for the copy23:50
fungiyeah, so i guess this is alarming-looking errors from bubblewrap's security model23:50
tonybclarkb: okay, I'll go look for that.23:50
clarkbtonyb: playbooks/legacy/static-election-publish/post.yaml in ozj23:52
clarkblooks like it copies to election/23:52
* clarkb looks for other post copy examples23:52
*** mat128 has joined #openstack-infra23:52
jeblairclarkb, fungi, mordred: ^ is that something that slipped past our first-level defense and hit the second level?  as in, do we need to close a hole in the ansible plugin security barrier?23:53
jeblairSpamapS: ^23:54
clarkbjeblair: possibly? though I think that first line of defence only applies for sync if copying from localhost23:54
jeblairi thought it should catch dest outside of work/ too23:55
clarkbtonyb: dest: "{{ zuul.executor.work_root }}/artifacts/" is what the tarball publishing job uses23:55
clarkbtonyb: so I thin kyou can update the playbook for election publishing to use that work root23:56
tonybclarkb: literally that? or "{{ zuul.executor.work_root }}/election" ?23:56
jeblairi added to etherpad under 'further debugging' section23:56
clarkbtonyb: "{{ zuul.executor.work_root }}/election"23:56
tonybclarkb: Thanks.23:57
SpamapSdoes it maybe call something on the system that infers a path outside work?23:59
* SpamapS hasn't looked closely23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!