Tuesday, 2017-08-08

*** thorst has joined #openstack-infra00:00
fungiianw: i suppose i could work an option for longer-term entries into http://git.openstack.org/cgit/openstack-infra/puppet-exim/tree/templates/aliases.erb but likely we actually need to do something more manageable if this persists much longer (like an actual spam identification system)00:00
fungii've resisted the pressure to do that so far, but things have never been anywhere near this bad until the past few weeks00:01
*** jamesmcarthur has quit IRC00:02
*** thorst has quit IRC00:02
*** slaweq has quit IRC00:02
pabelangerdmsimard: Ah, yes. It was limited to the CR repo for 7.400:03
fungiokay, mysqldump just finished, gerrit restarting now00:05
*** thorst has joined #openstack-infra00:05
fungigerrit webui seems to be working again00:06
fungi#status log Gerrit on review.openstack.org restarted just now, and is no longer using contact store functionality or configuration options00:07
openstackstatusfungi: finished logging00:07
fungii'll get a notice out to the infra ml tomorrow about https://review.openstack.org/49109000:09
fungiother than that, i think the gerrit-contactstore-removal spec is done00:09
*** jamesmcarthur has joined #openstack-infra00:12
*** jkilpatr has quit IRC00:13
*** dingyichen has joined #openstack-infra00:17
*** jamesmcarthur has quit IRC00:17
*** gmann has quit IRC00:18
*** gmann has joined #openstack-infra00:18
*** slaweq has joined #openstack-infra00:19
*** thorst has quit IRC00:23
*** slaweq has quit IRC00:23
*** thorst has joined #openstack-infra00:23
*** harlowja has quit IRC00:25
openstackgerritMerged openstack/diskimage-builder master: Bump fedora/fedora-minimal DIB_RELEASE 26  https://review.openstack.org/48257000:26
*** thorst has quit IRC00:27
*** slaweq has joined #openstack-infra00:29
*** claudiub has quit IRC00:34
*** slaweq has quit IRC00:36
*** armax has quit IRC00:37
*** thorst has joined #openstack-infra00:38
pabelangerianw: clarkb: thanks, elastic-recheck seems to be detecting tripleo failures now00:39
*** slaweq has joined #openstack-infra00:41
*** Apoorva_ has joined #openstack-infra00:42
*** bobh has joined #openstack-infra00:45
*** Apoorva has quit IRC00:45
*** slaweq has quit IRC00:46
*** LindaWang has joined #openstack-infra00:46
*** Apoorva_ has quit IRC00:47
*** armax has joined #openstack-infra00:48
*** liujiong has joined #openstack-infra00:48
*** slaweq has joined #openstack-infra00:51
*** markvoelker has joined #openstack-infra00:55
*** slaweq has quit IRC00:56
ianwclarkb: any thoughts on http://logs.openstack.org/78/480778/2/check/gate-tempest-dsvm-neutron-full-centos-7-nv/8d9e9cc/logs/screen-n-cpu.txt.gz#_Aug_01_14_20_47_80833600:56
ianwunfortunately (?) your name comes up when looking for proxy errors in devstack logs :)00:57
*** markvoelker_ has joined #openstack-infra00:57
ianwit might be a red herring though, maybe it's a real neutron issue that bubbles up to nova like this ...00:57
*** markvoelker has quit IRC01:01
*** slaweq has joined #openstack-infra01:01
*** gouthamr has quit IRC01:02
*** armax has quit IRC01:07
*** slaweq has quit IRC01:07
*** tuanluong has joined #openstack-infra01:10
*** shu-mutou-AWAY is now known as shu-mutou01:10
*** zhurong has joined #openstack-infra01:11
*** pahuang has quit IRC01:18
*** slaweq has joined #openstack-infra01:23
*** thorst has quit IRC01:24
*** slaweq has quit IRC01:28
*** rwsu has quit IRC01:32
ianw23.253.166.156 - - [01/Aug/2017:14:20:39 +0000] "GET /identity/v3/auth/tokens HTTP/1.1" 200 371401:32
ianw23.253.166.156 - - [01/Aug/2017:14:19:47 +0000] "GET /v2.0/auto-allocated-topology/f6806985392e4ece8ac13fb6784131b6 HTTP/1.1" 200 17401:32
ianw23.253.166.156 - - [01/Aug/2017:14:20:39 +0000] "GET /identity/v3/auth/tokens HTTP/1.1" 200 358601:32
ianwnothing good ever happens when times goes backwards01:32
*** slaweq has joined #openstack-infra01:34
*** pahuang has joined #openstack-infra01:35
*** dougwig has quit IRC01:36
*** cuongnv has joined #openstack-infra01:37
*** slaweq has quit IRC01:40
*** rwsu has joined #openstack-infra01:44
*** camunoz has quit IRC01:48
*** pahuang has quit IRC01:54
*** slaweq has joined #openstack-infra01:56
openstackgerritsebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync  https://review.openstack.org/48768302:00
*** bobh has quit IRC02:00
*** slaweq has quit IRC02:00
*** ramishra has quit IRC02:03
*** iyamahat has quit IRC02:06
*** slaweq has joined #openstack-infra02:06
*** yamahata has quit IRC02:07
*** pahuang has joined #openstack-infra02:07
*** jamesmcarthur has joined #openstack-infra02:13
*** slaweq has quit IRC02:13
*** gildub has joined #openstack-infra02:14
*** dhill_ has quit IRC02:15
*** dhill_ has joined #openstack-infra02:15
*** Marx314 has quit IRC02:16
*** mtreinish has quit IRC02:17
*** fbouliane has quit IRC02:17
*** gtmanfred has quit IRC02:17
*** rbergeron has quit IRC02:18
*** lifeless has quit IRC02:18
*** tnarg has quit IRC02:18
*** rodrigods has quit IRC02:19
*** rbergeron has joined #openstack-infra02:19
*** lifeless has joined #openstack-infra02:19
*** mtreinish has joined #openstack-infra02:22
*** gtmanfred has joined #openstack-infra02:23
*** rodrigods has joined #openstack-infra02:23
*** fbouliane has joined #openstack-infra02:23
*** gcb has joined #openstack-infra02:29
*** ramishra has joined #openstack-infra02:34
*** sree has joined #openstack-infra02:34
*** bobh has joined #openstack-infra02:35
*** sree has quit IRC02:39
*** jamesmcarthur has quit IRC02:40
*** slaweq has joined #openstack-infra02:40
*** armax has joined #openstack-infra02:41
*** armax has quit IRC02:44
*** slaweq has quit IRC02:45
*** yamamoto_ has joined #openstack-infra02:46
*** yamamoto has quit IRC02:46
*** jamesmcarthur has joined #openstack-infra02:48
*** hongbin_ has joined #openstack-infra02:49
*** hongbin has quit IRC02:49
*** hongbin_ has quit IRC02:49
*** hongbin has joined #openstack-infra02:49
*** tnovacik has joined #openstack-infra02:50
*** slaweq has joined #openstack-infra02:51
openstackgerritjimmygc proposed openstack/diskimage-builder master: Fix ubuntu minimal build failure  https://review.openstack.org/49165302:55
*** slaweq has quit IRC02:56
*** jamesmcarthur has quit IRC02:56
*** slaweq has joined #openstack-infra03:01
*** slaweq has quit IRC03:05
*** ramineni has joined #openstack-infra03:06
*** ramineni has left #openstack-infra03:07
*** slaweq has joined #openstack-infra03:11
*** spzala has quit IRC03:16
*** david-lyle has quit IRC03:16
*** slaweq has quit IRC03:18
*** tnovacik has quit IRC03:23
*** david-lyle has joined #openstack-infra03:23
*** jascott1_ has quit IRC03:24
*** nicolasbock has joined #openstack-infra03:25
*** jascott1 has joined #openstack-infra03:25
*** jascott1 has quit IRC03:26
*** jascott1 has joined #openstack-infra03:27
*** slaweq has joined #openstack-infra03:33
*** slaweq has quit IRC03:38
*** bobh has quit IRC03:38
*** nicolasbock has quit IRC03:39
*** slaweq has joined #openstack-infra03:43
openstackgerritMerged openstack-infra/zuul-jobs master: Update the zuul-sphinx extension config  https://review.openstack.org/49113403:44
*** baoli has quit IRC03:44
*** Dinesh_Bhor has joined #openstack-infra03:45
*** dave-mccowan has quit IRC03:47
*** slaweq has quit IRC03:49
*** nicolasbock has joined #openstack-infra03:50
*** links has joined #openstack-infra03:53
*** hongbin has quit IRC03:56
*** esberglu has quit IRC03:59
*** EricGonczer_ has joined #openstack-infra04:02
*** ykarel has joined #openstack-infra04:02
*** EricGonczer_ has quit IRC04:20
*** adisky__ has joined #openstack-infra04:21
*** thorst has joined #openstack-infra04:25
*** thorst has quit IRC04:30
*** harlowja has joined #openstack-infra04:35
*** spzala has joined #openstack-infra04:47
*** esberglu has joined #openstack-infra04:49
*** spzala has quit IRC04:51
*** pahuang has quit IRC04:52
*** esberglu has quit IRC04:53
*** jamesmcarthur has joined #openstack-infra04:57
*** sflanigan has quit IRC04:59
*** slaweq has joined #openstack-infra05:00
*** claudiub has joined #openstack-infra05:01
*** hareesh has joined #openstack-infra05:01
*** jamesmcarthur has quit IRC05:02
*** pahuang has joined #openstack-infra05:05
*** slaweq has quit IRC05:05
*** eranrom has quit IRC05:08
*** slaweq has joined #openstack-infra05:10
*** nicolasbock has quit IRC05:11
*** harlowja has quit IRC05:14
*** slaweq has quit IRC05:15
*** waynr has joined #openstack-infra05:19
*** waynr has left #openstack-infra05:20
*** slaweq has joined #openstack-infra05:20
*** slaweq has quit IRC05:27
*** psachin has joined #openstack-infra05:33
*** yamahata has joined #openstack-infra05:41
openstackgerritAkihiro Motoki proposed openstack-infra/project-config master: Add release permission for neutron-vpnaas and dashboard  https://review.openstack.org/49167005:43
*** sree has joined #openstack-infra05:49
*** nicolasbock has joined #openstack-infra05:53
*** cshastri has joined #openstack-infra05:53
*** thorst has joined #openstack-infra05:58
*** markus_z has joined #openstack-infra05:58
*** sflanigan has joined #openstack-infra06:03
*** thorst has quit IRC06:03
*** bhavik1 has joined #openstack-infra06:05
*** pgadiya has joined #openstack-infra06:18
*** rcernin has joined #openstack-infra06:21
openstackgerritTobias Henkel proposed openstack-infra/nodepool feature/zuulv3: Fix detail headers order for nodepool list  https://review.openstack.org/49167806:25
*** coolsvap has joined #openstack-infra06:26
*** kjackal_ has joined #openstack-infra06:28
*** bhavik1 has quit IRC06:53
*** stevebaker has quit IRC06:54
*** slaweq has joined #openstack-infra06:58
*** zhurong has quit IRC06:59
*** pcaruana has joined #openstack-infra07:00
*** markvoelker_ has quit IRC07:01
*** stevebaker has joined #openstack-infra07:02
*** spzala has joined #openstack-infra07:04
*** slaweq has quit IRC07:04
*** jascott1 has quit IRC07:05
*** jascott1 has joined #openstack-infra07:05
*** markvoelker has joined #openstack-infra07:07
*** markvoelker has quit IRC07:08
*** markvoelker has joined #openstack-infra07:08
*** spzala has quit IRC07:09
*** aarefiev has joined #openstack-infra07:10
*** jascott1 has quit IRC07:10
*** gtrxcb has quit IRC07:11
*** florianf has joined #openstack-infra07:15
*** aviau has quit IRC07:19
*** aviau has joined #openstack-infra07:19
*** tesseract has joined #openstack-infra07:21
*** ralonsoh has joined #openstack-infra07:22
*** Swami has quit IRC07:27
*** slaweq has joined #openstack-infra07:30
*** Douhet has quit IRC07:31
*** Douhet has joined #openstack-infra07:32
*** slaweq has quit IRC07:36
*** ccamacho has joined #openstack-infra07:38
*** yamamoto_ has quit IRC07:44
*** sflanigan has quit IRC07:48
*** yamamoto has joined #openstack-infra07:50
*** alexchadin has joined #openstack-infra07:55
*** e0ne has joined #openstack-infra07:56
*** ralonsoh_ has joined #openstack-infra07:57
*** ralonsoh has quit IRC07:57
*** rtjure has quit IRC07:58
*** thorst has joined #openstack-infra07:59
*** ralonsoh_ is now known as ralonsoh08:02
*** arturb has quit IRC08:02
*** thorst has quit IRC08:04
*** shardy has joined #openstack-infra08:06
*** seanhandley has left #openstack-infra08:07
*** gildub has quit IRC08:09
*** priteau has joined #openstack-infra08:09
*** mwarad has joined #openstack-infra08:15
*** _mwarad_ has joined #openstack-infra08:15
*** _mwarad_ has quit IRC08:15
*** derekh has joined #openstack-infra08:20
*** dizquierdo has joined #openstack-infra08:20
*** slaweq has joined #openstack-infra08:25
*** dingyichen has quit IRC08:25
*** lucas-afk is now known as lucasagomes08:26
openstackgerritMerged openstack-infra/project-config master: Make neutron functional job non-voting  https://review.openstack.org/49154808:27
*** esberglu has joined #openstack-infra08:28
bauzasmmm, can't we now provide HTTP links in a gerrit comment ?08:29
*** slaweq has quit IRC08:29
*** esberglu has quit IRC08:32
*** slaweq has joined #openstack-infra08:35
*** slaweq has quit IRC08:41
dimakhey08:41
strigaziianw yt?08:41
dimakI have an error with Babel from openstack mirror08:42
dimakhttp://logs.openstack.org/00/489000/2/gate/gate-dragonflow-python35/44a33cb/console.html#_2017-08-08_07_25_10_20007308:42
dimakAnyone noticed this?08:42
*** electrofelix has joined #openstack-infra08:46
*** rtjure has joined #openstack-infra08:46
*** yamamoto has quit IRC08:49
openstackgerritSpyros Trigazis (strigazi) proposed openstack-infra/system-config master: [magnum] Cache fedorapeople.org  https://review.openstack.org/49146608:49
openstackgerritSpyros Trigazis (strigazi) proposed openstack-infra/system-config master: [magnum] Cache fedorapeople.org  https://review.openstack.org/49146608:50
*** mwarad has quit IRC08:59
openstackgerritSpyros Trigazis (strigazi) proposed openstack-infra/project-config master: [magnum] Cache fedorapeople.org  https://review.openstack.org/49172408:59
*** alexchadin has quit IRC09:03
*** spzala has joined #openstack-infra09:05
*** ykarel is now known as ykarel|lunch09:08
*** stakeda has quit IRC09:09
*** spzala has quit IRC09:10
*** alexchadin has joined #openstack-infra09:15
*** nicolasbock has quit IRC09:15
*** yamamoto has joined #openstack-infra09:15
*** slaweq has joined #openstack-infra09:19
*** sambetts|afk is now known as sambetts09:20
*** yamamoto has quit IRC09:22
*** slaweq has quit IRC09:25
*** pgadiya has quit IRC09:26
*** tosky has joined #openstack-infra09:29
*** pgadiya has joined #openstack-infra09:29
*** slaweq has joined #openstack-infra09:29
ianwstrigazi: for a bit09:33
strigaziianw https://review.openstack.org/#/q/topic:cache-fedorapeople-magnum09:33
*** slaweq has quit IRC09:34
ianwstrigazi: ok cool, get pabelanger to take a look too but LGTM09:35
strigaziianw he is in canada?09:38
ianwusually :)09:39
*** slaweq has joined #openstack-infra09:39
strigaziianw yes he is, you in AU afaik and me in Switzerland, very convenient setup :)09:40
*** shardy has quit IRC09:43
*** nicolasbock has joined #openstack-infra09:43
*** slaweq has quit IRC09:46
*** kornicameister has quit IRC09:47
*** cuongnv has quit IRC09:52
*** yamamoto has joined #openstack-infra09:54
*** shu-mutou is now known as shu-mutou-AWAY09:54
*** shardy has joined #openstack-infra09:56
*** jamesmcarthur has joined #openstack-infra09:57
*** alexchadin has quit IRC09:59
*** thorst has joined #openstack-infra10:00
*** kornicameister has joined #openstack-infra10:00
*** slaweq has joined #openstack-infra10:02
*** jamesmcarthur has quit IRC10:02
*** thorst has quit IRC10:05
*** yamamoto has quit IRC10:07
*** pgadiya has quit IRC10:07
*** slaweq has quit IRC10:08
*** liujiong has quit IRC10:09
*** dtantsur|afk is now known as dtantsur10:09
*** yamamoto has joined #openstack-infra10:12
*** slaweq has joined #openstack-infra10:12
*** igormarnat has quit IRC10:16
*** ruhe has quit IRC10:16
*** tnarg has joined #openstack-infra10:17
*** markvoelker has quit IRC10:17
*** yamamoto has quit IRC10:17
*** igormarnat has joined #openstack-infra10:17
*** Odd_Bloke has quit IRC10:17
*** abelur has quit IRC10:17
*** esberglu has joined #openstack-infra10:18
*** yamamoto has joined #openstack-infra10:18
*** ruhe has joined #openstack-infra10:18
*** abelur has joined #openstack-infra10:18
*** hareesh has quit IRC10:18
*** odyssey4me has quit IRC10:18
*** abelur_ has quit IRC10:18
*** Odd_Bloke has joined #openstack-infra10:19
*** slaweq has quit IRC10:19
*** hareesh has joined #openstack-infra10:19
*** pgadiya has joined #openstack-infra10:19
*** odyssey4me has joined #openstack-infra10:19
*** yamamoto has quit IRC10:20
*** yamamoto has joined #openstack-infra10:20
*** esberglu has quit IRC10:21
*** zhurong has joined #openstack-infra10:24
*** AJaeger is now known as AJaeger_10:26
*** pgadiya has quit IRC10:28
*** tojuvone has joined #openstack-infra10:34
*** tojuvone has left #openstack-infra10:35
*** katkapilatova has joined #openstack-infra10:36
openstackgerritMerged openstack-infra/project-config master: Make grenade-linuxbridge-multinode job experimental  https://review.openstack.org/49099310:38
openstackgerritMark Korondi proposed openstack-infra/project-config master: Bringing upstream training virtual environment over here  https://review.openstack.org/49020210:40
openstackgerritMerged openstack-infra/project-config master: [Kuryr] Turn python3 job to voting  https://review.openstack.org/49162710:40
*** ykarel|lunch is now known as ykarel10:41
*** pgadiya has joined #openstack-infra10:41
*** thorst has joined #openstack-infra10:42
openstackgerritMerged openstack-infra/project-config master: [Fuxi] Turn python3 job to voting  https://review.openstack.org/49162810:44
openstackgerritMerged openstack-infra/project-config master: [Zun] Make python3 dsvm job as voting  https://review.openstack.org/49162310:44
openstackgerritMark Korondi proposed openstack-infra/project-config master: Bringing upstream training virtual environment over here  https://review.openstack.org/49020210:45
*** igormarnat has quit IRC10:48
*** igormarnat has joined #openstack-infra10:48
openstackgerritMerged openstack-infra/project-config master: [Zun] Move multinode job to experimental  https://review.openstack.org/49162410:50
openstackgerritMerged openstack-infra/project-config master: Reduce yum-config-manager output  https://review.openstack.org/49107610:50
openstackgerritMerged openstack-infra/project-config master: Upgrade the ARA fedora jobs to fedora 26  https://review.openstack.org/49163310:51
*** thorst has quit IRC10:54
*** thorst has joined #openstack-infra10:54
*** jkilpatr has joined #openstack-infra10:58
*** yamamoto has quit IRC10:58
*** lrossetti_ has joined #openstack-infra10:58
*** thorst has quit IRC10:59
*** lrossetti has quit IRC10:59
*** slaweq has joined #openstack-infra10:59
*** yamamoto has joined #openstack-infra10:59
*** sdague has joined #openstack-infra10:59
*** yamamoto has quit IRC11:05
*** slaweq_ has joined #openstack-infra11:07
*** spzala has joined #openstack-infra11:07
*** jascott1 has joined #openstack-infra11:07
*** yamamoto has joined #openstack-infra11:07
*** yamamoto has quit IRC11:08
*** yamamoto has joined #openstack-infra11:10
*** sree has quit IRC11:10
*** jascott1 has quit IRC11:12
*** spzala has quit IRC11:12
*** slaweq_ has quit IRC11:12
*** yamamoto has quit IRC11:13
*** yamamoto has joined #openstack-infra11:15
*** huanxie has quit IRC11:15
*** yamamoto has quit IRC11:16
*** yamamoto has joined #openstack-infra11:16
*** slaweq_ has joined #openstack-infra11:17
*** alexchadin has joined #openstack-infra11:19
*** slaweq_ has quit IRC11:23
*** gildub has joined #openstack-infra11:24
*** EricGonczer_ has joined #openstack-infra11:33
*** gordc has joined #openstack-infra11:37
*** EricGonczer_ has quit IRC11:38
*** EricGonczer_ has joined #openstack-infra11:39
*** dave-mccowan has joined #openstack-infra11:40
*** ldnunes has joined #openstack-infra11:41
*** lucasagomes is now known as lucas-hungry11:46
*** abelur_ has joined #openstack-infra11:50
*** thorst has joined #openstack-infra11:51
*** slaweq_ has joined #openstack-infra11:51
*** slaweq_ has quit IRC11:56
*** psachin has quit IRC11:58
*** jrist has joined #openstack-infra11:58
openstackgerritTobias Rydberg proposed openstack-infra/irc-meetings master: Changed to correct chairs for the publiccloud_wg  https://review.openstack.org/49176912:00
*** psachin has joined #openstack-infra12:00
pabelangerlooks like we are hitting quota issues in citycloud-lon112:01
pabelangerOpenStackCloudHTTPError: (403) Client Error for url: https://lon1.citycloud.com:8774/v2/bed89257500340af8d0fbe7141b1bfd6/servers Quota exceeded for cores, instances: Requested 8, 1, but already used 400, 50 of 400, 50 cores, instances12:01
pabelangeralso, that error message is super confusing12:01
*** slaweq_ has joined #openstack-infra12:01
*** jpena|off is now known as jpena12:03
*** esberglu has joined #openstack-infra12:04
*** trown|outtypewww is now known as trown12:05
*** rlandy has joined #openstack-infra12:06
*** slaweq_ has quit IRC12:06
*** tuanluong has quit IRC12:07
*** hareesh has quit IRC12:08
*** esberglu has quit IRC12:09
*** slaweq_ has joined #openstack-infra12:12
*** slaweq_ has quit IRC12:16
*** yamamoto has quit IRC12:18
pabelangerclarkb: any idea why we'd see this warning http://logs.openstack.org/49/491749/1/check/gate-tripleo-ci-centos-7-undercloud-oooq/5edaa28/console.html#_2017-08-08_11_03_02_67640012:19
pabelangerclarkb: I mean, I know why it is there but how should I go about fixing it12:20
*** yamamoto has joined #openstack-infra12:20
*** yamamoto has quit IRC12:20
*** slaweq_ has joined #openstack-infra12:22
mnaserhttps://review.openstack.org/#/c/491466/ can someone give this a bit of love by any chance12:23
mnasermost magnum jobs are timing out due to this12:23
mnaserso hopefully if we can get some caching in, it'll become significantly less12:23
pabelangermnaser: strigazi: which images are specifically needed?12:24
mnaserpabelanger right now the one that keeps timing out in master https://fedorapeople.org/groups/magnum/fedora-atomic-latest.qcow212:24
mnaserit downloads at ~30Kb/s so it just times out12:25
mnaserhttp://logs.openstack.org/11/488511/4/check/gate-functional-dsvm-magnum-api-ubuntu-xenial/25369a5/logs/devstacklog.txt.gz < warning, big log file, but you can see it there12:25
pabelangermnaser: right, what is the difference between that and atomic images shipped by fedora?12:25
*** jcoufal has joined #openstack-infra12:25
pabelangermnaser: for example, http://mirror.regionone.infracloud-vanilla.openstack.org/fedora/releases/26/CloudImages/x86_64/images/12:26
*** Goneri has joined #openstack-infra12:26
mnaserpabelanger good question, i'll defer to strigazi for that.  however, as a deployer, I use the atomic images shipped by fedora and they work12:27
mnaserhowever we're testing/running against fedora 25 right now12:27
mnaserand i dont see that in the mirrors for some reason12:27
mnaserhttp://mirror.regionone.infracloud-vanilla.openstack.org/fedora/releases/25/CloudImages/x86_64/images/12:27
*** slaweq_ has quit IRC12:28
*** rwsu has quit IRC12:28
*** rwsu has joined #openstack-infra12:29
mnaserhttp://mirror.math.princeton.edu/pub/alt/atomic/stable/12:29
mnaserokay, thats a specific mirror but that seems to be where they are stored, /pub/alt/atomic/ .. dont think we already cache that?12:29
pabelangerya, looking. Fedoar-26 seems to ship them now12:30
pabelangertrying to see where fedora-25 is12:30
*** zhurong has quit IRC12:30
*** ralonsoh has quit IRC12:31
*** ralonsoh has joined #openstack-infra12:32
strigazipabelanger we need fedora-atomic-latest which is a symlink to Fedora-Atomic-25-20170719.qcow2 fedora-kubernetes-ironic-latest.tar.gz -> fedora-25-kubernetes-ironic-20170620.tar.gz and ubuntu-mesos-latest.qcow2 -> ubuntu-14.04.3-mesos-0.25.0.qcow212:32
strigazipabelanger mnaser the images are stock images12:33
pabelangerright, so lets see if we can just mirror them directly from source12:34
pabelangerATM, fedora-26 atomic we get for free12:34
strigazipabelanger we use fedora eople to use a symlink when we update the image and not add commits to our repo12:34
*** jaypipes has joined #openstack-infra12:35
*** sbezverk has joined #openstack-infra12:35
mnaserimho its probably cleaner to show which upstream ones we're using exactly, making it easy for potential users to know the exact image that is being used12:35
strigazipabelanger but we can always commit there if it makes our life easier and we gain performance12:35
pabelangerseems like a lot of pressure on fedorapeople.org12:35
strigazipabelanger if we get f26 for free we can change to the official repo.12:36
mnaserstrigazi https://github.com/openstack-infra/system-config/blob/master/modules/openstack_project/files/mirror/fedora-mirror-update.sh we can edit this to get the fedora 25 images12:36
mnaserhttp://mirrors.kernel.org/fedora-alt/atomic/stable/12:36
pabelangerstrigazi: right, I mean, if you want to test fedora-26, we already mirror that to AFS12:37
mnaserstrigazi fyi f26 come with docker 1.13.1 and i had to push up a few things to make it work so just to keep in mind (mainly k8s 1.6.7 and a patch to set default policy for iptables forward to accept)12:37
pabelangerotherwise, we should be able to add mirror for https://dl.fedoraproject.org/pub/alt/atomic/stable/12:37
strigazipabelanger sounds good but for stable branches we are slower in updates we still need f25 until we update.12:37
openstackgerritAlexander Chadin proposed openstack-infra/project-config master: Remove gate job from watcherclient  https://review.openstack.org/49178412:38
*** kgiusti has joined #openstack-infra12:38
pabelangerstrigazi: why isn't Fedora-Atomic-25-20170719.qcow2 listed at https://dl.fedoraproject.org/pub/alt/atomic/stable/ ?12:39
mnaserpabelanger based on my simple math it seem to be around ~4GB per fedora atomic release so mirroring should use ~36gb12:39
strigazipabelanger deleted?12:39
*** yamamoto has joined #openstack-infra12:40
pabelangerstrigazi: would one of the listed images work for you? How do you decided when you need to replace Fedora-Atomic-25-20170719.qcow212:41
strigazipabelanger I'll give it a go with f26 and if it works we can see what to with our ubuntu image and stable branches.12:41
robcresswello/ Just setting up 3rd party CI, is it expected that the noop-check-communication job doesnt receieve params like LOG_PATH? Seems to be able to find the log server, but isn't populating that build param.12:42
pabelangerstrigazi: sure, lets see if that works, if so, then you get the mirror for free. Looking at other images now12:42
*** rhallisey has joined #openstack-infra12:42
pabelangerrobcresswell: see http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul/openstack_functions.py how we set it up today for zuulv2.512:43
pabelangerrobcresswell: you'll need to create a job like: http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul/layout.yaml#n1119 to call the function12:43
*** slaweq_ has joined #openstack-infra12:44
*** dprince has joined #openstack-infra12:45
robcresswellthanks pabelanger. Little out of my depth atm. That's really helpful.12:45
*** lucas-hungry is now known as lucasagomes12:45
pabelangernp12:46
*** jpena is now known as jpena|mtg12:46
*** slaweq_ has quit IRC12:49
*** Goneri has quit IRC12:50
*** mandre_away is now known as mandre_mtg12:51
*** pradk has joined #openstack-infra12:54
*** abelur_ has quit IRC12:54
*** slaweq_ has joined #openstack-infra12:54
*** felipemonteiro_ has joined #openstack-infra12:55
*** coolsvap has quit IRC12:56
*** jpena|mtg is now known as jpena|off12:56
*** felipemonteiro__ has joined #openstack-infra12:57
*** esberglu has joined #openstack-infra12:58
*** jamesmcarthur has joined #openstack-infra12:58
*** slaweq_ has quit IRC13:00
*** felipemonteiro_ has quit IRC13:01
mnaserpabelanger whats the decision making process when deciding if something will be mirrored or cached?13:01
mnaserhttp://mirror.regionone.infracloud-vanilla.openstack.org/fedora/releases/26/CloudImages/x86_64/images/ -- the image there is from 2017-07-05.. there's been a few newer images since (such as one released on the 23rd of july)13:02
mnaserso i suspect we're not going to get access to fresh images :(13:02
pabelangermnaser: if we can usually rsync, we mirror.  However, if the contents chance too fast (like rdo), then we reverse proxy cache13:02
*** jrist has quit IRC13:02
*** clayton has quit IRC13:03
mnaserpabelanger i would guess then images would be something that we can consider more on the side of stable content13:04
mnaserand it would involve a small change here only https://github.com/openstack-infra/system-config/blob/master/modules/openstack_project/files/mirror/fedora-mirror-update.sh13:04
*** clayton has joined #openstack-infra13:05
mnaseri can propose a small change and what ill do is ill exclude all the older releases so we only have f25 atomic latest + f26 atomic latest and then new releases moving forward13:05
mnaserit'll save a bunch of disk space on images we likely wont use13:05
*** gildub has quit IRC13:06
*** Julien-zte has joined #openstack-infra13:06
*** pradk has quit IRC13:06
fungii'm curious why the content is so stale. we run rsync from the official copy ~daily?13:07
*** sbezverk has quit IRC13:07
mnaserfungi i dont think its the content thats stale, i think the atomic team doesnt publish images there officially13:08
mnaserthey probably release in /pub/alt/atomic and there might have been some old reason why that ended up there (for fedora 25, it doesnt even exist)13:08
*** spzala has joined #openstack-infra13:08
*** rlandy has quit IRC13:08
openstackgerritGael Chamoulaud proposed openstack-infra/tripleo-ci master: Enable tripleo-validations tests  https://review.openstack.org/48108013:09
pabelangerYa, I don't think ISO content (or any content) changes in release directory13:09
pabelangerwe'd likely need to mirror: https://dl.fedoraproject.org/pub/alt/atomic/stable/13:09
fungigot it13:10
funginow i'm less confused ;)13:10
*** links has quit IRC13:11
*** markvoelker has joined #openstack-infra13:12
*** pgadiya has quit IRC13:13
numanspabelanger, hi, can you please add this to your review queue - https://review.openstack.org/#/c/490622/13:13
*** LindaWang has quit IRC13:13
*** dizquierdo is now known as dizquierdo_afk13:14
*** slaweq_ has joined #openstack-infra13:16
strigazipabelanger mnaser Will someone push a chnage to mirror https://dl.fedoraproject.org/pub/alt/atomic/stable/ ?13:17
*** mpranjic has joined #openstack-infra13:17
mnaserstrigazi working on it!13:17
mnaseri'm making sure we dont mirror useless stuff like isos etc13:17
*** Liuqing has joined #openstack-infra13:18
strigazimnaser cool13:18
*** bobh has joined #openstack-infra13:18
mpranjichello! I have issues with login to wiki.openstack.org with openID.13:19
strigazimnaser they don't have ISOs i think13:19
mpranjicI get the error:13:19
mpranjic OpenID error13:19
mpranjicAn error occurred: an invalid token was found.13:19
mnaserstrigazi http://mirrors.kernel.org/fedora-alt/atomic/stable/Fedora-Atomic-26-20170723.0/Atomic/x86_64/iso/13:19
mpranjiccan someone help me out with that?13:19
mnaserand there is stuff like libvirt boxes and blabla, i'll get it addressed shortly13:19
mpranjicmy Ubuntu One username is: mpranjic13:19
strigazimnaser we only need /CloudImages, not /Atomic13:20
mnaseryep thats why im doing all the excludes in the rsync mirorring13:20
mnaserso we get all the .qcow2s pretty much13:20
*** ldnunes has quit IRC13:20
strigaziand raw I guess13:21
*** slaweq_ has quit IRC13:21
*** sshnaidm|afk is now known as sshnaidm13:21
openstackgerritMohammed Naser proposed openstack-infra/system-config master: Add Fedora Atomic mirrors  https://review.openstack.org/49180013:22
mnaserpabelanger fungi ^ i also added output in the comments of a dry run so it should work :)13:23
*** baoli has joined #openstack-infra13:23
*** xyang1 has joined #openstack-infra13:24
*** slaweq_ has joined #openstack-infra13:26
*** sree has joined #openstack-infra13:27
openstackgerritMohammed Naser proposed openstack-infra/project-config master: Add NODEPOOL_ATOMIC_MIRROR to configure_mirror.sh  https://review.openstack.org/49180113:28
*** LindaWang has joined #openstack-infra13:28
*** jamesmcarthur has quit IRC13:28
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Add mapping file containing v2 to v3 mappings  https://review.openstack.org/49180413:29
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion  https://review.openstack.org/49180513:30
*** slaweq_ has quit IRC13:33
*** ldnunes has joined #openstack-infra13:33
*** cshastri has quit IRC13:33
slaweqmordred: hello13:34
slaweqmordred: can You take a look at https://review.openstack.org/#/c/491266/13:34
mordredslaweq: yes!13:34
slaweqmordred: I think that it's enough to do it like I did but please check if maybe yamamoto is right13:35
slaweqmordred: thx in advance :)13:35
mordredoh - sorry - I had this reviewed in my browswer but didn't actually click submit ...13:35
*** cshastri has joined #openstack-infra13:36
mordredslaweq: review left - but basically we need to copy the ENABLE_IDENTIY_V2 pattern for now (this will be better in a couple of weeks)13:37
*** alexchadin has quit IRC13:39
ssbarneaWhat could I do to make the release of JJB 2.0 happen before the apocalypse? https://storyboard.openstack.org/#!/story/200074513:39
*** alexchadin has joined #openstack-infra13:40
*** alexchadin has quit IRC13:40
slaweqmordred: thx13:40
*** alexchadin has joined #openstack-infra13:40
mordredssbarnea: hi! so - I think we'd like to hold off until we've migrated openstack to zuul v3 which is planned for september 1113:40
fungimordred: well, _we_ pin the version we're using13:41
*** alexchadin has quit IRC13:41
mordredoh.13:41
mordredwell13:41
mordredignore me13:41
fungiso i don't expect they need to hold off releasing13:41
fungiwe haven't asked them not to13:41
*** alexchadin has joined #openstack-infra13:41
*** bh526r has joined #openstack-infra13:41
*** alexchadin has quit IRC13:41
ssbarneathe fact that 2.0 is in pre-release for so long does hurt it a lot as I cannot 'persuade' others to use the pre-release in production.13:42
*** wznoinsk_ is now known as wznoinsk13:42
*** alexchadin has joined #openstack-infra13:42
fungissbarnea: have you asked in #openstack-jjb? the devs/reviewers on that repo have been mostly autonomous for a while, the infra team only provides a bit of oversight13:42
sshnaidmclarkb, ping13:42
fungiwe stopped exerting much control over it when we ceased using jenkins (roughly a year ago)13:43
*** ldnunes_ has joined #openstack-infra13:43
ssbarneafungi: thanks for the hint. I didn't know about that channel, joined and going to cross post now.13:44
*** ldnunes has quit IRC13:44
odyssey4mehi all, I'd like to understand more about how we can cache an image onto the nodepool nodes13:44
*** camunoz has joined #openstack-infra13:45
*** jtomasek has joined #openstack-infra13:46
*** alexchadin has quit IRC13:46
*** hongbin has joined #openstack-infra13:47
*** felipemonteiro__ has quit IRC13:48
*** slaweq_ has joined #openstack-infra13:48
*** ociuhandu has joined #openstack-infra13:50
*** slaweq_ has quit IRC13:53
*** ociuhandu has quit IRC13:56
robcresswello/ Sorry, back with more questions; nodepool list seems to be "stuck" with a list of instances in the delete state, but the provider has already deleted them. Is there a way to nudge nodepool to figure that out?13:57
*** EricGonczer_ has joined #openstack-infra13:57
*** Liuqing has quit IRC13:58
*** slaweq_ has joined #openstack-infra13:59
*** dizquierdo_afk is now known as dizquierdo13:59
*** gouthamr has joined #openstack-infra14:00
*** EricGonc_ has joined #openstack-infra14:01
*** xinliang has quit IRC14:01
dimakAJaeger_, yolanda there are a lot of queued jenkins jobs, any chance there are more node-pool issues?14:02
fungiodyssey4me: we cache the _small_ images and similar files devstack declares it wants by running its image_list.sh utility script from this element: https://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool/elements/cache-devstack/extra-data.d/55-cache-devstack-repos#n10714:02
*** EricGonczer_ has quit IRC14:02
fungiodyssey4me: obviously baking too many or too large images onto the filesystems of our worker images makes them unwieldy, so we do try to keep it to a minimum and infrequently-used/larger images can instead be grabbed through our afs-backed mirrors or our caching reverse proxies14:04
odyssey4mefungi it's probably a bit big to cache, and putting into the afs mirror or reverse proxying might work fine14:04
*** marst has joined #openstack-infra14:05
fungifor example, kolla publishes their images onto tarballs.o.o and then we have a reverse proxy they pull them through in each provider/region14:05
fungiand their largest images are over 4gib14:05
*** slaweq_ has quit IRC14:06
fungithe ones we cache onto our image filesystems are more things like cirros which if memory serves is in the tens of mib14:06
mtreinishfungi: it's 13MB14:09
*** mriedem has joined #openstack-infra14:09
*** slaweq has quit IRC14:10
fungicool, i was within margin of error/order of magnitude anyway ;)14:10
*** sree has quit IRC14:10
*** sree has joined #openstack-infra14:11
mtreinishwell at least on x86_64, maybe other arches are bigger :)14:11
odyssey4mefungi oh no, let me check on the size - but it's less than 300MB IIRC14:13
*** xinliang has joined #openstack-infra14:13
*** rbrndt has joined #openstack-infra14:14
odyssey4mefungi ah it seems it's around ~90MB per platform14:14
fungiwe carve out 100gib for the afs cache and another separate 100gib for the apache reverse proxy cache now, so should have plenty of room to cache things local to workers either way, but files in the neighborhood of 100mib is probably pushing the bounds of what we'd want to cache unless a substantial percentage of all jobs we're running will use it14:15
*** jtomasek has quit IRC14:15
odyssey4mefungi so it'd be preferred as a file on AFS, rather than a reverse proxy?14:16
fungithat mostly depends on how often the file is expected to change14:17
odyssey4mefungi we'd be happy to refresh it daily, or even weekly14:17
openstackgerritClaudiu Belu proposed openstack-infra/project-config master: cloudbase-init: Adds releasenotes jobs  https://review.openstack.org/49182114:17
fungiis this something you're producing and reconsuming, or something you're consuming which is published outside our ci system by some other community (and how often, roughly)?14:18
odyssey4mefungi it's the base lxc cache which is published once every 6 hours IIRC onto images.lxccontainers.org14:18
odyssey4mesorry - images.linuxcontainers.org14:19
fungithat seems like a better fit for the reverse proxy cache, yeah14:19
odyssey4meyeah, that would be alot easier for us to consume I think, because we'd still be able to use the API instead of creating a code path to use a file path or custom URL14:19
fungiwell, it'll still be a custom url because it's not a transparent proxy14:20
odyssey4meyep, but we have a code path for that already14:20
fungioh, awesome14:20
odyssey4meso, how do I add a new reverse proxy?14:20
fungitwo places need updating:14:20
fungihttp://git.openstack.org/cgit/openstack-infra/system-config/tree/modules/openstack_project/templates/mirror.vhost.erb14:21
fungihttps://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool/scripts/configure_mirror.sh14:21
*** Guest13936 is now known as med_14:21
*** med_ has quit IRC14:21
*** med_ has joined #openstack-infra14:21
*** med_ is now known as medberry14:21
fungiit should be pretty clear from surrounding context what needs to be added, but if you have questions then ask away14:22
odyssey4methanks fungi - I'll take a closer look shortly and ping any further questions, thanks so much for your expertise and assistance14:23
*** jpena|off is now known as jpena14:25
fungiodyssey4me: just glad i could help14:25
hongbinhi, i want to know if there is a way to dump non-devstack systemd logs (i.e. docker logs) to the gate, i tried to do this: https://review.openstack.org/#/c/480306/1/contrib/post_test_hook.sh , but it looks the logs are not there if there is a timeout killed14:26
*** admcleod_ is now known as admcleod14:26
openstackgerritMatthew Treinish proposed openstack-infra/subunit2sql master: Switch to using stestr  https://review.openstack.org/49107414:27
fungidimak: i think we're just backlogged. the osic environment was finally turned off last week, we've got a couple of citycloud regions offline for different issues, and our voucher for ovh expired so we're waiting for that to get re-upped14:28
openstackgerritMatthew Treinish proposed openstack-infra/subunit2sql master: Update python3 versions in tox.ini envlist  https://review.openstack.org/49182714:29
fungithe post pipeline's only about 4 hours behind, so the situation's not terrible (yet anyway)14:29
openstackgerritMatthew Treinish proposed openstack-infra/subunit2sql master: Switch to using stestr  https://review.openstack.org/49107414:34
openstackgerritMatthew Treinish proposed openstack-infra/subunit2sql master: Update python3 versions in tox.ini envlist  https://review.openstack.org/49182714:34
*** felipemonteiro_ has joined #openstack-infra14:37
*** armax has joined #openstack-infra14:37
*** jpena is now known as jpena|off14:38
*** florianf has quit IRC14:40
*** alexchadin has joined #openstack-infra14:42
*** medberry is now known as med_14:43
*** LindaWang has quit IRC14:43
*** slaweq has joined #openstack-infra14:43
*** dtantsur is now known as dtantsur|brb14:44
*** katkapilatova has left #openstack-infra14:45
*** alexchadin has quit IRC14:47
*** gyee has joined #openstack-infra14:48
openstackgerritMerged openstack/os-client-config master: Update globals safely  https://review.openstack.org/49161814:48
*** slaweq has quit IRC14:48
*** links has joined #openstack-infra14:48
*** links has quit IRC14:49
*** cshastri has quit IRC14:50
openstackgerritsebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync  https://review.openstack.org/48768314:53
*** florianf has joined #openstack-infra14:53
*** slaweq has joined #openstack-infra14:53
*** annegentle has joined #openstack-infra14:56
*** xarses_ has joined #openstack-infra14:57
clarkbsshnaidm: hi14:58
*** EricGonc_ has quit IRC14:58
*** slaweq has quit IRC14:59
sshnaidmclarkb, do we have any problem with logs now? EmilienM told me you have some issues14:59
clarkbodyssey4me: fungi not that if the lxc images arent served with ttls they will be cached for roughly 24 hours. which is 4x their update cycle15:00
clarkbsshnaidm: yes there are still ~27 copies on /etc in every job15:00
sshnaidmclarkb, which job? do you have an url?15:00
*** dmsimard is now known as dmsimard|afk15:00
odyssey4meclarkb our issue for testing is not really getting the latest image, but getting one at all15:00
*** EricGonczer_ has joined #openstack-infra15:01
odyssey4mebetween the dns failures, and slow download speeds, we're not getting them reliably done and getting job timeouts/failures15:01
odyssey4meso we're hoping just to get something more reliable in place15:01
*** psachin has quit IRC15:01
clarkbodyssey4me: sure just noting that that may be a drawback15:02
clarkbsshnaidm: gate-tripleo-ci-centos-7-undercloud-containers is the one I looked at but assuming the others are that way too15:02
odyssey4meclarkb appreciate the heads up - for us it won't be an issue15:02
fungiclarkb: odyssey4me: if it becomes an issue, convincing the hosts of that image to start employing a cache ttl header is probably not an entirely wasted effort either15:03
*** slaweq has joined #openstack-infra15:04
*** gyee has quit IRC15:04
*** pradk has joined #openstack-infra15:05
*** jrist has joined #openstack-infra15:05
*** pradk has quit IRC15:07
*** mattmceuen has joined #openstack-infra15:07
sshnaidmclarkb, fyi https://bugs.launchpad.net/tripleo/+bug/170933915:07
openstackLaunchpad bug 1709339 in tripleo "CI: duplicate /etc directories in logs for containers" [Critical,Triaged]15:07
clarkbsshnaidm: http://logs.openstack.org/95/480395/5/check/gate-tripleo-ci-centos-7-undercloud-containers/1a14f5d/logs/var/log/extra/docker/containers/neutron_ovs_agent/etc/ http://logs.openstack.org/95/480395/5/check/gate-tripleo-ci-centos-7-undercloud-containers/1a14f5d/logs/var/log/extra/docker/containers/mysql/etc/15:08
clarkbhttp://logs.openstack.org/95/480395/5/check/gate-tripleo-ci-centos-7-undercloud-containers/1a14f5d/logs/var/log/extra/docker/containers/mistral_executor/etc/ and so on15:08
sshnaidmclarkb, I see, it's described in the bug I submitted right now15:09
*** jascott1 has joined #openstack-infra15:09
clarkbsshnaidm: its not just for the containers btw http://logs.openstack.org/95/480395/5/check/gate-tripleo-ci-centos-7-undercloud-containers/1a14f5d/logs/etc/15:09
*** slaweq has quit IRC15:10
sshnaidmclarkb, this directory is from main subnode, it's not duplicated15:11
openstackgerritsebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync  https://review.openstack.org/48768315:13
clarkbsshnaidm: but it is, because its teh same stuff from the containers15:13
clarkbsshnaidm: and its got redundant info we should never be collecting like DIR_COLORS15:14
clarkbsshnaidm: we need to stop collecting all of that15:14
clarkbmake 1 copy of the necessary data and thats it15:14
sshnaidmclarkb, yeah, but problem is collecting /etc in containers, not this one15:14
*** jascott1 has quit IRC15:14
clarkbits both...15:14
sshnaidmclarkb, we need 1 /etc directory anyway15:14
clarkbyes but we don't need all the extra crap in it15:14
sshnaidmclarkb, if it takes 1KB, I'm not sure it worth the effort to overcomplicate the code15:15
clarkbsshnaidm: but it is15:16
clarkbbeacuse we've already had problems where overcollecting results in grabbing potentially massive content you don't want15:16
clarkbthis is why we keep asking you to only copy what you want15:16
clarkbrather than copying everything then reducing from everything15:16
openstackgerritsebastian marcet proposed openstack-infra/openstackid-resources master: Fixed Typo on Summit Service  https://review.openstack.org/49183615:17
sshnaidmclarkb, we excluded everything you told to exclude previous time: https://github.com/openstack-infra/tripleo-ci/blob/master/toci-quickstart/config/collect-logs.yml#L35-L6415:17
clarkbsshnaidm: right but we've also asked you to invert the way you collect logs and only collect what you want15:17
clarkbsshnaidm: so first step was stop collecting absolutely everything but moving forward we should be collecting what we want/need explicitly15:18
openstackgerritMerged openstack-infra/openstackid-resources master: Fixed Typo on Summit Service  https://review.openstack.org/49183615:18
clarkbbut also you are still collecting all of /etc multiple times including things like dir colors and bashcompletion which we've asked you to stop for weeks now15:18
sshnaidmclarkb, it's too much projects in tripleo together so not always is possible to maintain the relevant up to date list, please consider fact it's not one project like you have usually, but a lot of them15:19
clarkbyes taht is the same situation we are in for devstack-gate and it hasn't been a problem15:19
sshnaidmclarkb, I'm not sure it's the same situation15:20
clarkbsshnaidm: and I'd like you to consider that you have effectively ddos'd our filesystem multiple times15:20
fungisshnaidm: why not put together a list of files you've needed to look at when troubleshooting job failures for those in the past? certainly you haven't looked at every copy of every file in /etc?15:20
clarkbone single job uses 10% of all our disk15:20
clarkbmore than the next two jobs combined15:20
sshnaidmclarkb, we talk now about files of size a few KBs, it's not what kills fs15:20
clarkbsshnaidm: no the collect everything attitude is what kills the fs15:20
clarkbsshnaidm: because when centos 7.4 happens and some new thing sneaks in we break again15:21
clarkband then against for 7.5 and then 8 and so on15:21
clarkbif instead you collcet what you need this risk is greatly reduced15:21
fungiit's better to realize you're not collecting a file you need and then make a change to start including it than to collect files you won't ever need15:21
sshnaidmclarkb, not sure I understand how centos is related to collecting /etc files15:21
clarkbsshnaidm: because the contents of /etc will change as centos changes over time15:22
fungisshnaidm: because each new update or release of the distro can move files around in /etc or add new ones15:22
sshnaidmclarkb, yeah, but how would it break anything?15:22
*** Julien-zte has quit IRC15:22
clarkbsshnaidm: if a large file shows up all of a sudden we fill the disk again just like we already did with the java stuff15:22
fungiwhen rh decides it should include some new large set of files in /etc you start collecting that automatically and fill up our logserver again15:22
*** sbezverk has joined #openstack-infra15:23
fungilet me put this another way... if we stopped hosting logs for tripleo jobs, we could provide the community with several months of log retention instead of just one15:23
fungido 2/3 of our community benefit from being able to look at logs for tripleo job failures?15:23
sshnaidmfungi, yes, because most of projects use one dir for logs and one /etc folder, because the have only one process15:24
*** iyamahat has joined #openstack-infra15:24
fungisshnaidm: that's not an answer to my question15:24
sshnaidmfungi, they would15:24
*** sbezverk_ has joined #openstack-infra15:24
sshnaidmfungi, because we test their projects too15:24
fungiwhat percentage of our community do you thnik look at logs for tripleo jobs? i doubt it's even close to the proportion of space you're using on the logs site15:25
sshnaidmfungi, every project which is part of tripleo will benefit from our jobs15:25
*** Julien-zte has joined #openstack-infra15:25
sshnaidmfungi, we have jobs running in about 10 other projects, part of them are voting, part of them are not, and part is experimental15:25
*** Julien-zte has quit IRC15:26
*** slaweq has joined #openstack-infra15:26
*** marst_ has joined #openstack-infra15:26
sshnaidmfungi, and we work to have there much more voting and relevant jobs to prevent failures and to help with integration15:26
*** marst has quit IRC15:27
*** iyamahat has quit IRC15:27
sshnaidmfungi, it's neutron, nova, ironic, etc, etc, and tripleo jobs for some of them are only way to be tested in "real life"15:27
sshnaidmso yes, I think we do something useful for all community15:27
openstackgerritsebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync  https://review.openstack.org/48768315:27
*** sbezverk has quit IRC15:28
clarkbsshnaidm: but we do that in other jobs as well without copying unnecessary data redundantly in every job is the point15:28
clarkbsshnaidm: you can do both things, they do not conflict with each other15:28
*** vhosakot has joined #openstack-infra15:28
sshnaidmclarkb, I handled the problem with all these /etc, it's a bug and will be solved, but I'm against whitelist of logs15:30
*** dtantsur|brb is now known as dtantsur15:30
*** slaweq has quit IRC15:30
sshnaidmclarkb, from all my investigations in last years it's really hard to determine which log will give your info, it could anyone15:30
clarkbsshnaidm: there is no reason to collect bash compleetion or dir colors and so on15:30
clarkbwhat is the argument that you need those?15:31
*** lrossetti_ has quit IRC15:31
sshnaidmclarkb, right, we don't need this, I can add them to exclude list right now15:31
*** lrossetti has joined #openstack-infra15:31
sshnaidmclarkb, to maintain configs list for tens of services is much more complicated and reason for breakages and failures15:32
clarkbbut it isn't...15:32
clarkbwe have done it successfully for years15:32
*** lrossetti has quit IRC15:33
fungidevstack-gate specifically has done it for years15:33
*** camunoz has quit IRC15:34
sshnaidmclarkb, fungi ok, I will raise this question on next tripleo meeting, please come and let's discuss there, I hope we'll find something suitable for everybody15:34
sshnaidmclarkb, fungi does it work for you?15:34
clarkbtuesday at 1400UTC is a bit early for me but I can try15:34
clarkband I think fungi is traveling that day15:34
clarkbsshnaidm: I can do my best to get up early15:36
*** slaweq has joined #openstack-infra15:36
*** rhallisey has quit IRC15:37
jeblairwhat's the current disk space used per build?15:37
clarkbseems to be between ~75MB and 100MB based on the job15:37
clarkb(so it is a massive improvement over where we were, but it would be nice to make it robsut so that we don't have to worry as much about it exploding in the future)15:38
*** ccamacho has quit IRC15:38
*** ccamacho has joined #openstack-infra15:38
jeblairk15:38
fungipart of it is also the number of tripleo jobs and number of times they're run multiplied by their average size15:38
*** tosky has quit IRC15:40
*** tosky has joined #openstack-infra15:40
clarkbbasically we know that collecting everything is problematic because you end up with what you don't expect (there are multiple cases of this) so to avoid it in the future I personally would like to see a more whitelist approach to collecting logs than a grab everything + blacklist15:41
fungidoing a quick analysis of the sample data clarkb collected, jobs with "tripleo" in the name account for 33% of the data we're storing right now, so trying to figure out how to get that reduced15:41
clarkbwith a single job (the one linked to above) being ~10% of the total15:42
clarkbwhcih is more than the next two jobs combined15:42
*** armax has quit IRC15:42
*** iyamahat has joined #openstack-infra15:42
*** slaweq has quit IRC15:42
*** armax has joined #openstack-infra15:43
fungiat least it's down from previously, where some 70% of the data we were storing were tripleo job logs15:43
*** pstack has joined #openstack-infra15:43
*** dougwig has joined #openstack-infra15:44
*** e0ne has quit IRC15:45
sshnaidmclarkb, fungi I added the item, if you can please join, if not - I'll present your point: https://etherpad.openstack.org/p/tripleo-meeting-items15:46
*** markus_z has quit IRC15:46
fungithanks sshnaidm!15:46
fungiand clarkb is correct, i'll be driving a car during that next meeting15:47
*** camunoz has joined #openstack-infra15:47
fungiotherwise i would gladly attend15:47
*** jamesdenton has quit IRC15:49
*** jamesdenton has joined #openstack-infra15:50
*** slaweq_ has joined #openstack-infra15:52
*** krtaylor has quit IRC15:54
*** yamamoto has quit IRC15:54
*** hamzy has quit IRC15:55
*** yamamoto has joined #openstack-infra15:58
*** felipemonteiro__ has joined #openstack-infra15:58
mwhahahaguestion, so i seem to have puppet-tripleo-puppet-unit logs in my puppet-mistral-puppet-lint results: http://logs.openstack.org/52/491352/1/gate/gate-puppet-mistral-puppet-lint/9b68f25/console.html#_2016-12-16_09_24_02_24095016:00
mwhahahaalso they are results from 201616:01
mwhahahaany thoughts on how that happened?16:02
*** felipemonteiro_ has quit IRC16:02
clarkbthe timestamp on the file itself is from today the 8th of august, possible that a node booted with bad clock resulting in the 2016 problem16:02
pabelangerya, that is odd16:03
*** yamamoto has quit IRC16:03
mwhahahahttp://logs.openstack.org/52/491352/1/gate/gate-puppet-mistral-puppet-lint/9b68f25/console.html#_2016-12-16_09_29_54_87389616:03
mwhahahai like the success/failure/failure16:04
clarkband the other builds there have the correct content so it isn't a consistent problem16:04
mwhahahawonder if there's a node that never fully cleared or something16:04
*** jamesmcarthur has joined #openstack-infra16:04
mwhahahabecause it looks like it's got a fail from back in march as well16:04
clarkbthinking about how console logs work could it be a uuid collision (that seems very unlikely but we are only using the short version of the uuid in the path at least16:05
pabelangerya16:05
clarkbjeblair: ^16:05
pabelangerit also doesn't explain why http://logs.openstack.org/52/491352/1/gate/gate-puppet-mistral-puppet-lint/9b68f25/_zuul_ansible/scripts/07-4094c726a11441b9b73ac0c6dde28be6.sh was actually called16:05
pabelangerbecause console log is different16:06
clarkbpabelanger: ya tahts why I'm wondering if its a collision on the uuids and we ended up copying so old file left around on the launcher maybe16:06
pabelangermaybe16:06
clarkbexcept we clear those out too don't we? they don't have a life on the launcher beyond the job?16:06
pabelangerlet me look at zl0216:07
pabelangersee if anything is odd16:07
*** Apoorva has joined #openstack-infra16:07
clarkbmwhahaha: I highly doubt that the test node itself managed to survive for 8 months. Nodepool is pretty good about keeping things cleaned up after its timeouts so 8 months would be a long time to survive16:08
pabelangerclarkb: look at the node ID16:08
mwhahaha¯\_(ツ)_/¯ stranger things have happened :D16:08
pabelangercentos-7-infracloud-chocolate-623113816:08
pabelangerthat is way wrong16:08
pabelangercentos-7-infracloud-vanilla-1032497516:09
pabelangerwhat is should have been16:09
pabelangerso, I wonder if we some how booted an old VM again in infracloiud16:09
clarkbpabelanger: ya that and the appended statuses makes me think there may be a collision somewhere and we end up picking up the old data16:09
clarkboh it could be cached on the remote somehow hrm?16:10
*** armax has quit IRC16:10
*** armax has joined #openstack-infra16:11
openstackgerritGabriele Cerami proposed openstack-infra/tripleo-ci master: WIP: containers periodic test  https://review.openstack.org/47574716:11
openstackgerritSlawek Kaplonski proposed openstack-infra/project-config master: Add QoS service plugin to be enabled in shade tests  https://review.openstack.org/49126616:12
clarkbin any case I think it likely is running the correct job but when the console log is collected we are grabbing the old file somehow16:12
clarkbmwhahaha: ^16:12
mwhahahawell it failed after passing the check so who knows16:12
mwhahahai rechecked and we'll see16:12
pabelangerclarkb: mwhahaha: so, job timed out for some reason. And zuul killed ansible, collected logs16:12
pabelangerso, possible something was wrong with node16:13
pabelangerSSH hostkeys didn't change, do it was the right node16:13
clarkbwe don't check hostkeys though16:13
pabelangerwhich makes me think it was just collsion with logs16:13
pabelangerclarkb: ya, we set them up16:13
clarkbin 2.5?16:13
clarkbpretty sure we don't16:14
pabelangerya, 1 sec16:14
clarkbbasedon the sequence number of the node that node definitely does look like it would've been booted in december though. So I think we are just getting logs from december somehow16:14
pabelangerclarkb: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/launcher/ansiblelaunchserver.py#n139916:15
*** EricGonczer_ has quit IRC16:15
clarkbpabelanger: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/launcher/ansiblelaunchserver.py#n128316:16
fungii wonder if it could have been on a down hypervisor until very recently, and we ran into an ip address collision rather than a uuid collision (which seems far less likely)16:16
clarkbso we don't have any idea what the hostkey should be we just grab whatever we get and trust it16:16
clarkbso we aren't really checking it in a way to know we got the right node16:16
pabelangerclarkb: Right, we keyscan to make sure node doesn't disappear between playbook runs16:16
pabelangerbut you are right, we blindly assume16:16
clarkbfungi: that is an interesting thoery16:17
clarkbfungi: basically arp wins on old hosts and we pick up preexisting console log there?16:17
fungithough still surprising that nodepool's cleanup wouldn't have dealt with it given the way we tag instances16:17
pabelangerclarkb: I wonder if we should try to match node via hostname too? Nodepool sets it centos-7-infracloud-vanilla-10324975, we could then have ansible task validate correct hostname16:18
fungiit should be deleting old nodes it finds in the server list even if it has lost track of them in its db16:18
clarkbfungi: ya but that runs on a 15 minute cron iirc so there is a window where your theory could happen16:18
clarkbsmall window but possible16:18
pabelangeror some other form of meta data in config-drive16:18
clarkbpabelanger: the idea would be for nova/neutron to provide the hostkey to us then we check that16:18
clarkbpabelanger: that work is slowly in progress aiui but nothing we can control today unfortunately16:19
pabelangerclarkb: ya16:19
fungithere is also the possibility to have glean echo the hostkey to the kconsole on boot and then get nodepool to scrape it from the nova console log, but not all our providers support the necessary api method16:20
*** ccamacho has left #openstack-infra16:20
*** ykarel has quit IRC16:22
clarkbwe should be able to do a nova list and see if any nodes with ancient sequence numbers show up /me does this16:22
*** krtaylor has joined #openstack-infra16:23
*** lucasagomes is now known as lucas-afk16:24
clarkbubuntu-xenial-infracloud-vanilla-889531316:25
clarkbubuntu-xenial-infracloud-chocolate-891163216:25
clarkbthose may be held nodes?16:25
clarkbeverything else looks fairly new16:26
fungiwhat are the possibilities that there could be a lost instance which isn't tracked in nova's db, and so is squatting an ip address but never getting cleaned up since it doesn't appear in the server list?16:27
clarkbnope they've been in a delete state for 81 and 78 days16:27
*** ggillies_ has quit IRC16:27
clarkbfungi: I'm guessing it is theoretically possible16:27
*** pcaruana has quit IRC16:27
clarkbbut don't know enough about nova to know under what circumstances that could happen if any16:27
fungihere in openstack, anything is theoretically possible!16:27
clarkbwe could run a virsh list --all and compare16:28
*** ggillies has joined #openstack-infra16:30
*** dizquierdo has quit IRC16:31
*** rcernin has quit IRC16:31
*** dizquierdo has joined #openstack-infra16:31
*** slagle has quit IRC16:31
*** tesseract has quit IRC16:31
pabelangerfungi: any feedback from OVH and our collections?16:32
fungipabelanger: i hadn't seen anything back from jean-daniel yet as of a few minutes ago16:33
*** kjackal_ has quit IRC16:33
fungilooking back through the discussion history from the last time this happened, our most recent voucher expired in january and we began to get notifications at that time16:34
fungibetween them being in french and going to infra-root@ address which nobody was monitoring regularly until i started keeping an eye on it a couple months ago, i didn't realize these were how we're supposed to know it's time to re-up the voucher16:34
pabelangerack16:35
fungiso once we get this squared away, the _next_ time we start getting messages in french from ovh to infra-root@ we should reach out to jean-daniel at that point to ask to have the voucher re-upped16:36
*** markvoelker has quit IRC16:36
fungibut it would be great if more people than just me set up their imap clients to keep an eye on the various mailboxes under that account too16:36
clarkbscanning libvirt for rogue instances is not as easy as it sounds (or I'm missing the virsh command that tells you what the nova uuid is)16:38
*** LindaWang has joined #openstack-infra16:39
clarkbnova doesn't set instance descriptions16:40
openstackgerritMerged openstack-infra/irc-meetings master: Changed to correct chairs for the publiccloud_wg  https://review.openstack.org/49176916:40
fungii smell a ranty summit talk in the works16:40
clarkbaha! virsh list --uuid16:41
*** bh526r has quit IRC16:41
*** slaweq_ has quit IRC16:43
*** dmsimard|afk is now known as dmsimard16:43
*** LindaWang has quit IRC16:43
*** alexchadin has joined #openstack-infra16:44
*** pstack has quit IRC16:44
*** slaweq has joined #openstack-infra16:45
*** shardy has quit IRC16:46
*** voipmonk has left #openstack-infra16:47
*** annegentle has quit IRC16:49
*** alexchadin has quit IRC16:49
openstackgerritMatthew Treinish proposed openstack/os-testr master: Switch to stestr under the covers  https://review.openstack.org/48844116:50
*** rhallisey has joined #openstack-infra16:50
*** Apoorva_ has joined #openstack-infra16:50
*** derekh has quit IRC16:52
*** rhallisey has quit IRC16:52
*** yamahata has quit IRC16:52
*** rhallisey has joined #openstack-infra16:52
*** iyamahat has quit IRC16:53
*** Apoorva has quit IRC16:53
*** pstack has joined #openstack-infra16:53
*** ralonsoh has quit IRC16:53
*** trown is now known as trown|lunch16:54
clarkbfungi: pabelanger http://paste.openstack.org/show/617807/ that is what we have on reachable hypervisors16:58
clarkbnow to cross check against the nodepool logs to see if any don't belong16:58
*** camunoz has quit IRC16:58
fungii have a feeling i'm going to be disappointed but still unsurprised by the result16:58
*** camunoz has joined #openstack-infra16:59
*** baoli has quit IRC17:00
openstackgerritChris Dent proposed openstack-infra/project-config master: Publish placement-api-ref  https://review.openstack.org/49186017:00
*** slaweq has quit IRC17:02
clarkbfungi: pabelanger http://paste.openstack.org/show/617809/ a few of them don't show in today's logs. Will cross check those against nova listings next17:03
clarkband I guess nodepool listings as they may be older than today17:03
*** tosky has quit IRC17:04
*** rwsu has quit IRC17:04
clarkbhttp://paste.openstack.org/show/617811/ is that cleaned up a bit17:06
clarkbonly one of thoseshows up in nodepool listings17:08
clarkbnow we check nova17:08
*** baoli has joined #openstack-infra17:09
*** iyamahat has joined #openstack-infra17:10
fungifunny, ovh says our instance with ip address 158.69.77.16 was reported conducting a brute force attack against someone's ssh server at 00:38:10 CEST today17:13
*** annegentle has joined #openstack-infra17:13
fungii can't find evidence that nodepool's booted any instance with that ip address in the past ~10 days of launcher debug logs17:15
fungiand it's not an ip address for anything in our ansible inventory17:16
clarkband I can't ssh to it implying it never was one of ours17:16
clarkbor rather if it still was around it wasn't ours17:16
clarkbfungi: pabelanger http://paste.openstack.org/show/617815/ all of those VMs appear to be leaked. Actually now that I say that I didn't find whcih one is our mirror node so need to clean it out of the list17:17
*** dtantsur is now known as dtantsur|afk17:17
clarkbhttp://paste.openstack.org/show/617816/ doesn't include the mirror17:18
*** sree has quit IRC17:18
clarkbcan you maybe double check that list and make sure I'm not missing some other VMs? but I think next step is dumpxml on them to see if we can get any more info about why they exist then possible virsh destroy them17:19
clarkband virsh undefine them17:19
*** sree has joined #openstack-infra17:19
clarkbfungi: I also think ^ lends weight to your IP addr theory17:19
*** dizquierdo has quit IRC17:19
clarkbthat first node says <nova:creationTime>2016-12-15 16:34:33</nova:creationTime>17:20
* fungi sighs17:20
*** bobh has quit IRC17:21
clarkbwhich is suspiciously close to the log timestamp that mwhahaha pointed out17:21
clarkbalso it is a nodepool host according to dumpxml17:21
*** rbrndt has quit IRC17:21
fungiyeah, i have a sinking feeling something happened to the environment around that time and whatever was done to recover from it caused us to lose track of those17:21
fungigiven the close clustering of timestamps17:21
*** baoli has quit IRC17:22
fungicould also explain why we've been getting a little less performance out of it than we thought we should for the number of instances we were booting i support (though with them being idle, probably not)17:22
*** baoli has joined #openstack-infra17:22
*** sree has quit IRC17:23
clarkbI'll gather what info I can for each one but ya I think we just delete them if nova diesnt know about them17:23
clarkb(so please helo me double check that aspect of it)17:23
*** hamzy has joined #openstack-infra17:24
fungii agree, it's more a warning that we should have some way of spotting leaks in those clouds17:24
*** electrofelix has quit IRC17:24
fungii don't see much reason to keep them, though i also don't know as much about what else might be a vm in that environment17:24
*** kjackal_ has joined #openstack-infra17:24
fungidoes the bifrost deployment create virtual machines on the hypervisor nodes outside nova's control?17:25
fungiseems unlikely, but i'm not too familiar with its architecture17:25
fungialso, i guess if we log into each of them and they all look like test nodes, then deletesky17:26
fungiany way to easily tease hostnames out of them?17:26
clarkbI'm pretty sure bifrost doesn't17:28
clarkbfungi: not sure, but we can in theory attach to their consoles17:28
*** jamesmcarthur has quit IRC17:28
*** sambetts is now known as sambetts|afk17:29
*** spzala has quit IRC17:29
clarkbof the 4 VMs I have dumpxml'd 3 are from 12/15 and one is from 12/1417:31
clarkband they all use flavor nodepool17:31
clarkbfungi: I'm not able to connect to the console, get error: internal error: character device console0 is not using a PTY so that may not be possible17:33
clarkboh that is because nova redirects it to a file which we can read17:33
clarkbthe one on 009 is ubuntu-xenial-infracloud-chocolate-620515717:33
clarkbnow to cross check all these against mwhahaha's log17:33
mwhahahadid i find some long lost vms? :D17:34
*** yamahata has joined #openstack-infra17:34
fungimwhahaha: you taught us that apparently nova leaks like a sieve ;)17:34
mwhahaha:o17:35
funginot really like a sieve. looks like we had some issue back in mid-december that caused us to lose track of some dozen or so instances in that cloud17:35
clarkbugh the one on 12 appears to be a failed boot and its just appending to its log file constantly17:35
mwhahahasounds like we need to invest in some flex tape17:35
*** pstack has quit IRC17:35
fungimwhahaha: if it works like in the infomercials, i'll pick up a few cases17:36
*** florianf has quit IRC17:36
clarkbit is 4GB large17:36
*** 94KAA7YW9 has joined #openstack-infra17:37
*** sbezverk_ has quit IRC17:37
*** markvoelker has joined #openstack-infra17:37
clarkbcentos-7-infracloud-chocolate-6200254 on 01117:37
clarkbubuntu-xenial-infracloud-chocolate-6200221 on 01317:38
pabelangerclarkb: wow, nice work17:38
clarkbubuntu-xenial-infracloud-chocolate-6193047 on 02817:39
clarkbubuntu-xenial-infracloud-chocolate-6192475 on 02617:40
*** sbezverk has joined #openstack-infra17:42
clarkbthe node on 024 isn't running17:42
clarkbubuntu-xenial-infracloud-chocolate-6198347 on 03617:43
*** markvoelker has quit IRC17:44
*** slagle has joined #openstack-infra17:45
*** alexchadin has joined #openstack-infra17:45
* clarkb stops posting all of them here (I think this is enough info to show the leaks are from nodepool and such)17:45
*** annegentle has quit IRC17:45
fungiagreed, no need to keep any of those in my opinion17:46
*** pradk has joined #openstack-infra17:48
clarkbI don't see the node that ran mwhahaha's job though. So possibly that is the one that is appending to its console log such that I can't really see what it is. Going to try and grep through that log now17:49
*** alexchadin has quit IRC17:49
clarkbsdague: dansmith in the case of nova "leaking" libvirt VMs. Is it safe to virsh destroy and undefine the nodes under nova? Or are there bits of the database we should check again?17:51
*** Swami has joined #openstack-infra17:52
*** pradk has quit IRC17:52
*** bobh has joined #openstack-infra17:52
clarkbI am going to start by virsh shutdowning the instances so they stop trying to do things17:52
clarkbor maybe even that isn't safe it if gets nova out of sync somewhere?17:53
*** florianf has joined #openstack-infra17:53
dansmithclarkb: nova is leaking libvirt vms? how?17:54
fungiseems to me like nova's already out of sync?17:54
clarkbdansmith: we don't know how, but there are VMs from December in infracloud that don't show up in nova listings17:54
mnaserclarkb that behaviour can happen is things get messy in the cloud17:54
clarkbdansmith: they are all from december 14 and 15 so guessing something went sideways17:54
mnaserdo you get warnings in the nova compute log with VM and database count not matching?17:54
clarkbmnaser: let me see17:55
dansmithclarkb: you can tell nova to reap things it doesn't know about, but I also don't know that I've ever seen that happen17:55
fungidansmith: best guess is that cloud suffered some sort of trauma months ago and didn't record these in the db or was unable to fully destroy them when it deleted them. also keep in mind this is still old code (mitaka-based i believe?)17:55
*** tnovacik has joined #openstack-infra17:55
clarkbmnaser: 2017-08-08 17:52:56.780 12787 WARNING nova.compute.manager [req-b15f50fc-47af-4f7e-971a-40d3e232d89e - - - - -] While synchronizing instance power states, found 0 instances in the database and 1 instances on the hypervisor. yup17:55
mnaserthere ya go17:55
mnaserthose warnings will be your big hint17:55
mnaserand as dansmith said there is a setting to delete them but i'm too scared to run that so manual cleanup would be easier17:56
clarkbmnaser: given that indicates the db doesn't know about the instance I should be safe to just destroy it manually ya?17:56
fungigood that we have something to look for in the future17:56
mnaseri mean if you want to verify17:56
mnaservirsh dumpxml <foo> the ID of the VM17:56
mnaserwill be the instance uuid17:56
mnaseryou can cross check that with the nova instances table17:56
mnaserit should be marked as deleted, if it is, you can virsh destroy <foo> and delete remains in /var/lib/nova/instances (that's what i do most of the time and nothing blew up)17:57
*** jrist has quit IRC17:57
clarkbmnaser: does virsh undefine not clean up /var/lib/nova/instances?17:57
dansmithclarkb: not images17:58
clarkbah ok17:58
clarkbI will start with a shutdown of the instances across the board so they stop running at least then we can go through and clean up17:58
*** pushkaraj__ has joined #openstack-infra17:58
*** 94KAA7YW9 has quit IRC17:58
*** pvaneck has joined #openstack-infra17:59
*** spzala has joined #openstack-infra17:59
*** spzala has quit IRC18:01
*** rbrndt has joined #openstack-infra18:01
*** spzala has joined #openstack-infra18:01
*** makowals has quit IRC18:06
clarkbmnaser: I shouldn't need to modify the nova db at all right? just clean up hypervisor disk contents?18:07
mnaserclarkb correct, if the API is returning "instance does not exist" it means for all the nova knows, that VM is supposed to be terminated18:08
clarkbperfect, thanks18:09
clarkbI'm almost done shutting down/destroying the instances so they stop running. Does anyone else want to look at infracloud and see if we have the same problem there?18:11
clarkbfungi: pabelanger ^18:11
*** makowals has joined #openstack-infra18:13
mnaserclarkb https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L6639-L671418:13
clarkbfungi: pabelanger first step was running `sudo virsh list --all --uuid` against all the reachable hypervisors. Then take that list and remove any nodes that show up in todays nodepool log or in nodepool list data. Then remove our mirror node then cross check against nova listing18:14
*** slaweq has joined #openstack-infra18:14
pabelangerclarkb: not at the moment, chasing down zuul change queue questions18:14
mnaserlooks like that code was added 4-6 years ago18:14
mnaserand the option to look for is running_deleted_instance_action18:15
mnaserso you can set that to log or shutdown (or reap if you want)18:15
*** kjackal_ has quit IRC18:15
fungiclarkb: i can probably take a look after the infra meeting18:15
clarkbmnaser: oh if shutdown is an option we probably want to set it to that. Thank you18:15
*** EricGonczer_ has joined #openstack-infra18:16
mnaserthe odd thing is it seems to default to reap18:16
clarkb(though I'm finding our images don't shutdown they have to be destroyed... guessing -minimal image builds don't have the acpi bits to handle a graceful shutdown request)18:16
mnaserso it should be deleting them.. somehow18:16
mnaserbut i guess its not18:16
* mnaser shrugs18:16
clarkbya double checking our config we don't seem to set any value18:17
mnaseri dont know enough to know why its not getting reap'd but i know we dont have it set to anything (i think) and we get orphan instances sometimes18:17
*** jamesmcarthur has joined #openstack-infra18:18
clarkbok I've got all of them in a non running state18:19
clarkbin chocolate18:19
clarkbmnaser: so you are saying clear out the content in /var/lib/nova/instances and virsh undefine the domains?18:20
pabelangerjeblair: question on changequeue merging: http://logs.openstack.org/33/491633/1/gate/gate-project-config-layout/3bc9763/console.html#_2017-08-08_10_50_23_258797 The reason networking-bagpipe is merged into other tripleojob is because it shares a job ate-tempest-dsvm-networking-bgpvpn-bagpipe-ubuntu-xenia with networking-bgpvpn, which it shares a job with tripleo-ci?18:20
mnaserclarkb yep, and just to be clear the contents of that specific instance id, not all of /var/lib/nova/instances :p18:20
clarkbya18:21
clarkb/var/lib/nova/instances/$uuid18:21
mnaseryeah that should be okay18:22
clarkbok compute064 is done18:22
clarkbI'm going to tail the nova compute log there and just doulbe check we don't get that warning again or any new errors before going to other hypervisors18:23
*** jamesmcarthur has quit IRC18:23
clarkbmnaser: dansmith thanks for the help18:24
*** jamesmcarthur has joined #openstack-infra18:24
mnasernp18:24
*** trown|lunch is now known as trown18:30
*** jamesmcarthur has quit IRC18:31
*** jamesmcarthur has joined #openstack-infra18:32
*** nicolasbock has quit IRC18:32
clarkblogs look good on 064 and we have successfully booted new instances in that cloud. Going to move forward finishing up cleanup for the rest of these VMs18:34
*** dprince has quit IRC18:36
openstackgerritMathieu Gagné proposed openstack-infra/project-config master: Bump internap-mtl01 capacity to 190  https://review.openstack.org/49188218:37
*** florianf has quit IRC18:39
*** jascott1 has joined #openstack-infra18:39
pabelangermgagne: +218:40
pabelangerand danke!18:40
mgagne=)18:40
*** markvoelker has joined #openstack-infra18:40
fungithanks clarkb and mnaser!18:41
fungibig thanks mgagne!18:42
*** alexchadin has joined #openstack-infra18:46
*** markvoelker has quit IRC18:48
*** EricGonczer_ has quit IRC18:48
*** Apoorva_ has quit IRC18:49
*** Apoorva has joined #openstack-infra18:50
*** alexchadin has quit IRC18:50
*** EricGonczer_ has joined #openstack-infra18:51
clarkbok I think chocolate is all cleaned up assuming my list of leaked VMs was complete18:53
openstackgerritwes hayutin proposed openstack-infra/tripleo-ci master: fix random broken pipe on du command  https://review.openstack.org/49188418:54
clarkblibvirt domains are all undefined and the nova instances dirs for each has been dleeted18:54
*** florianf has joined #openstack-infra18:54
clarkbfungi: I can likely tackle vanilla after meeting and lunch but would be good if more than one person is familiar with this :)18:55
fungiclarkb: i don't disagree, though i'18:56
fungive done so little with infra-cloud so far that my learning curve will be steeeeep18:56
clarkbI'll walk you through it :)18:57
fungimuch appreciated18:57
clarkbthis particular issue isn't too bad especially once mnaser pointed out that log warning we can grep for18:57
fungithere's no tc meeting today, so in an hour i should have time to take it for a spin18:57
clarkbjust lots of listing and cross referencing stuff18:57
jeblairpabelanger: that sounds plausible, though i haven't looked at the details.  that is how the queue merging works.18:58
dmsimardThe publishers I see i project-config are all scp, ftp, afs -- is there no way to run shell inside a publisher ? I see one instance of "postbuildscript" used here: https://github.com/openstack-infra/project-config/blob/master/jenkins/jobs/infra.yaml#L299 but I doubt it works18:58
jeblairdmsimard: yes that's complicated and best avoided.18:59
jeblairdmsimard: super easy in v3.18:59
*** vhosakot has quit IRC18:59
dmsimardjeblair: context is to run log collection outside of the job and inside a publisher instead so that if the job times out, logs are available19:00
fungiit's that (weekly infra team meeting) time again! find us in #openstack-meeting for the next hour19:00
clarkbdmsimard: the way we do that with devstack-gate is to timeout the main test process 5 minutes before the job timeout19:00
clarkbdmsimard: then you have 5 minutes to colelct logs19:00
jeblairdmsimard: right.  devstack-gate has support for that.19:00
jeblairdmsimard: otherwise, there isn't a good way for that in v2.19:00
*** slaweq has quit IRC19:00
dmsimardclarkb: yikes19:00
*** vhosakot has joined #openstack-infra19:00
dmsimardokay, then, thanks :)19:00
*** slaweq has joined #openstack-infra19:01
openstackgerritwes hayutin proposed openstack-infra/tripleo-ci master: fix random broken pipe on du command  https://review.openstack.org/49188419:03
*** vhosakot has quit IRC19:05
*** slaweq has quit IRC19:06
*** sslypushenko_ has joined #openstack-infra19:06
*** slaweq has joined #openstack-infra19:06
*** vhosakot has joined #openstack-infra19:10
*** kjackal_ has joined #openstack-infra19:20
*** baoli has quit IRC19:22
sdaguefungi / clarkb interesting git review edge case I just ran into19:23
sdague3 patch series, in merge conflict19:23
sdaguerebase the first patch in gerrit ui, second is a merge conflict so you can't19:23
sdaguepull them down, rebase on master19:23
sdaguegit review... failed19:23
sdaguebecause the bottom ref did not change19:23
sdagueit will not push the other two19:24
clarkbsdague: what you can do is rebase the other two onto the updated base and that should work19:24
fungistrange, if the bottom patch isn't different from what's already in gerrit, it should ignore it and push the others19:24
clarkbbecause then it will have the same sha1 and not attempt a zero delta update (it will just recognize it as existing)19:24
fungioh, yeah if you somehow changed the bottom sha locally after that19:25
clarkbfungi: its different in its sha1 because of timestamps and such but the patch diff is nil19:25
fungiright19:25
sdaguehttp://paste.openstack.org/show/617823/19:25
sdagueyeh19:25
sdagueI ended up just making a random change in the base patch in the gerrit ui19:25
fungii agree that's a tough one to automate away19:25
sdaguethen pushed over19:25
sdagueyeh, it's definitely very edge case19:25
clarkbyou don't need to make a random change in the base patch19:25
sdaguebut it seemed interesting enough to at least tell someone19:25
clarkbyou just have to rebase second and third patch on what is in gerrit for first patch19:26
sdagueclarkb: sure, but that's actually more work than gerrit random change && git review19:26
clarkbthis is also why git review -x can be problematic because you can easily end up with updates to changes that are considered nil changes19:26
clarkbits also not something git review can really do anything about, its gerrit behavior19:26
sdagueyep, that's fine19:27
sdaguelike I said, it's just an interesting edge condition19:27
fungimore work, but does avoid yet one more patchset on that change at least19:29
fungiso maybe a tradeoff19:30
openstackgerritsebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync  https://review.openstack.org/48768319:30
sdagueyeh, at this point I was optimizing for time19:32
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Add mapping file containing v2 to v3 mappings  https://review.openstack.org/49180419:39
*** markvoelker has joined #openstack-infra19:44
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add comments about base jobs  https://review.openstack.org/49189719:45
openstackgerritsebastian marcet proposed openstack-infra/openstackid-resources master: External Calendar Sync  https://review.openstack.org/48768319:49
*** markvoelker has quit IRC19:51
*** pushkaraj__ has quit IRC19:56
clarkbfungi: with meeting winding down. `OS_CLIENT_CONFIG_FILE=/etc/openstack/all-clouds.yaml venv/bin/openstack --os-cloud admin-infracloud-vanilla compute service list` is what you run to get a list of all the nova services, we want to filter out the compute hosts from that. Then for each up compute host I ran ssh for loop to get `echo $hostname && sudo virsh list --all --uuid`19:56
clarkbfungi: for the computes that are down I manually attempted sshing to them and in chocolate none of them responded19:57
clarkbthen with that list I removed any uuid taht showed up in nodepool launcher log from today, any that show up in nodepool list and any that show up in nodepool list. Also remove the mirror node19:57
clarkbfungi: then the remaining uuids I ssh'd into each compute host with one of them and ran virsh dumpxml $uuid to get info about the node (shows you the flavor and creation time)19:58
fungiclarkb: from the puppetmaster presumably19:58
clarkbonce confirmed that the instances are old and not needed via dumpxml you do `virsh undefine $uuid` and then delete /var/lib/nova/instances/$uuid19:58
pabelangerneat, just got Your OpenStack Summit Sydney Registration Code email19:59
clarkbfungi: the nodepool checking I did on nodepool.o.o but the nova list on puppetmaster yes19:59
fungipabelanger: from me or the full discount one?19:59
pabelangerfungi: from kendall@openstack.org20:00
fungipabelanger: okay, so the full one. good ;)20:00
fungisince you were a ptg attendee in atlanta you shouldn't have gotten one from me20:00
clarkbfungi: I'm going to grab food now but will watch irc if you have questions about ^20:00
*** pushkaraj__ has joined #openstack-infra20:00
fungii only sent the us$300 codes, and those were de-duped to not include ptg attendees20:01
*** baoli has joined #openstack-infra20:01
fungiclarkb: will do, this is enough to get me started, thanks!20:01
pabelangerfungi: cool20:01
fungiclarkb: quick question though, any reason you're calling openstackclient from a virtualenv rather than using the globally-installed one we have on the puppetmaster?20:01
*** baoli_ has joined #openstack-infra20:02
fungii've usually used the globally installed one, and it's been working fine, but wondering if i shouldn't be for some reason20:02
fungilooks like we have 7 compute hosts down in vanilla20:03
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion  https://review.openstack.org/49180520:03
pabelangerfungi: that sounds about right20:04
fungicehcking the down ones, so far they don't even respond to ping (while the working ones do)20:04
*** baoli has quit IRC20:05
*** pushkaraj__ has quit IRC20:05
clarkbfungi: no reason for venv I like controlling the client versions20:05
fungiokay, so not for any particular bug20:06
pabelangerfungi: ya, I know the last few compute hosts I couldn't access via ilo either.20:06
fungicompute035.vanilla.ic is responding to ping but refused my first ssh connection (tcp rst)20:06
funginow it's timing out subsequent ssh attempts20:07
*** jamesdenton has quit IRC20:07
fungiwonder if i asploded it trying to ssh in20:07
pabelangerI know a few had HDDs that look to be dying20:07
funginah, some ssh attempts are refused by it, others time out20:07
fungiany good way to make ironic reboot these? openstack baremetal reboot or something?20:08
fungipublic endpoint for baremetal service in RegionOne region not found20:09
fungipoop20:09
*** jamesdenton has joined #openstack-infra20:10
fungilooks like maybe we don't have it in the catalog. i'm trying our instructions for hitting it from the controller20:11
fungithat seems to get it20:11
*** e0ne has joined #openstack-infra20:12
pabelangerfungi: clarkb: so, I think we might have a bad hypervisor in citycloud-lon1, incoming logstash query20:13
pabelangerhttp://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22%5C%5C%5C%22msg%5C%5C%5C%22%3A%20%5C%5C%5C%22Timer%20expired%5C%5C%5C%22%5C%22%20AND%20message%3A%5C%22%5C%5C%5C%22rc%5C%5C%5C%22%3A%20257%5C%22%20AND%20filename%3A%5C%22console.html%5C%22%20AND%20voting%3A1&from=864000s20:13
clarkbya you have to hit it on the main baremetal node20:13
clarkbbrcause bifrost is not a full openstack deployment20:14
clarkbso no real auth and isnt exposed eith thr other apis20:14
pabelangerfungi: clarkb: would should see about passing the info to citycloud and have them confirm20:14
clarkbpabelanger: we probably want to give them our VM uuids so they cantrack it to hypervisor20:14
pabelangerYa, I can get a list here in a few minutes20:14
fungigonna try to reboot all the controllers that i can't ssh into20:15
fungier, compute nodes i mean20:15
fungiall the ones listed by nova as being down anyway20:15
openstackgerritJames E. Blair proposed openstack-infra/project-config master: Use new syntax for base jobs  https://review.openstack.org/49190620:16
openstackgerritJames E. Blair proposed openstack-infra/zuul-jobs master: Remove base job  https://review.openstack.org/49190720:16
*** jkilpatr has quit IRC20:16
*** jamesmcarthur has quit IRC20:19
*** kgiusti has left #openstack-infra20:19
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Require a base job  https://review.openstack.org/49161020:20
pabelangerclarkb: last 4 UUIDs http://paste.openstack.org/show/617830/20:21
pabelangerI'll compose an email shortly20:22
*** e0ne has quit IRC20:23
*** adisky__ has quit IRC20:23
fungii've confirmed i couldn't ssh into any of the down compute nodes in vanilla, and have asked ironic to reboot them. giving it a few minutes (none are up in nova's service list just yet)20:25
*** jamesmcarthur has joined #openstack-infra20:25
*** kjackal_ has quit IRC20:25
pabelangerfungi: thanks!20:25
fungiafter that i'll start collecting instance lists20:26
*** jcoufal has quit IRC20:27
*** jamesmcarthur has quit IRC20:28
pabelangerclarkb: fungi: emails sent20:31
*** rossella_s has quit IRC20:31
fungithanks pabelanger!20:33
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add zuul.project.src_dir variable  https://review.openstack.org/49191520:33
fungioh, hey, compute40 came back online after rebooting20:33
*** rossella_s has joined #openstack-infra20:35
clarkbpabelanger: thanks, I see it20:36
clarkbfungi: nice20:36
*** jkilpatr has joined #openstack-infra20:37
*** e0ne has joined #openstack-infra20:37
*** jamesmcarthur has joined #openstack-infra20:38
*** felipemonteiro__ has quit IRC20:40
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add zuul.project.src_dir variable  https://review.openstack.org/49191520:43
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add zuul.project.src_dir variable  https://review.openstack.org/49191520:44
fungicompute35 came back up into a state where it again responds to ping and refuses ssh access. no clue what's up there20:44
fungimaybe it's missing a host key or something20:44
*** marst_ has quit IRC20:44
sshnaidmclarkb, fungi fyi, solution for bug with multiple /etc is merging here: https://review.openstack.org/#/c/481233/20:45
openstackgerritMerged openstack-infra/project-config master: Bump internap-mtl01 capacity to 190  https://review.openstack.org/49188220:47
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add zuul.project.src_dir variable  https://review.openstack.org/49191520:49
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add zuul.project.src_dir variable  https://review.openstack.org/49191520:50
*** baoli_ has quit IRC20:50
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion  https://review.openstack.org/49180520:50
*** krtaylor has quit IRC20:51
fungisshnaidm: that looks like it could be an effective reduction. thanks20:52
*** sbezverk has quit IRC20:53
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Add mapping file containing v2 to v3 mappings  https://review.openstack.org/49180420:55
*** dprince has joined #openstack-infra20:55
openstackgerritMatthew Treinish proposed openstack/os-testr master: Switch to stestr under the covers  https://review.openstack.org/48844120:55
*** e0ne has quit IRC20:55
*** marst has joined #openstack-infra20:55
openstackgerritMerged openstack-infra/project-config master: Use new syntax for base jobs  https://review.openstack.org/49190620:56
fungiclarkb: okay, i've confirmed that the remaining down computes after attempting to reboot them are still inaccessible via ssh, so proceeding to the instance lists collection phase20:57
clarkbok20:57
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion  https://review.openstack.org/49180520:58
openstackgerritMerged openstack-infra/zuul-jobs master: Create fetch-tox-output role  https://review.openstack.org/49064320:59
*** camunoz has quit IRC21:01
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion  https://review.openstack.org/49180521:01
*** spzala has quit IRC21:02
*** iyamahat has quit IRC21:02
*** baoli has joined #openstack-infra21:03
*** trown is now known as trown|outtypewww21:03
*** iyamahat has joined #openstack-infra21:03
fungiclarkb: pabelanger: should i find it odd that puppetmaster isn't recognizing the ssh host keys for a lot of infracloud vanilla compute nodes?21:04
fungistarting to wonder how ansible has been dealing with them21:04
*** dprince has quit IRC21:05
clarkbya I would expect the root user to be able to ssh to them as part of the ansibling21:05
clarkbI ssh'ed from my local desktop when doing the virsh listings in chocolate though21:05
fungid'oh, operator error21:06
fungii was missing the sudo on ssh21:06
fungiso it was trying to add them to my ~/.ssh/known_hosts on puppetmaster21:07
openstackgerritGabriele Cerami proposed openstack-infra/tripleo-ci master: WIP: containers periodic test  https://review.openstack.org/47574721:08
*** yamamoto has joined #openstack-infra21:09
fungiclarkb: optimization, for my own sense of laziness... gonna generate two uuid lists a little while apart and only check entries which appear in both lists21:10
fungineed to grab a bite to eat anyway, so i'll put that delay to good use21:10
clarkbok21:10
openstackgerritPaul Belanger proposed openstack-infra/openstack-zuul-jobs master: WIP: Add upload-pypi job  https://review.openstack.org/49192621:11
clarkbfungi: another way is to do xml parsing and only look at domains for which the creation time is older than say a week21:12
clarkbbut that is likely far more work because xml21:12
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion  https://review.openstack.org/49180521:13
clarkbfungi: pabelanger we might also want to followup with citycloud on the state of sto2 (I think ti was that region) as that is 50 instance quota we are not able to use currently21:13
*** ldnunes_ has quit IRC21:15
*** sslypushenko_ has quit IRC21:16
pabelangerclarkb: Ya, I haven't heard anything back myself21:16
*** camunoz has joined #openstack-infra21:17
*** slaweq has quit IRC21:18
openstackgerritPaul Belanger proposed openstack-infra/openstack-zuul-jobs master: WIP: Add upload-pypi job  https://review.openstack.org/49192621:18
*** jamesmcarthur has quit IRC21:20
*** EricGonczer_ has quit IRC21:21
*** yamamoto has quit IRC21:21
*** sbezverk has joined #openstack-infra21:22
openstackgerritJames E. Blair proposed openstack-infra/project-config master: Zuulv3: update sql reporter syntax  https://review.openstack.org/49193221:23
jeblairpabelanger, mordred: ^ we need that in to restart zuulv321:23
pabelanger+221:25
*** rockyg has joined #openstack-infra21:27
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add comments about base jobs  https://review.openstack.org/49189721:29
*** annegentle has joined #openstack-infra21:32
*** felipemonteiro_ has joined #openstack-infra21:33
*** felipemonteiro__ has joined #openstack-infra21:35
*** pvaneck_ has joined #openstack-infra21:35
*** markvoelker has joined #openstack-infra21:37
*** thorst has quit IRC21:37
*** pvaneck has quit IRC21:38
*** felipemonteiro_ has quit IRC21:38
openstackgerritMerged openstack-infra/project-config master: Zuulv3: update sql reporter syntax  https://review.openstack.org/49193221:43
*** markvoelker has quit IRC21:44
*** jascott1 has quit IRC21:45
*** jascott1 has joined #openstack-infra21:45
*** jascott1 has quit IRC21:46
*** jascott1 has joined #openstack-infra21:46
*** jascott1 has quit IRC21:49
*** jascott1 has joined #openstack-infra21:50
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add migration tool for v2 to v3 conversion  https://review.openstack.org/49180521:53
openstackgerritMonty Taylor proposed openstack-infra/project-config master: Run zuul-migrate job on changes to mapping file  https://review.openstack.org/49193721:54
mordredjeblair, pabelanger: ^^ also there's the project-config change to run that job on changes to the mapping file21:54
*** jascott1 has quit IRC21:54
*** yamamoto has joined #openstack-infra21:59
*** florianf has quit IRC21:59
*** dprince has joined #openstack-infra21:59
*** priteau has quit IRC22:02
*** esberglu has quit IRC22:03
*** markvoelker has joined #openstack-infra22:04
*** esberglu has joined #openstack-infra22:04
*** markvoelker_ has joined #openstack-infra22:05
*** esberglu has quit IRC22:08
*** markvoelker has quit IRC22:09
clarkbI eyeballed the PTG walk poorly. its .8 miles according to google (still walkable but quite a bit more than 1/4 mile)22:14
*** camunoz has quit IRC22:16
*** esberglu has joined #openstack-infra22:16
fungiyeah, no concerns with that on my part22:17
fungimy luggage is a backpack, so i could do miles on foot with it uphill if needed22:17
*** slaweq has joined #openstack-infra22:19
*** slaweq has quit IRC22:24
*** rockyg has quit IRC22:24
openstackgerritTim Burke proposed openstack-infra/project-config master: Add release notes jobs for python-swiftclient  https://review.openstack.org/49194022:26
*** Julien-zte has joined #openstack-infra22:27
jeblairclarkb, mordred, fungi: are there other reports of infracloud being slow?22:29
clarkbjeblair: I think we've heard it about other clouds but haven't seen infracloud necessarily22:29
clarkbwe are also tracking job timeouts with e-r and they are up since turning off osic22:29
clarkbyou mgiht want to pull up that query and see what the cloud distribution is22:30
*** spzala has joined #openstack-infra22:30
fungialso we still keep fiddling with the max-servers in infra-cloud to figure out how hard we can push it (and as we run at capacity for a while we've still needed to reduce it a couple times)22:30
*** dprince has quit IRC22:31
*** felipemonteiro__ has quit IRC22:31
clarkbhttp://blog.ffwll.ch/2017/08/github-why-cant-host-the-kernel.html completely unrelated but potentially interesting22:32
jeblairi'm trying to figure out if we should scale infracloud back some more22:33
jeblairi don't really have the time to tune it myself.  so if we think this is a signal that we're still oversubscribed, maybe we should lower our usage some.22:33
jeblairbut if we think it's an errant signal, i'll just ignore it for now and see if zuulv3 jobs run faster when we're less busy.22:34
clarkbvanilla is 28% of job timeouts, chocolate is 17%22:34
clarkbcitycloud lon1 is 15% and rax ord is 11%22:34
*** aeng_ has joined #openstack-infra22:34
*** xyang1 has quit IRC22:34
clarkbso vanilla is significanlty more likely to timeout that other regions but chocolate seems to be in line (if high) with other regions22:35
clarkbalso that isn't scaled against total server quota, just percentage of total fails22:35
jeblairvanilla is 12% of capacity and chocolate is 9%22:35
jeblairso the're vaguely hand-wavey 2x as represented in timeouts as they should be based on their proportion of quota22:36
*** markvoelker has joined #openstack-infra22:36
*** gordc has quit IRC22:37
*** bobh has quit IRC22:37
*** markvoel_ has joined #openstack-infra22:37
clarkbwe've also done about 10 timeouts per hour based on logstash data22:38
*** thorst has joined #openstack-infra22:38
clarkbout of 600-900 jobs launched per hour22:38
clarkbbased on that I think I would tune vanilla back22:40
*** markvoelker_ has quit IRC22:40
clarkbchocolate maybe less so? but likely needs it as well22:40
*** markvoelker has quit IRC22:41
*** Julien-zte has quit IRC22:41
*** jascott1 has joined #openstack-infra22:41
*** jaypipes has quit IRC22:42
*** Julien-z_ has joined #openstack-infra22:42
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Don't pass self to a bound method  https://review.openstack.org/49194622:43
*** thorst has quit IRC22:43
clarkbI think that cleaning up any leaked instances will help too22:44
clarkbchocolate should start being better in that regard, but to be determined if vanilla has a problem22:44
openstackgerritSwaminathan Vasudevan proposed openstack/diskimage-builder master: Failed to open parameter YAML error while trying to unmount imagedir  https://review.openstack.org/49063722:45
fungiyeah, we've got one node stuck in a delete state in vanilla for several months i'm trying to work out how to clean up22:45
fungilooks like it's active according to nova so i'm going to attempt to delete it through the api22:46
clarkbfungi: we have one of those in chocolate too if you find out how to clear the one in vanilla we can do that one next22:46
*** rbrndt has quit IRC22:46
fungiand nova continues to list it in an active state22:47
funginot reachable via ssh22:48
fungithe uuid also isn't showing up in the virsh list22:48
fungiso i think this one's teh inverse of the others from earlier. nova still thinks it exists but it doesn't appear to (or maybe it's on a dead compute node?)22:49
*** vhosakot has quit IRC22:49
clarkbah22:49
clarkbif you nova show it as admin I think you can get the hypervisor info from nova22:50
clarkbyou might also be able to tell nova to forget about it as admin?22:50
fungicool, will try that in a jiffy22:50
*** krtaylor has joined #openstack-infra22:50
jeblairokay, so ze01 can rsync a repo to mtl01 at 6mbps, but it can rsync the same repo to vanilla at something like 160kbps22:51
* fungi wonders if we need to switch our isdn to dual-channel22:51
fungibtw, we're down to 8 uuids in vanilla which appeared in my initial list. i'm about to check whether we have any left in nodepool older than the initial list i made (other than the months-old ghost that is)22:53
jeblaira wget of a large file saving to /dev/null on a vanilla node is reporting 70kbps22:53
jeblairsame file on mtl01 is 30mbps22:54
jeblairsame file on comput032.vanilla is also 70kbps22:55
fungilooks like we still have a handful of nodes in vanilla running jobs since before i pulled the uuids from virsh, so odds are once these age out we're left with very few (if any) leaked from nova22:55
jeblairso i think we're saturating our network link22:55
clarkbjeblair: check chocolate too as its the same networking I think22:55
clarkb(would help rule out hardware problems as it is different base hardware on roughyl the same networking)22:56
jeblairclarkb: yeah, i checked a chocolate node and it's reporting about 180kbps22:56
jeblairso, erm, twice as fast?  :)22:56
jeblairbut it varies a lot, so could be approx the same22:57
fungiso the phantom instance nova says is on compute012 but virsh list on that host doesn't include it22:59
clarkbfungi: and you are using virsh list --all?22:59
clarkbwithout --all you only see running instances22:59
clarkbalso check if /var/lib/nova/instances has a dir for it23:00
fungiyup, --all --uuid23:00
*** markvoelker has joined #openstack-infra23:01
fungithere are in fact only two instances listed on compute012 and neither matches this uuid23:01
fungiubuntu-xenial-infracloud-vanilla-8895313 created 2017-05-19T10:07:18Z23:02
openstackgerritJames E. Blair proposed openstack-infra/project-config master: Reduce infra-cloud usage  https://review.openstack.org/49194923:03
*** markvoel_ has quit IRC23:03
jeblairclarkb, fungi: there's a shot in the dark reduction ^23:04
jeblairclarkb, fungi: do we have any information about the network there and what we should expect?23:04
clarkbjeblair: after the flood I know that local networking went to 1gig instead of 10Gbe in vanilla. But unsure of the internet connectivity23:04
clarkbits also possible they are just throttling the hell out of us23:04
*** xarses_ has quit IRC23:05
*** spzala has quit IRC23:05
*** spzala has joined #openstack-infra23:05
fungisame, all i know is what i can read on https://docs.openstack.org/infra/system-config/infra-cloud.html23:05
*** spzala has quit IRC23:05
*** spzala has joined #openstack-infra23:06
*** marst has quit IRC23:06
*** spzala has quit IRC23:06
clarkbrcarrillocruz: and cmurphy (possibly jesusaur) may know more23:06
*** jascott1 has quit IRC23:06
*** rhallisey has quit IRC23:06
*** spzala has joined #openstack-infra23:07
*** spzala has quit IRC23:07
*** spzala has joined #openstack-infra23:07
*** pbourke has quit IRC23:07
*** spzala has quit IRC23:07
*** spzala has joined #openstack-infra23:08
*** spzala has quit IRC23:08
*** pbourke has joined #openstack-infra23:09
jeblairi feel like i'm missing something23:09
jeblairwe set max-servers to 96 on vanilla -- we have 45 compute hosts -- that's almost down to two vms per host23:10
fungiwe're now down to 4 uuids which were present in vanilla when i first checked, and two of those are known to nodepool, meaning we have two needing cleanup23:10
openstackgerritIsaku Yamahata proposed openstack-infra/project-config master: networking-odl: retire boron task  https://review.openstack.org/49195123:10
jeblaircacti says the compute hosts average about 25mbit continuous inbound traffic23:10
jeblair2x150kbps != 25mbps23:11
jeblairhow is it anything other than way *undercommitted*?23:11
clarkbfwiw its 35 compute hosts in vanilla that are operational23:11
jeblairso nearly 3 nodes / host23:12
clarkbya 3 is 1:1 cpu ration23:12
clarkb*ratio23:12
jeblairand in fact, compute032 is running 3 instances right now23:12
*** sflanigan has joined #openstack-infra23:13
jeblairhttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=5981&rra_id=all23:13
clarkbthe base data is likely rabbitmq traffic and glance image transfers which is all on the same layer 2 so we should be running at 1gig for that23:14
jeblairthat graph makes it look like we've been running flat out at 25mbps for nearly 24h23:14
jeblairimage transfers should be a spike, and i hope we're not doing 25mbps of rabbit23:15
clarkbya I think the spike to 100Mbps must be image trasnfers23:15
jeblairsounds reasonable23:15
fungiso interestingly, these two "leaked" instances in vanilla are known to nova, neither is leaked. one is the mirror and the other is pabelanger-test1... so other than the phantom instance that we can't delete because it doesn't actually exist, we have no discrepancies there23:15
clarkbthat also makes me worry that we have been given 100mbps connectivity there and not gigabit23:16
jeblairclarkb: indeed23:16
clarkbjeblair: I know that SpamapS and greghaynes found rabbit to be very chatty. It wouldn't surprise me if it was doing 25mbps but I also think it is insane to be doing that23:16
openstackgerritIsaku Yamahata proposed openstack-infra/project-config master: networking-odl: retire boron task  https://review.openstack.org/49195123:16
*** markvoelker has quit IRC23:18
*** markvoelker has joined #openstack-infra23:18
fungitrying to see if i can tease interface speeds out of one of the controllers23:19
fungier, computes23:19
jeblairclarkb: when i run iftop on compute032, i see a *lot* of connections between zuul/git.o.o and many different infracloud ips23:19
jeblairclarkb: i would only expect to see connections to the 3 ips of the nodes running on compute03223:19
*** pvaneck_ has quit IRC23:19
*** rhallisey has joined #openstack-infra23:20
jeblairclarkb: eg: http://paste.openstack.org/show/617852/23:20
clarkbjeblair: as if we are on a hub not a switch23:20
fungion compute12 (i had to install the ethtool package) i see eth2 is the only physical interface with link detected and it claims to be operating at 10000baseT/Full23:21
jeblairclarkb: ya23:21
*** aeng has quit IRC23:21
fungiwhich i find dubious23:21
clarkbfungi: ya eth2 is our only link23:21
clarkbits why we have the weird bridge thing for neutron23:21
fungithe 10gb link speed i find dubious i mean23:22
clarkbbut the weird bridge thing for neutron should be a proper switch23:22
clarkbfungi: oh ya23:22
*** sdague has quit IRC23:22
fungii guess we could check the bridge table in the kernel and make sure only local macs are showing up on the local interfaces?23:23
fungi(to rule out bridge loops)23:23
*** markvoelker has quit IRC23:23
* fungi plays around with brctl23:23
mnaserdoes all openstack infra testing happen on 8 core machines?23:24
fungibr-vlan2551 and brq85ba3bb6-1f (on compute12) both seem to have a lot of macs showing on them23:25
clarkbmnaser: it does not but hasn't always been the case (nor will it necessarily be the case in the future)23:25
persiamnaser: That is the default request from infrastructure donors.23:25
clarkbs/not/now/23:25
*** Swami has quit IRC23:25
fungimnaser: per https://docs.openstack.org/infra/manual/testing.html#known-differences-to-watch-out-for it can vary23:26
clarkbfungi: the rough setup is eth2 - eth2.$vlan - br-$vlan - veth1 - veth2 iirc23:26
mnaserso.. we're testing some new flavors with fully dedicated cores and i kinda wanted to throw a small 10-15 nodes to see how it copes with (and also help the gate a tiny bit if the setup is still there)23:26
mnaserfor example you'd get 2 cores + 8gb of memory, but 2 fully dedicated cores23:26
clarkbfungi: the reason for that is we need an interface on $vlan for the hypervisor but need to put neutron the same vlan without letting it manage the vlan (so neutron is all untagged because if we let neutron manage it then it borks the hypervisor interface on the same vlan)23:26
clarkbfungi: brq$stuff is the bridge neutron manages23:27
fungii was about to ask, that was my first guess though23:27
clarkbI think what we want to do is tcpdump eth2 and see if we are getting hub like behavior (but thats rouhgly what iftop was doing for me)23:28
clarkbbecause eth2 should be the raw ethernet connection and we should only see stuff destined to hosts behind it on the hypervisor23:28
fungiclarkb: okay, so given that br-vlan2551 only shows a couple local macs and the rest (147 at the moment) are all showing nonlocal23:28
fungii'm guessing we don't see any sort of reflection to account for a storm23:29
clarkbbut could we be contending for access to the bus if we are plugged into a hub?23:30
clarkbwe shouldn't see 147 non local IPs I don't think23:30
clarkbcontroller, upstream router, and VMs are all we should have right?23:30
clarkboh it could be hypervisor to hypervisor because of multinode23:30
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Don't pass self to a bound method  https://review.openstack.org/49194623:30
clarkbso maybe this is ok23:30
*** hongbin has quit IRC23:31
fungiby "plugged into a hub" i assume you mean a switch which has given up trying to track macs in its bridge table. i can't imagine where they'd find an ethernet hub in this day and age23:31
jeblairand the mirror23:31
clarkbbut possible 100Mbps not 1gig or 1gig23:31
clarkbfungi: right that, cam table filled and you lose23:31
fungii can certainly imagine some scenarios where certain switches may run into issues if we cycle through random macs faster than they get aged out of teh table and end up filling them up23:32
fungiyes, that23:32
jeblair48/96 nodes are being used for multinode jobs23:32
clarkbjeblair: what is a transfer between hypervisors like23:33
clarkbthat should be our best case transfer23:33
jeblairwill check23:33
fungicompute017 is the one hosting the mirror, if that helps to compare against23:33
fungiin vanilla23:34
fungiand wow does it show signs of packet loss!23:34
fungihttp://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=3&leaf_id=42223:34
fungivery high error count on eth2 as well23:35
fungimost of its eth2 traffic is from eth2.255123:36
jeblairclarkb: 160mbps23:36
fungii can see on the graph where the mirror vm got rebuilt since it seems like that's probably when it landed on this compute node23:37
jeblairthat's reading from disk23:38
clarkbfungi: ya eth2.2552 is where we tag the vlan for all traffic outbound23:39
clarkbfungi: so that includes all the hypervisor communication and all the VM communication23:39
jeblairclarkb: sorry, 160mBps so 1280mbps23:39
fungipicking another compute node at random, also seeing gaps in snmp responses and pretty high error rate on eth223:40
clarkbjeblair: cool so we likely aren't having global issues. The best case comes out pretty well23:40
fungiso i do agree it's more likely we're saturating the uplink rather than intra-cloud links23:40
clarkbits possible that linux is hating us for the chained bridges (yay software switches) or upstream devices have trouble with VM macs changing frequently or problems are largely to internet?23:40
clarkbfungi: ya23:41
fungibaremetal00 shows similar packet loss and errors on eth223:41
fungihrm23:42
fungithough it's seeing a solid 20mbps inbound on eth2.255123:42
fungiit shouldn't really be consuming anything, right?23:42
clarkbfungi: what does iftop show it talking to or just netstat?23:43
fungithat one probably makes a good control group if we're looking for signs of a storm23:43
clarkbbifrost/ironic will do heartbeats to the nodes23:43
clarkbbut ya 20mbps for heartbeats seems really high23:43
clarkband yes it should be good control group23:43
*** mattmceuen has quit IRC23:45
*** soliosg has quit IRC23:46
fungithis doesn't look good. didn't even have to go that far23:46
clarkbthats interesting I see nb03 and nb04 comms23:46
clarkbto controll00 from compute02123:46
fungiyeah, a tcpdump on eth2.2551 showed me a centos7 test node talking to git.o.o23:47
fungithat should _never_ make it to baremetal0023:47
clarkbsudo tcpdump -i any host 199.19.215.9 shows nb03 to controller0023:48
clarkbso ya23:48
fungiso definitely looks like the switch layer in that rack at least is falling back to flooding behavior23:48
clarkbya23:48
fungiwe can e-mail hpe and ask them to power-cycle the switches i guess, though as i understand it we're not the only machines plugged into them23:49
clarkbmaybe have them check the theory at least23:49
clarkbI wonder what our router is though23:50
fungi15.184.64.123:50
fungithat's probably not what you meant23:50
clarkbwell sort of I know when tripleo was using this setup they used linux as a router23:51
clarkbbut that doesn't appear to be one of ours23:51
clarkbso I think this is entirely upstream of us23:51
fungibug surprise, the oui of the router's mac (bceafa) is assigned to... [drumroll]23:52
fungiHewlett Packard23:52
fungiso could be linux on a "newer" proliant (post-compaq) or something i guess23:53
jeblairso likely two things: 1) switch acting as hub -- annoying, taxes each compute node with an extra 20mbps it has to ignore, but probably not killing performance.  2) upstream bandwidth limit.23:54
jeblairthat sound right?23:54
*** gildub has joined #openstack-infra23:55
fungiyeah, that's the best i can piece together23:55
jeblair(interestingly, i wonder if the 20-25mbps we're seeing on all the nodes because of the switch behavior clues us into our upstream bandwidth?  25mbps/96=260kbps which is not too far off from the 160kbps we measured earlier)23:56
clarkbalso I bet image uploads tank that bw23:57
fungiprobably so23:57
fungiwe _could_ test the theory by dialing down max-servers to 0 in both clouds and then doing some bulk transfers to/from baremetal00 or the mirror or something23:57
fungiif we really wanted a more accurate picture23:58
fungialso possible we're not being throttled, but are sharing an uplink from that pod with some much more network-consuming neighbors hogging the available bandwidth23:58
*** slagle has quit IRC23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!