Thursday, 2018-09-27

*** dpawlik has joined #openstack-infra00:02
*** dpawlik has quit IRC00:06
ianw# ping www.google.com00:07
ianwping: www.google.com: Name or service not known00:07
ianwi guess that's good00:07
clarkbianw: is there an ip address (I'm curious to see too00:08
ianw23.253.218.24100:08
ianwunbound is running00:08
clarkbhost google.com returns an address00:09
ianw# host www.google.com00:10
ianwwww.google.com has address 216.58.192.16400:10
ianwHost www.google.com not found: 2(SERVFAIL)00:10
ianw?00:10
ianw# host www.google.com00:10
ianwwww.google.com has address 216.58.192.16400:10
ianwwww.google.com has IPv6 address 2607:f8b0:4009:80f::200400:10
clarkbI dropped the wwww00:10
ianwnow it works ...?00:10
*** jamesmcarthur has joined #openstack-infra00:10
ianwclarkb: did you restart anything?  it certainly seems like it started out borked, but now looks ok00:11
*** slaweq has joined #openstack-infra00:11
clarkbSep 27 00:07:29 unbound[1355:1] debug: sending to target: <.> 2620:0:ccc::2#53 then Sep 27 00:07:29 unbound[1355:1] notice: sendto failed: Network is unreachable00:11
clarkbianw: I did not restart anything00:11
clarkbI wonder if it is round robining and failing on some of the resolvers00:11
ianwntpd[822]: error resolving pool 2.fedora.pool.ntp.org: Name or service not known (-2)00:12
clarkbI wonder if the issue is using ipv6 and relying on RAs to configure the interface00:13
clarkbwe statically configure the ipv4 address with glean but rely on RAs for ipv6 aiui00:13
clarkbso there could be a lag after boot where ipv6 isn't working00:13
*** ijw has joined #openstack-infra00:14
clarkbianw: oh we don't have ipv6 configured at all on that host00:15
*** slaweq has quit IRC00:15
*** jamesmcarthur has quit IRC00:15
ianwyeah ... only link local addrs00:15
clarkbok so we don't have working dns on boot beacuse we don't haev working ipv6 on boot like we expect00:16
clarkbbut some clouds never have working ipv600:16
pabelangerglean doesn't support ipv6 for fedora / centos00:16
pabelangerjust ubuntu00:16
clarkbah maybe we do statically configure ipv6 on debuntu then00:16
pabelangeryah, only clouds with dhcp ipv6 will be good00:16
pabelangerwhich I think is inap?00:16
clarkbvexxhost RAs, I don't think inap does ipv600:17
pabelangerkk00:17
clarkbovh has ipv6 but the instances don't know about it00:17
pabelangerokay, so that might be why ovh is failing on fedora00:17
pabelangerif you look in forwarding.conf for unbound, we setup ipv600:17
pabelangerbut when we run configure-unbound role, we check for ipv6 / ipv4 first00:18
pabelangerthen drop in right forwarding.conf00:18
clarkbya I think we assumed that unbound wouldn't use ipv6 addrs if it couldn't ipv600:18
ianwin theory it should just ignore that?  it seems to come good after a little bit, however ... maybe there's a timeout00:18
pabelangerclarkb: yah, maybe that changed in fedora recently00:18
ianwclarkb: ok if i reboot and see if dns is borked right at boot again?00:18
pabelangerbecause centos still works00:18
clarkbianw: well I think it round robins the different resolvers and half of them are ipv400:18
clarkbianw: ya I'm off of the host00:18
*** mriedem_away has quit IRC00:18
clarkbpabelanger: oh could be00:19
pabelangerthat would explain why fedora-28 just start randoming failing too, when fedora-27 worked fine00:19
pabelangerI can check unbound changelog and see if anything pops up00:20
*** felipemonteiro has joined #openstack-infra00:22
ianwyep ...00:22
ianw1028  sendto(42, "\200T\1\0\0\1\0\0\0\0\0\1\3www\3abc\3net\2au\0\0\1\0\1"..., 43, 0, {sa_family=AF_INET6, sin6_port=htons(53), inet_pton(AF_INET6, "2001:4860:4860::8888", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=0}, 28) = -1 ENETUNREACH00:22
ianw (Network is unreachable)00:22
ianwthat was stracing unbound00:23
pabelangerso, guess we need to rework DIB them00:26
pabelangerthen*00:26
clarkbwe could do ipv4 only by default since that works in ipv6 "only" clouds00:26
clarkbit is just less reliable because NAT00:26
pabelangerhttps://www.nlnetlabs.nl/svn/unbound/tags/release-1.8.0/doc/Changelog00:27
ianwsigh, seems *every* part is unreliable00:27
pabelangerhaven't found anything yet related to forwarding00:27
ianwif it only has link-local ipv6 addresses, why would it be trying this ... time to git pull ...00:28
ianwoh don't tell me it's in svn.00:29
pabelangeroldschool00:29
*** jamesmcarthur has joined #openstack-infra00:31
*** jamesmcarthur has quit IRC00:35
*** ansmith has quit IRC00:42
ianwhttp://paste.openstack.org/show/730975/00:50
ianwso it seems unbound has a rather complicated algorithm for choosing the forwarding server to talk to based on ping times etc, and there is some caching period for this00:50
*** felipemonteiro has quit IRC00:51
ianwmy inclination here is that *something* is slightly different about fedora start and maybe when unbound queries ipv4 it times out, etc ... basically it ends up looking as good/as bad as ipv600:51
*** jamesmcarthur has joined #openstack-infra00:52
ianwso the ipv6 servers get added into the mix ... at least until networking is fully 100% and enough time has passed that unbound starts to requery it's performance considerations00:52
ianwat which point it now notices the ipv4 servers are really fast and the ipv6 servers don't exist00:52
* ianw is waving hands madly on this ...00:53
*** Emine has quit IRC00:55
*** jamesmcarthur has quit IRC00:56
ianwyeah, for example01:00
ianwSep 27 00:58:15 ianw-test ntpd[593]: error resolving pool 2.fedora.pool.ntp.org: Name or service not known (-2)01:00
ianwSep 27 00:58:19 ianw-test network[663]: Bringing up interface eth0:  [  OK  ]01:00
ianwSep 27 00:58:23 ianw-test network[663]: Bringing up interface eth1:  [  OK  ]01:00
ianwSep 27 00:58:24 ianw-test systemd[1]: Starting Unbound recursive Domain Name Server...01:03
ianwi dunno, even more confused.  unbound is starting after eth101:03
*** adriancz has quit IRC01:07
*** longkb has joined #openstack-infra01:13
*** hongbin has joined #openstack-infra01:19
*** rlandy has quit IRC01:31
*** openstackgerrit has joined #openstack-infra01:34
openstackgerritMerged openstack-infra/system-config master: Only replicate gtest-org and kdc  https://review.openstack.org/60549001:34
*** ijw has quit IRC01:41
*** mrsoul has joined #openstack-infra01:54
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: Uncap cherrypy  https://review.openstack.org/60113602:13
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: rewrite interface in react  https://review.openstack.org/59160402:14
*** Bhujay has joined #openstack-infra02:19
*** Bhujay has quit IRC02:19
*** Bhujay has joined #openstack-infra02:20
*** armax has quit IRC02:22
*** annp has joined #openstack-infra02:23
ianwok, i've filed https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=4188 , i'm sort of assuming the response, if any, will be "well durr, don't do that"02:25
openstackwww.nlnetlabs.nl bug 4188 in server "IPv6 forwarders without ipv6 result in SERVFAIL" [Enhancement,New] - Assigned to unbound-team02:25
*** graphene has quit IRC02:33
*** diablo_rojo has quit IRC02:33
*** apetrich has quit IRC02:40
*** psachin has joined #openstack-infra02:48
*** imacdonn has quit IRC02:50
*** markvoelker has joined #openstack-infra02:50
*** imacdonn has joined #openstack-infra02:51
*** rfolco has quit IRC02:54
*** rkukura has quit IRC02:55
*** Bhujay has quit IRC02:59
openstackgerritSergey Vilgelm proposed openstack-dev/pbr master: Special case long_description_content_type  https://review.openstack.org/56517703:06
*** felipemonteiro has joined #openstack-infra03:15
*** harlowja has quit IRC03:20
*** ramishra has joined #openstack-infra03:27
*** jiapei has joined #openstack-infra03:42
*** rcernin_ has quit IRC03:42
*** rcernin has joined #openstack-infra03:43
*** dave-mccowan has quit IRC03:46
*** hongbin has quit IRC03:47
*** dpawlik has joined #openstack-infra04:02
*** haleyb has quit IRC04:07
*** dpawlik has quit IRC04:07
*** vivsoni_ has joined #openstack-infra04:13
*** vivsoni has quit IRC04:15
*** felipemonteiro has quit IRC04:16
*** udesale has joined #openstack-infra04:18
*** ijw has joined #openstack-infra04:23
*** ijw has quit IRC04:27
openstackgerritIan Wienand proposed openstack-infra/project-config master: elements/ndoepool-base: only initially populate ipv4 nameservers  https://review.openstack.org/60558304:27
openstackgerritMerged openstack-infra/project-config master: Added twine check functionality to python-tarball playbook  https://review.openstack.org/60509604:32
*** slaweq has joined #openstack-infra05:11
*** slaweq has quit IRC05:15
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] initial port of install-docker role  https://review.openstack.org/60558505:16
*** apetrich has joined #openstack-infra05:27
*** quique|off is now known as quiquell05:31
*** Bhujay has joined #openstack-infra05:35
openstackgerritIan Wienand proposed openstack-dev/pbr master: Special case long_description_content_type  https://review.openstack.org/56517705:39
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix unreachable nodes detection  https://review.openstack.org/60282905:39
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Also retry the job if a post job failed with unreachable  https://review.openstack.org/60283005:39
*** bnemec has quit IRC05:39
*** ijw has joined #openstack-infra05:42
*** rkukura has joined #openstack-infra05:44
quiquellAny infra-root here ? we need to priorize a promotion blocker in the zuul queue05:44
*** ijw has quit IRC05:47
*** rkukura has quit IRC05:49
ianwquiquell: what do you need done?05:49
quiquellianw: this one https://review.openstack.org/#/c/605039/05:50
quiquellianw: Can we wait, I have doubts about one thing05:53
ianwwell yeah, there's a lot ahead of it running ... i know you realise it will reset the queue05:53
quiquellianw: Yes... maybe is more hurt than gainn05:54
quiquellianw: people has being waiting to merge stuff for long05:54
quiquellianw: nah leave it05:54
quiquellianw: sorry about the noise05:54
quiquellWe don't have timeouts now05:54
*** gfidente has joined #openstack-infra06:00
*** roman_g has quit IRC06:03
AJaegerconfig-core, two small cleanups for your consideration, please: https://review.openstack.org/605077 and https://review.openstack.org/605344 . Also, https://review.openstack.org/605128 is ready06:05
*** kopecmartin|off is now known as kopecmartin|ruck06:08
*** ijw has joined #openstack-infra06:09
*** longkb has quit IRC06:13
*** njohnston has quit IRC06:14
*** longkb has joined #openstack-infra06:14
*** ijw has quit IRC06:14
*** AJaeger has quit IRC06:15
*** njohnston has joined #openstack-infra06:15
*** adriancz has joined #openstack-infra06:17
*** AJaeger has joined #openstack-infra06:18
*** slaweq has joined #openstack-infra06:23
*** dpawlik has joined #openstack-infra06:23
*** pcaruana has joined #openstack-infra06:33
*** jamesdenton has quit IRC06:35
*** graphene has joined #openstack-infra06:39
*** aojea has joined #openstack-infra06:40
mnasiadkaHi - I'm trying to fetch periodic jobs failure metrics from graphite.openstack.org - and it seems the ones under stats_counts.zuul.tenant.pipeline.periodic.project.* are empty, but the all_jobs and total_changes are populated - should I look somewhere else in the tree to get those?06:46
*** diablo_rojo has joined #openstack-infra06:55
*** quiquell is now known as quiquell|brb06:55
*** ginopc has joined #openstack-infra07:07
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add build page  https://review.openstack.org/59702407:10
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add job page  https://review.openstack.org/59704807:10
*** graphene has quit IRC07:10
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add config-errors notifications drawer  https://review.openstack.org/59714707:10
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add change status page  https://review.openstack.org/59947207:11
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add project page  https://review.openstack.org/60426607:11
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add labels page  https://review.openstack.org/60468207:11
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add nodes page  https://review.openstack.org/60468307:11
*** rcernin has quit IRC07:12
*** graphene has joined #openstack-infra07:12
*** chkumar|off is now known as chandankumar07:18
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool master: Implement a Kubernetes driver  https://review.openstack.org/53555707:19
*** shardy has joined #openstack-infra07:21
*** diablo_rojo has quit IRC07:27
*** graphene has quit IRC07:30
*** graphene has joined #openstack-infra07:31
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] initial port of install-docker role  https://review.openstack.org/60558507:35
*** quiquell|brb is now known as quiquell07:38
*** florianf|afk is now known as florianf07:40
*** longkb has quit IRC07:41
*** longkb has joined #openstack-infra07:42
*** tosky has joined #openstack-infra07:45
*** jpich has joined #openstack-infra07:47
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Remove tricircle dsvm jobs  https://review.openstack.org/60534407:49
mnasiadkaok, found those in stats.zuul.tenant... - thanks for help ;)07:55
*** jpena|off is now known as jpena08:01
openstackgerritMatthieu Huin proposed openstack-infra/zuul master: web: add tenant and project scoped, JWT-protected actions  https://review.openstack.org/57690708:04
openstackgerritMatthieu Huin proposed openstack-infra/zuul master: CLI: add create-web-token command  https://review.openstack.org/60538608:09
*** e0ne has joined #openstack-infra08:12
*** alexchadin has joined #openstack-infra08:21
*** kashyap has joined #openstack-infra08:22
kashyapcmurphy: Morning08:22
cmurphymorning kashyap08:23
kashyapcmurphy: When you get a moment, can you update the FIXME for SLES here: https://wiki.openstack.org/wiki/LibvirtDistroSupportMatrix08:23
cmurphykashyap: will do, thanks for the reminder08:23
kashyapThanks!08:23
openstackgerritDmitry Tantsur proposed openstack/diskimage-builder master: Add an element to configure iBFT network interfaces  https://review.openstack.org/39178708:26
openstackgerritFilippo Inzaghi proposed openstack-infra/bindep master: fix tox python3 overrides  https://review.openstack.org/60561308:26
*** xinliang has joined #openstack-infra08:27
*** janki has joined #openstack-infra08:27
*** jiapei has quit IRC08:31
*** lpetrut has joined #openstack-infra08:33
openstackgerritDmitry Tantsur proposed openstack/diskimage-builder master: Add an element to configure iBFT network interfaces  https://review.openstack.org/39178708:35
*** verdurin has quit IRC08:36
*** dtantsur|afk is now known as dtantsur08:36
openstackgerritFilippo Inzaghi proposed openstack-infra/elastic-recheck master: fix tox python3 overrides  https://review.openstack.org/57449408:39
*** verdurin has joined #openstack-infra08:39
openstackgerritFilippo Inzaghi proposed openstack-infra/elastic-recheck master: fix tox python3 overrides  https://review.openstack.org/60561808:41
*** derekh has joined #openstack-infra08:43
*** kashyap has left #openstack-infra08:47
*** roman_g has joined #openstack-infra08:48
*** panda|off is now known as panda09:00
*** ykarel has joined #openstack-infra09:04
*** electrofelix has joined #openstack-infra09:05
*** janki has quit IRC09:13
*** Bhujay has quit IRC09:19
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: add Gentoo jobs and vars and also fix install test  https://review.openstack.org/60243909:20
*** pbourke has quit IRC09:24
*** pbourke has joined #openstack-infra09:24
*** Emine has joined #openstack-infra09:24
*** Bhujay has joined #openstack-infra09:25
*** longkb has quit IRC09:38
*** longkb has joined #openstack-infra09:38
*** alexchadin has quit IRC09:42
*** calbers has quit IRC10:00
*** calbers has joined #openstack-infra10:03
*** longkb has quit IRC10:03
*** yamamoto has quit IRC10:17
*** shardy is now known as shardy_mtg10:18
*** yamamoto has joined #openstack-infra10:25
*** yamamoto has quit IRC10:46
*** yamamoto has joined #openstack-infra10:48
*** yamamoto has quit IRC10:49
openstackgerritMiguel Angel Ajo proposed openstack-infra/project-config master: Enable storyboard for os-log-merger  https://review.openstack.org/60564410:52
*** udesale has quit IRC10:54
ajonot sure if that's the right way to do it though ^11:00
*** jpena is now known as jpena|lunch11:06
*** alexchadin has joined #openstack-infra11:22
fungijust a heads up, i'm on the road until probably ~19:00z, but will try to catch up once i'm back at the computer again11:22
*** yamamoto has joined #openstack-infra11:23
*** joabdearaujo has joined #openstack-infra11:26
*** quiquell is now known as quiquell|lunch11:27
*** felipemonteiro has joined #openstack-infra11:30
*** ykarel_ has joined #openstack-infra11:37
*** ssbarnea|bkp has joined #openstack-infra11:37
*** ykarel has quit IRC11:39
*** quiquell|lunch is now known as quiquell11:43
*** tosky__ has joined #openstack-infra11:44
*** tosky has quit IRC11:44
*** tosky has joined #openstack-infra11:45
*** olivierb has joined #openstack-infra11:47
*** pcaruana has quit IRC11:50
*** Bhujay has quit IRC11:51
*** Bhujay has joined #openstack-infra11:52
*** Bhujay has quit IRC11:53
*** Bhujay has joined #openstack-infra11:53
*** tpsilva has joined #openstack-infra11:54
*** Bhujay has quit IRC11:54
*** rfolco has joined #openstack-infra11:56
*** olivierb has quit IRC11:58
*** olivierb has joined #openstack-infra11:59
openstackgerritSimon Westphahl proposed openstack-infra/zuul master: Use merger to get list of files for pull-request  https://review.openstack.org/60328712:00
*** annp has quit IRC12:03
*** felipemonteiro has quit IRC12:04
fricklerslaweq: hi, how are you doing with debugging the dvr job? do you still need the held nodes?12:05
fricklerpabelanger: there's also a node held for you, 19d old12:05
*** trown|outtypewww is now known as trown12:06
*** e0ne has quit IRC12:09
*** e0ne has joined #openstack-infra12:09
openstackgerritSimon Westphahl proposed openstack-infra/zuul master: Use merger to get list of files for pull-request  https://review.openstack.org/60328712:13
*** shardy_mtg is now known as shardy12:14
*** ijw has joined #openstack-infra12:14
*** yamamoto has quit IRC12:14
*** tosky__ has quit IRC12:15
*** yamamoto has joined #openstack-infra12:16
*** jamesdenton has joined #openstack-infra12:16
*** ijw has quit IRC12:18
mordredmorning humans!12:19
slaweqfrickler: hi, no I don't need them12:21
slaweqfrickler: sorry, I thought that they will be removed afrer few hours12:21
*** mriedem has joined #openstack-infra12:22
*** rlandy has joined #openstack-infra12:22
fricklerslaweq: no, they are kept until someone deletes them manually. will clean up now, thanks for your feedback.12:23
slaweqfrickler: so please remove them now :) and once again sorry for keeping it so long12:24
slaweqI will remember now that :)12:24
mordredfrickler: oh - sorry - that's my bad12:31
mordredfrickler: I, like slaweq, thought they had a timeout associated with themselves12:31
mordredso didn't delete them manually when he said he was done with them12:31
slaweqmordred: I thought like that because You told me that actually :)12:31
*** mriedem has quit IRC12:34
*** jpena|lunch is now known as jpena12:34
mordredslaweq: well, I guess now you learn to not listen to me :)12:35
slaweqmordred: LOL12:36
*** pcaruana has joined #openstack-infra12:39
*** alexchadin has quit IRC12:43
*** ykarel_ is now known as ykarel12:44
*** ansmith has joined #openstack-infra12:44
openstackgerritMonty Taylor proposed openstack-infra/system-config master: Remove snapd from servers  https://review.openstack.org/60567612:46
*** mriedem has joined #openstack-infra12:49
mordredinfra-root: I just restarted gerrit on review-dev to test the update to the replication rules12:53
mordredBUT12:53
mordredinfra-root: gerrit does not like the regexes we have in the config file that contain # characters12:54
*** ramishra has quit IRC12:54
mordredI'm not sure what happened / why this has stopped working12:54
mordredbut gerrit will not start unless they're removed12:54
mordredyou can replicate the error just by running git config --file=/home/gerrit2/review_site/etc/gerrit.config --list12:54
pabelangerfrickler: that can be deleted if you like12:56
*** agopi has quit IRC12:58
*** bobh has joined #openstack-infra12:59
*** kgiusti has joined #openstack-infra13:00
*** haleyb has joined #openstack-infra13:03
*** boden has joined #openstack-infra13:08
*** dhill_ has quit IRC13:13
*** scroll has joined #openstack-infra13:13
*** dhill_ has joined #openstack-infra13:14
*** njohnston has quit IRC13:15
*** njohnston has joined #openstack-infra13:16
openstackgerritMonty Taylor proposed openstack-infra/system-config master: Double the gerrit regex backslashes  https://review.openstack.org/60569713:17
mordredinfra-root, cmurphy: ^^13:18
mordredcmurphy: I'm *thinking* that's futureparser related, as that's really the only thing that has changed in the gerrit puppet13:18
*** bnemec has joined #openstack-infra13:18
pabelangermordred: Can never have enough backslashes13:19
cmurphymordred: hmm13:19
*** agopi has joined #openstack-infra13:20
*** ramishra has joined #openstack-infra13:22
*** e0ne has quit IRC13:22
cmurphymordred: here's where that gets passed down to http://git.openstack.org/cgit/openstack-infra/puppet-gerrit/tree/templates/gerrit.config.erb#n165 there's no reason that should be different with the future parser13:22
*** aidin has joined #openstack-infra13:22
cmurphymordred: iirc clarkb said when we turned the future parser on there was one minor change in the config file and it wasn't this13:23
mordredcmurphy: spectacular13:24
mordredcmurphy: I am 100% dumbfounded13:24
mordredcmurphy: fwiw - I tested that by applying it on review-dev via puppet and it made the error go away13:24
cmurphymordred: well okay then that's good enough13:25
mordredcmurphy: well - except - what the heck happened?13:25
*** e0ne has joined #openstack-infra13:26
cmurphygremlins13:27
mordredcmurphy: don't feed them after midnight13:27
cmurphymordred: i think i saw something similar in how the command parameter of an exec was being interpolated but the problem magically went away after clarkb asked about it and now i don't know which patchset of which change it was in13:30
*** yamamoto has quit IRC13:31
*** yamamoto has joined #openstack-infra13:31
mordred\o/13:36
*** psachin has quit IRC13:36
*** yumiriam has quit IRC13:41
*** rh-jelabarre has joined #openstack-infra13:43
*** aidin has quit IRC13:53
openstackgerritDoug Hellmann proposed openstack-infra/project-config master: switch all official python projects to python3 publishing job  https://review.openstack.org/59832313:55
*** ykarel has quit IRC13:59
*** fuentess has joined #openstack-infra14:08
openstackgerritMerged openstack-infra/project-config master: Add zone-opendev.org project  https://review.openstack.org/60509514:30
*** jistr is now known as jistr|call14:31
*** yamamoto has quit IRC14:32
*** yamamoto has joined #openstack-infra14:33
*** yamamoto has quit IRC14:33
*** yamamoto has joined #openstack-infra14:34
*** lpetrut has quit IRC14:36
*** lpetrut has joined #openstack-infra14:36
*** yamamoto has quit IRC14:38
*** smarcet has joined #openstack-infra14:38
openstackgerritsebastian marcet proposed openstack-infra/openstackid-resources master: Fix on CFP presentation submission proccess  https://review.openstack.org/60576014:43
*** yamamoto has joined #openstack-infra14:43
openstackgerritMerged openstack-infra/openstackid-resources master: Fix on CFP presentation submission proccess  https://review.openstack.org/60576014:44
*** evrardjp has quit IRC14:45
corvusmordred, fungi: can you review https://review.openstack.org/605092 when you have a moment?14:45
*** evrardjp has joined #openstack-infra14:47
*** HenryG has quit IRC14:49
*** armax has joined #openstack-infra14:52
*** yamamoto has quit IRC14:52
*** HenryG has joined #openstack-infra14:55
*** rkukura has joined #openstack-infra14:57
*** _erlon_ has joined #openstack-infra14:58
*** sthussey has joined #openstack-infra14:58
*** evrardjp has quit IRC14:58
*** Swami has joined #openstack-infra14:59
*** hamzy_ has quit IRC15:00
*** shardy is now known as shardy_afk15:00
openstackgerritsebastian marcet proposed openstack-infra/system-config master: OpenStackId production release 1.0.25  https://review.openstack.org/60577015:06
*** shardy_afk is now known as shardy15:07
openstackgerritSalvador Fuentes Garcia proposed openstack-infra/openstack-zuul-jobs master: Add zuul envar for kata tests  https://review.openstack.org/60577315:09
mordredcorvus: done15:14
*** jbadiapa has quit IRC15:20
*** jamesmcarthur has joined #openstack-infra15:21
pabelangercorvus: mordred: clarkb: fungi: is there any interest in having manage-projects in jeepyb grow support for github only things?  This might be more in the content of zuul user using github then openstack-infra. I'm just starting to look how we can use some of the concepts here in openstack-infra but for ansible-network. Rather then going out and building something one off15:22
mordredpabelanger: I don't pesonally have any interest in that - but I can understand how it would be desirable for someone in github land15:25
*** hamzy_ has joined #openstack-infra15:26
*** shardy has quit IRC15:26
pabelangerI guess the question is more, will people -2 it over just not review15:26
*** jbadiapa has joined #openstack-infra15:27
AJaegerdhellmann: comment on https://review.openstack.org/598323 - are we ready with PyPi?15:27
*** quiquell is now known as quiquell|off15:28
*** yamamoto has joined #openstack-infra15:28
mordredpabelanger: I would not -2 - but I don't believe I have context to review. I wonder - we have a base class in jeepyb now for reading the config - I wonder if we could do a little more work on the codebase to make it something thatyou could make a jeepyb-github project that depends on jeepyb but doesn't need to add logic directly to jeepyb itself?15:29
*** hamzy_ has quit IRC15:30
pabelangermordred: yes, that would work also. And think a fine solution15:30
*** kopecmartin|ruck is now known as kopecmartin|off15:31
pabelangerit is more the existing config layout / github bits that are interesting, then duplicating that all into something else15:31
pabelangerand, imagine say somebody like kata might be intereted in using it, to say help sync things like labels across all projects15:32
pabelangerthat's what I am really looking for right now15:32
pabelangerautomating the creation of repos, is icing on the cake :)15:32
*** eernst has joined #openstack-infra15:34
pabelangerthe other idea I had, was maybe just move this into a zuul-job and improve ansible modules for github15:34
pabelangerkinda like we use ansible-role-cloud-launcher today15:34
*** graphene has quit IRC15:35
mordredpabelanger: I actually think that sounds like a more better approach - more generally managable15:36
*** graphene has joined #openstack-infra15:36
*** dave-mccowan has joined #openstack-infra15:41
dhellmannAJaeger : You're right. I think we can work those out as we find them.15:45
*** shardy has joined #openstack-infra15:46
*** chandankumar is now known as chkumar|off15:46
*** jistr|call is now known as jistr15:46
*** e0ne has quit IRC15:51
AJaegermordred, dhellmann, smcginnis, please look at https://review.openstack.org/531825 and https://review.openstack.org/598323 - should we merge 598323 (and abandon 531825) - and work these out? (see dhellmann 's comment above as well)15:52
dhellmannAJaeger , mordred : yeah, I think we want to just switch to the new job instead15:53
smcginnis++15:53
*** gyee has joined #openstack-infra15:53
*** rpittau has quit IRC15:54
*** gyee has quit IRC15:54
dhellmannI'll go through the list of projects and register os-$foo names for the ones where we don't own the regular name15:54
*** dpawlik has quit IRC15:54
smcginnisI seem to recall some discussion in the past of trying to get control of at least one or two of those names as they appeared to be abandoned.15:55
AJaegerdhellmann: I remember somebody was trying to claim keystone (and others). Let's see what mordred remembers...15:55
smcginnisThough I also recall hearing that can be a lengthy process.15:55
AJaegersmcginnis: yes, so did I - let's figure status out on that one.15:55
AJaegerdhellmann's change leaves one occourence of release-openstack-server template, should we remove it completely?15:57
dhellmannwhich one did I miss?15:57
AJaegermordred, clarkb, fungi, I'd like you to chime in and have alignment ^15:57
*** gyee has joined #openstack-infra15:57
AJaegerdhellmann: openstack/ansible-role-tripleo-congress15:57
*** dpawlik has joined #openstack-infra15:58
dhellmannwhy are we using that job for ansible roles?15:58
AJaegerno idea ;( Might just be wrong...15:58
*** graphene has quit IRC15:59
*** dpawlik has quit IRC15:59
mordredAJaeger, dhellmann: there is a person with a project published to keystone which seems abandoned - I contacted the author several times to see if he'd transfer with no response15:59
AJaegersorry, need to step out for a bit, will read later15:59
*** dpawlik has joined #openstack-infra15:59
mordredI think the next step in that process was to contact the pypi folks and see if they'd do it for us15:59
mordredafter having shown good-faith effort to contact the existing owner16:00
*** graphene has joined #openstack-infra16:00
dhellmannok, do you still want to do that?16:00
mordredI contacted the maintainer, Dan Crosta on 11/17/2017 and again o 01/08/2018 - the second time i copied dhellmann, ttx, fungi and smcginnis16:02
*** yamamoto has quit IRC16:02
mordreddhellmann: I can - unless you feel like it's a more appropriate request coming from the releaes team16:02
dhellmannI'm trying to figure out if they have documented the process16:03
*** ykarel has joined #openstack-infra16:04
mordreddhellmann: https://www.python.org/dev/peps/pep-0541/16:04
mordredlinked from an issue on gh I found: https://github.com/pypa/pypi-legacy/issues/68216:05
*** ginopc has quit IRC16:05
dhellmannmordred : ok, I'll see about doing that today16:05
mordreddhellmann: cool - let me know if I can help in any way16:07
*** dave-mccowan has quit IRC16:08
dhellmannmordred : there are currently 28 open requests to have package maintainership changed https://github.com/pypa/warehouse/issues?q=is%3Aissue+is%3Aopen+label%3A%22PEP+541%2216:09
dhellmannsome as old as january16:09
dhellmannI'm not sure we want to block on having preferred names16:10
*** efried has quit IRC16:10
*** efried has joined #openstack-infra16:10
pabelangerhttps://github.com/pypa/warehouse/issues/4610 looks to have been transferred in a day16:11
pabelangernot sure why so fast vs others16:12
*** jpich has quit IRC16:12
*** graphene has quit IRC16:14
dhellmannhttps://github.com/pypa/warehouse/issues/477016:14
dhellmannI don't appear to have permission to tag it16:15
*** graphene has joined #openstack-infra16:15
pabelangerI've found recently, you need to have special permissions on github repo to tag labels16:15
dhellmannthat seems likely16:16
pabelangerwoah, somebody tagged it16:16
*** diablo_rojo has joined #openstack-infra16:17
mordredwoot! responses16:18
corvusmordred: hey look what dcrosta's been working on: https://github.com/tox-dev/tox-docker16:19
mordredcorvus: yah - I think I looked at that last time I looked at him :)16:19
*** ykarel_ has joined #openstack-infra16:19
*** ykarel has quit IRC16:22
*** florianf is now known as florianf|afk16:25
AJaegerdhellmann: will you do the process for https://pypi.org/project/magnum/ and https://pypi.org/project/congress as well?16:26
*** dpawlik has quit IRC16:26
dhellmannAJaeger : I'm making a list so I can do them all at once16:26
AJaegerdhellmann: thanks!16:26
clarkbI'm having a slow start this morning. Haven't seen a response on the BHS1 email yet16:28
AJaegermordred: will you abandon https://review.openstack.org/#/c/531825/ then?16:29
AJaegermordred: with the two stacked on top of it?16:29
mordredAJaeger: abaonded16:30
*** dpawlik has joined #openstack-infra16:30
* mordred afks for a bit16:30
AJaegerthanks, mordred16:30
openstackgerritMerged openstack-infra/zuul master: Uncap cherrypy  https://review.openstack.org/60113616:33
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: WIP: Playing with k8s install  https://review.openstack.org/60580316:34
openstackgerritAndreas Jaeger proposed openstack-infra/openstack-zuul-jobs master: Remove release-openstack-server template  https://review.openstack.org/53183016:34
clarkbcorvus: on https://review.openstack.org/#/c/605092/1/manifests/site.pp the node comment annotation is for bionic which means our puppet tests won't run there (also not sure we can puppet those at all?) is that supposed to be xenial like the existing dns servers?16:34
openstackgerritAndreas Jaeger proposed openstack-infra/project-config master: Remove release-openstack-python-without-pypi  https://review.openstack.org/53182916:35
corvusclarkb: oh, for some reason i thought we were able to use bionic now16:35
corvusi guess i need to change it16:35
AJaegerdhellmann, mordred, those are the cleanups we can do once dhellmann's change is in ^16:35
clarkbcorvus: and the other question I have is do we need to update iptables or is that going to happen in a followup change?16:36
*** bobh has quit IRC16:37
corvusclarkb: followup.  bootstrapping here is a bit messy.  i'll probably do at least one manual iptables thing to get it going.16:37
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Add opendev nameservers  https://review.openstack.org/60509216:37
clarkbok16:37
corvusmordred: ^ con you re-review pls?16:37
dmsimardWhy would pip/tox apparently fail on a 404 for a wheel ? Should it not attempt to retrieve the package if a wheel isn't found ?16:38
AJaegerdhellmann: So, I'll +2 your change once you update all repos - including the missed one.16:39
clarkbdmsimard: it is likely 404ing on a file that is in the index16:39
clarkbdmsimard: in which case it has alread decided which file is most appropriate to install from16:39
dmsimardclarkb: I have three jobs which failed on two separate mirrors but the package is there and it works locally :/16:40
dmsimardhttp://paste.openstack.org/show/731034/16:40
clarkbdmsimard: our mirrors are proxy caches now for pypi because our disks can't keep up with how quickly machine learning produces pypi packages16:40
dmsimardlol16:41
dmsimardno more bandersnatch ?16:41
dmsimardI've been quite a bit disconnected :/16:41
clarkbdmsimard: no more bandersnatch, however you 404'd on our wheel mirrors16:41
*** bobh has joined #openstack-infra16:42
clarkbhttp://mirror.dfw.rax.openstack.org/wheel/ubuntu-16.04-x86_64/black/ which is a proper 40416:42
dmsimardright16:42
clarkbit should install from the pypi/simple/black path instead in that case16:42
dmsimardbut it sort of gives up16:42
dmsimardhands in the air and all that16:42
*** Swami has quit IRC16:46
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Remove migrated legacy-glare-dsvm job  https://review.openstack.org/60507716:47
*** mdbooth has quit IRC16:49
dmsimardI'll do a recheck to see what happens... wanted to avoid doing one blindly considering the current state of the zuul queues16:50
clarkbdmsimard: I wonder if the reason it failed is it requires python>=3.6 and pip is smart enough to check that16:51
*** e0ne has joined #openstack-infra16:52
clarkbxenial doesn't have 3.6, it has 3.5 and I'm not sure what that tox target is running under either16:52
clarkbbut that is my best guess for why it couldn't find a package to install at this point16:52
dmsimardoh, does it ?16:53
dmsimardWell "It requires Python 3.6.0+ to run"16:53
dmsimardhuh, okay16:54
clarkbhttps://pypi.org/project/black/#history and pypi is aware of this on the left panel16:54
*** dtantsur is now known as dtantsur|afk16:54
*** e0ne has quit IRC16:54
dhellmannAJaeger : "update all repos"?16:54
openstackgerritMichael McCune proposed openstack-infra/irc-meetings master: update api-sig meeting times  https://review.openstack.org/60580816:57
*** dpawlik has quit IRC16:58
*** ykarel_ is now known as ykarel16:58
*** gfidente has quit IRC16:59
*** dpawlik has joined #openstack-infra16:59
dmsimardclarkb: TIL17:01
dmsimardthanks for the second pair of eyes :)17:01
dmsimardbtw is there anything in specific causing the zuul queues backlog right now ?17:01
*** gfidente has joined #openstack-infra17:02
clarkbdmsimard: same story as last week. Down a cloud region and tripleo gate resets and openstack gate resets eating up large amounts of capacity17:03
dmsimardyeah nodepool is running at max capacity for sure17:04
*** derekh has quit IRC17:04
*** diablo_rojo has quit IRC17:05
dmsimardPackethost is the one that is down ?17:05
*** jamesmcarthur has quit IRC17:06
clarkbwell it may be down too (I haven't checked if the port situation there is happier) but no it is bhs1 in ovh17:07
clarkbwe're still trying to sort out the fallout of their upgrade there17:07
dmsimardpackethost has net zero VMs, everything is failing17:07
dmsimardwell, according to http://grafana.openstack.org/d/U462abNik/nodepool-packethost?orgId=117:07
clarkbour VM images boot with working networking now, but we can't reliably boot them due to nova having trouble talking to the neutron ports api17:07
*** _erlon_ has quit IRC17:08
*** jpena is now known as jpena|off17:11
dmsimardmordred, dhellmann: interesting bug in pbr https://github.com/openstack-dev/pbr/blob/master/pbr/packaging.py#L104-L11017:11
dmsimardI have a package "ara-clients" so the egg name was "ara-clients" (perhaps should have been ara_clients?) but anyway, it led to the installation of ara itself and >=clients as the version17:12
openstackgerritClark Boylan proposed openstack-infra/system-config master: Add zuul user to bridge.openstack.org  https://review.openstack.org/60492517:12
openstackgerritClark Boylan proposed openstack-infra/system-config master: Manage user ssh keys from urls  https://review.openstack.org/60493217:12
*** trown is now known as trown|lunch17:12
clarkbmordred: corvus ^ I think I have testing sorted out for that now so won't bother WIPing the second change there17:12
*** ramishra has quit IRC17:13
*** agopi_ has joined #openstack-infra17:13
clarkbI'm going to cleanup bhs1 ports now then increase max servers to 8 in that region again to see if it works. That seemed to be ok last time it wasn't until we went to 80 that it had a sad17:13
clarkbalso mnaser did you see my paste of weird volumes in sjc1 ? I think they leaked but wanted to make sure you ahd an opportunity to look at them if you want before I delete them17:15
*** e0ne has joined #openstack-infra17:16
*** agopi has quit IRC17:16
clarkbinfra-root config-core for talking about opendev messaing with the foundation I'm looking at picking a time next week. I know many of us are going to be at ansiblefest which may be a good or bad thing for trying to talk about this (as it is in austin where many of the foundation folk are) Would we rather try for Friday next week after we return?17:18
clarkbI'm thinking that an IRC meeting/discussion like we had for the naming discussion would work well?17:19
mnaserclarkb: can you delete them please?17:19
clarkbmnaser: I can17:19
*** agopi_ is now known as agopi17:21
*** Emine has quit IRC17:21
clarkbmnaser: actually I can't because volume delete says the volume is attached but the server it is attached to no longer exist so server remove volume fails. volume set can change the state to error or similar but that is admin only17:25
mnaserclarkb: i can do that, if you have the IDs handy so i dont break $world ?17:25
clarkbmnaser: http://paste.openstack.org/show/730964/ volumes are on the left and servers are on the right17:26
*** harlowja has joined #openstack-infra17:27
*** rkukura has quit IRC17:28
*** aojea has quit IRC17:28
dhellmanndmsimard : that looks like a case where we want to be using rpartition instead of a regex17:32
dhellmanndmsimard : do you have time to work on a patch?17:32
*** Emine has joined #openstack-infra17:34
*** dpawlik has quit IRC17:36
*** dpawlik has joined #openstack-infra17:39
*** dpawlik has quit IRC17:39
*** dpawlik has joined #openstack-infra17:40
*** dpawlik has quit IRC17:40
dmsimarddhellmann: we've worked around it for now by using an underscore instead which doesn't trigger the issue -- I don't have much bandwidth at all right now but I can document the bug in a storyboard story ?17:42
dhellmanndmsimard : that would be good, thanks!17:42
mnaserclarkb: they should be goner17:44
*** jamesdenton has quit IRC17:44
clarkbmnaser: yup thanks17:45
mnaserbtw i noticed not all graphs are 'multiregion'17:45
mnaseri.e.: http://grafana.openstack.org/d/nuvIH5Imk/nodepool-vexxhost?orgId=1&from=now-3h&to=now17:45
clarkbmnaser: I think ianw was working on converting them?17:46
clarkbbut ya not all have been done iirc17:46
*** panda is now known as panda|off17:46
mnaserstill looks awesome though17:46
mnaserclarkb: i see error node attempts is zero now for the past few minutes so17:51
mnaseri think we're good!17:51
mnasersorry for that17:51
clarkbno problem17:52
AJaegerdhellmann: the missing openstack/ansible-role-tripleo-congress - you changed all other roles...17:55
dhellmannok17:55
AJaegerdhellmann: see my comment in https://review.openstack.org/#/c/598323/17:55
*** diablo_rojo has joined #openstack-infra17:56
openstackgerritDavid Shrewsbury proposed openstack-infra/zuul-jobs master: WIP: Add role to install kubernetes  https://review.openstack.org/60582317:59
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Implement a Kubernetes driver  https://review.openstack.org/53555718:00
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: WIP: Playing with k8s install  https://review.openstack.org/60580318:00
*** yamamoto has joined #openstack-infra18:00
*** sshnaidm is now known as sshnaidm|off18:03
*** dims_ is now known as dims18:05
openstackgerritDoug Hellmann proposed openstack-infra/project-config master: switch all official python projects to python3 publishing job  https://review.openstack.org/59832318:06
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix unreachable nodes detection  https://review.openstack.org/60282918:07
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Also retry the job if a post job failed with unreachable  https://review.openstack.org/60283018:07
openstackgerritDoug Hellmann proposed openstack-infra/project-config master: Remove release-openstack-python-without-pypi  https://review.openstack.org/53182918:08
*** ykarel_ has joined #openstack-infra18:08
clarkbbhs1 looks good with max servers set to 818:09
dhellmannAJaeger , mordred : it looks like we need to resolve the name problem for congress, keystone, magnum, and heat. I have filed tickets for the first 2, and contacted the owners of the other 218:09
clarkbI'll let it run there for antoher hour or two then try bumping it up to say 13 (for 10 total instances) after18:09
clarkbsee if we can find where it starts to fall over18:09
*** Swami has joined #openstack-infra18:10
*** ykarel has quit IRC18:11
*** bobh has quit IRC18:12
*** olivierb has quit IRC18:15
*** olivierb has joined #openstack-infra18:16
*** lpetrut has quit IRC18:21
openstackgerritMerged openstack-infra/system-config master: Double the gerrit regex backslashes  https://review.openstack.org/60569718:31
*** trown|lunch is now known as trown18:31
*** jamesdenton has joined #openstack-infra18:31
*** gfidente has quit IRC18:32
clarkbinfra-root can we get a non foundation review on https://review.openstack.org/#/c/605212/ ?18:32
*** graphene has quit IRC18:32
clarkbmordred: would you be willing to review my debuntu glean refactor stack at https://review.openstack.org/#/c/604225/2 I think that will help with general readability and understanding18:33
* fungi is home now, trying to catch up on scrollback18:34
*** graphene has joined #openstack-infra18:34
*** graphene has quit IRC18:34
cmurphyclarkb: speaking of that kata change https://review.openstack.org/60183118:35
clarkbthat would appear to be a prereq18:36
*** graphene has joined #openstack-infra18:36
*** eernst has quit IRC18:37
fungipabelanger: on jeepyb manage-projects and github things, i'm hoping in the very near future we can at least disable our use of github-related features in manage-projects entirely so it might end up even more un-tested/un-exercised thereafter. wondering if a separate utility might make more sense for that anyway18:38
clarkbinfra-root yall ok with me approving the futureparser change for static.o.o?18:38
clarkbactually is electiosn stuff hosted there?18:38
clarkbif so maybe we shoudl wait for after the TC elelection just to avoid any potential unhappyness with that ending today18:38
clarkbya governance.o.o is a vhost on static.o.o18:39
fungismcginnis: dhellmann: mordred was working on following https://www.python.org/dev/peps/pep-0541/#removal-of-an-abandoned-project for keystone, not sure where it got left off (i think he tried e-mail and leaving a github issue as contact attempts)18:39
fungioh, i see mordred replied already, ignore me18:39
smcginnis:)18:39
clarkbcmurphy: are erb entries like <%= scope['openstack_project::static::cert_file'] %> expected to work with futureparser?18:40
clarkbcmurphy: I'm not sure how the scope array differs from scope.lookupvar18:40
clarkb(maybe they are aliases for each other?)18:41
cmurphyclarkb: they're the same thing18:41
cmurphyit will still work18:41
clarkbin that case I see no issues with switching static.o.o to futureparser18:42
pabelangerfungi: Yup, that is a fair statement. I think adding support into ansible, might be the separate thing. Maybe better is I sync with kata folks, since they depend on github also18:42
clarkbpersia and diablo_rojo ^ I'd like to update how puppet runs on the server serving the elections/governance content. Would you prefer we wait for after the TC election to do that?18:42
*** olivierb has quit IRC18:43
*** olivierb has joined #openstack-infra18:43
*** quite has quit IRC18:44
*** bobh has joined #openstack-infra18:45
*** jistr has quit IRC18:47
*** quite has joined #openstack-infra18:48
*** jistr has joined #openstack-infra18:49
mordredcorvus: https://review.openstack.org/#/c/605092 has 3x +218:50
fungiclarkb: persia: diablo_rojo: tonyb: also be aware the post pipeline backlog means that when the change to close out the tc election does (eventually) merge it will likely still be a while before the election page gets updated18:51
AJaegerdhellmann: regarding pypi, I think we might have additional repos - like the ansible-role-tripleo ones - that are not set up at pypi. But most might need just creation. These are repos that were created after mordred did his change earlier this year18:51
fungi(we're up to about 81 hours for the oldest changes still in post)18:52
dhellmannAJaeger : we'll get those for free when we tag a release, right?18:52
dhellmannwe can no longer just register a name, we have to have something to upload18:52
pabelangerclarkb: re: opendev, next week is okay for meeting for me18:52
dhellmannI did that for a few regular projects that I found, but the ansible roles are unlikely to have name collisions18:53
AJaegerdhellmann: should be fine...18:53
fungiyeah, now that twine upload autoregisters projects on pypi people don't necessarily need to precreate them unless they want additional permissions on the entry or are afraid someone might squat the name18:53
dhellmannright, that's why I went ahead and created qinling, manila, and zaqar-ui18:53
clarkbpabelanger: would friday be preferable due to ansiblefest?18:53
mordredclarkb: +2 on 605212 - left off the +a- not sure if we're waiting on anything18:54
fungiand also, precreating projects on pypi is a pain since you need an actual sdist/wheel/egg to upload now18:54
clarkbmordred: cmurphy pointd out https://review.openstack.org/601831 should go in first18:54
AJaegerdhellmann: just wanted to point out that we know about those repos (keystone, magnum, congress,...) but that research was done 9 months ago and new repos created in the meantime. Thanks for registering the new repos.18:54
corvusmordred: thx, self-approved18:55
dhellmannAJaeger : yeah, I went through all of the repos that are slated to be part of stein so far and we had a total of 7 with issues, 4 of which are still being dealt with18:56
dhellmannI am OK with moving ahead and letting the first release attempt force the issue18:56
dhellmannteams can either change the sdist name in their setup.cfg or we'll have the pypi acls by by then18:57
dhellmannwe keep talking about doing this, and blocking on someone having time to deal with the names, and I think we should stop blocking18:57
mordredclarkb: whole glean stack looks great18:57
mordredclarkb: cool - +A on 60183118:58
diablo_rojoclarkb, fungi I am fine with it happening now, but I'll let persia share an opinion.18:58
mordredclarkb, corvus, fungi: if you're bored and want an easy one: https://review.openstack.org/#/c/605676/ - removes snapd from our servers18:58
fungiheh. bored? ;)18:58
mordredfungi: ikr?18:58
mordredjust pointing it out because I accidentally noticed it this morning, but it's not tied to any other efforts18:59
AJaegerdhellmann, smcginnis, can we really change publish-xstatic-to-pypi to publish-to-pypi-python3? See http://git.openstack.org/cgit/openstack-infra/openstack-zuul-jobs/tree/zuul.d/project-templates.yaml#n307 and http://git.openstack.org/cgit/openstack-infra/openstack-zuul-jobs/tree/zuul.d/jobs.yaml#n776 - don't we need the xstatic check?19:00
*** yamamoto has quit IRC19:01
dhellmannAJaeger : that check is now performed as part of requesting the release tag19:01
*** smarcet has quit IRC19:02
AJaegergreat, so I'll +2 https://review.openstack.org/#/c/59832319:03
dhellmanncool, thanks for doing such a careful review19:05
*** ykarel_ is now known as ykarel|away19:05
*** graphene has quit IRC19:07
*** graphene has joined #openstack-infra19:08
*** jistr has quit IRC19:08
*** jistr has joined #openstack-infra19:08
*** dayou has quit IRC19:10
*** ykarel|away has quit IRC19:12
dhellmannwould it make sense to have the priorities of pipelines later in the process get higher, so that we're not sending more jobs to a pipeline than it's going to be able to process?19:17
dhellmannbasically, prioritize finishing things over starting things19:18
dhellmannit looks like we have the post queue precedence set to low, so we're going to keep sending more jobs at it and starving it of resources19:20
fungidhellmann: at least zuul now uses a supercedent pipeline manager for post, so it will only enqueue up to two builds of jobs for any project+branch19:21
dhellmannsure, that's helpful too19:21
dhellmannI'm trying to remember what little of queuing theory I ever studied19:21
fungisubsequent merges to a project don't effectively increase the load there while there is already one waiting19:21
clarkbdhellmann: ya I've been mulling changing the priority on post19:22
dhellmannone project can't starve another, but we're starving them all right now19:22
clarkbI think we can probbly do it now that it is supercedent so the cost is low19:22
clarkbdhellmann: hwoever19:22
clarkbthe gate is super unhealthy for tripleo and openstack19:22
clarkband fixing those issues would probably be more globally beneficial19:22
dhellmannif check was low, gate was medium, and post was high and we were fully loaded then we would completely finish with 1 patch before starting another19:22
dhellmannI'm sure19:22
dhellmannI know the tripleo team has been working to address that19:23
dhellmannI don't know about openstack per se19:23
fungiso we would basically leave release/pre-release/tag, release-post and gate at highest precedence, check and post at normal precedence, and periodic/experimental at lowest precedence?19:23
pabelangerclarkb: I'll be traveling on friday most of the day19:23
*** jistr has quit IRC19:23
dhellmannfungi : I think we want post higher than gate19:23
openstackgerritMerged openstack-infra/system-config master: Redefine $listdomain for kata lists  https://review.openstack.org/60183119:23
dhellmannbecause we're not actually done with the patch until the post queue is done with it19:24
fungidhellmann: one up-side to making post highest precedence, i suppose, is that we could get rid of release-post19:24
dhellmannwe're "done enough" to start using it in other tests, I guess19:24
dhellmannyeah, true19:24
dhellmannthat would simplify things, too19:24
AJaegerwill this kind of high load continue? Normally the priorities are fine and two weeks ago I would have said: Don't change anything ;)19:24
dhellmannAJaeger : I have ~300 more patches to propose to fix tox settings :-)19:25
dhellmannI'm still going to trickle those in, but it's what made me look at the status this afternoon19:25
AJaegerfungi, dhellmann if post is higher than gate, then supercedent is not really needed, we would never "merge" post jobs like we do today19:25
dhellmannAJaeger : we do still need it, because we might have 3 things pass the gate queue at the same time19:25
AJaegerdhellmann: it's soo bad already, 300 more less won't change anything ;/19:26
dhellmannfrom the same project -- we see that with releases pretty often19:26
dhellmannAJaeger : yes, well, I'm not going to add any more jobs today, I was just saying that's what made me go look at the queue depth19:26
clarkbif gate and post had the same prioriry I think that would work19:26
AJaegerdhellmann: question for me is what is normal. With current load, I agree, we need to give post higher prio. With "normal" load, I would not change it.19:26
*** jistr has joined #openstack-infra19:26
dhellmannclarkb : no, I really think we need to treat the series of pipelines as a queue of its own, and throttle things going from one stage to the next if it's going to mean the work can't actually complete19:27
clarkbdhellmann: when they are the same though they would compete at roughly fair distribution of resources19:27
dhellmannideally we would have a large check queue, a reasonably deep gate, and anything that makes it through the gate would be home free in post so that would be a tiny queue19:28
*** jistr has quit IRC19:28
dhellmannbut we don't want gate and post competing, that's my point19:28
dhellmannwe view the patch as "done" when it leaves the gate and merges, but we often don't see the build artifacts until after the post job is done so the patch isn't really completely done after it merges19:29
dhellmannthe post jobs don't tend to be expensive but we need them to run19:29
fungii think we have different work reflected in those different pipelines though, to some extent19:29
*** jistr has joined #openstack-infra19:29
dhellmannyes, that's true, we do19:29
dhellmannI think we've undervalued the post job work19:29
clarkbsome post jobs are expensive, but probably not tripleo 3 node container test that is going to timeout after 3 hours expensive19:29
clarkbdhellmann: I agree19:29
dhellmannfor example, we merge a doc patch but don't see new documentation live until after post. that's going to be several days from now it seems19:30
*** bobh has quit IRC19:30
fungiactivity in the check pipeline is likely to be the result of developers pushing and iterating on changes, while gate pipeline activity is a result of reviewers approving changes and post is a result of automation performing actions based on changes which were able to survive gating19:30
corvuswhen we're able to make the promote pipeline, i think we can make it higher precedence fairly easily because jobs in there will be very low impact.19:30
*** bobh has joined #openstack-infra19:30
openstackgerritMerged openstack-infra/system-config master: Remove snapd from servers  https://review.openstack.org/60567619:30
mordredcorvus: aroo? we already  have a promote pipeline?19:31
dhellmannclarkb : how hard is it to get stats on which repos are consuming nodes in each pipeline, and failure rates?19:31
corvus(and we'll be able to use promote for tarballs and docs)  but we have a bit more work to do there.19:32
fungiif felt to me like we basically optimized our pipelines based on which activity we encourage more: highest is people reviewing and approving changes, next is people pushing new stuff that is of perhaps dubious quality, and last is automation (the fact that people are seeking near-realtime feedback from the automation in the low-priority pipeline is something i hadn't considered however)19:32
dhellmannif I have numbers I can try to express the issues the tripleo situation may be causing for us19:32
corvusmordred: we need some more support in zuul before we can reliably use it (especially if we do swift logs)19:32
clarkbdhellmann: the hardest part is figuring out how many nodes per job is consumed since I don't think we directly record that in the data processing19:32
mordredcorvus: ah - gotcha19:32
clarkbcorvus: ^ does the zuul sql db have nodeset info like that?19:32
dhellmannfungi : I agree, that's what we did. I think that made sense at one time. Maybe it's worth revisiting?19:33
fungii always welcome an opportunity to revisit assumptions19:33
clarkbdhellmann: we record job runtimes and status in the zuul sql db and in graphite.19:33
clarkbI think the one piece missing is what number of resources that job locks during that time19:34
corvusclarkb: no it doesn't19:34
dhellmannhow many jobs use more than 1 node?19:34
clarkbdhellmann: in tripleo? most of them19:34
*** jistr has quit IRC19:34
dhellmannok, if you give me numbers of each job, I can turn that into numbers of nodes19:34
clarkbtop of tripleo gate is 6/9 are multinode19:35
dhellmannfailure rates would be good, too19:35
dhellmannI don't know how to collect the data, but if you help with that I can go argue internally to ease back a bit19:36
corvusi do know the tripleo folks are trying to move some tests into single-node jobs19:37
mordreddhellmann: or alternately to provide some node resources19:37
dhellmanncorvus : yeah19:37
*** jistr has joined #openstack-infra19:37
dhellmannmordred : or that19:37
clarkbdhellmann: the place I end up looking a lot is http://status.openstack.org/elastic-recheck/data/others.html that shows 137 faisl in two tripleo multinode container jobs oaver the last 10 days in the gate19:37
dhellmannI guess the point is I'll do that part but I need data to make the argument19:37
mordred++19:37
*** dayou has joined #openstack-infra19:37
*** jamesmcarthur has joined #openstack-infra19:37
clarkbthat doesn't give us a complete view, but it is the "world is on fire" check with rough sense of how on fire19:38
*** hamzy has joined #openstack-infra19:38
dhellmannclarkb : I don't look at that page often enough to know how to read it, which numbers are you looking at? I see 82 fails for tripleo-ci-centos-7-containers-multinode right at the top19:38
clarkbfor curating global values the neutron team has dashbaord like http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?orgId=1, The tripleo team could put something similar together using grafyaml19:39
dhellmannoh, and I guess the next job down has 55 for 137 total19:39
clarkbdhellmann: ya I added those two together19:39
dhellmannk, yeah, I worked it out19:39
*** jistr has quit IRC19:39
dhellmannbuild-openstack-sphinx-docs is failing pretty often, too19:40
clarkbfwiw openstack is super flaky right now too19:40
clarkbit just consumes a smaller portion of resources on reset due to smaller queue19:41
dhellmannyeah19:41
dhellmannthough I wonder how much that contributes to the tripleo failures, too19:41
mordredI'm having fairly consistently inconsistent devstack behavior in sdk functional tests19:41
*** e0ne has quit IRC19:42
mordredon any given patch I'm rather likely to have something fail server-side and need to recheck - so I'd bet the tripleo folks are getting hit by that as well to some degree19:42
clarkbfwiw I've tried to read the tripleo logs to undersatnd how things are failing but get lost in the ansible drives mistral which drives ansible which runs puppet in docker19:43
dhellmannclarkb : yeah, don't even get me started19:43
AJaegerdhellmann: build-openstack-sphinx-docs was broken on many stable branches, a fallout from the way we changed the jobs. On master I'm not aware of anything19:44
*** hamzy has quit IRC19:44
dhellmannAJaeger : ok, I didn't look at any of the failures, I just noticed it was 3rd on that page clarkb linked19:47
*** e0ne has joined #openstack-infra19:47
*** jistr has joined #openstack-infra19:49
*** e0ne has quit IRC19:50
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Provide some accounting of node usage in logs  https://review.openstack.org/60585619:52
corvus2018-09-27 12:51:53,271 zuul.nodepool                    INFO     Nodeset <NodeSet [<Node 0000000001 ('controller',):label1>]> with 1 nodes was in use for 0.006144285202026367 seconds for build <Build 5eae74f9be32448880a78994d0281de2 of project-test1 on <Worker localhost6.localdomain6>>19:52
corvusclarkb, dhellmann: ^ that change will cause zuul to emit log lines like that19:52
clarkbneat19:52
corvusmaybe that would help.  if we can ever land the change and restart.  ;)19:52
dhellmannheh19:53
dhellmannthanks, corvus19:53
corvusoh we're missing the project name19:54
corvus(project-test1 is a job name -- for hysterical raisins)19:54
*** pcaruana has quit IRC19:54
clarkbdhellmann: it might also be helpful to prioritize bug fixes rather than feature additions?19:56
dhellmannclarkb : I know at one point juan kicked everything out of the tripleo gate to clear it up and get some stabilizing fixes through, but I don't know if there has been a real policy on that19:57
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Provide some accounting of node usage in logs  https://review.openstack.org/60585619:57
dhellmannhow many jobs are we running non-voting?19:58
clarkbthere were a bunch in the tripleo gate but I think those did get cleaned up. There are a few outliers I've noticed but not like before19:59
clarkbor do you mean in general?19:59
dhellmannin general19:59
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Indicate whether a build is voting in the logs  https://review.openstack.org/60585719:59
dhellmannI'm just thinking of classes of jobs that we could kill or something. I guess there aren't likely to be many.19:59
openstackgerritMerged openstack-infra/glean master: Use common function for debian bond mode  https://review.openstack.org/60422120:00
openstackgerritMerged openstack-infra/glean master: Check same debian interface path everywhere  https://review.openstack.org/60422220:00
clarkbya I'm actually not sure how to answer that question with distributed zuul job configs now20:00
corvusthe sql db *does* include whether it's voting20:00
clarkbah maybe ^ is how to answer that question then20:00
corvusso we can semi-easily find out how many we've run (and for what projects)20:00
dhellmanncorvus: what do you think about my theory about setting pipeline priorities so they get progressively higher?20:00
dhellmannyeah, that would be useful to have, too. we could ask teams to turn off non-voting jobs for the time being20:01
*** bobh has quit IRC20:01
openstackgerritMerged openstack-infra/glean master: Manage the debian interface header in one place  https://review.openstack.org/60422320:02
corvusdhellmann: when we have promote, i'm all for it.  but right now, we have our historical mix of important + unimportant stuff in post (i think), so it's a harder call.  for example, we still have a lot of coverage jobs, which we threw in post because they're not important.  but they're heavy resource users.20:03
dhellmannhmm, yes, that's a good point20:03
clarkbAJaeger has been working to clean those up fwiw20:03
fungiby moving them to check instead20:03
fungiyes20:03
corvusmaybe it's as simple as removing all the coverage jobs and then there's nothing unimportant left in post?20:04
clarkbya but check is lower priority so probably ok20:04
clarkbcorvus: that is probably the vast majority of unimportant stuff in post20:04
corvusat any rate, that's my main concern -- basically if we're shifting the importance of post, we should make sure what's in it is aligned with the new importance.20:04
corvusskimming http://zuul.openstack.org/builds.html?pipeline=post looks pretty tame20:06
corvusmaybe we're close enough to make it worth doing now20:06
*** smarcet has joined #openstack-infra20:07
*** e0ne has joined #openstack-infra20:08
*** diablo_rojo has quit IRC20:08
openstackgerritMerged openstack-infra/glean master: Consistent debian interface control flow  https://review.openstack.org/60422420:09
openstackgerritMerged openstack-infra/glean master: Debian interface config set bond once  https://review.openstack.org/60422520:09
*** hamzy has joined #openstack-infra20:09
*** diablo_rojo has joined #openstack-infra20:11
*** hamzy has quit IRC20:20
*** hamzy has joined #openstack-infra20:20
clarkbusing the data in the zuul status.json for the current gate queue and the heuristic of any job with multinode in its name is worth 2 nodes and all other jobs are 1 node we get ~234 nodes for the current tripleo active window20:22
clarkbthere is only one chagne in the openstack integrated gate so it won't give us an idea of how painful it was when those were in a long queue20:25
*** hamzy has quit IRC20:25
*** diablo_rojo has quit IRC20:26
*** Emine has quit IRC20:27
*** smarcet has quit IRC20:27
clarkbbhs1 seems happy with a max servers of 1320:28
*** Emine has joined #openstack-infra20:28
clarkbI wonder if the issue is more to do with thundering herds20:28
*** hamzy has joined #openstack-infra20:28
mordredclarkb: that would make sense if the underlying issues are that nova is having issues talking to neutron when it happens20:29
ianwclarkb / pabelanger: not sure if you saw https://www.nlnetlabs.nl/bugs-script/show_bug.cgi?id=4188 but it looks like we tickled a unbound bug with the unconfigured ipv620:32
openstackwww.nlnetlabs.nl bug 4188 in server "IPv6 forwarders without ipv6 result in SERVFAIL" [Enhancement,Resolved: fixed] - Assigned to unbound-team20:32
ianwclarkb / mnaser : graphs should be generated by a script, let me see20:32
*** hamzy has quit IRC20:34
pabelangerianw: woot, yah for CIing the internet20:34
ianwmnaser: http://grafana.openstack.org/d/nuvIH5Imk/nodepool-vexxhost?orgId=1&from=now-3h&to=now is looking about right to me?  up top left there is a region drop down box?20:34
*** hamzy has joined #openstack-infra20:34
pabelangerianw: we should see about backporting fix in fedora20:34
ianwpabelanger: yep, i only finished up late last night with it ... but i'll file a bug.  given this, i think a smaller workaround than the one i proposed (https://review.openstack.org/605583) that we can revert when it's fixed is appropriate20:35
*** dayou has quit IRC20:38
openstackgerritMerged openstack-infra/openstack-zuul-jobs master: Add zuul envar for kata tests  https://review.openstack.org/60577320:39
mordredfuentess: ^^ woot20:40
*** smarcet has joined #openstack-infra20:40
*** dpawlik has joined #openstack-infra20:42
openstackgerritIan Wienand proposed openstack-infra/zuul-sphinx master: Add attr_overview directive  https://review.openstack.org/60498020:43
*** kgiusti has left #openstack-infra20:43
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] initial port of install-docker role  https://review.openstack.org/60558520:43
*** hamzy has quit IRC20:43
*** dpawlik has quit IRC20:46
*** jamesmcarthur has quit IRC20:47
*** ansmith has quit IRC20:48
clarkbhttp://paste.openstack.org/show/731045/ is a quick and dirty script to get an idea of what the node request distribution is20:48
*** anteaya has joined #openstack-infra20:51
ianwmordred : ^^ i wasn't sure how we want to do the matching for docker hosts.  my initial thought there is to add a subdomain for docker hosts20:51
ianwand i'm determined to get "quay" used in openstack after it was rejected for the Q series name ;)20:52
*** smarcet has quit IRC20:52
clarkbwhat that breakdown doesn't show is the node requests * time consumed per request20:53
clarkbso still not the whole picture20:54
notmynamethere are some swift jobs in the periodic section and periodic-stable that have been there from 60+ hours. not sure what they are, but maybe they could be cleaned up if they'll be automatically retriggered20:55
clarkbnotmyname: those jobs are all just queued and zuul isn't queuing new ones every 24 hours so I think we are good to let them sit like that20:56
notmynameok20:56
mordredianw: well - I think once I get the new groups plugin finished, we can just list them in a yaml list20:58
mordredianw: I don't think the subdomain will work out - because we might eventually have docker installed on all of our hosts20:58
mordredianw: (although I grok your desire :) )20:59
*** trown is now known as trown|outtypewww20:59
*** yamamoto has joined #openstack-infra20:59
mordredin fact, lemme go update that to not try to do the disabled bit21:00
notmynameclarkb: I updated your pastebin'd script (and put the changes in github's gist because I can do multiple files there): https://gist.github.com/notmyname/8bf3dbcb7195250eb76f2a1a8996fb0021:05
notmynameclarkb: you'll need to install https://pypi.org/project/ascii_graph/21:05
clarkbneat21:06
*** dayou has joined #openstack-infra21:06
notmynameclarkb: *super* useful for a quick visual check of what you're showing :-)21:06
clarkbya that graph makes it way easier to understand21:06
notmynamethose are some big numbers at the end! and that's a conservative estimate since the "multinode" check just adds 2, regardless of how many nodes are used21:08
clarkbyup21:08
openstackgerritMonty Taylor proposed openstack-infra/system-config master: Add yamlgroup inventory plugin  https://review.openstack.org/60238521:09
mordredianw, clarkb: ^^ there it is, without trying to do the disabled magic21:09
*** rh-jelabarre has quit IRC21:10
notmynameclarkb: any way to see what a "normal" multinode job uses? eg I know swift's one multinode job uses 5 nodes21:12
ianwmordred: what's with the GPL bits there?  it's from somewhere else?21:12
clarkbnotmyname: not easily, that is what we were discussiing early. The major piece of data we don't record after jobs finish is how many nodes they used21:13
clarkbnotmyname: what we could do is build a giant lookup table but thats a lot of work likely21:13
clarkbnotmyname: corvus also started work to have zuul record some of this so hopefulyl we'll be able to answer that in the future21:13
clarkbianw: its from ansible21:13
*** e0ne has quit IRC21:14
dhellmannclarkb : isn't the zuul configuration already the lookup table? can't we get the job definitions programmatically?21:14
clarkbdhellmann: if you load the entire zuul config like zuul does21:15
dhellmannI thought that API was there, but maybe not21:15
notmynameclarkb: looks like there's 66 multinode jobs defined on http://zuul.openstack.org/jobs.html (ie a search for "multinode-"). is there any way from zuul to load the config and check the node set or something?21:15
mordredianw: it's an inventory plugin that subclasses GPL code - ansible plugins do not get the gpl exception that ansible modules get21:15
*** jamesdenton has quit IRC21:15
ianwahh21:16
mordrednotmyname: we've got patches in flight that add the ability to get the job data for individual jobs21:16
clarkblet me check if the api has that info in it yet21:16
clarkbsome  of the job config data is exposed21:16
notmynamemordred: it will only be 2, I mean 8, I mean 27 hours until they land! ;-)21:16
mordrednotmyname: heh21:17
mordredoh - actually- I think the API bit landed, it's just the dashboard page outstanding21:17
*** jamesmcarthur has joined #openstack-infra21:17
mriedemso uh, any ideas on resolving this? http://status.openstack.org/elastic-recheck/gate.html#144913621:18
*** bobh has joined #openstack-infra21:18
mriedem 899 fails in              10 days21:18
mordredhttp://logs.openstack.org/48/597048/7/check/zuul-build-dashboard/44bca93/npm/html is a draft copy of the updated dashboard - if you click to 'jobs' then to 'devstack-multinode' you can see the nodes21:18
clarkbmriedem: i think we did solve that one21:18
*** bobh has quit IRC21:18
clarkbmriedem: the first blip was limestone mirror crashing the second blip was ovh gra1 mirror crashing21:18
mriedemget right out of town21:18
mriedemi guess we'll know when indexing is happening again21:18
clarkbmriedem: limestone fixed their issue and a reboot in gra1 seemed to have fixed that one21:18
mordrednotmyname: http://zuul.openstack.org/api/job/devstack-multinode is the api call21:19
notmynameanyone ever played with the idea of zuul tracking usage to do "billing" tracking per project? maybe that's a dangerous idea with the wrong sort of incentives...21:19
clarkbmordred: problem is we get the nodeset name in some cases http://zuul.openstack.org/api/job/tripleo-ci-centos-7-containers-multinode21:19
clarkboh wait no thats just a variable, likely have to look at the parent and recurse until nodset info is found21:20
mordredclarkb: we might not have it attached to a job though - it might be a top-level defined nodeset21:20
clarkbnotmyname: ya I don't know that we want to do that as much as encourage teams to be good stewards of the resources? like if your testing doesn't work at all then stop approving things that aren't bug fixes21:20
mordrednotmyname: not in any way that would be even more than brainstorming ... for the reasons you would imagine and that clarkb mentions - but in considering multitenant zuul we've definitely pondered ideas like per-tenant build node resources and/or accounting21:21
mordredbut I think that would be a dangerous amount of siloing to introduce into openstack itself21:22
*** dhill_ has quit IRC21:22
notmynamemy employer has a customer that does a huge amount of pipelined data processing (DNA sequences). they also need to do chargeback for the different groups that are running jobs, using space, etc. zuul happens to be a pretty good pipeline job handler. having some sort of ability to track what "project" uses what resources would make it potentially useful in cases like this21:24
*** agopi_ has joined #openstack-infra21:24
clarkbanother good steward action is to remove all the non voting jobs from the gate queue21:25
mordrednotmyname: yah - that's precisely the sort of use case that has made me ponder such accounting before21:25
clarkbI think it would be useful to track it more explicitly even if we aren't doing charging/billing/etc based on it too21:25
mordredclarkb: I think we'd need to add a /nodesets endpoint to get the top-level defined nodeset objects - and then either recurse through parents - or if we get a named nodeset grab it from /nodesets21:26
clarkbif your month over month usage skyrockets maybe that indicates a previously unknown issue21:26
mordredclarkb: indeed21:26
*** agopi has quit IRC21:27
mordredclarkb: alternately, perhaps we should consider expanding that server-side - so that everything looks like devstack-multinode even if it's referencing a nodeset object - although we'd want to keep the name: field if it is referencing one21:27
clarkbmordred: expanding it server side so the client gets the raw data would probably be the most firendly api wise21:28
mordredyah21:28
clarkbfwiw I also don't think the infra team wants to be in the role of policing openstack21:28
*** anteaya has quit IRC21:28
mordrednope21:28
clarkbunfortunately we've had this role in various ways for various reasons21:28
mordredBUT - I do think providing data so that self-policing is possible is a thing we like doing21:28
clarkb++21:29
mordredsuch as the grafana stuff21:29
mordredclarkb, corvus: the react patch and the puppet patch to go with it are both 2x+2 ... perhaps we should talk about rollout strategy?21:30
mordredlike, should we maybe put zuul.o.o in the emergency list - land both, then do a puppet run that should update the html and the apache at the same time?21:31
clarkbmordred: I've not completely tracked that, but we need to update apache in order for the new js to work?21:31
*** anteaya has joined #openstack-infra21:31
clarkbif that is the case then ya something like what you describe is probably ideal21:31
mordredclarkb: yah - the rewrite rules are different21:31
mordredclarkb: https://review.openstack.org/#/c/604251/ is the puppet patch21:32
*** agopi_ is now known as agopi21:33
clarkbmordred: can the should be removed in a few weeks section be remoevd to simplify things?21:33
mordredclarkb: yah - probably so :)21:34
clarkband we must have a singular index.html that does all the things but api ?21:34
*** jamesmcarthur has quit IRC21:35
mordredalthough not ALL of it wants to be removed - it's really just that one line21:35
clarkbwhat will make this data tracking much better is getting the nodeset information into the zuul sql db21:36
clarkbbecause then we can do time * resources and show actual usage21:37
*** jamesmcarthur has joined #openstack-infra21:37
clarkbright now we are sort of integrating over node requests to hand aave around what that number is21:37
*** jamesmcarthur has quit IRC21:39
*** ansmith has joined #openstack-infra21:40
clarkbI've bumped bhs1 up to 20 instances as it continues to not leak nodes there21:41
*** jamesmcarthur has joined #openstack-infra21:42
*** felipemonteiro has joined #openstack-infra21:43
mnaserianw: the stuff under api operation seems to have ca-ymq-1 only21:46
*** yamamoto has quit IRC21:48
clarkbhttps://review.openstack.org/#/c/589068/ just failed and caused a rest of the entire gate21:48
clarkbjaosorior: ^ http://logs.openstack.org/68/589068/29/gate/tripleo-ci-centos-7-scenario002-multinode-oooq-container/fc4a5fe/job-output.txt.gz#_2018-09-27_21_46_40_464305 the post run timed out21:49
openstackgerritMerged openstack-infra/system-config master: Creates 'embargo-notice' list  https://review.openstack.org/60521221:54
clarkbjaosorior: it would be interesting to profile what the cost in collecting logs is21:54
clarkbjaosorior: we can't do that on that change because I don't think we got all the logs needed to debug that but something to dig into21:55
clarkbmaybe we can trim the list of things logged again21:55
*** mriedem has quit IRC21:55
mordredclarkb: ugh. I *REALLY* need to git the log collecting stuff reworked21:56
mordreds/git/get/21:56
mordredthat's like - 9 months behind schedule21:56
ianwmnaser: interesting ... i think it's a problem with the actual data for sjc ...21:56
*** diablo_rojo has joined #openstack-infra21:57
clarkbmordred: sure but one wonders also if http://logs.openstack.org/68/589068/29/check/tripleo-ci-centos-7-scenario002-multinode-oooq-container/cec555e/logs/undercloud/home/zuul/tripleo-heat-installer-templates/ is actually necessary21:57
clarkbthats all stuff in git somewhere I'm sure21:57
mordredclarkb: yah. good point21:58
*** rlandy is now known as rlandy|biab22:03
*** smarcet has joined #openstack-infra22:04
ianwmordred: remember anything about ComputePostServers possibly changing stats reporting somehow?22:04
*** eernst has joined #openstack-infra22:05
ianwstats.timers.nodepool.task.vexxhost-ca-ymq-1.ComputePostServersDetail exsists, but stats.timers.nodepool.task.vexxhost-sjc1.ComputePostServersDetail  doesn't22:05
*** mriedem has joined #openstack-infra22:07
*** boden has quit IRC22:08
*** scarab_ has joined #openstack-infra22:09
*** scarab_ has quit IRC22:11
*** agopi is now known as agopi|brb22:11
mordredianw: it shoudnt have changed22:12
mordredstats.timers.nodepool.task.vexxhost-sjc1.ComputePostServersDetail seems right to me22:12
ianwmordred: where are you seeing that?  unless i'm blind that isn't in graphite22:15
*** agopi|brb has quit IRC22:16
*** graphene has quit IRC22:16
*** graphene has joined #openstack-infra22:17
*** fuentess has quit IRC22:19
ianwit doesn't seem like the ComputePostServersDetail task gets run on nl03, which is launch sjc nodes.  i wonder what's different22:20
ianw(from grepping the logs for it)22:20
ianwdoesn't seem like any of the launcher are putting that stat in ...22:24
ianwhang on, no Detail ...22:28
ianw"Manager vexxhost-ca-ymq-1 running task ComputePostServers (queue 0)" runs all the time22:28
ianwbut never for sjc1 ... weird22:28
*** rcernin has joined #openstack-infra22:29
ianwbut we have ComputePostOs-volumes_boot22:30
ianw... because, i guess, they're all marked as "boot-from-volume"22:33
ianwso a) the graph should capture that and b) i think the nodepool task->stat normalisation might want to fix that up too22:34
openstackgerritMonty Taylor proposed openstack-infra/zuul master: Add nodesets API route  https://review.openstack.org/60587722:34
*** jtomasek has quit IRC22:35
*** jamesmcarthur has quit IRC22:37
mordredianw: hrm. so - actually, ComputePostServersDetail doesn't make much sense ... it should be ComputePostServers - which is POST {compute}/servers - which is the rest call, where {compute} is the endpoint of the 'compute' service22:37
mordredianw: the other, ComputePostOs-volumes - you're right - is ugy and should get fixed22:38
*** agopi|brb has joined #openstack-infra22:38
mordredbut it's the (current) translation of POST {compute}/os-volumes_boot - which I kid you not is the endpoint you submit something to to boot from volume22:38
mordredianw: the only place we have Detail is ComputeGetServersDetail - because we run GET {compute}/servers/detail - as GET {compute}/servers is almost useless22:39
mordredianw: do you think we should update the transformation to be to ComputePostOsVolumesBoot ?22:40
*** tpsilva has quit IRC22:42
*** dpawlik has joined #openstack-infra22:42
*** anteaya has quit IRC22:43
clarkbok I think 20 to 25 may have tipped us over into bhs1 is less happy22:43
clarkbif these additional instances do fail to boot I will dial it back to 20 (better than nothing) and we can see if it stabilizes22:43
fungiunfortunate the problem seems to be persisting22:44
clarkbactually I take that back these nodes all flipped to ready22:45
clarkbthey seemed to take longer to boot though and their ports showed as DOWN, I guess they go DOWN to UP to DOWN again through the lifetime of the instance22:46
clarkbI'll keep it at 25 and see how it does22:46
*** dpawlik has quit IRC22:47
*** anteaya has joined #openstack-infra22:48
clarkbfungi: I'm beginning to think it is a thundering herd type problem22:51
*** jamesmcarthur has joined #openstack-infra22:53
*** tosky has quit IRC22:53
corvusmordred, clarkb: re react, i'm inclined to do what mordred suggests (or, perhaps depending on the time, merging both without adding zuul to emergency) when there's a little more predictability to merge times.  so... maybe friday or saturday?22:55
fungicomparatively small herd22:55
fungifri/sat sounds like a reasonable time for that transition22:56
*** jamesmcarthur has quit IRC22:57
clarkbwfm22:58
*** smarcet has quit IRC22:59
ianwmordred: yep, i think the transforms will help, i'll take a look23:02
*** jamesmcarthur has joined #openstack-infra23:07
*** olivierb has quit IRC23:08
*** olivierb has joined #openstack-infra23:11
*** jamesmcarthur has quit IRC23:12
*** dpawlik has joined #openstack-infra23:20
*** dpawlik has quit IRC23:25
*** mriedem has quit IRC23:26
*** bobh has joined #openstack-infra23:28
*** olivierb has quit IRC23:29
*** anteaya has quit IRC23:32
*** anteaya has joined #openstack-infra23:39
openstackgerritJames E. Blair proposed openstack-infra/project-config master: Grafana: set zuul node requests yaxis min  https://review.openstack.org/60588623:40
*** Swami has quit IRC23:45
pabelanger+323:46
pabelanger:)23:46
pabelangerclarkb: notmyname: script is neat, going to play with it23:46
ianwmemories of my thesis supervisor and the huge red marks he'd put on things if you used a misleading graph axis23:47
ianwyou can make anything look good by fiddling the axes :)23:47
corvusyay! we're almost through the backlog! oh.  nope.23:48
pabelangerseems we always have about 100 nodes deleting, over the last 6 hours23:50
pabelangeroh23:51
pabelangerhttp://grafana.openstack.org/dashboard/db/nodepool-packethost23:51
clarkbyes thats been a known issue since the ptg I think23:51
clarkbport leaks23:51
pabelangerah23:51
clarkbstudarus occasionally fiddles it or upgrades a piece of the cloud so havent turned it off23:51
pabelangeryah, doesn't look like much nodes used there last 6hours23:52
*** jamesmcarthur has joined #openstack-infra23:52
clarkbthe problem is basically that ports leak then boots fail. If you delete the ports boots work again for a time23:53
clarkbnot every port is leaked23:53
clarkbit is weird23:53
pabelangeris it just studarus working on the cloud?23:53
pabelangeryah, last 30days doesn't look healthy23:54
pabelangerhttp://grafana.openstack.org/d/U462abNik/nodepool-packethost?orgId=1&from=now-30d&to=now23:54
*** felipemonteiro has quit IRC23:55
clarkbplatform9 supports the cloud distro thing too23:55
clarkband ya we looked at it at the ptg a bit23:56
clarkbbut havent tracked down where neutron is losing those ports23:56
clarkbdid learn you cant rely on explicit quotas with noca though as you will lag quota updates23:56
clarkbso apparebtly clouds like osic gave us a 10% or so buffer on our quotas23:56
*** rlandy|biab is now known as rlandy23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!