Monday, 2019-04-15

*** yamamoto has joined #openstack-infra00:05
*** hwoarang has quit IRC00:07
*** hwoarang has joined #openstack-infra00:10
*** ijw has quit IRC00:15
*** ijw has joined #openstack-infra00:15
*** ijw has quit IRC00:16
*** ijw has joined #openstack-infra00:16
openstackgerritPaul Belanger proposed openstack-infra/zuul-jobs master: DNM: add ansible_network_os to vars  https://review.openstack.org/65242400:25
*** bobh has joined #openstack-infra00:31
*** Goneri has joined #openstack-infra00:35
*** gregoryo has joined #openstack-infra00:52
*** zhurong has joined #openstack-infra00:54
*** Goneri has quit IRC01:03
openstackgerritIan Wienand proposed openstack-infra/system-config master: Update grafana for new archive repo  https://review.openstack.org/65244301:04
ianwclarkb: ^ hail mary change there ... maybe it works, maybe not ...01:05
clarkbI would +2 but am on a phone and logins dont work on mobile anymore01:08
clarkbso feel free to treat this note as a +2 on that I guess01:08
ianwclarkb: heh, not urgent :)  it's 2+ years of changes so i'm not that confident it will work anyway.  if it's too much effort, it seems we can turn of repo management in the existing code and just do it externally; or fork from the existing release01:14
*** bobh has quit IRC01:19
ianwclarkb: hrm, so puppet3 fails -- are we dropping puppet3 jobs for puppet4 hosts now?01:20
clarkbI havent done that cleanup yet but I think we can01:33
*** ijw has quit IRC01:35
openstackgerritIan Wienand proposed openstack-infra/system-config master: Update grafana for new archive repo  https://review.openstack.org/65244301:43
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add Puppet-Version: !X skip to apply tests  https://review.openstack.org/65247201:43
*** ykarel has joined #openstack-infra01:52
*** masayukig has joined #openstack-infra01:55
*** hwoarang has quit IRC02:00
*** ijw has joined #openstack-infra02:01
*** hwoarang has joined #openstack-infra02:02
*** dave-mccowan has quit IRC02:03
*** igordc has joined #openstack-infra02:04
openstackgerritIan Wienand proposed openstack-infra/system-config master: Add Puppet-Version: !X skip to apply tests  https://review.openstack.org/65247202:05
openstackgerritIan Wienand proposed openstack-infra/system-config master: Update grafana for new archive repo  https://review.openstack.org/65244302:05
*** masayukig has quit IRC02:10
*** masayukig has joined #openstack-infra02:10
*** yamamoto has quit IRC02:12
*** yamamoto has joined #openstack-infra02:12
*** dave-mccowan has joined #openstack-infra02:13
*** dave-mccowan has quit IRC02:17
*** jamesmcarthur has joined #openstack-infra02:19
*** jamesmcarthur has quit IRC02:26
*** jamesmcarthur has joined #openstack-infra02:26
*** jamesmcarthur has quit IRC02:32
*** hwoarang has quit IRC02:39
*** hwoarang has joined #openstack-infra02:43
*** jamesmcarthur has joined #openstack-infra02:46
*** jamesmcarthur has quit IRC02:49
*** jamesmcarthur has joined #openstack-infra02:50
*** jamesmcarthur has quit IRC02:54
*** hwoarang has quit IRC03:16
*** hwoarang has joined #openstack-infra03:23
*** bhavikdbavishi has joined #openstack-infra03:25
*** bhavikdbavishi1 has joined #openstack-infra03:28
*** psachin has joined #openstack-infra03:29
*** bhavikdbavishi has quit IRC03:30
*** bhavikdbavishi1 is now known as bhavikdbavishi03:30
*** jamesmcarthur has joined #openstack-infra03:31
*** jamesmcarthur has quit IRC03:32
*** ramishra has joined #openstack-infra03:36
*** tonyb[m] has joined #openstack-infra03:40
*** igordc has quit IRC03:50
*** raukadah is now known as chandankumar03:53
*** ijw has quit IRC04:01
*** ijw_ has joined #openstack-infra04:01
*** imacdonn has quit IRC04:05
*** imacdonn has joined #openstack-infra04:06
*** ykarel has quit IRC04:15
*** udesale has joined #openstack-infra04:24
*** ijw_ has quit IRC04:29
*** ijw has joined #openstack-infra04:30
*** ykarel has joined #openstack-infra04:34
*** ykarel_ has joined #openstack-infra04:35
*** ijw has quit IRC04:36
*** ijw has joined #openstack-infra04:37
*** whoami-rajat has joined #openstack-infra04:37
*** ykarel has quit IRC04:38
*** janki has joined #openstack-infra04:44
*** bhavikdbavishi1 has joined #openstack-infra04:48
*** jaosorior has joined #openstack-infra04:48
*** hongbin has quit IRC04:49
*** bhavikdbavishi has quit IRC04:49
*** bhavikdbavishi1 is now known as bhavikdbavishi04:49
*** eernst has quit IRC04:50
*** bhavikdbavishi1 has joined #openstack-infra04:53
*** bhavikdbavishi has quit IRC04:54
*** bhavikdbavishi1 is now known as bhavikdbavishi04:54
*** rcernin has quit IRC05:08
*** rcernin has joined #openstack-infra05:10
*** Lucas_Gray has joined #openstack-infra05:10
*** ykarel_ is now known as ykarel05:15
*** jtomasek has joined #openstack-infra05:28
*** tkajinam has quit IRC05:28
*** ramishra has quit IRC05:38
*** quiquell|off is now known as quiquell|rover05:46
*** ijw has quit IRC05:46
*** ijw has joined #openstack-infra05:47
*** ramishra has joined #openstack-infra05:47
*** jbadiapa has joined #openstack-infra05:51
*** ijw has quit IRC05:53
*** tkajinam has joined #openstack-infra05:54
*** Lucas_Gray has quit IRC06:03
*** cjloader has quit IRC06:11
openstackgerritOpenStack Proposal Bot proposed openstack-infra/project-config master: Normalize projects.yaml  https://review.openstack.org/65256806:15
*** kopecmartin|off is now known as kopecmartin06:17
AJaegerinfra-root, openstack/networking-omnipath is setup in git.o.o but not in github properly - it has no content ;(06:18
AJaegercould you check what's wrong, please?06:19
*** pcaruana has joined #openstack-infra06:19
*** dpawlik has joined #openstack-infra06:21
*** roman_g has joined #openstack-infra06:25
*** udesale has quit IRC06:26
*** dpawlik has quit IRC06:27
*** hwoarang has quit IRC06:28
*** hwoarang has joined #openstack-infra06:29
*** dpawlik has joined #openstack-infra06:31
*** toabctl has joined #openstack-infra06:32
*** tkajinam_ has joined #openstack-infra06:48
*** e0ne has joined #openstack-infra06:48
*** eumel8 has joined #openstack-infra06:48
*** e0ne has quit IRC06:49
*** tkajinam has quit IRC06:51
*** e0ne has joined #openstack-infra06:52
*** hwoarang has quit IRC06:52
*** hwoarang has joined #openstack-infra06:53
*** ijw has joined #openstack-infra06:55
*** slaweq__ has joined #openstack-infra06:57
*** apetrich has joined #openstack-infra07:02
*** rcernin has quit IRC07:05
*** ginopc has joined #openstack-infra07:07
*** e0ne has quit IRC07:12
fricklerAJaeger: I didn't find any obvious error, but also no indication that gerrit even tried to replicate that repo to github07:15
*** iurygregory has joined #openstack-infra07:17
*** udesale has joined #openstack-infra07:17
*** kjackal has joined #openstack-infra07:17
*** slaweq__ is now known as slaweq07:18
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Add --check-config option to zuul scheduler  https://review.openstack.org/54216007:18
*** tosky has joined #openstack-infra07:20
fricklerdmsimard: I do see errors related to three ara repos, though, I seem to remember that you moved them, maybe some cleanup is missing there?07:20
*** udesale has quit IRC07:20
*** udesale has joined #openstack-infra07:21
*** pgaxatte has joined #openstack-infra07:22
*** spotz has joined #openstack-infra07:22
*** udesale has quit IRC07:22
*** udesale has joined #openstack-infra07:22
*** e0ne has joined #openstack-infra07:31
*** rpittau|afk is now known as rpittau07:34
*** udesale has quit IRC07:41
*** udesale has joined #openstack-infra07:43
*** e0ne has quit IRC07:52
*** ykarel is now known as ykarel|lunch07:52
*** jpich has joined #openstack-infra07:55
*** e0ne has joined #openstack-infra07:57
*** lucasagomes has joined #openstack-infra07:59
*** rossella_s has joined #openstack-infra08:08
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Add support for smart reconfigurations  https://review.openstack.org/65211408:12
*** tkajinam_ has quit IRC08:20
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Add --check-config option to zuul scheduler  https://review.openstack.org/54216008:26
*** gfidente has joined #openstack-infra08:28
*** gregoryo has quit IRC08:28
*** dtantsur|afk is now known as dtantsur08:30
*** udesale has quit IRC08:40
*** udesale has joined #openstack-infra08:40
*** yboaron has joined #openstack-infra08:42
*** dkushwaha has joined #openstack-infra08:49
*** ykarel|lunch is now known as ykarel08:51
*** electrofelix has joined #openstack-infra09:05
*** e0ne has quit IRC09:09
openstackgerritNir Magnezi proposed openstack/diskimage-builder master: Add version-less RHEL element for RHEL7 and RHEL8  https://review.openstack.org/64373109:11
*** janki has quit IRC09:14
*** e0ne has joined #openstack-infra09:18
*** jpich has quit IRC09:21
*** jpich has joined #openstack-infra09:22
*** jpich has quit IRC09:23
*** jpich has joined #openstack-infra09:24
*** yamamoto has quit IRC09:33
*** zbr has joined #openstack-infra09:47
*** ginopc has quit IRC09:48
*** ginopc has joined #openstack-infra09:48
*** ramishra_ has joined #openstack-infra09:49
*** zbr__ has quit IRC09:50
*** panda has joined #openstack-infra09:51
*** ramishra has quit IRC09:52
*** udesale has quit IRC09:54
*** udesale has joined #openstack-infra09:55
*** e0ne has quit IRC09:55
*** bhavikdbavishi has quit IRC09:59
*** janki has joined #openstack-infra10:01
*** Lucas_Gray has joined #openstack-infra10:02
*** e0ne has joined #openstack-infra10:06
*** yamamoto has joined #openstack-infra10:10
fricklerinfra-root: actually it might look like the patch to restrict ara replication might not be working as planned. I only see replications for -dev/-infra currently. https://review.openstack.org/#/c/650914/1/modules/openstack_project/manifests/review.pp10:16
*** yamamoto has quit IRC10:22
*** udesale has quit IRC10:25
*** gnuoy has joined #openstack-infra10:26
*** e0ne has quit IRC10:26
*** udesale has joined #openstack-infra10:26
openstackgerritJens Harbott (frickler) proposed openstack-infra/system-config master: Revert "Disable gerrit replication to GitHub for ara/ara-infra/ara-web"  https://review.openstack.org/65261410:27
gnuoyHi, I landed a change recently, https://review.openstack.org/#/c/652032/ I see the merge is present on git.openstack.org ( https://git.openstack.org/cgit/openstack/charm-interface-pacemaker-remote/log/ ) but hasn't made it to the github mirror ( https://github.com/openstack/charm-interface-pacemaker-remote/commits/master ). This is a new repo so I'm wondering if I have made a mistake in the setup, if I'm being impatient or if there is genuine infra10:29
gnuoyissue ?10:29
*** e0ne has joined #openstack-infra10:35
*** kjackal has quit IRC10:38
fricklergnuoy: yes, the replication seems to be broken currently due to an issue on the infra side10:39
gnuoyah, ok, thanks for the update10:39
*** tbachman has quit IRC10:43
*** ykarel is now known as ykarel|afk10:47
dkushwahagetting same issue as gnuoy raised. One of patch https://review.openstack.org/#/c/651470 got merged yesterday, and able to see it in /git.openstack.org, but changes not reflecting in github10:47
*** kjackal has joined #openstack-infra10:58
*** yamamoto has joined #openstack-infra11:02
*** jpich has quit IRC11:02
*** yamamoto has quit IRC11:06
*** quiquell|rover is now known as quique|rover|eat11:06
*** jpich has joined #openstack-infra11:07
*** jpich has quit IRC11:07
*** jpich has joined #openstack-infra11:08
*** ykarel|afk is now known as ykarel11:10
*** bhavikdbavishi has joined #openstack-infra11:10
*** e0ne has quit IRC11:10
*** yamamoto has joined #openstack-infra11:11
*** yamamoto has quit IRC11:11
*** weshay_pto has quit IRC11:12
*** mhu has joined #openstack-infra11:13
*** weshay_pto has joined #openstack-infra11:13
*** yamamoto has joined #openstack-infra11:21
*** panda is now known as panda|lunch11:23
*** Wryhder has joined #openstack-infra11:23
*** yamamoto has quit IRC11:23
*** Lucas_Gray has quit IRC11:24
*** Wryhder is now known as Lucas_Gray11:24
*** ldnunes has joined #openstack-infra11:33
*** rosmaita has joined #openstack-infra11:36
*** ldnunes has quit IRC11:39
*** kgiusti has joined #openstack-infra11:39
*** yboaron has quit IRC11:41
dmsimardfrickler: errors where ?11:42
*** kazsh has quit IRC11:46
*** yamamoto has joined #openstack-infra11:46
openstackgerritNir Magnezi proposed openstack/diskimage-builder master: Add version-less RHEL element for RHEL7 and RHEL8  https://review.openstack.org/64373111:47
*** kazsh has joined #openstack-infra11:49
*** quique|rover|eat is now known as quiquell|rover11:49
*** thomasmckay has quit IRC11:50
fricklerdmsimard: errors in /var/log/manage_projects.log.1.gz on review01.o.o, like "manage_projects - ERROR - Problems creating openstack/ara-web, moving on."11:53
fricklerdmsimard: also did you see the conversation above? seems replication from openstack/* to github is currently not happening. I proposed a revert of your patch in case you can see a quick fix11:54
fricklercan't11:54
dmsimardHmmm, that's not a gerrit replication error12:02
dmsimardThat looks like jeepyb trying to create the repo12:02
dmsimardAnd it can't since it has been moved12:02
*** rlandy has joined #openstack-infra12:03
dmsimardA revert would not help, it's something I didn't think about12:03
*** rlandy is now known as rlandy|ruck12:03
dmsimardreplication to github should work though, I'll check it out12:05
*** tbachman has joined #openstack-infra12:07
*** panda|lunch is now known as panda12:08
*** jcoufal has joined #openstack-infra12:10
pabelangerdmsimard: frickler: setting has-github: false might fix it: https://github.com/openstack-infra/jeepyb/blob/master/jeepyb/cmd/manage_projects.py#L27 that is in gerrit/projects.yaml12:10
dmsimardyeah that's the jeepyb part12:10
dmsimardtrying to understand why github replication isn't working12:10
zbrpabelanger: do you happen to know when we will be able to get rid of http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/fetch-zuul-cloner/tasks/main.yaml#n13 ?12:11
pabelangerjeepby does it12:11
zbrthis is still run even by simple jobs like tox ones.12:11
pabelangerwhen we create a new project12:12
pabelangerzbr: job that use that role, should be updated to remove the role. It is no longer needed for zuulv3, and should just be a noop12:12
pabelangerdmsimard: https://github.com/openstack-infra/jeepyb/blob/master/jeepyb/cmd/manage_projects.py#L57912:13
*** e0ne has joined #openstack-infra12:16
dmsimardpabelanger: I understand what I'm looking at but shouldn't gerrit be handling replication ?12:16
dmsimardpabelanger: I thought jeepyb was just for new projects12:16
pabelangerdmsimard: I think AJaeger repo is broken, because it is a new project.  And gerrit hasn't replicated because that code path didn't trigger12:17
dmsimardyeah there's two different problems :)12:17
pabelangerand gerrit won't replicate a project unless you commit code or do restart12:18
pabelangerwe should look at gerrit logs and see why the 2 other projects haven't tiggered either12:18
pabelangerbut, possible related to jeepyb12:18
pabelangercan can first fix that, see if AJaeger project replicated12:19
*** jamesmcarthur has joined #openstack-infra12:24
dmsimardneed to brb 5 minutes, I've added review.o.o in emergency file and re-ran replication. Will do more testing on review-dev12:25
*** rlandy|ruck is now known as rlandy|ruck|mtg12:33
*** yamamoto has quit IRC12:33
*** Lucas_Gray has quit IRC12:34
*** jamesmcarthur has quit IRC12:35
*** nicolasbock has joined #openstack-infra12:35
openstackgerritSorin Sbarnea proposed opendev/base-jobs master: [POC] Remove fetch-zuul-cloner from base job  https://review.openstack.org/65263712:36
*** tosky has quit IRC12:36
*** e0ne has quit IRC12:40
*** Lucas_Gray has joined #openstack-infra12:41
*** e0ne has joined #openstack-infra12:41
fungimy first guess with the replication issues is that the negative lookahead is not matching anything in the openstack namespace at all for some reason12:41
fungiwe might try adjusting/removing the exclusion temporarily and see if replication starts back up again12:42
dmsimardfungi: that summarizes what I'm doing right now, yes12:43
fungicool, consider that an endorsement of your present line of investigation ;)12:43
dmsimard\o/12:44
dmsimardI'll do more testing with the lookahead on review-dev12:44
dmsimardthe one thing that is different with review and review-dev in that regard12:45
dmsimardis that review has three "projects" clause (openstack/*, openstack-infra/* and openstack-dev/*) so I'll investigate along those lines12:45
*** jamesmcarthur has joined #openstack-infra12:46
fungisounds like a good place to start, yes12:46
dmsimardperhaps we need a single clause like: projects = openstack(-dev|-infra)?/(?!ara$|ara-web$|ara-infra$).*12:47
*** rfolco has joined #openstack-infra12:47
dmsimardgnuoy: thanks for letting us know, working on it12:48
gnuoygreat, thanks dmsimard, much appreciated12:48
*** kaiokmo has joined #openstack-infra12:51
*** yamamoto has joined #openstack-infra12:54
*** quiquell|rover has quit IRC12:55
*** quiquell has joined #openstack-infra12:56
*** e0ne has quit IRC12:58
*** udesale has quit IRC12:59
openstackgerritDavid Moreau Simard proposed openstack-infra/system-config master: Add missing '^' to github replication pattern  https://review.openstack.org/65264413:00
dmsimardfungi, frickler, pabelanger: ^13:00
*** udesale has joined #openstack-infra13:02
dmsimardreproduced and fixed on review-dev13:02
zigoianw: fungi: Any reason to hold on https://review.openstack.org/#/c/645574/ ?13:02
zigoHas debootstrap been fixed?13:02
*** udesale has quit IRC13:03
*** bhavikdbavishi has quit IRC13:03
dmsimardalso matches what we have been using in rdo, ex: https://github.com/rdo-infra/review.rdoproject.org-config/blob/3663b146b6f8cd8806f98b176a437079cf8f9b78/gerrit/replication.config#L1713:03
*** udesale has joined #openstack-infra13:03
*** rlandy|ruck|mtg is now known as rlandy|ruck13:05
*** yboaron has joined #openstack-infra13:06
*** Lucas_Gray has quit IRC13:11
*** jamesmcarthur has quit IRC13:12
dmsimardpabelanger: there's not a single project with "has-github: false" eh13:14
dmsimardhttp://codesearch.openstack.org/?q=has-github&i=nope&files=&repos=13:14
*** e0ne has joined #openstack-infra13:17
dmsimardhttps://github.com/openstack-infra/jeepyb/blob/c132a30732c8a96161ea5f9503491b1f5ec7a1f9/jeepyb/cmd/manage_projects.py#L573 doesn't check the value of "has-github", only if it exists ?13:19
*** lseki has joined #openstack-infra13:19
fungii think it may have originally been a flag, not a boolean13:20
openstackgerritLon Hohberger proposed openstack/diskimage-builder master: Add version-less RHEL element for RHEL7 and RHEL8  https://review.openstack.org/64373113:20
fungipossible it has bitrotted since we don't exercise it13:20
dmsimardok, I don't think the error is fatal in any case, jeepyb just continues with the other projects13:22
dmsimardthe fix for the gerrit replication is in 652644 and I confirmed it works13:22
*** tbachman has left #openstack-infra13:23
dmsimardreview is in the emergency file and it's already applied manually, need to step away and I'll be back later13:23
*** markmcd has left #openstack-infra13:23
dmsimardgnuoy, AJaeger: github replication should be ok now13:24
gnuoydmsimard, great, thank you13:24
*** mriedem has joined #openstack-infra13:24
dmsimardit's still churning through the repos in alphabetical order but it'll get there eventually13:24
*** jroll has quit IRC13:25
*** jroll has joined #openstack-infra13:26
*** e0ne has quit IRC13:28
*** Goneri has joined #openstack-infra13:30
*** e0ne has joined #openstack-infra13:34
*** jamesmcarthur has joined #openstack-infra13:35
*** eharney has joined #openstack-infra13:37
fungiyeah, i can't pull 652644 into gertty until replication catches back up, i think13:41
*** yamamoto has quit IRC13:41
*** lmiccini has quit IRC13:42
*** rh-jelabarre has joined #openstack-infra13:44
*** yamamoto has joined #openstack-infra13:44
*** zzehring has joined #openstack-infra13:45
*** jamesmcarthur_ has joined #openstack-infra13:46
*** bnemec has joined #openstack-infra13:46
fungi450 tasks remaining13:47
*** yamamoto has quit IRC13:48
*** jamesmcarthur has quit IRC13:49
*** priteau has joined #openstack-infra13:51
gary_perkinsianw: it's looking like the original arm64ci.cloud cloud is gonna have to be decommissioned soon :( I see it's currently running mirror01.nrt1.arm64ci.openstack.org. Is there anything you need to do with it prior to knocking it on the head?13:57
*** eernst has joined #openstack-infra13:59
gary_perkinsianw: I'm still waiting for https://review.openstack.org/650021 to be fully approved and merged. Then you'll be able to setup a mirror there13:59
dulekHey, I started seeing this stuff lately in Kuryr gates: http://logs.openstack.org/81/652581/1/check/kuryr-kubernetes-tempest-py36/88f34b9/job-output.txt.gz#_2019-04-15_07_40_16_19689114:03
dulek"dial tcp: lookup gcr.io on 127.0.0.1:53: server misbehaving" - any ideas?14:03
dulekThat run above is on vexxhost, I'll check if I saw it elsewhere.14:03
dulekYep, all the failures are on vexxhost.14:04
*** udesale has quit IRC14:05
*** rlandy|ruck is now known as rlandy|ruck|mtg14:05
*** e0ne has quit IRC14:06
*** yboaron has quit IRC14:08
fungidulek: vexxhost ca-ymq1 or sjc1 region (or both) and when did it seem to start?14:10
fungii gather ipv6 was finally turned on in sjc1 over the weekend14:10
fungiso wondering if we're having trouble reaching nameservers14:10
mnasermtl has always had ipv614:11
mnasersjc1 just recently added ipv614:11
dulekfungi: Seems like sjc1 in all the cases.14:13
mnaserbooting a test vm to check things out14:14
fungiif it looks like it could be a weird interaction with something kuryr jobs are doing to the server's networking then we can set an autohold for that job14:15
mnaserhttps://www.irccloud.com/pastebin/O9QTELeA/14:15
mnaserI think we should setup an auto hold14:15
dulekfungi: I'll dig it in logstash a bit.14:15
mnaserdulek: has it been failing starting Sunday or a bit before?14:16
dulekmnaser: Checking.14:16
*** sthussey has joined #openstack-infra14:18
dulekmnaser: First hit: 2019-04-15T01:24:40.824+02:0014:19
*** Lucas_Gray has joined #openstack-infra14:21
*** e0ne has joined #openstack-infra14:21
dulekShould I just take a look into unbound logs? ;)14:23
dulekmnaser, fungi: http://logs.openstack.org/94/652394/1/check/kuryr-kubernetes-tempest-containerized/9b227b0/controller/logs/unbound_log.txt.gz14:27
dulekLook for timestamp 1555284324.14:27
duleknotice: sendto failed: Network is unreachable14:27
duleknotice: remote address is ip6 2001:4860:4860::8888 port 53 (len 28)14:28
*** yboaron has joined #openstack-infra14:28
dulekI assume that's the culprit?14:28
*** janki has quit IRC14:29
fungithat's the ipv6 equivalent of 8.8.8.814:29
fungiptr is google-public-dns-a.google.com.14:29
openstackgerritLon Hohberger proposed openstack/diskimage-builder master: Add version-less RHEL element for RHEL7 and RHEL8  https://review.openstack.org/64373114:29
*** ykarel is now known as ykarel|afk14:30
fungibut yeah, "Network is unreachable" suggests maybe the node decided it had no v6 default route14:30
fungihttp://logs.openstack.org/94/652394/1/check/kuryr-kubernetes-tempest-containerized/9b227b0/zuul-info/zuul-info.controller.txt suggests it had working ipv6 at the start of the job though14:31
fungihad 1724sec remaining on that ra expiration near the start of the job14:33
fungidid the node start having dns lookup problems around the time the ra for its default route expired, i wonder?14:34
*** cjloader has joined #openstack-infra14:34
dulekfungi: ra?14:35
fungiroute announcement14:36
fungi22:59:12 to 23:25:24 http://logs.openstack.org/94/652394/1/check/kuryr-kubernetes-tempest-containerized/9b227b0/job-output.txt.gz#_2019-04-14_23_25_24_13536514:36
*** lpetrut has joined #openstack-infra14:36
fungithat looks like it may have errored a little before the default route expired14:37
dulekI don't think we're doing anything to IPv6 routing.14:38
dulekAnd that setup is pretty standard - Neutron + OVS.14:38
*** cgoncalves has quit IRC14:39
*** efried is now known as efried_pto14:39
fungigot it, nothing fancy binding container network namespaces to the instance interfaces or anything14:42
*** ijw has quit IRC14:44
*** ijw has joined #openstack-infra14:44
*** cgoncalves has joined #openstack-infra14:44
fungii can temporarily set an autohold with a high count so we can recheck until we hit that, or across some number of different changes and then recheck them in parallel14:45
fungii'm unfortunately not finding anything in the syslog we collected from that node14:45
fungiwell, nothing which looks relevant that is14:46
*** anteaya has joined #openstack-infra14:46
*** quiquell is now known as quiquell|off14:46
*** armax has joined #openstack-infra14:48
openstackgerritboden proposed openstack-infra/project-config master: update vmware-nsx jobs  https://review.openstack.org/65268014:51
*** rlandy|ruck|mtg is now known as rlandy|ruck14:56
openstackgerritThierry Carrez proposed openstack/ptgbot master: Preserve JSON dictionary order  https://review.openstack.org/65268515:00
openstackgerritThierry Carrez proposed openstack/ptgbot master: Remove last GitHub links for help  https://review.openstack.org/65268615:00
*** ijw_ has joined #openstack-infra15:03
*** ijw has quit IRC15:07
*** eernst has quit IRC15:07
*** tosky has joined #openstack-infra15:08
*** cgoncalves has quit IRC15:10
*** e0ne has quit IRC15:14
openstackgerritMerged openstack-infra/system-config master: Add missing '^' to github replication pattern  https://review.openstack.org/65264415:14
mnaserfungi: perhaps we should send RA's more often?15:15
*** Lucas_Gray has quit IRC15:16
fungimnaser: i'm not convinced (yet) that it's necessarily to do with route expiration, though it's also possible something in the job is causing the kernel to block or ignore later announcements15:17
mnaserfungi: from my test vm, it just worked fine, I can ping that address15:17
*** Lucas_Gray has joined #openstack-infra15:17
mnaserhttps://www.irccloud.com/pastebin/5LZr9JbO/15:18
fungiyeah, i mean it looks like v6 routing is working at the start of the job15:18
*** cgoncalves has joined #openstack-infra15:18
*** markvoelker has joined #openstack-infra15:20
fungialso... http://logs.openstack.org/94/652394/1/check/kuryr-kubernetes-tempest-containerized/9b227b0/zuul-info/inventory.yaml says ansible_host is the v6 address, so if all v6 routing broke then we wouldn't have been able to collect those logs15:20
fungisearching logstash for message:"server misbehaving" in the past 24 hours only turns up kuryr-kubernetes-tempest.* jobs, too15:22
dulekThat's true.15:23
*** e0ne has joined #openstack-infra15:23
*** ykarel|afk is now known as ykarel15:26
dulekfungi, mnaser: If you think it's something in the Kuryr-Kubernetes DevStack plugin I can just move that pull to the beginning of the job and see what happens.15:27
*** pgaxatte has quit IRC15:28
clarkbcould it be docker itself that has the problem?15:28
fungithis is what i'm wondering15:28
clarkbI would run a dig against 127.0.0.1 and possibly tcpdump port 53 to debug15:29
fungithough unbound also complains about inability to reach a global v6 address and reports a !h error15:29
*** ykarel is now known as ykarel|away15:30
fungi"notice: sendto failed: Network is unreachable" in http://logs.openstack.org/94/652394/1/check/kuryr-kubernetes-tempest-containerized/9b227b0/controller/logs/unbound_log.txt.gz15:32
clarkbah probably not docker itself then15:33
fungiwhich could also be iptables udp egress rules if set to reject rather than drop15:33
*** e0ne has quit IRC15:34
fungisince returning icmp-unreach is the standard way to handle that (though i'd expect us to set icmp-admin-prohibit instead for clarity)15:34
*** ykarel|away has quit IRC15:35
*** josephrsandoval has joined #openstack-infra15:36
*** cgoncalves has quit IRC15:36
*** slaweq has quit IRC15:36
*** woojay has joined #openstack-infra15:38
*** gyee has joined #openstack-infra15:38
AJaegerdmsimard: thanks15:38
*** cgoncalves has joined #openstack-infra15:43
anteayaI'm going to bumb the gerrit will be offline email I sent Friday to ensure Monday inboxes see it15:44
anteayaobjections?15:44
anteayabump*15:44
*** auristor has quit IRC15:47
*** igordc has joined #openstack-infra15:47
openstackgerritMerged openstack-infra/project-config master: Normalize projects.yaml  https://review.openstack.org/65256815:49
*** igordc has quit IRC15:51
clarkbanteaya: not from15:52
clarkber from me15:52
fungianteaya: sounds fine to me15:52
anteayadone15:52
anteayathank you15:52
clarkbfungi: is gerrit repliocation happy now?15:52
openstackgerritSorin Sbarnea proposed opendev/base-jobs master: Use standard ansible-lint config file  https://review.openstack.org/65270815:53
* clarkb catches up on the day15:54
*** ginopc has quit IRC15:55
*** auristor has joined #openstack-infra15:57
*** josephrsandoval has quit IRC15:58
fungiclarkb: seems to be, yes15:58
zbrfungi: clarkb AJaeger : does any of you have experience using pre-commit tool for linting? (not to be confused with the git hook feature)15:58
clarkbinfra-root fyi the disk is full on insecure-ci-registry.opendev.org again. So we will need to come upwith a plan for that15:59
clarkbzbr: no15:59
fungizbr: never heard of it if it's not a git pre-commit hook. have a link to something about what you're describing?15:59
clarkbhttps://pre-commit.com/ is the tool I think16:02
zbrclarkb: yes, but the page does not make a good job highliting its main benefits.16:02
*** ykarel|away has joined #openstack-infra16:03
fungiahh, yeah, i do recall seeing this now16:04
zbrit does address few things very well, like orchestrating multiple isolated linters16:04
zbreasy bumping (pre-commit autoupdate)16:04
mordredwe've actually historically actively avoided any systemic use of local git hooks16:04
zbrwe already use it in many tripleo-repos16:04
zbrit does not need git hooks16:04
clarkbmordred: fwiw this isn't a git hook its just the worst named tool in the world16:05
mordredwow16:05
zbrclarkb: totally agree, bad name16:05
clarkbbut that confusion is everyones first reaction which makes me wary of suggesting we use the tool due to the name creating confusion16:05
zbrin fact it can install git-hook but this is totally optional16:05
mordred"Run pre-commit install to install pre-commit into your git hooks. pre-commit will now run on every commit. "16:05
mordredkk. good to know16:05
fungibumping the versions of linters used seems like it would cause massive confusion. we already have projects pin their linters during eat cycle so they don't run into issues they have to fix or explicitly skip near release time16:05
zbri guess that is how it started,... and the name stuck.16:05
zbrall linters are always pinned, no surprises.16:06
fungii can't easily tell if it supports different versions of different linters per branch16:07
zbrthe auto-update does look for new versions and bump them in config, is up to you to test and raise CR to do it.16:07
zbrfungi: not sure I understand?16:07
*** lpetrut has quit IRC16:08
zbrthere are few things that make me love it: never affected by tox bug where tox fails to update the virtualenv on bumping.16:08
fungizbr: say you want to run flake8 1.2.3 on commits in one branch but flake8 2.3.4 on commits for another branch16:09
zbralso it does save a huge amount of disk space and time because each linter-version is shared/caches across all projects.16:09
zbrso if you have 20 projects using ansible-lint=1.2.3, there is only one copy on disk fully managed by it. not 20 tox repos with the same stuff in them.16:10
zbri do have >100 repos cloned locally, so the .tox footprint is big.16:10
fungihrm, actually i tend to only have one copy because i use git clean with great frequency and rely on a single pip cache16:10
zbrfungi: yep, but pip cache does not avoid recreation of the virtualenv, which also takes time. not important on CI but for devenv it saves many seconds.16:11
zbranyway, before becoming annoying with my selling speach.... I can make a POC change to demo it if you want, just tell me on which repo to demostrate it.16:12
fungivery few if you're not installing the kitchen sink in your tox testenvs, but we have a bit of an anti-pattern of using one test-requirements.txt for all our testenvs instead16:13
zbrfungi: yep, kichen-sink describes very well our current used of test-reqs, where in fact we install linters in all envs only due to convenience. and ansible-lint in particular being a very heavy one.16:14
fungialso the greater the difference between how developers run these checks locally vs how they're run in the gate significantly increases our "but it works for me why is your ci so broken?" support burden16:15
zbrfungi: no difference, i can show you. in fact is the opposite, its use assures that local == CI, we know well what happens with locally outdated virtualenvs, when user needs to rememebr to do tox -r ....16:16
fungii meant if the idea is to replace tox with the pre-commit tool just for local developer use and not also in the ci jobs16:17
fungibut yeah, curious to see how you configure it to run, say, different versions of flake8 for different branches of the same repo16:17
zbrfungi: example of config https://github.com/openstack/tripleo-quickstart/blob/master/.pre-commit-config.yaml16:18
fungiokay, that makes more sense. i was having trouble reconciling that with the fact that git pre-commit hooks apply to the entire repository16:19
fungiso the hook checks the configuration present in the commit16:19
*** ramishra_ has quit IRC16:20
fungiand figures out which virtualenv to use based on that16:20
zbrplease dont call it hook ;) ... i not using the hook myself, just calling it manually.16:20
zbrand mainly calling it from tox -e linters : https://github.com/openstack/tripleo-quickstart/blob/master/tox.ini#L4316:20
fungioh, the introduction at https://pre-commit.com/ specifically describes using it as a git pre-commit hook16:21
fungii guess you're suggesting a different usage pattern16:21
*** hwoarang has quit IRC16:21
zbrthat is why i said worst demo page ;)16:21
clarkbthis confusion is my biggest source of hesitancy towards using the tool16:21
clarkbwe are gonna spend lot sof time explaining this to people if we switch16:21
zbrclarkb: most people would not observe, their workflow is not changed at all.16:22
zbrcalling the same tox job to lint, some kind of results.16:22
*** hwoarang has joined #openstack-infra16:22
*** jpich has quit IRC16:23
fungianyway, from what i can see there you get the majority of those benefits by just being specific about deps in your tox testenvs rather than using one list of test requirements in all of them (for my personal projects i don't use a test-requirements.txt with tox, just different deps lines so the bare minimum is installed)16:23
zbrfungi: devil is in the details: to do something ~similar in tox would require you to put each linter inside a different tox environment which would make it hard to manage.16:24
funginot every repository is going to want the same sets of plugins installed with flake8, for example, so having your flake8 testenv be repository+branch specific is still less messy, to me16:24
zbrfungi: this file is defined by each repository16:24
fungii do put each linter in a different tox environment in my projects where i'm doing that, yes16:24
fungizbr: but the virtualenv it uses isn't per project though, right?16:25
*** slaweq has joined #openstack-infra16:25
zbrfungi: it manages its own virtualenvs which are not per project are based on hash(tool, rev)16:26
fungiso you'd still need different virtualenvs for each different set of flake8 plugins used for each project. i guess it's at least smart enough to figure out that if two invocations rely on the same set of flake8 plugins and versions then they can reuse a common venv16:26
zbryeah, it does also has its config, i am almost sure as I had this case with extra plugins and didn't get any surprises.16:27
fungianyway, since it doesn't seem this is necessary to figure out before friday's maintenance, i'm going to go back to preparing for that16:28
*** rpittau is now known as rpittau|afk16:28
*** slaweq_ has joined #openstack-infra16:29
zbrmeanwhile I found a more real subject: ERROR! the role 'push-to-intermediate-registry' was not found --- with base-jobs linters job, unrelated to the change itself. https://review.openstack.org/#/c/652708/16:31
*** slaweq has quit IRC16:31
fungialso, anybody know how to disable the "vulnerable dependency" alerting on github? it's getting ridiculous. now it's e-mailing us to let us know that there's a vulnerability in ansible 2.6.0 through 2.6.13 and citing entries like https://github.com/openstack-infra/zuul-base-jobs/blob/master/test-requirements.txt#L816:31
fungizbr: yes, the registry keeps running out of disk space. several folks are trying to brainstorm ways to deal with it16:32
*** bhavikdbavishi has joined #openstack-infra16:32
clarkbinfra-root so looking at my notes about the insecure-ci-registry registry garbage-collect fails and we cannot run it when the server is online (I had previously only done dry runs)16:32
clarkbthe disk has filled again16:32
clarkbI think our next step is to stop the registry, delete the registry contents and start it again16:32
clarkbthen sort out how to garbage collect properly (one issue is it is apparently not safe to GC when the registry is running)16:33
clarkbif we'd like to debug the broken state further I can snapshot the instance first16:33
clarkbmaybe I should do that, stop the registry, free some disk from the journal, then snapshot, then delete registry data, start service again16:34
clarkbif that sounds reasonable let me know16:34
fungisomething tells me we're likely to get another crack at experiencing the broken state between now and when we get the real solution in16:34
fungibut go for it if you like16:34
*** bhavikdbavishi1 has joined #openstack-infra16:35
*** bhavikdbavishi has quit IRC16:36
*** bhavikdbavishi1 is now known as bhavikdbavishi16:36
clarkbinsecure-ci-registry01.opendev.org added to the emergency.yaml file so that ansible and docker compose don't undo things16:38
openstackgerritFabien Boucher proposed openstack-infra/zuul master: WIP - Pagure driver - https://pagure.io/pagure/  https://review.openstack.org/60440416:41
*** psachin has quit IRC16:42
*** dtantsur is now known as dtantsur|afk16:44
openstackgerritClark Boylan proposed openstack-infra/zuul master: Fix tox.ini cover target install command  https://review.openstack.org/65272716:45
clarkbI'm going to step out for a bit while I want to be sure ansible won't run on that host anymore16:47
clarkbthen I'll be back to do surgery on that host16:47
*** priteau has quit IRC16:47
dmsimardsince https://review.openstack.org/#/c/652644/ merged, I've removed review.o.o from the emergency file16:47
dmsimardand with that, I'll be mostly on PTO this week -- feel free to ping but there might be increased latency or even timeouts :p16:48
*** priteau has joined #openstack-infra16:48
*** ijw_ has quit IRC16:49
*** josephrsandoval has joined #openstack-infra16:49
*** josephrsandoval has quit IRC16:49
*** kopecmartin is now known as kopecmartin|off16:50
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Fix loss of ZK conn during node delete  https://review.openstack.org/65272916:51
*** e0ne has joined #openstack-infra16:51
*** ijw has joined #openstack-infra16:51
*** lucasagomes has quit IRC16:57
*** ijw has quit IRC17:00
*** ijw has joined #openstack-infra17:00
*** gfidente is now known as gfidente|out17:02
*** ykarel|away has quit IRC17:08
*** Lucas_Gray has quit IRC17:13
clarkbok I'm back now17:16
clarkbcorvus: ^ if you are around now any objection to the proposed plan for the intermediate registry?17:16
*** ijw has quit IRC17:17
*** ijw has joined #openstack-infra17:17
clarkbalright I've stopped the registry container and am taking a snapshot now17:22
clarkbwhen that completes I'll do data deletion17:22
clarkbthen a reboot to make things that were sad about disk happy again17:22
*** _erlon_ has joined #openstack-infra17:25
*** ijw has quit IRC17:35
*** slaweq_ has quit IRC17:37
*** ijw has joined #openstack-infra17:38
*** yamamoto has joined #openstack-infra17:40
*** ykarel|away has joined #openstack-infra17:44
*** diablo_rojo has joined #openstack-infra17:45
*** yamamoto has quit IRC17:45
corvusclarkb: i'm still catching up -- i don't think a snapshot is necessary -- it should be okay (especially in a situation like this) to delete everything and restart -- the worst thing that happens if the data are missing is that a recheck will be necessary17:49
clarkbah ok, well the snapshot is already happening so I'll roll with that for now then do the toehr stuff17:49
corvusk17:49
corvusclarkb: what do you think we're missing?17:49
corvusdo we need a cron to gc?  are we deleting tags at all?17:50
clarkbwe need torun gc (whcih doesn't work right now due to errors in the data) and for gc to work we need to delete tags17:50
clarkbso ya we need some expiration cron that will delete older tags allowing a gc to clean them up17:50
clarkbthen if the registry regularly corrupts itself we might need to debug that more. I do wonder if running out of disk is why that happened though17:51
clarkbin which case hopefully regular GCing is the fix17:51
corvusclarkb: ok... i can start on the tag deletion bit first -- it's similar to things i've already written... if you or someone else wants to write the run-gc-cron separately that's cool, or i can get to that after delete-tags17:51
clarkbI can write the gc change17:52
clarkbI'll confirm the comamnds I was running work properly after the cleanup17:52
corvusgreat, sounds like a plan -- that's probably a late-thisafternoon thing for me while i continue to un-vacation17:52
fungiso it's not wholly clear to me what the situation is there... clarkb: i thought earlier you'd said the registry has to be taken offline before it can be garbage-collected?17:57
clarkbfungi: ya reading more on that you only have to do that if deleting files on disk to untag blobs17:58
clarkbfungi: if you use the API to untag (which I expect corvus will do) then it is safe to do it onlin17:58
clarkbthe race is in tag updates aiui17:58
fungiaha17:59
fungithat makes more sense17:59
fungiso the offline gc idea was more in an effort to avoid racing on tag updates18:00
*** e0ne has quit IRC18:00
clarkbya I reread the document https://medium.com/@mcvidanagama/cleanup-your-docker-registry-ef0527673e3a and realized the offline requirement was only required for one of the two methods listed18:01
corvusyeah, we should be able to use the api18:01
fungialso, not sure if either of you saw over the weekend, but we have ipv6 in sjc1 now i'm just not sure how to go about getting the gitea webservers (and load balancer?) to know about it18:02
corvuswe do something similar for dockerhub; so my change is (i expect) to put something similar to that logic in a cron job18:02
corvusfungi: nice!18:02
corvusfungi: off the top of my head -- do you think a reboot or networking restart will pick it up?18:03
clarkbI think the hosts should've picked up on it via the RAs automatically18:03
clarkbThen we need to restart services to see that18:03
fungii expect the kernel already knows, so probably restarting apache would cause it to listen on those addresses automatically unless we have specific listen directives for the v4 addresses18:03
corvusso maybe just restart docker on the gitea servers18:04
corvusand then gerrit replicate to catch anything we missed18:04
corvus(then similar on the LB itself)18:04
fungifor the lb, that's probably going to involve pulling together the list of addresses for all the gitea containers unless there's already some magic integration in place to do that18:04
*** jamesmcarthur_ has quit IRC18:04
corvusoh actually... we don't *really* care about the gitea servers18:05
corvusthe only thing that's public facing is the lb18:05
clarkbright the biggest thing is the public endpoint of haproxy18:05
corvusif the lb talks to the backends over ipv4, that's fine18:05
fungigood point. just having the lb listen on the v6 address and then adding that to dns shoulc suffice18:05
clarkbok snapshot finally completed I'm going to rm -rf /var/registry/data/docker then reboot18:10
corvusclarkb: should we move that onto /opt?18:10
clarkbcorvus: opt is slightly smaller than / for that right now18:11
clarkbits 33GB vs 36GB (ish)18:11
corvusi note you said reboot - which makes me think we ran out of system space -- and doing so would contain any problems18:11
clarkbthats true we wouldbn't need to reboot after hitting this problem if we were on /opt18:11
corvus(so if things go wrong again, we can just 'restart docker' rather than reboot)18:11
corvusya18:11
clarkbmay be worth taking the ~3GB hit for that18:11
clarkband ya that is why I am going to reboot18:11
corvusyeah, i vote take the 3gb hit18:11
corvusclarkb: maybe symlink into /opt for now, and i'll propose a change to the docker-compose file to use the new path explicitly?18:12
clarkbok18:12
clarkb/var/registry/data:/var/lib/registry is the current docker compose mount18:13
clarkbI can symlink /var/registry/data to /opt/registry/data ?18:13
clarkbya that is what I'm doing18:14
clarkbok rebooting now18:18
clarkbok up and running now18:21
clarkbI've rechecked https://review.openstack.org/#/c/652727/1 whihc should push to the registry when done18:22
clarkbnothing has written to it yet so gc complains about that. Once ^ pushes to it I expect there to be enough data in place that I can get gc sorted out18:26
*** markvoelker has quit IRC18:26
*** kjackal has quit IRC18:28
openstackgerritJames E. Blair proposed openstack-infra/system-config master: Move insecure-ci-registry data to /opt  https://review.openstack.org/65275018:29
corvusclarkb: ^ that moves everything -- i think it's compatible with your change18:29
corvusclarkb: (we will just have 2 copies of the auth data and certs after that change merges; and once we restart with that config, we can remove /var/registry entirely)18:30
*** ykarel|away has quit IRC18:30
*** kjackal has joined #openstack-infra18:57
*** bhavikdbavishi has quit IRC19:01
*** jamesmcarthur has joined #openstack-infra19:01
*** e0ne has joined #openstack-infra19:02
clarkb`sudo docker exec -it registrydocker_registry_1 registry garbage-collect --dry-run /etc/docker/registry/config.yml` works on the regitry server now19:03
clarkbI'll get a cron up to run that without the --dry-run19:03
*** markvoelker has joined #openstack-infra19:04
*** e0ne has quit IRC19:06
openstackgerritClark Boylan proposed openstack-infra/system-config master: Install a docker registry GC cron  https://review.openstack.org/65275519:09
clarkbcorvus: ^ fyi19:09
mlozahello, is point to site configuration supported in neutron vpnaas? I want to connect my workstation to a neutron router.19:10
*** ijw has quit IRC19:10
clarkbmloza: we run the developer infrastructure for openstack so aren't super familiar with running openstack itself. The best place for that question is likely #openstack-neutron19:13
clarkbor on the mailing list19:13
mlozak thanks19:14
*** eharney has quit IRC19:18
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Support fail-fast in project pipelines  https://review.openstack.org/65276419:32
*** jcoufal has quit IRC19:38
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Support fail-fast in project pipelines  https://review.openstack.org/65276419:39
*** slaweq_ has joined #openstack-infra19:47
*** dave-mccowan has joined #openstack-infra19:48
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Support fail-fast in project pipelines  https://review.openstack.org/65276419:53
fungidisappearing for a bit to grab an early dinner, but should be back soonish19:58
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Fix for orphaned DELETED nodes  https://review.openstack.org/65272920:02
*** kjackal has quit IRC20:06
mnaseris the openstack pbx connected to an actual provider to be able to be used over phone ?20:10
clarkbmnaser: yes it is. Details at https://wiki.openstack.org/wiki/Infrastructure/Conferencing20:11
*** rh-jelabarre has quit IRC20:11
mordredmnaser: I always use it via phone20:14
clarkbandroid also has sip clients but they all seem to fail ahrd at the "no username/password" setup20:15
*** dave-mccowan has quit IRC20:20
*** verdurin has quit IRC20:30
*** verdurin has joined #openstack-infra20:33
clarkbcorvus: https://review.openstack.org/#/c/652755/1 that passes tests now and adds the garbage collecting cron. Its a noop until we have the expired tag stuff, but should be safe to merge before then20:36
* clarkb reviews the move into /opt now20:36
*** jamesmcarthur has quit IRC20:37
*** pcaruana has quit IRC20:38
*** jamesmcarthur has joined #openstack-infra20:39
*** ijw has joined #openstack-infra20:41
*** sshnaidm has quit IRC20:41
openstackgerritPaul Belanger proposed openstack-infra/nodepool master: Gather host keys for connection-type network_cli  https://review.openstack.org/65277820:42
*** Goneri has quit IRC20:44
*** priteau has quit IRC20:44
mnaserclarkb, mordred: cool!  I just also read about Jitsi, seems like something neat to have as well20:45
mnaseror Jitsi Meet rather https://jitsi.org/jitsi-meet/20:45
clarkbya they have a hosted version of that that is free to use20:45
mnasercool20:46
clarkbhttps://meet.jit.si/20:46
clarkbI've heard varying feedback on how well it works, but people should feel free to try it as it is free20:47
*** dave-mccowan has joined #openstack-infra20:54
*** jamesmcarthur has quit IRC20:56
*** sshnaidm has joined #openstack-infra20:57
*** dave-mccowan has quit IRC20:57
*** gfidente|out has quit IRC20:57
*** slaweq_ is now known as slaweq20:58
clarkbkeeping an eye on the registry it is already back up to 2.6GB of disk used. Whcih is quciker than we probably want (relative to the disk size)_ but it isn't completely running away21:01
openstackgerritMerged openstack-infra/system-config master: Install a docker registry GC cron  https://review.openstack.org/65275521:04
*** raissa has joined #openstack-infra21:14
openstackgerritMerged openstack-infra/nodepool master: Implement max-servers for AWS driver  https://review.openstack.org/64947421:17
fungihow much space does it get in /opt (that's where it's been moved to, right?)?21:18
clarkb33GB21:20
fungithat's not much more than the rootfs21:21
fungibut at least it's more21:21
*** eglute has joined #openstack-infra21:27
*** jamesmcarthur has joined #openstack-infra21:27
clarkbits about the same21:28
clarkbrootfs was 36GB21:28
clarkbuseable21:28
fungiahh, but at least it won't tank the whole server when it fills up now21:29
fungigot it21:29
openstackgerritMerged openstack-infra/zuul master: encrypt: Fix SSL error when using file pubkey  https://review.openstack.org/65058921:29
*** eharney has joined #openstack-infra21:35
*** raissa has quit IRC21:40
ianwfungi: we ok to go with https://review.openstack.org/#/c/650021/ ?21:42
ianwmirror01.nrt1.arm64ci.openstack.org isn't responding still, and per previous messages from gary_perkins it looks like nrt1 is going away anyway21:43
ianwi haven't had an update on the linaro ticket21:44
fungii don't think anything is in the private hostvars/groupvars yet, i only got as far as resetting the passwords and recording details in the credentials list21:44
ianwin short, despite several arm64 clouds being wired in, we don't have anywhere to run nodes :/21:44
ianwfungi: i can update that today and babysit the change in if you want to give it a once-over21:45
fungioh, happy to, thanks!21:45
mnaserI'm a bit stumped21:47
mnaserOSA has had centos jobs freezeat the _same_ exact spot and timeout21:47
mnasersee: http://logs.openstack.org/14/652314/1/check/openstack-ansible-deploy-aio_metal-centos-7/de04e8b/job-output.txt.gz#_2019-04-15_18_01_19_100792 and http://logs.openstack.org/14/652314/1/check/openstack-ansible-deploy-aio_metal-centos-7/de04e8b/job-output.txt.gz#_2019-04-15_18_01_19_100792 and I have much more21:47
mnaserI mean, it was just cloning things .. and then it hangs .. there are some system logs but none of them seem to indicate anything wild..21:47
clarkbcould it be nested virt crashing the node? we saw that with tripleo on centos at one time21:47
clarkboh if it is crashing early then probalby not that21:48
mnaserI mean its consistently hanging at he clone21:48
mnaserthird example http://logs.openstack.org/68/652368/1/check/openstack-ansible-deploy-aio_distro_metal-centos-7/e173b19/job-output.txt.gz#_2019-04-15_17_44_02_96840621:48
mnaser2/3 jobs ran on inap, so not provided specific21:48
mnaserand log collection works fine after too, so its really curious21:49
clarkbya that implies the host networking isn't breaking21:50
clarkbit is possible that the last logged data is misleading though depending on how it crashed21:50
ianwmnaser: yeah, or the host crashing hard; i was going to suggest maybe remote log sending something like https://git.openstack.org/cgit/openstack-infra/devstack-gate/tree/functions.sh#n97221:50
*** jamesmcarthur has quit IRC21:52
openstackgerritMerged openstack-infra/zuul master: Centralize job canceling  https://review.openstack.org/64060921:52
clarkbregistry now using 4.6GB21:52
ianwclarkb: per our brief discussion yesterday : https://review.openstack.org/#/c/652472/ does skips for older puppet versions.  if you're ok with the idea like that, i can expand it to do anything currently migrated to puppet421:52
fungiianw: not sure if you saw, but i added a couple of changes under topic:letsencrypt over the weekend for hsts and caa21:53
clarkbianw: hrm I wonder if we need similar with the beaker and rspec-beaker jobs21:53
clarkbbut that seems fine to me for now21:54
*** whoami-rajat has quit IRC21:54
ianwfungi: oh cool; yeah that was like the only thing it mentioned in the ssl report21:55
*** jamesmcarthur has joined #openstack-infra21:55
*** jamesmcarthur has quit IRC21:55
ianwi guess with a redirect http:// -> https:// there's no point in not having hsts21:56
fungiwhen it's redirecting, yes. if we were also serving content under http i wouldn't have suggested it21:58
fungialso i dug deeper into dane tlsa records for letsencrypt, but that's a bit of a time bomb unless we automate server cert tlsa record generation as part of the key rotation21:59
fungiwe could pin it to the current le ca certs but in time those will age out and we'll end up with clients rejecting the connection21:59
*** jamesmcarthur has joined #openstack-infra22:00
fungii found lots of folks picking apart both options22:00
fungibut the one is a lot of extra complexity and the other is a ticking time bomb22:00
clarkbinfra-root I'm looking at cleaning up the two test server's I've built off of snapshot images. clarkb-test-lists-upgrade and clarkb-test-bridge-snapshot-boot22:01
clarkbany reason to not server delete those two servers at this point?22:01
fungii'd say you're in the best possible position to judge ;)22:01
fungibut no objection from me, no22:02
corvusclarkb: wfm22:02
*** iurygregory has quit IRC22:02
clarkbwell the bridge resize was successful and the lists server was upgraded in production so I don't think I need them anymore :)22:02
clarkbI'm deleting them now22:02
*** jtomasek has quit IRC22:03
clarkb#status log Deleted clarkb-test-bridge-snapshot-boot (b1bbdf16-0669-4275-aa6a-cec31f3ee84b) and clarkb-test-lists-upgrade (40135a0e-4067-4682-875d-9a6cec6a999b) as both tasks they were set up to test for have been completed22:04
openstackstatusclarkb: finished logging22:04
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: upgrade react and react-scripts to ^2.0.0  https://review.openstack.org/63190222:07
*** slaweq has quit IRC22:12
clarkbok removing insecure-ci-registry from the emergency.yaml file now so that the cron isntalls as well as the managed move to /opt22:14
*** mriedem has quit IRC22:14
openstackgerritMerged openstack-infra/zuul master: Reset dependent jobs when aborting paused job  https://review.openstack.org/63459722:22
*** jamesmcarthur has quit IRC22:25
openstackgerritClark Boylan proposed openstack-infra/zuul master: Fix dynamic loading of trusted layouts  https://review.openstack.org/65278722:26
openstackgerritClark Boylan proposed openstack-infra/zuul master: Config errors should not affect config-projects  https://review.openstack.org/65278822:26
corvusclarkb: fyi i have managed to delete some images from a local test registry!22:30
clarkbcorvus: woot22:30
corvusclarkb: the process still leaves some layer link files on the filesystem;22:30
corvusi'm not sure why, or how to get rid of them22:31
corvusoh, you know, a layer may be distinct from a blob22:31
fungilike, separate object reference in the api?22:32
corvusso we might be able to delete all the layers for a manifest as well as the manifest itself, but still retain the blobs if it's used in another manifest22:32
corvusfungi: yeah, that's what i'm thinking22:32
corvussince blobs can be used by more than one image (and likely will be in our case), i don't want to delete anything that might be used by something that should be retained22:32
clarkbcorvus: the garbage collect should handle that for us in theory?22:33
corvusbut i *think* that a manifest points to a list of layers which each point to a blob.  i've deleted a manifest and GC cleaned up the blobs, but left the layer->blob links22:33
corvusclarkb: yeah, it did clean up the blobs, it just left a bunch of files like this:22:34
corvuscat ./data/docker/registry/v2/repositories/gerrit/_layers/sha256/71c170c5dae2fb430e70a395ee48d0853a88d456aebb9903c8de0c3be962ab78/link22:34
corvussha256:71c170c5dae2fb430e70a395ee48d0853a88d456aebb9903c8de0c3be962ab7822:34
clarkboh I see I wasn't sure if you were manually deleting things or if you were letting the tool do it22:35
corvussorry; i deleted the manifest using the api, then gc'd and that removed the blobs22:36
clarkbgot it22:36
clarkblooks like that data is quite small in our case currently22:37
corvusyeah, i think if we ignored this problem, we would grow very slowly.  but i'll see if there's another option22:38
*** mattw4 has joined #openstack-infra22:41
openstackgerritClark Boylan proposed openstack-infra/zuul master: Add release note for broken trusted config loading fix  https://review.openstack.org/65279322:52
*** tkajinam has joined #openstack-infra22:53
ianwinfra-root: if we can look at the grafana puppet update & test skip below it @ https://review.openstack.org/#/c/652443 i can watch that today.  it's a big version jump and we don't have great rspec tests, but i think the least time-sink way is for me to just watch it closely and be ready to fix or revert22:55
*** markvoelker has quit IRC22:57
pabelangerinfra-root: it looks like nodepool-launcher isn't running on nl01.o.o23:05
*** Adri2000 has quit IRC23:06
corvus2019-04-15 22:03:39,917 DEBUG nodepool.TaskManager: Manager rax-iad ran task ComputeGetServersDetail in 1.1651818752288818s23:06
corvuslast log line23:06
corvus[38729450.875460] Out of memory: Kill process 4241 (nodepool-launch) score 807 or sacrifice child23:06
pabelangeryah, see that now23:07
pabelangerpuppet also seems to be at 100% cpu23:07
pabelangernot sure if that is related23:07
pabelangerchecking cacti.o.o23:07
corvushttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63831&rra_id=all23:07
pabelangerwow23:07
clarkbwow we OOM'd?23:08
pabelangerclarkb: yah23:08
*** markvoelker has joined #openstack-infra23:08
clarkbI dont know that the launchers have ever done that before23:08
corvuslooks like we may have introduced a bug in nov/dec?23:08
pabelangerlooks that way23:08
corvuspuppet seems to be in a busy loop23:09
corvussched_yield()                           = 023:09
corvusthat's all that strace says (repeatedly)23:09
corvusno interesting files reported by lsof23:09
corvusi can't think of any further investigation to do now; i vote we kill puppet and restart the launcher.23:10
*** Adri2000 has joined #openstack-infra23:10
pabelanger+123:10
ianw++ no threads, no children, and who knows what happened when oom kicks in23:11
pabelangerhttp://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63828&rra_id=all23:11
pabelangerCPU spiked may 201823:11
corvusroot     14604 97.5  0.0 420812   764 ?        Rs    2018 471657:52 /usr/bin/ruby /usr/bin/puppet apply /opt/system-config/production/manifests/site.pp --logdest syslog --environment production --no-noop --detailed-exitcodes23:11
corvusthat process has been running a while.23:12
pabelangeryah23:12
pabelangerheh23:12
pabelangernl02 also has puppet at 100 cpu too23:13
corvus14604 Tue May 15 06:00:49 201823:13
clarkb++ to killing puppet and restarting23:13
corvusso, er, one month shy of a year23:13
corvusok i will kill and restart now23:13
pabelangercorvus: nl02 also OOM'd23:13
pabelangertaking nodepool-launcher with it23:13
corvusnl01 is back in service23:15
corvusi'll do the same for nl0223:15
pabelangernl03 also only have 53MB free, so getting close to swapping there23:15
pabelangerbut still running23:15
pabelangersorry 59M23:16
corvus22188 Thu Dec  6 07:57:36 2018 /usr/bin/ruby /usr/bin/puppet apply /opt/system-config/production/manifests/site.pp --logdest syslog --environment prod23:17
corvusftr on nl02 ^23:17
clarkbwas puppet a big memory consumer too?23:17
pabelangerclarkb: it doesn't look like it23:17
pabelangerjust cpu23:17
clarkbzuul-executor memory use increased around that same time too fwiw23:18
clarkbI have no data suggesting that these are related but they could be23:18
corvuslet's restart nl03 pre-emptively23:18
pabelanger++23:19
corvus#status log restarted nodepool-launcher on nl01 and nl02 due to OOM; restarted n-l on nl03 due to limited memory23:20
openstackstatuscorvus: finished logging23:20
pabelangerend of oct 2018 we moved nodepool to new zk cluster23:21
*** hwoarang has quit IRC23:22
*** ijw has quit IRC23:23
*** yamamoto has joined #openstack-infra23:23
*** rcernin has joined #openstack-infra23:24
*** hwoarang has joined #openstack-infra23:27
*** yamamoto has quit IRC23:35
*** bobh has joined #openstack-infra23:40
*** hwoarang has quit IRC23:44
openstackgerritIan Wienand proposed openstack-infra/system-config master: [wip] letsencrypt update idea  https://review.openstack.org/65280123:44
corvusclarkb: i have observed this behavior: https://github.com/docker/distribution/issues/180323:45
corvusthat means if we expect to be able to push a manifest with the same sha, we need to restart the registry23:45
*** hwoarang has joined #openstack-infra23:45
clarkbouch23:45
corvuswe *might* be able to ignore that in our case, since every image we push should be brand new23:45
corvus(even on a recheck, we should get a different creation time, which is part of the config layer, and therefore in the manifest)23:46
corvusso maybe we ignore that and roll with it.  it will, however, make my testing harder :)23:46
corvusi'll need to make sure the same restriction doesn't apply to blobs themselves though (since they will be reused)23:47
clarkbhrm23:48
corvusi need to eod, i'll have to pick this up tomorrow23:49
clarkbthe fix for zuul should be merging soon23:51
clarkbI should be able to restart zuul for that. I thinkw e only need the scheduler to be restarted23:51
clarkbhrm no I think py35 just failed23:52
pabelangerclarkb: I think it might be slowness of testing23:52
pabelangerso far, I see just timeouts23:52
clarkbya its waiting for threads to close it looks like23:53
pabelangerand lost of zk connection23:53
pabelangerloss*23:53
clarkbI'll recheck23:54
clarkbpabelanger: ok ya lots of nonoderrors from zk23:57
*** sthussey has quit IRC23:57
clarkbmordred: btw it is rview that prevents login to review.o.o on mobile23:59
clarkbI don't understand why but I figure having working browser is more flexible than rview so I'd rather have that23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!