Monday, 2019-04-15

*** yamamoto has joined #openstack-infra		00:05
*** hwoarang has quit IRC		00:07
*** hwoarang has joined #openstack-infra		00:10
*** ijw has quit IRC		00:15
*** ijw has joined #openstack-infra		00:15
*** ijw has quit IRC		00:16
*** ijw has joined #openstack-infra		00:16
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: DNM: add ansible_network_os to vars https://review.openstack.org/652424	00:25
*** bobh has joined #openstack-infra		00:31
*** Goneri has joined #openstack-infra		00:35
*** gregoryo has joined #openstack-infra		00:52
*** zhurong has joined #openstack-infra		00:54
*** Goneri has quit IRC		01:03
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: Update grafana for new archive repo https://review.openstack.org/652443	01:04
ianw	clarkb: ^ hail mary change there ... maybe it works, maybe not ...	01:05
clarkb	I would +2 but am on a phone and logins dont work on mobile anymore	01:08
clarkb	so feel free to treat this note as a +2 on that I guess	01:08
ianw	clarkb: heh, not urgent :) it's 2+ years of changes so i'm not that confident it will work anyway. if it's too much effort, it seems we can turn of repo management in the existing code and just do it externally; or fork from the existing release	01:14
*** bobh has quit IRC		01:19
ianw	clarkb: hrm, so puppet3 fails -- are we dropping puppet3 jobs for puppet4 hosts now?	01:20
clarkb	I havent done that cleanup yet but I think we can	01:33
*** ijw has quit IRC		01:35
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: Update grafana for new archive repo https://review.openstack.org/652443	01:43
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: Add Puppet-Version: !X skip to apply tests https://review.openstack.org/652472	01:43
*** ykarel has joined #openstack-infra		01:52
*** masayukig has joined #openstack-infra		01:55
*** hwoarang has quit IRC		02:00
*** ijw has joined #openstack-infra		02:01
*** hwoarang has joined #openstack-infra		02:02
*** dave-mccowan has quit IRC		02:03
*** igordc has joined #openstack-infra		02:04
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: Add Puppet-Version: !X skip to apply tests https://review.openstack.org/652472	02:05
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: Update grafana for new archive repo https://review.openstack.org/652443	02:05
*** masayukig has quit IRC		02:10
*** masayukig has joined #openstack-infra		02:10
*** yamamoto has quit IRC		02:12
*** yamamoto has joined #openstack-infra		02:12
*** dave-mccowan has joined #openstack-infra		02:13
*** dave-mccowan has quit IRC		02:17
*** jamesmcarthur has joined #openstack-infra		02:19
*** jamesmcarthur has quit IRC		02:26
*** jamesmcarthur has joined #openstack-infra		02:26
*** jamesmcarthur has quit IRC		02:32
*** hwoarang has quit IRC		02:39
*** hwoarang has joined #openstack-infra		02:43
*** jamesmcarthur has joined #openstack-infra		02:46
*** jamesmcarthur has quit IRC		02:49
*** jamesmcarthur has joined #openstack-infra		02:50
*** jamesmcarthur has quit IRC		02:54
*** hwoarang has quit IRC		03:16
*** hwoarang has joined #openstack-infra		03:23
*** bhavikdbavishi has joined #openstack-infra		03:25
*** bhavikdbavishi1 has joined #openstack-infra		03:28
*** psachin has joined #openstack-infra		03:29
*** bhavikdbavishi has quit IRC		03:30
*** bhavikdbavishi1 is now known as bhavikdbavishi		03:30
*** jamesmcarthur has joined #openstack-infra		03:31
*** jamesmcarthur has quit IRC		03:32
*** ramishra has joined #openstack-infra		03:36
*** tonyb[m] has joined #openstack-infra		03:40
*** igordc has quit IRC		03:50
*** raukadah is now known as chandankumar		03:53
*** ijw has quit IRC		04:01
*** ijw_ has joined #openstack-infra		04:01
*** imacdonn has quit IRC		04:05
*** imacdonn has joined #openstack-infra		04:06
*** ykarel has quit IRC		04:15
*** udesale has joined #openstack-infra		04:24
*** ijw_ has quit IRC		04:29
*** ijw has joined #openstack-infra		04:30
*** ykarel has joined #openstack-infra		04:34
*** ykarel_ has joined #openstack-infra		04:35
*** ijw has quit IRC		04:36
*** ijw has joined #openstack-infra		04:37
*** whoami-rajat has joined #openstack-infra		04:37
*** ykarel has quit IRC		04:38
*** janki has joined #openstack-infra		04:44
*** bhavikdbavishi1 has joined #openstack-infra		04:48
*** jaosorior has joined #openstack-infra		04:48
*** hongbin has quit IRC		04:49
*** bhavikdbavishi has quit IRC		04:49
*** bhavikdbavishi1 is now known as bhavikdbavishi		04:49
*** eernst has quit IRC		04:50
*** bhavikdbavishi1 has joined #openstack-infra		04:53
*** bhavikdbavishi has quit IRC		04:54
*** bhavikdbavishi1 is now known as bhavikdbavishi		04:54
*** rcernin has quit IRC		05:08
*** rcernin has joined #openstack-infra		05:10
*** Lucas_Gray has joined #openstack-infra		05:10
*** ykarel_ is now known as ykarel		05:15
*** jtomasek has joined #openstack-infra		05:28
*** tkajinam has quit IRC		05:28
*** ramishra has quit IRC		05:38
*** quiquell\|off is now known as quiquell\|rover		05:46
*** ijw has quit IRC		05:46
*** ijw has joined #openstack-infra		05:47
*** ramishra has joined #openstack-infra		05:47
*** jbadiapa has joined #openstack-infra		05:51
*** ijw has quit IRC		05:53
*** tkajinam has joined #openstack-infra		05:54
*** Lucas_Gray has quit IRC		06:03
*** cjloader has quit IRC		06:11
openstackgerrit	OpenStack Proposal Bot proposed openstack-infra/project-config master: Normalize projects.yaml https://review.openstack.org/652568	06:15
*** kopecmartin\|off is now known as kopecmartin		06:17
AJaeger	infra-root, openstack/networking-omnipath is setup in git.o.o but not in github properly - it has no content ;(	06:18
AJaeger	could you check what's wrong, please?	06:19
*** pcaruana has joined #openstack-infra		06:19
*** dpawlik has joined #openstack-infra		06:21
*** roman_g has joined #openstack-infra		06:25
*** udesale has quit IRC		06:26
*** dpawlik has quit IRC		06:27
*** hwoarang has quit IRC		06:28
*** hwoarang has joined #openstack-infra		06:29
*** dpawlik has joined #openstack-infra		06:31
*** toabctl has joined #openstack-infra		06:32
*** tkajinam_ has joined #openstack-infra		06:48
*** e0ne has joined #openstack-infra		06:48
*** eumel8 has joined #openstack-infra		06:48
*** e0ne has quit IRC		06:49
*** tkajinam has quit IRC		06:51
*** e0ne has joined #openstack-infra		06:52
*** hwoarang has quit IRC		06:52
*** hwoarang has joined #openstack-infra		06:53
*** ijw has joined #openstack-infra		06:55
*** slaweq__ has joined #openstack-infra		06:57
*** apetrich has joined #openstack-infra		07:02
*** rcernin has quit IRC		07:05
*** ginopc has joined #openstack-infra		07:07
*** e0ne has quit IRC		07:12
frickler	AJaeger: I didn't find any obvious error, but also no indication that gerrit even tried to replicate that repo to github	07:15
*** iurygregory has joined #openstack-infra		07:17
*** udesale has joined #openstack-infra		07:17
*** kjackal has joined #openstack-infra		07:17
*** slaweq__ is now known as slaweq		07:18
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Add --check-config option to zuul scheduler https://review.openstack.org/542160	07:18
*** tosky has joined #openstack-infra		07:20
frickler	dmsimard: I do see errors related to three ara repos, though, I seem to remember that you moved them, maybe some cleanup is missing there?	07:20
*** udesale has quit IRC		07:20
*** udesale has joined #openstack-infra		07:21
*** pgaxatte has joined #openstack-infra		07:22
*** spotz has joined #openstack-infra		07:22
*** udesale has quit IRC		07:22
*** udesale has joined #openstack-infra		07:22
*** e0ne has joined #openstack-infra		07:31
*** rpittau\|afk is now known as rpittau		07:34
*** udesale has quit IRC		07:41
*** udesale has joined #openstack-infra		07:43
*** e0ne has quit IRC		07:52
*** ykarel is now known as ykarel\|lunch		07:52
*** jpich has joined #openstack-infra		07:55
*** e0ne has joined #openstack-infra		07:57
*** lucasagomes has joined #openstack-infra		07:59
*** rossella_s has joined #openstack-infra		08:08
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Add support for smart reconfigurations https://review.openstack.org/652114	08:12
*** tkajinam_ has quit IRC		08:20
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Add --check-config option to zuul scheduler https://review.openstack.org/542160	08:26
*** gfidente has joined #openstack-infra		08:28
*** gregoryo has quit IRC		08:28
*** dtantsur\|afk is now known as dtantsur		08:30
*** udesale has quit IRC		08:40
*** udesale has joined #openstack-infra		08:40
*** yboaron has joined #openstack-infra		08:42
*** dkushwaha has joined #openstack-infra		08:49
*** ykarel\|lunch is now known as ykarel		08:51
*** electrofelix has joined #openstack-infra		09:05
*** e0ne has quit IRC		09:09
openstackgerrit	Nir Magnezi proposed openstack/diskimage-builder master: Add version-less RHEL element for RHEL7 and RHEL8 https://review.openstack.org/643731	09:11
*** janki has quit IRC		09:14
*** e0ne has joined #openstack-infra		09:18
*** jpich has quit IRC		09:21
*** jpich has joined #openstack-infra		09:22
*** jpich has quit IRC		09:23
*** jpich has joined #openstack-infra		09:24
*** yamamoto has quit IRC		09:33
*** zbr has joined #openstack-infra		09:47
*** ginopc has quit IRC		09:48
*** ginopc has joined #openstack-infra		09:48
*** ramishra_ has joined #openstack-infra		09:49
*** zbr__ has quit IRC		09:50
*** panda has joined #openstack-infra		09:51
*** ramishra has quit IRC		09:52
*** udesale has quit IRC		09:54
*** udesale has joined #openstack-infra		09:55
*** e0ne has quit IRC		09:55
*** bhavikdbavishi has quit IRC		09:59
*** janki has joined #openstack-infra		10:01
*** Lucas_Gray has joined #openstack-infra		10:02
*** e0ne has joined #openstack-infra		10:06
*** yamamoto has joined #openstack-infra		10:10
frickler	infra-root: actually it might look like the patch to restrict ara replication might not be working as planned. I only see replications for -dev/-infra currently. https://review.openstack.org/#/c/650914/1/modules/openstack_project/manifests/review.pp	10:16
*** yamamoto has quit IRC		10:22
*** udesale has quit IRC		10:25
*** gnuoy has joined #openstack-infra		10:26
*** e0ne has quit IRC		10:26
*** udesale has joined #openstack-infra		10:26
openstackgerrit	Jens Harbott (frickler) proposed openstack-infra/system-config master: Revert "Disable gerrit replication to GitHub for ara/ara-infra/ara-web" https://review.openstack.org/652614	10:27
gnuoy	Hi, I landed a change recently, https://review.openstack.org/#/c/652032/ I see the merge is present on git.openstack.org ( https://git.openstack.org/cgit/openstack/charm-interface-pacemaker-remote/log/ ) but hasn't made it to the github mirror ( https://github.com/openstack/charm-interface-pacemaker-remote/commits/master ). This is a new repo so I'm wondering if I have made a mistake in the setup, if I'm being impatient or if there is genuine infra	10:29
gnuoy	issue ?	10:29
*** e0ne has joined #openstack-infra		10:35
*** kjackal has quit IRC		10:38
frickler	gnuoy: yes, the replication seems to be broken currently due to an issue on the infra side	10:39
gnuoy	ah, ok, thanks for the update	10:39
*** tbachman has quit IRC		10:43
*** ykarel is now known as ykarel\|afk		10:47
dkushwaha	getting same issue as gnuoy raised. One of patch https://review.openstack.org/#/c/651470 got merged yesterday, and able to see it in /git.openstack.org, but changes not reflecting in github	10:47
*** kjackal has joined #openstack-infra		10:58
*** yamamoto has joined #openstack-infra		11:02
*** jpich has quit IRC		11:02
*** yamamoto has quit IRC		11:06
*** quiquell\|rover is now known as quique\|rover\|eat		11:06
*** jpich has joined #openstack-infra		11:07
*** jpich has quit IRC		11:07
*** jpich has joined #openstack-infra		11:08
*** ykarel\|afk is now known as ykarel		11:10
*** bhavikdbavishi has joined #openstack-infra		11:10
*** e0ne has quit IRC		11:10
*** yamamoto has joined #openstack-infra		11:11
*** yamamoto has quit IRC		11:11
*** weshay_pto has quit IRC		11:12
*** mhu has joined #openstack-infra		11:13
*** weshay_pto has joined #openstack-infra		11:13
*** yamamoto has joined #openstack-infra		11:21
*** panda is now known as panda\|lunch		11:23
*** Wryhder has joined #openstack-infra		11:23
*** yamamoto has quit IRC		11:23
*** Lucas_Gray has quit IRC		11:24
*** Wryhder is now known as Lucas_Gray		11:24
*** ldnunes has joined #openstack-infra		11:33
*** rosmaita has joined #openstack-infra		11:36
*** ldnunes has quit IRC		11:39
*** kgiusti has joined #openstack-infra		11:39
*** yboaron has quit IRC		11:41
dmsimard	frickler: errors where ?	11:42
*** kazsh has quit IRC		11:46
*** yamamoto has joined #openstack-infra		11:46
openstackgerrit	Nir Magnezi proposed openstack/diskimage-builder master: Add version-less RHEL element for RHEL7 and RHEL8 https://review.openstack.org/643731	11:47
*** kazsh has joined #openstack-infra		11:49
*** quique\|rover\|eat is now known as quiquell\|rover		11:49
*** thomasmckay has quit IRC		11:50
frickler	dmsimard: errors in /var/log/manage_projects.log.1.gz on review01.o.o, like "manage_projects - ERROR - Problems creating openstack/ara-web, moving on."	11:53
frickler	dmsimard: also did you see the conversation above? seems replication from openstack/* to github is currently not happening. I proposed a revert of your patch in case you can see a quick fix	11:54
frickler	can't	11:54
dmsimard	Hmmm, that's not a gerrit replication error	12:02
dmsimard	That looks like jeepyb trying to create the repo	12:02
dmsimard	And it can't since it has been moved	12:02
*** rlandy has joined #openstack-infra		12:03
dmsimard	A revert would not help, it's something I didn't think about	12:03
*** rlandy is now known as rlandy\|ruck		12:03
dmsimard	replication to github should work though, I'll check it out	12:05
*** tbachman has joined #openstack-infra		12:07
*** panda\|lunch is now known as panda		12:08
*** jcoufal has joined #openstack-infra		12:10
pabelanger	dmsimard: frickler: setting has-github: false might fix it: https://github.com/openstack-infra/jeepyb/blob/master/jeepyb/cmd/manage_projects.py#L27 that is in gerrit/projects.yaml	12:10
dmsimard	yeah that's the jeepyb part	12:10
dmsimard	trying to understand why github replication isn't working	12:10
zbr	pabelanger: do you happen to know when we will be able to get rid of http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/fetch-zuul-cloner/tasks/main.yaml#n13 ?	12:11
pabelanger	jeepby does it	12:11
zbr	this is still run even by simple jobs like tox ones.	12:11
pabelanger	when we create a new project	12:12
pabelanger	zbr: job that use that role, should be updated to remove the role. It is no longer needed for zuulv3, and should just be a noop	12:12
pabelanger	dmsimard: https://github.com/openstack-infra/jeepyb/blob/master/jeepyb/cmd/manage_projects.py#L579	12:13
*** e0ne has joined #openstack-infra		12:16
dmsimard	pabelanger: I understand what I'm looking at but shouldn't gerrit be handling replication ?	12:16
dmsimard	pabelanger: I thought jeepyb was just for new projects	12:16
pabelanger	dmsimard: I think AJaeger repo is broken, because it is a new project. And gerrit hasn't replicated because that code path didn't trigger	12:17
dmsimard	yeah there's two different problems :)	12:17
pabelanger	and gerrit won't replicate a project unless you commit code or do restart	12:18
pabelanger	we should look at gerrit logs and see why the 2 other projects haven't tiggered either	12:18
pabelanger	but, possible related to jeepyb	12:18
pabelanger	can can first fix that, see if AJaeger project replicated	12:19
*** jamesmcarthur has joined #openstack-infra		12:24
dmsimard	need to brb 5 minutes, I've added review.o.o in emergency file and re-ran replication. Will do more testing on review-dev	12:25
*** rlandy\|ruck is now known as rlandy\|ruck\|mtg		12:33
*** yamamoto has quit IRC		12:33
*** Lucas_Gray has quit IRC		12:34
*** jamesmcarthur has quit IRC		12:35
*** nicolasbock has joined #openstack-infra		12:35
openstackgerrit	Sorin Sbarnea proposed opendev/base-jobs master: [POC] Remove fetch-zuul-cloner from base job https://review.openstack.org/652637	12:36
*** tosky has quit IRC		12:36
*** e0ne has quit IRC		12:40
*** Lucas_Gray has joined #openstack-infra		12:41
*** e0ne has joined #openstack-infra		12:41
fungi	my first guess with the replication issues is that the negative lookahead is not matching anything in the openstack namespace at all for some reason	12:41
fungi	we might try adjusting/removing the exclusion temporarily and see if replication starts back up again	12:42
dmsimard	fungi: that summarizes what I'm doing right now, yes	12:43
fungi	cool, consider that an endorsement of your present line of investigation ;)	12:43
dmsimard	\o/	12:44
dmsimard	I'll do more testing with the lookahead on review-dev	12:44
dmsimard	the one thing that is different with review and review-dev in that regard	12:45
dmsimard	is that review has three "projects" clause (openstack/, openstack-infra/ and openstack-dev/*) so I'll investigate along those lines	12:45
*** jamesmcarthur has joined #openstack-infra		12:46
fungi	sounds like a good place to start, yes	12:46
dmsimard	perhaps we need a single clause like: projects = openstack(-dev\|-infra)?/(?!ara$\|ara-web$\|ara-infra$).*	12:47
*** rfolco has joined #openstack-infra		12:47
dmsimard	gnuoy: thanks for letting us know, working on it	12:48
gnuoy	great, thanks dmsimard, much appreciated	12:48
*** kaiokmo has joined #openstack-infra		12:51
*** yamamoto has joined #openstack-infra		12:54
*** quiquell\|rover has quit IRC		12:55
*** quiquell has joined #openstack-infra		12:56
*** e0ne has quit IRC		12:58
*** udesale has quit IRC		12:59
openstackgerrit	David Moreau Simard proposed openstack-infra/system-config master: Add missing '^' to github replication pattern https://review.openstack.org/652644	13:00
dmsimard	fungi, frickler, pabelanger: ^	13:00
*** udesale has joined #openstack-infra		13:02
dmsimard	reproduced and fixed on review-dev	13:02
zigo	ianw: fungi: Any reason to hold on https://review.openstack.org/#/c/645574/ ?	13:02
zigo	Has debootstrap been fixed?	13:02
*** udesale has quit IRC		13:03
*** bhavikdbavishi has quit IRC		13:03
dmsimard	also matches what we have been using in rdo, ex: https://github.com/rdo-infra/review.rdoproject.org-config/blob/3663b146b6f8cd8806f98b176a437079cf8f9b78/gerrit/replication.config#L17	13:03
*** udesale has joined #openstack-infra		13:03
*** rlandy\|ruck\|mtg is now known as rlandy\|ruck		13:05
*** yboaron has joined #openstack-infra		13:06
*** Lucas_Gray has quit IRC		13:11
*** jamesmcarthur has quit IRC		13:12
dmsimard	pabelanger: there's not a single project with "has-github: false" eh	13:14
dmsimard	http://codesearch.openstack.org/?q=has-github&i=nope&files=&repos=	13:14
*** e0ne has joined #openstack-infra		13:17
dmsimard	https://github.com/openstack-infra/jeepyb/blob/c132a30732c8a96161ea5f9503491b1f5ec7a1f9/jeepyb/cmd/manage_projects.py#L573 doesn't check the value of "has-github", only if it exists ?	13:19
*** lseki has joined #openstack-infra		13:19
fungi	i think it may have originally been a flag, not a boolean	13:20
openstackgerrit	Lon Hohberger proposed openstack/diskimage-builder master: Add version-less RHEL element for RHEL7 and RHEL8 https://review.openstack.org/643731	13:20
fungi	possible it has bitrotted since we don't exercise it	13:20
dmsimard	ok, I don't think the error is fatal in any case, jeepyb just continues with the other projects	13:22
dmsimard	the fix for the gerrit replication is in 652644 and I confirmed it works	13:22
*** tbachman has left #openstack-infra		13:23
dmsimard	review is in the emergency file and it's already applied manually, need to step away and I'll be back later	13:23
*** markmcd has left #openstack-infra		13:23
dmsimard	gnuoy, AJaeger: github replication should be ok now	13:24
gnuoy	dmsimard, great, thank you	13:24
*** mriedem has joined #openstack-infra		13:24
dmsimard	it's still churning through the repos in alphabetical order but it'll get there eventually	13:24
*** jroll has quit IRC		13:25
*** jroll has joined #openstack-infra		13:26
*** e0ne has quit IRC		13:28
*** Goneri has joined #openstack-infra		13:30
*** e0ne has joined #openstack-infra		13:34
*** jamesmcarthur has joined #openstack-infra		13:35
*** eharney has joined #openstack-infra		13:37
fungi	yeah, i can't pull 652644 into gertty until replication catches back up, i think	13:41
*** yamamoto has quit IRC		13:41
*** lmiccini has quit IRC		13:42
*** rh-jelabarre has joined #openstack-infra		13:44
*** yamamoto has joined #openstack-infra		13:44
*** zzehring has joined #openstack-infra		13:45
*** jamesmcarthur_ has joined #openstack-infra		13:46
*** bnemec has joined #openstack-infra		13:46
fungi	450 tasks remaining	13:47
*** yamamoto has quit IRC		13:48
*** jamesmcarthur has quit IRC		13:49
*** priteau has joined #openstack-infra		13:51
gary_perkins	ianw: it's looking like the original arm64ci.cloud cloud is gonna have to be decommissioned soon :( I see it's currently running mirror01.nrt1.arm64ci.openstack.org. Is there anything you need to do with it prior to knocking it on the head?	13:57
*** eernst has joined #openstack-infra		13:59
gary_perkins	ianw: I'm still waiting for https://review.openstack.org/650021 to be fully approved and merged. Then you'll be able to setup a mirror there	13:59
dulek	Hey, I started seeing this stuff lately in Kuryr gates: http://logs.openstack.org/81/652581/1/check/kuryr-kubernetes-tempest-py36/88f34b9/job-output.txt.gz#_2019-04-15_07_40_16_196891	14:03
dulek	"dial tcp: lookup gcr.io on 127.0.0.1:53: server misbehaving" - any ideas?	14:03
dulek	That run above is on vexxhost, I'll check if I saw it elsewhere.	14:03
dulek	Yep, all the failures are on vexxhost.	14:04
*** udesale has quit IRC		14:05
*** rlandy\|ruck is now known as rlandy\|ruck\|mtg		14:05
*** e0ne has quit IRC		14:06
*** yboaron has quit IRC		14:08
fungi	dulek: vexxhost ca-ymq1 or sjc1 region (or both) and when did it seem to start?	14:10
fungi	i gather ipv6 was finally turned on in sjc1 over the weekend	14:10
fungi	so wondering if we're having trouble reaching nameservers	14:10
mnaser	mtl has always had ipv6	14:11
mnaser	sjc1 just recently added ipv6	14:11
dulek	fungi: Seems like sjc1 in all the cases.	14:13
mnaser	booting a test vm to check things out	14:14
fungi	if it looks like it could be a weird interaction with something kuryr jobs are doing to the server's networking then we can set an autohold for that job	14:15
mnaser	https://www.irccloud.com/pastebin/O9QTELeA/	14:15
mnaser	I think we should setup an auto hold	14:15
dulek	fungi: I'll dig it in logstash a bit.	14:15
mnaser	dulek: has it been failing starting Sunday or a bit before?	14:16
dulek	mnaser: Checking.	14:16
*** sthussey has joined #openstack-infra		14:18
dulek	mnaser: First hit: 2019-04-15T01:24:40.824+02:00	14:19
*** Lucas_Gray has joined #openstack-infra		14:21
*** e0ne has joined #openstack-infra		14:21
dulek	Should I just take a look into unbound logs? ;)	14:23
dulek	mnaser, fungi: http://logs.openstack.org/94/652394/1/check/kuryr-kubernetes-tempest-containerized/9b227b0/controller/logs/unbound_log.txt.gz	14:27
dulek	Look for timestamp 1555284324.	14:27
dulek	notice: sendto failed: Network is unreachable	14:27
dulek	notice: remote address is ip6 2001:4860:4860::8888 port 53 (len 28)	14:28
*** yboaron has joined #openstack-infra		14:28
dulek	I assume that's the culprit?	14:28
*** janki has quit IRC		14:29
fungi	that's the ipv6 equivalent of 8.8.8.8	14:29
fungi	ptr is google-public-dns-a.google.com.	14:29
openstackgerrit	Lon Hohberger proposed openstack/diskimage-builder master: Add version-less RHEL element for RHEL7 and RHEL8 https://review.openstack.org/643731	14:29
*** ykarel is now known as ykarel\|afk		14:30
fungi	but yeah, "Network is unreachable" suggests maybe the node decided it had no v6 default route	14:30
fungi	http://logs.openstack.org/94/652394/1/check/kuryr-kubernetes-tempest-containerized/9b227b0/zuul-info/zuul-info.controller.txt suggests it had working ipv6 at the start of the job though	14:31
fungi	had 1724sec remaining on that ra expiration near the start of the job	14:33
fungi	did the node start having dns lookup problems around the time the ra for its default route expired, i wonder?	14:34
*** cjloader has joined #openstack-infra		14:34
dulek	fungi: ra?	14:35
fungi	route announcement	14:36
fungi	22:59:12 to 23:25:24 http://logs.openstack.org/94/652394/1/check/kuryr-kubernetes-tempest-containerized/9b227b0/job-output.txt.gz#_2019-04-14_23_25_24_135365	14:36
*** lpetrut has joined #openstack-infra		14:36
fungi	that looks like it may have errored a little before the default route expired	14:37
dulek	I don't think we're doing anything to IPv6 routing.	14:38
dulek	And that setup is pretty standard - Neutron + OVS.	14:38
*** cgoncalves has quit IRC		14:39
*** efried is now known as efried_pto		14:39
fungi	got it, nothing fancy binding container network namespaces to the instance interfaces or anything	14:42
*** ijw has quit IRC		14:44
*** ijw has joined #openstack-infra		14:44
*** cgoncalves has joined #openstack-infra		14:44
fungi	i can temporarily set an autohold with a high count so we can recheck until we hit that, or across some number of different changes and then recheck them in parallel	14:45
fungi	i'm unfortunately not finding anything in the syslog we collected from that node	14:45
fungi	well, nothing which looks relevant that is	14:46
*** anteaya has joined #openstack-infra		14:46
*** quiquell is now known as quiquell\|off		14:46
*** armax has joined #openstack-infra		14:48
openstackgerrit	boden proposed openstack-infra/project-config master: update vmware-nsx jobs https://review.openstack.org/652680	14:51
*** rlandy\|ruck\|mtg is now known as rlandy\|ruck		14:56
openstackgerrit	Thierry Carrez proposed openstack/ptgbot master: Preserve JSON dictionary order https://review.openstack.org/652685	15:00
openstackgerrit	Thierry Carrez proposed openstack/ptgbot master: Remove last GitHub links for help https://review.openstack.org/652686	15:00
*** ijw_ has joined #openstack-infra		15:03
*** ijw has quit IRC		15:07
*** eernst has quit IRC		15:07
*** tosky has joined #openstack-infra		15:08
*** cgoncalves has quit IRC		15:10
*** e0ne has quit IRC		15:14
openstackgerrit	Merged openstack-infra/system-config master: Add missing '^' to github replication pattern https://review.openstack.org/652644	15:14
mnaser	fungi: perhaps we should send RA's more often?	15:15
*** Lucas_Gray has quit IRC		15:16
fungi	mnaser: i'm not convinced (yet) that it's necessarily to do with route expiration, though it's also possible something in the job is causing the kernel to block or ignore later announcements	15:17
mnaser	fungi: from my test vm, it just worked fine, I can ping that address	15:17
*** Lucas_Gray has joined #openstack-infra		15:17
mnaser	https://www.irccloud.com/pastebin/5LZr9JbO/	15:18
fungi	yeah, i mean it looks like v6 routing is working at the start of the job	15:18
*** cgoncalves has joined #openstack-infra		15:18
*** markvoelker has joined #openstack-infra		15:20
fungi	also... http://logs.openstack.org/94/652394/1/check/kuryr-kubernetes-tempest-containerized/9b227b0/zuul-info/inventory.yaml says ansible_host is the v6 address, so if all v6 routing broke then we wouldn't have been able to collect those logs	15:20
fungi	searching logstash for message:"server misbehaving" in the past 24 hours only turns up kuryr-kubernetes-tempest.* jobs, too	15:22
dulek	That's true.	15:23
*** e0ne has joined #openstack-infra		15:23
*** ykarel\|afk is now known as ykarel		15:26
dulek	fungi, mnaser: If you think it's something in the Kuryr-Kubernetes DevStack plugin I can just move that pull to the beginning of the job and see what happens.	15:27
*** pgaxatte has quit IRC		15:28
clarkb	could it be docker itself that has the problem?	15:28
fungi	this is what i'm wondering	15:28
clarkb	I would run a dig against 127.0.0.1 and possibly tcpdump port 53 to debug	15:29
fungi	though unbound also complains about inability to reach a global v6 address and reports a !h error	15:29
*** ykarel is now known as ykarel\|away		15:30
fungi	"notice: sendto failed: Network is unreachable" in http://logs.openstack.org/94/652394/1/check/kuryr-kubernetes-tempest-containerized/9b227b0/controller/logs/unbound_log.txt.gz	15:32
clarkb	ah probably not docker itself then	15:33
fungi	which could also be iptables udp egress rules if set to reject rather than drop	15:33
*** e0ne has quit IRC		15:34
fungi	since returning icmp-unreach is the standard way to handle that (though i'd expect us to set icmp-admin-prohibit instead for clarity)	15:34
*** ykarel\|away has quit IRC		15:35
*** josephrsandoval has joined #openstack-infra		15:36
*** cgoncalves has quit IRC		15:36
*** slaweq has quit IRC		15:36
*** woojay has joined #openstack-infra		15:38
*** gyee has joined #openstack-infra		15:38
AJaeger	dmsimard: thanks	15:38
*** cgoncalves has joined #openstack-infra		15:43
anteaya	I'm going to bumb the gerrit will be offline email I sent Friday to ensure Monday inboxes see it	15:44
anteaya	objections?	15:44
anteaya	bump*	15:44
*** auristor has quit IRC		15:47
*** igordc has joined #openstack-infra		15:47
openstackgerrit	Merged openstack-infra/project-config master: Normalize projects.yaml https://review.openstack.org/652568	15:49
*** igordc has quit IRC		15:51
clarkb	anteaya: not from	15:52
clarkb	er from me	15:52
fungi	anteaya: sounds fine to me	15:52
anteaya	done	15:52
anteaya	thank you	15:52
clarkb	fungi: is gerrit repliocation happy now?	15:52
openstackgerrit	Sorin Sbarnea proposed opendev/base-jobs master: Use standard ansible-lint config file https://review.openstack.org/652708	15:53
* clarkb catches up on the day		15:54
*** ginopc has quit IRC		15:55
*** auristor has joined #openstack-infra		15:57
*** josephrsandoval has quit IRC		15:58
fungi	clarkb: seems to be, yes	15:58
zbr	fungi: clarkb AJaeger : does any of you have experience using pre-commit tool for linting? (not to be confused with the git hook feature)	15:58
clarkb	infra-root fyi the disk is full on insecure-ci-registry.opendev.org again. So we will need to come upwith a plan for that	15:59
clarkb	zbr: no	15:59
fungi	zbr: never heard of it if it's not a git pre-commit hook. have a link to something about what you're describing?	15:59
clarkb	https://pre-commit.com/ is the tool I think	16:02
zbr	clarkb: yes, but the page does not make a good job highliting its main benefits.	16:02
*** ykarel\|away has joined #openstack-infra		16:03
fungi	ahh, yeah, i do recall seeing this now	16:04
zbr	it does address few things very well, like orchestrating multiple isolated linters	16:04
zbr	easy bumping (pre-commit autoupdate)	16:04
mordred	we've actually historically actively avoided any systemic use of local git hooks	16:04
zbr	we already use it in many tripleo-repos	16:04
zbr	it does not need git hooks	16:04
clarkb	mordred: fwiw this isn't a git hook its just the worst named tool in the world	16:05
mordred	wow	16:05
zbr	clarkb: totally agree, bad name	16:05
clarkb	but that confusion is everyones first reaction which makes me wary of suggesting we use the tool due to the name creating confusion	16:05
zbr	in fact it can install git-hook but this is totally optional	16:05
mordred	"Run pre-commit install to install pre-commit into your git hooks. pre-commit will now run on every commit. "	16:05
mordred	kk. good to know	16:05
fungi	bumping the versions of linters used seems like it would cause massive confusion. we already have projects pin their linters during eat cycle so they don't run into issues they have to fix or explicitly skip near release time	16:05
zbr	i guess that is how it started,... and the name stuck.	16:05
zbr	all linters are always pinned, no surprises.	16:06
fungi	i can't easily tell if it supports different versions of different linters per branch	16:07
zbr	the auto-update does look for new versions and bump them in config, is up to you to test and raise CR to do it.	16:07
zbr	fungi: not sure I understand?	16:07
*** lpetrut has quit IRC		16:08
zbr	there are few things that make me love it: never affected by tox bug where tox fails to update the virtualenv on bumping.	16:08
fungi	zbr: say you want to run flake8 1.2.3 on commits in one branch but flake8 2.3.4 on commits for another branch	16:09
zbr	also it does save a huge amount of disk space and time because each linter-version is shared/caches across all projects.	16:09
zbr	so if you have 20 projects using ansible-lint=1.2.3, there is only one copy on disk fully managed by it. not 20 tox repos with the same stuff in them.	16:10
zbr	i do have >100 repos cloned locally, so the .tox footprint is big.	16:10
fungi	hrm, actually i tend to only have one copy because i use git clean with great frequency and rely on a single pip cache	16:10
zbr	fungi: yep, but pip cache does not avoid recreation of the virtualenv, which also takes time. not important on CI but for devenv it saves many seconds.	16:11
zbr	anyway, before becoming annoying with my selling speach.... I can make a POC change to demo it if you want, just tell me on which repo to demostrate it.	16:12
fungi	very few if you're not installing the kitchen sink in your tox testenvs, but we have a bit of an anti-pattern of using one test-requirements.txt for all our testenvs instead	16:13
zbr	fungi: yep, kichen-sink describes very well our current used of test-reqs, where in fact we install linters in all envs only due to convenience. and ansible-lint in particular being a very heavy one.	16:14
fungi	also the greater the difference between how developers run these checks locally vs how they're run in the gate significantly increases our "but it works for me why is your ci so broken?" support burden	16:15
zbr	fungi: no difference, i can show you. in fact is the opposite, its use assures that local == CI, we know well what happens with locally outdated virtualenvs, when user needs to rememebr to do tox -r ....	16:16
fungi	i meant if the idea is to replace tox with the pre-commit tool just for local developer use and not also in the ci jobs	16:17
fungi	but yeah, curious to see how you configure it to run, say, different versions of flake8 for different branches of the same repo	16:17
zbr	fungi: example of config https://github.com/openstack/tripleo-quickstart/blob/master/.pre-commit-config.yaml	16:18
fungi	okay, that makes more sense. i was having trouble reconciling that with the fact that git pre-commit hooks apply to the entire repository	16:19
fungi	so the hook checks the configuration present in the commit	16:19
*** ramishra_ has quit IRC		16:20
fungi	and figures out which virtualenv to use based on that	16:20
zbr	please dont call it hook ;) ... i not using the hook myself, just calling it manually.	16:20
zbr	and mainly calling it from tox -e linters : https://github.com/openstack/tripleo-quickstart/blob/master/tox.ini#L43	16:20
fungi	oh, the introduction at https://pre-commit.com/ specifically describes using it as a git pre-commit hook	16:21
fungi	i guess you're suggesting a different usage pattern	16:21
*** hwoarang has quit IRC		16:21
zbr	that is why i said worst demo page ;)	16:21
clarkb	this confusion is my biggest source of hesitancy towards using the tool	16:21
clarkb	we are gonna spend lot sof time explaining this to people if we switch	16:21
zbr	clarkb: most people would not observe, their workflow is not changed at all.	16:22
zbr	calling the same tox job to lint, some kind of results.	16:22
*** hwoarang has joined #openstack-infra		16:22
*** jpich has quit IRC		16:23
fungi	anyway, from what i can see there you get the majority of those benefits by just being specific about deps in your tox testenvs rather than using one list of test requirements in all of them (for my personal projects i don't use a test-requirements.txt with tox, just different deps lines so the bare minimum is installed)	16:23
zbr	fungi: devil is in the details: to do something ~similar in tox would require you to put each linter inside a different tox environment which would make it hard to manage.	16:24
fungi	not every repository is going to want the same sets of plugins installed with flake8, for example, so having your flake8 testenv be repository+branch specific is still less messy, to me	16:24
zbr	fungi: this file is defined by each repository	16:24
fungi	i do put each linter in a different tox environment in my projects where i'm doing that, yes	16:24
fungi	zbr: but the virtualenv it uses isn't per project though, right?	16:25
*** slaweq has joined #openstack-infra		16:25
zbr	fungi: it manages its own virtualenvs which are not per project are based on hash(tool, rev)	16:26
fungi	so you'd still need different virtualenvs for each different set of flake8 plugins used for each project. i guess it's at least smart enough to figure out that if two invocations rely on the same set of flake8 plugins and versions then they can reuse a common venv	16:26
zbr	yeah, it does also has its config, i am almost sure as I had this case with extra plugins and didn't get any surprises.	16:27
fungi	anyway, since it doesn't seem this is necessary to figure out before friday's maintenance, i'm going to go back to preparing for that	16:28
*** rpittau is now known as rpittau\|afk		16:28
*** slaweq_ has joined #openstack-infra		16:29
zbr	meanwhile I found a more real subject: ERROR! the role 'push-to-intermediate-registry' was not found --- with base-jobs linters job, unrelated to the change itself. https://review.openstack.org/#/c/652708/	16:31
*** slaweq has quit IRC		16:31
fungi	also, anybody know how to disable the "vulnerable dependency" alerting on github? it's getting ridiculous. now it's e-mailing us to let us know that there's a vulnerability in ansible 2.6.0 through 2.6.13 and citing entries like https://github.com/openstack-infra/zuul-base-jobs/blob/master/test-requirements.txt#L8	16:31
fungi	zbr: yes, the registry keeps running out of disk space. several folks are trying to brainstorm ways to deal with it	16:32
*** bhavikdbavishi has joined #openstack-infra		16:32
clarkb	infra-root so looking at my notes about the insecure-ci-registry registry garbage-collect fails and we cannot run it when the server is online (I had previously only done dry runs)	16:32
clarkb	the disk has filled again	16:32
clarkb	I think our next step is to stop the registry, delete the registry contents and start it again	16:32
clarkb	then sort out how to garbage collect properly (one issue is it is apparently not safe to GC when the registry is running)	16:33
clarkb	if we'd like to debug the broken state further I can snapshot the instance first	16:33
clarkb	maybe I should do that, stop the registry, free some disk from the journal, then snapshot, then delete registry data, start service again	16:34
clarkb	if that sounds reasonable let me know	16:34
fungi	something tells me we're likely to get another crack at experiencing the broken state between now and when we get the real solution in	16:34
fungi	but go for it if you like	16:34
*** bhavikdbavishi1 has joined #openstack-infra		16:35
*** bhavikdbavishi has quit IRC		16:36
*** bhavikdbavishi1 is now known as bhavikdbavishi		16:36
clarkb	insecure-ci-registry01.opendev.org added to the emergency.yaml file so that ansible and docker compose don't undo things	16:38
openstackgerrit	Fabien Boucher proposed openstack-infra/zuul master: WIP - Pagure driver - https://pagure.io/pagure/ https://review.openstack.org/604404	16:41
*** psachin has quit IRC		16:42
*** dtantsur is now known as dtantsur\|afk		16:44
openstackgerrit	Clark Boylan proposed openstack-infra/zuul master: Fix tox.ini cover target install command https://review.openstack.org/652727	16:45
clarkb	I'm going to step out for a bit while I want to be sure ansible won't run on that host anymore	16:47
clarkb	then I'll be back to do surgery on that host	16:47
*** priteau has quit IRC		16:47
dmsimard	since https://review.openstack.org/#/c/652644/ merged, I've removed review.o.o from the emergency file	16:47
dmsimard	and with that, I'll be mostly on PTO this week -- feel free to ping but there might be increased latency or even timeouts :p	16:48
*** priteau has joined #openstack-infra		16:48
*** ijw_ has quit IRC		16:49
*** josephrsandoval has joined #openstack-infra		16:49
*** josephrsandoval has quit IRC		16:49
*** kopecmartin is now known as kopecmartin\|off		16:50
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Fix loss of ZK conn during node delete https://review.openstack.org/652729	16:51
*** e0ne has joined #openstack-infra		16:51
*** ijw has joined #openstack-infra		16:51
*** lucasagomes has quit IRC		16:57
*** ijw has quit IRC		17:00
*** ijw has joined #openstack-infra		17:00
*** gfidente is now known as gfidente\|out		17:02
*** ykarel\|away has quit IRC		17:08
*** Lucas_Gray has quit IRC		17:13
clarkb	ok I'm back now	17:16
clarkb	corvus: ^ if you are around now any objection to the proposed plan for the intermediate registry?	17:16
*** ijw has quit IRC		17:17
*** ijw has joined #openstack-infra		17:17
clarkb	alright I've stopped the registry container and am taking a snapshot now	17:22
clarkb	when that completes I'll do data deletion	17:22
clarkb	then a reboot to make things that were sad about disk happy again	17:22
*** _erlon_ has joined #openstack-infra		17:25
*** ijw has quit IRC		17:35
*** slaweq_ has quit IRC		17:37
*** ijw has joined #openstack-infra		17:38
*** yamamoto has joined #openstack-infra		17:40
*** ykarel\|away has joined #openstack-infra		17:44
*** diablo_rojo has joined #openstack-infra		17:45
*** yamamoto has quit IRC		17:45
corvus	clarkb: i'm still catching up -- i don't think a snapshot is necessary -- it should be okay (especially in a situation like this) to delete everything and restart -- the worst thing that happens if the data are missing is that a recheck will be necessary	17:49
clarkb	ah ok, well the snapshot is already happening so I'll roll with that for now then do the toehr stuff	17:49
corvus	k	17:49
corvus	clarkb: what do you think we're missing?	17:49
corvus	do we need a cron to gc? are we deleting tags at all?	17:50
clarkb	we need torun gc (whcih doesn't work right now due to errors in the data) and for gc to work we need to delete tags	17:50
clarkb	so ya we need some expiration cron that will delete older tags allowing a gc to clean them up	17:50
clarkb	then if the registry regularly corrupts itself we might need to debug that more. I do wonder if running out of disk is why that happened though	17:51
clarkb	in which case hopefully regular GCing is the fix	17:51
corvus	clarkb: ok... i can start on the tag deletion bit first -- it's similar to things i've already written... if you or someone else wants to write the run-gc-cron separately that's cool, or i can get to that after delete-tags	17:51
clarkb	I can write the gc change	17:52
clarkb	I'll confirm the comamnds I was running work properly after the cleanup	17:52
corvus	great, sounds like a plan -- that's probably a late-thisafternoon thing for me while i continue to un-vacation	17:52
fungi	so it's not wholly clear to me what the situation is there... clarkb: i thought earlier you'd said the registry has to be taken offline before it can be garbage-collected?	17:57
clarkb	fungi: ya reading more on that you only have to do that if deleting files on disk to untag blobs	17:58
clarkb	fungi: if you use the API to untag (which I expect corvus will do) then it is safe to do it onlin	17:58
clarkb	the race is in tag updates aiui	17:58
fungi	aha	17:59
fungi	that makes more sense	17:59
fungi	so the offline gc idea was more in an effort to avoid racing on tag updates	18:00
*** e0ne has quit IRC		18:00
clarkb	ya I reread the document https://medium.com/@mcvidanagama/cleanup-your-docker-registry-ef0527673e3a and realized the offline requirement was only required for one of the two methods listed	18:01
corvus	yeah, we should be able to use the api	18:01
fungi	also, not sure if either of you saw over the weekend, but we have ipv6 in sjc1 now i'm just not sure how to go about getting the gitea webservers (and load balancer?) to know about it	18:02
corvus	we do something similar for dockerhub; so my change is (i expect) to put something similar to that logic in a cron job	18:02
corvus	fungi: nice!	18:02
corvus	fungi: off the top of my head -- do you think a reboot or networking restart will pick it up?	18:03
clarkb	I think the hosts should've picked up on it via the RAs automatically	18:03
clarkb	Then we need to restart services to see that	18:03
fungi	i expect the kernel already knows, so probably restarting apache would cause it to listen on those addresses automatically unless we have specific listen directives for the v4 addresses	18:03
corvus	so maybe just restart docker on the gitea servers	18:04
corvus	and then gerrit replicate to catch anything we missed	18:04
corvus	(then similar on the LB itself)	18:04
fungi	for the lb, that's probably going to involve pulling together the list of addresses for all the gitea containers unless there's already some magic integration in place to do that	18:04
*** jamesmcarthur_ has quit IRC		18:04
corvus	oh actually... we don't really care about the gitea servers	18:05
corvus	the only thing that's public facing is the lb	18:05
clarkb	right the biggest thing is the public endpoint of haproxy	18:05
corvus	if the lb talks to the backends over ipv4, that's fine	18:05
fungi	good point. just having the lb listen on the v6 address and then adding that to dns shoulc suffice	18:05
clarkb	ok snapshot finally completed I'm going to rm -rf /var/registry/data/docker then reboot	18:10
corvus	clarkb: should we move that onto /opt?	18:10
clarkb	corvus: opt is slightly smaller than / for that right now	18:11
clarkb	its 33GB vs 36GB (ish)	18:11
corvus	i note you said reboot - which makes me think we ran out of system space -- and doing so would contain any problems	18:11
clarkb	thats true we wouldbn't need to reboot after hitting this problem if we were on /opt	18:11
corvus	(so if things go wrong again, we can just 'restart docker' rather than reboot)	18:11
corvus	ya	18:11
clarkb	may be worth taking the ~3GB hit for that	18:11
clarkb	and ya that is why I am going to reboot	18:11
corvus	yeah, i vote take the 3gb hit	18:11
corvus	clarkb: maybe symlink into /opt for now, and i'll propose a change to the docker-compose file to use the new path explicitly?	18:12
clarkb	ok	18:12
clarkb	/var/registry/data:/var/lib/registry is the current docker compose mount	18:13
clarkb	I can symlink /var/registry/data to /opt/registry/data ?	18:13
clarkb	ya that is what I'm doing	18:14
clarkb	ok rebooting now	18:18
clarkb	ok up and running now	18:21
clarkb	I've rechecked https://review.openstack.org/#/c/652727/1 whihc should push to the registry when done	18:22
clarkb	nothing has written to it yet so gc complains about that. Once ^ pushes to it I expect there to be enough data in place that I can get gc sorted out	18:26
*** markvoelker has quit IRC		18:26
*** kjackal has quit IRC		18:28
openstackgerrit	James E. Blair proposed openstack-infra/system-config master: Move insecure-ci-registry data to /opt https://review.openstack.org/652750	18:29
corvus	clarkb: ^ that moves everything -- i think it's compatible with your change	18:29
corvus	clarkb: (we will just have 2 copies of the auth data and certs after that change merges; and once we restart with that config, we can remove /var/registry entirely)	18:30
*** ykarel\|away has quit IRC		18:30
*** kjackal has joined #openstack-infra		18:57
*** bhavikdbavishi has quit IRC		19:01
*** jamesmcarthur has joined #openstack-infra		19:01
*** e0ne has joined #openstack-infra		19:02
clarkb	`sudo docker exec -it registrydocker_registry_1 registry garbage-collect --dry-run /etc/docker/registry/config.yml` works on the regitry server now	19:03
clarkb	I'll get a cron up to run that without the --dry-run	19:03
*** markvoelker has joined #openstack-infra		19:04
*** e0ne has quit IRC		19:06
openstackgerrit	Clark Boylan proposed openstack-infra/system-config master: Install a docker registry GC cron https://review.openstack.org/652755	19:09
clarkb	corvus: ^ fyi	19:09
mloza	hello, is point to site configuration supported in neutron vpnaas? I want to connect my workstation to a neutron router.	19:10
*** ijw has quit IRC		19:10
clarkb	mloza: we run the developer infrastructure for openstack so aren't super familiar with running openstack itself. The best place for that question is likely #openstack-neutron	19:13
clarkb	or on the mailing list	19:13
mloza	k thanks	19:14
*** eharney has quit IRC		19:18
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Support fail-fast in project pipelines https://review.openstack.org/652764	19:32
*** jcoufal has quit IRC		19:38
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Support fail-fast in project pipelines https://review.openstack.org/652764	19:39
*** slaweq_ has joined #openstack-infra		19:47
*** dave-mccowan has joined #openstack-infra		19:48
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Support fail-fast in project pipelines https://review.openstack.org/652764	19:53
fungi	disappearing for a bit to grab an early dinner, but should be back soonish	19:58
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Fix for orphaned DELETED nodes https://review.openstack.org/652729	20:02
*** kjackal has quit IRC		20:06
mnaser	is the openstack pbx connected to an actual provider to be able to be used over phone ?	20:10
clarkb	mnaser: yes it is. Details at https://wiki.openstack.org/wiki/Infrastructure/Conferencing	20:11
*** rh-jelabarre has quit IRC		20:11
mordred	mnaser: I always use it via phone	20:14
clarkb	android also has sip clients but they all seem to fail ahrd at the "no username/password" setup	20:15
*** dave-mccowan has quit IRC		20:20
*** verdurin has quit IRC		20:30
*** verdurin has joined #openstack-infra		20:33
clarkb	corvus: https://review.openstack.org/#/c/652755/1 that passes tests now and adds the garbage collecting cron. Its a noop until we have the expired tag stuff, but should be safe to merge before then	20:36
* clarkb reviews the move into /opt now		20:36
*** jamesmcarthur has quit IRC		20:37
*** pcaruana has quit IRC		20:38
*** jamesmcarthur has joined #openstack-infra		20:39
*** ijw has joined #openstack-infra		20:41
*** sshnaidm has quit IRC		20:41
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool master: Gather host keys for connection-type network_cli https://review.openstack.org/652778	20:42
*** Goneri has quit IRC		20:44
*** priteau has quit IRC		20:44
mnaser	clarkb, mordred: cool! I just also read about Jitsi, seems like something neat to have as well	20:45
mnaser	or Jitsi Meet rather https://jitsi.org/jitsi-meet/	20:45
clarkb	ya they have a hosted version of that that is free to use	20:45
mnaser	cool	20:46
clarkb	https://meet.jit.si/	20:46
clarkb	I've heard varying feedback on how well it works, but people should feel free to try it as it is free	20:47
*** dave-mccowan has joined #openstack-infra		20:54
*** jamesmcarthur has quit IRC		20:56
*** sshnaidm has joined #openstack-infra		20:57
*** dave-mccowan has quit IRC		20:57
*** gfidente\|out has quit IRC		20:57
*** slaweq_ is now known as slaweq		20:58
clarkb	keeping an eye on the registry it is already back up to 2.6GB of disk used. Whcih is quciker than we probably want (relative to the disk size)_ but it isn't completely running away	21:01
openstackgerrit	Merged openstack-infra/system-config master: Install a docker registry GC cron https://review.openstack.org/652755	21:04
*** raissa has joined #openstack-infra		21:14
openstackgerrit	Merged openstack-infra/nodepool master: Implement max-servers for AWS driver https://review.openstack.org/649474	21:17
fungi	how much space does it get in /opt (that's where it's been moved to, right?)?	21:18
clarkb	33GB	21:20
fungi	that's not much more than the rootfs	21:21
fungi	but at least it's more	21:21
*** eglute has joined #openstack-infra		21:27
*** jamesmcarthur has joined #openstack-infra		21:27
clarkb	its about the same	21:28
clarkb	rootfs was 36GB	21:28
clarkb	useable	21:28
fungi	ahh, but at least it won't tank the whole server when it fills up now	21:29
fungi	got it	21:29
openstackgerrit	Merged openstack-infra/zuul master: encrypt: Fix SSL error when using file pubkey https://review.openstack.org/650589	21:29
*** eharney has joined #openstack-infra		21:35
*** raissa has quit IRC		21:40
ianw	fungi: we ok to go with https://review.openstack.org/#/c/650021/ ?	21:42
ianw	mirror01.nrt1.arm64ci.openstack.org isn't responding still, and per previous messages from gary_perkins it looks like nrt1 is going away anyway	21:43
ianw	i haven't had an update on the linaro ticket	21:44
fungi	i don't think anything is in the private hostvars/groupvars yet, i only got as far as resetting the passwords and recording details in the credentials list	21:44
ianw	in short, despite several arm64 clouds being wired in, we don't have anywhere to run nodes :/	21:44
ianw	fungi: i can update that today and babysit the change in if you want to give it a once-over	21:45
fungi	oh, happy to, thanks!	21:45
mnaser	I'm a bit stumped	21:47
mnaser	OSA has had centos jobs freezeat the _same_ exact spot and timeout	21:47
mnaser	see: http://logs.openstack.org/14/652314/1/check/openstack-ansible-deploy-aio_metal-centos-7/de04e8b/job-output.txt.gz#_2019-04-15_18_01_19_100792 and http://logs.openstack.org/14/652314/1/check/openstack-ansible-deploy-aio_metal-centos-7/de04e8b/job-output.txt.gz#_2019-04-15_18_01_19_100792 and I have much more	21:47
mnaser	I mean, it was just cloning things .. and then it hangs .. there are some system logs but none of them seem to indicate anything wild..	21:47
clarkb	could it be nested virt crashing the node? we saw that with tripleo on centos at one time	21:47
clarkb	oh if it is crashing early then probalby not that	21:48
mnaser	I mean its consistently hanging at he clone	21:48
mnaser	third example http://logs.openstack.org/68/652368/1/check/openstack-ansible-deploy-aio_distro_metal-centos-7/e173b19/job-output.txt.gz#_2019-04-15_17_44_02_968406	21:48
mnaser	2/3 jobs ran on inap, so not provided specific	21:48
mnaser	and log collection works fine after too, so its really curious	21:49
clarkb	ya that implies the host networking isn't breaking	21:50
clarkb	it is possible that the last logged data is misleading though depending on how it crashed	21:50
ianw	mnaser: yeah, or the host crashing hard; i was going to suggest maybe remote log sending something like https://git.openstack.org/cgit/openstack-infra/devstack-gate/tree/functions.sh#n972	21:50
*** jamesmcarthur has quit IRC		21:52
openstackgerrit	Merged openstack-infra/zuul master: Centralize job canceling https://review.openstack.org/640609	21:52
clarkb	registry now using 4.6GB	21:52
ianw	clarkb: per our brief discussion yesterday : https://review.openstack.org/#/c/652472/ does skips for older puppet versions. if you're ok with the idea like that, i can expand it to do anything currently migrated to puppet4	21:52
fungi	ianw: not sure if you saw, but i added a couple of changes under topic:letsencrypt over the weekend for hsts and caa	21:53
clarkb	ianw: hrm I wonder if we need similar with the beaker and rspec-beaker jobs	21:53
clarkb	but that seems fine to me for now	21:54
*** whoami-rajat has quit IRC		21:54
ianw	fungi: oh cool; yeah that was like the only thing it mentioned in the ssl report	21:55
*** jamesmcarthur has joined #openstack-infra		21:55
*** jamesmcarthur has quit IRC		21:55
ianw	i guess with a redirect http:// -> https:// there's no point in not having hsts	21:56
fungi	when it's redirecting, yes. if we were also serving content under http i wouldn't have suggested it	21:58
fungi	also i dug deeper into dane tlsa records for letsencrypt, but that's a bit of a time bomb unless we automate server cert tlsa record generation as part of the key rotation	21:59
fungi	we could pin it to the current le ca certs but in time those will age out and we'll end up with clients rejecting the connection	21:59
*** jamesmcarthur has joined #openstack-infra		22:00
fungi	i found lots of folks picking apart both options	22:00
fungi	but the one is a lot of extra complexity and the other is a ticking time bomb	22:00
clarkb	infra-root I'm looking at cleaning up the two test server's I've built off of snapshot images. clarkb-test-lists-upgrade and clarkb-test-bridge-snapshot-boot	22:01
clarkb	any reason to not server delete those two servers at this point?	22:01
fungi	i'd say you're in the best possible position to judge ;)	22:01
fungi	but no objection from me, no	22:02
corvus	clarkb: wfm	22:02
*** iurygregory has quit IRC		22:02
clarkb	well the bridge resize was successful and the lists server was upgraded in production so I don't think I need them anymore :)	22:02
clarkb	I'm deleting them now	22:02
*** jtomasek has quit IRC		22:03
clarkb	#status log Deleted clarkb-test-bridge-snapshot-boot (b1bbdf16-0669-4275-aa6a-cec31f3ee84b) and clarkb-test-lists-upgrade (40135a0e-4067-4682-875d-9a6cec6a999b) as both tasks they were set up to test for have been completed	22:04
openstackstatus	clarkb: finished logging	22:04
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: upgrade react and react-scripts to ^2.0.0 https://review.openstack.org/631902	22:07
*** slaweq has quit IRC		22:12
clarkb	ok removing insecure-ci-registry from the emergency.yaml file now so that the cron isntalls as well as the managed move to /opt	22:14
*** mriedem has quit IRC		22:14
openstackgerrit	Merged openstack-infra/zuul master: Reset dependent jobs when aborting paused job https://review.openstack.org/634597	22:22
*** jamesmcarthur has quit IRC		22:25
openstackgerrit	Clark Boylan proposed openstack-infra/zuul master: Fix dynamic loading of trusted layouts https://review.openstack.org/652787	22:26
openstackgerrit	Clark Boylan proposed openstack-infra/zuul master: Config errors should not affect config-projects https://review.openstack.org/652788	22:26
corvus	clarkb: fyi i have managed to delete some images from a local test registry!	22:30
clarkb	corvus: woot	22:30
corvus	clarkb: the process still leaves some layer link files on the filesystem;	22:30
corvus	i'm not sure why, or how to get rid of them	22:31
corvus	oh, you know, a layer may be distinct from a blob	22:31
fungi	like, separate object reference in the api?	22:32
corvus	so we might be able to delete all the layers for a manifest as well as the manifest itself, but still retain the blobs if it's used in another manifest	22:32
corvus	fungi: yeah, that's what i'm thinking	22:32
corvus	since blobs can be used by more than one image (and likely will be in our case), i don't want to delete anything that might be used by something that should be retained	22:32
clarkb	corvus: the garbage collect should handle that for us in theory?	22:33
corvus	but i think that a manifest points to a list of layers which each point to a blob. i've deleted a manifest and GC cleaned up the blobs, but left the layer->blob links	22:33
corvus	clarkb: yeah, it did clean up the blobs, it just left a bunch of files like this:	22:34
corvus	cat ./data/docker/registry/v2/repositories/gerrit/_layers/sha256/71c170c5dae2fb430e70a395ee48d0853a88d456aebb9903c8de0c3be962ab78/link	22:34
corvus	sha256:71c170c5dae2fb430e70a395ee48d0853a88d456aebb9903c8de0c3be962ab78	22:34
clarkb	oh I see I wasn't sure if you were manually deleting things or if you were letting the tool do it	22:35
corvus	sorry; i deleted the manifest using the api, then gc'd and that removed the blobs	22:36
clarkb	got it	22:36
clarkb	looks like that data is quite small in our case currently	22:37
corvus	yeah, i think if we ignored this problem, we would grow very slowly. but i'll see if there's another option	22:38
*** mattw4 has joined #openstack-infra		22:41
openstackgerrit	Clark Boylan proposed openstack-infra/zuul master: Add release note for broken trusted config loading fix https://review.openstack.org/652793	22:52
*** tkajinam has joined #openstack-infra		22:53
ianw	infra-root: if we can look at the grafana puppet update & test skip below it @ https://review.openstack.org/#/c/652443 i can watch that today. it's a big version jump and we don't have great rspec tests, but i think the least time-sink way is for me to just watch it closely and be ready to fix or revert	22:55
*** markvoelker has quit IRC		22:57
pabelanger	infra-root: it looks like nodepool-launcher isn't running on nl01.o.o	23:05
*** Adri2000 has quit IRC		23:06
corvus	2019-04-15 22:03:39,917 DEBUG nodepool.TaskManager: Manager rax-iad ran task ComputeGetServersDetail in 1.1651818752288818s	23:06
corvus	last log line	23:06
corvus	[38729450.875460] Out of memory: Kill process 4241 (nodepool-launch) score 807 or sacrifice child	23:06
pabelanger	yah, see that now	23:07
pabelanger	puppet also seems to be at 100% cpu	23:07
pabelanger	not sure if that is related	23:07
pabelanger	checking cacti.o.o	23:07
corvus	http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63831&rra_id=all	23:07
pabelanger	wow	23:07
clarkb	wow we OOM'd?	23:08
pabelanger	clarkb: yah	23:08
*** markvoelker has joined #openstack-infra		23:08
clarkb	I dont know that the launchers have ever done that before	23:08
corvus	looks like we may have introduced a bug in nov/dec?	23:08
pabelanger	looks that way	23:08
corvus	puppet seems to be in a busy loop	23:09
corvus	sched_yield() = 0	23:09
corvus	that's all that strace says (repeatedly)	23:09
corvus	no interesting files reported by lsof	23:09
corvus	i can't think of any further investigation to do now; i vote we kill puppet and restart the launcher.	23:10
*** Adri2000 has joined #openstack-infra		23:10
pabelanger	+1	23:10
ianw	++ no threads, no children, and who knows what happened when oom kicks in	23:11
pabelanger	http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63828&rra_id=all	23:11
pabelanger	CPU spiked may 2018	23:11
corvus	root 14604 97.5 0.0 420812 764 ? Rs 2018 471657:52 /usr/bin/ruby /usr/bin/puppet apply /opt/system-config/production/manifests/site.pp --logdest syslog --environment production --no-noop --detailed-exitcodes	23:11
corvus	that process has been running a while.	23:12
pabelanger	yah	23:12
pabelanger	heh	23:12
pabelanger	nl02 also has puppet at 100 cpu too	23:13
corvus	14604 Tue May 15 06:00:49 2018	23:13
clarkb	++ to killing puppet and restarting	23:13
corvus	so, er, one month shy of a year	23:13
corvus	ok i will kill and restart now	23:13
pabelanger	corvus: nl02 also OOM'd	23:13
pabelanger	taking nodepool-launcher with it	23:13
corvus	nl01 is back in service	23:15
corvus	i'll do the same for nl02	23:15
pabelanger	nl03 also only have 53MB free, so getting close to swapping there	23:15
pabelanger	but still running	23:15
pabelanger	sorry 59M	23:16
corvus	22188 Thu Dec 6 07:57:36 2018 /usr/bin/ruby /usr/bin/puppet apply /opt/system-config/production/manifests/site.pp --logdest syslog --environment prod	23:17
corvus	ftr on nl02 ^	23:17
clarkb	was puppet a big memory consumer too?	23:17
pabelanger	clarkb: it doesn't look like it	23:17
pabelanger	just cpu	23:17
clarkb	zuul-executor memory use increased around that same time too fwiw	23:18
clarkb	I have no data suggesting that these are related but they could be	23:18
corvus	let's restart nl03 pre-emptively	23:18
pabelanger	++	23:19
corvus	#status log restarted nodepool-launcher on nl01 and nl02 due to OOM; restarted n-l on nl03 due to limited memory	23:20
openstackstatus	corvus: finished logging	23:20
pabelanger	end of oct 2018 we moved nodepool to new zk cluster	23:21
*** hwoarang has quit IRC		23:22
*** ijw has quit IRC		23:23
*** yamamoto has joined #openstack-infra		23:23
*** rcernin has joined #openstack-infra		23:24
*** hwoarang has joined #openstack-infra		23:27
*** yamamoto has quit IRC		23:35
*** bobh has joined #openstack-infra		23:40
*** hwoarang has quit IRC		23:44
openstackgerrit	Ian Wienand proposed openstack-infra/system-config master: [wip] letsencrypt update idea https://review.openstack.org/652801	23:44
corvus	clarkb: i have observed this behavior: https://github.com/docker/distribution/issues/1803	23:45
corvus	that means if we expect to be able to push a manifest with the same sha, we need to restart the registry	23:45
*** hwoarang has joined #openstack-infra		23:45
clarkb	ouch	23:45
corvus	we might be able to ignore that in our case, since every image we push should be brand new	23:45
corvus	(even on a recheck, we should get a different creation time, which is part of the config layer, and therefore in the manifest)	23:46
corvus	so maybe we ignore that and roll with it. it will, however, make my testing harder :)	23:46
corvus	i'll need to make sure the same restriction doesn't apply to blobs themselves though (since they will be reused)	23:47
clarkb	hrm	23:48
corvus	i need to eod, i'll have to pick this up tomorrow	23:49
clarkb	the fix for zuul should be merging soon	23:51
clarkb	I should be able to restart zuul for that. I thinkw e only need the scheduler to be restarted	23:51
clarkb	hrm no I think py35 just failed	23:52
pabelanger	clarkb: I think it might be slowness of testing	23:52
pabelanger	so far, I see just timeouts	23:52
clarkb	ya its waiting for threads to close it looks like	23:53
pabelanger	and lost of zk connection	23:53
pabelanger	loss*	23:53
clarkb	I'll recheck	23:54
clarkb	pabelanger: ok ya lots of nonoderrors from zk	23:57
*** sthussey has quit IRC		23:57
clarkb	mordred: btw it is rview that prevents login to review.o.o on mobile	23:59
clarkb	I don't understand why but I figure having working browser is more flexible than rview so I'd rather have that	23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!