Thursday, 2020-07-09

*** Dmitrii-Sh has quit IRC		00:02
*** ryohayakawa has joined #opendev		00:04
*** Dmitrii-Sh has joined #opendev		00:07
*** rh-jelabarre has quit IRC		00:43
*** rh-jelabarre has joined #opendev		00:44
*** rh-jelabarre has quit IRC		00:48
*** gouthamr_ has quit IRC		00:59
*** gouthamr_ has joined #opendev		01:05
*** diablo_rojo has quit IRC		01:12
*** gouthamr_ has quit IRC		01:25
*** gouthamr_ has joined #opendev		01:37
mnaser	has there been any issues with promotion by any chance	02:49
mnaser	https://hub.docker.com/r/vexxhost/magnum-api/tags for example has no promotion since merging	02:50
* mnaser double checks job config...		02:51
mnaser	`openstack-operator:images:promote:magnum` last ran 2020-06-22 19:28:29	02:52
*** gouthamr_ has quit IRC		02:52
mnaser	https://zuul.opendev.org/t/vexxhost/builds?change=739774%2C7	02:52
mnaser	looks like the promote never got enqueued.. even for docs, this was 5 changes that merged all at once	02:52
*** gouthamr_ has joined #opendev		02:56
*** gouthamr_ has quit IRC		03:26
*** gouthamr_ has joined #opendev		03:32
*** gouthamr_ has quit IRC		04:00
*** gouthamr_ has joined #opendev		04:11
*** gouthamr_ has quit IRC		04:29
*** gouthamr_ has joined #opendev		04:30
*** gouthamr_ has quit IRC		04:35
*** gouthamr_ has joined #opendev		04:40
*** ysandeep\|away is now known as ysandeep\|rover		05:03
*** marios has joined #opendev		05:39
*** gouthamr_ has quit IRC		06:50
*** gouthamr_ has joined #opendev		06:52
*** knikolla has quit IRC		06:54
*** knikolla has joined #opendev		06:56
*** gouthamr_ has quit IRC		07:04
*** gouthamr_ has joined #opendev		07:05
*** gouthamr_ has quit IRC		07:15
*** gouthamr_ has joined #opendev		07:22
*** zbr7 is now known as zbr\|ruck		07:23
*** hashar has joined #opendev		07:45
*** gouthamr_ has quit IRC		07:46
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Update synchronize-repos https://review.opendev.org/740110	07:47
*** ysandeep\|rover is now known as ysandeep\|lunch		07:52
*** gouthamr_ has joined #opendev		07:52
*** ysandeep\|lunch is now known as ysandeep		07:53
*** fressi has quit IRC		07:53
*** ysandeep is now known as ysandeep\|lunch		07:53
*** sshnaidm has joined #opendev		07:55
*** gouthamr_ has quit IRC		07:57
*** fressi has joined #opendev		08:01
*** moppy has quit IRC		08:01
*** moppy has joined #opendev		08:01
*** gmann has quit IRC		08:03
*** gmann has joined #opendev		08:06
*** gouthamr_ has joined #opendev		08:09
*** iurygregory has quit IRC		08:10
*** jaicaa has quit IRC		08:17
openstackgerrit	Jeffrey Zhang proposed openstack/diskimage-builder master: support non-x86_64 centos7 to change DIB_DISTRIBUTION_MIRROR variable https://review.opendev.org/740183	08:18
*** jaicaa has joined #opendev		08:20
*** iurygregory has joined #opendev		08:24
*** tobiash has quit IRC		08:26
*** ysandeep\|lunch is now known as ysandeep\|rover		08:34
*** DSpider has joined #opendev		08:36
*** tobiash has joined #opendev		09:03
*** tosky has joined #opendev		09:10
openstackgerrit	Thierry Carrez proposed openstack/project-config master: maintain-github-mirror: fix dependency name https://review.opendev.org/740199	09:17
*** dtantsur\|afk is now known as dtantsur		09:29
*** ryohayakawa has quit IRC		09:47
openstackgerrit	Dmitry Tantsur proposed openstack/project-config master: Account for ironic bugfix branches https://review.opendev.org/740212	09:49
*** roman_g has joined #opendev		09:54
*** roman_g has quit IRC		10:14
*** fressi has quit IRC		10:16
*** roman_g has joined #opendev		10:16
openstackgerrit	Dmitry Tantsur proposed openstack/project-config master: Account for ironic bugfix branches https://review.opendev.org/740212	10:26
*** hashar has quit IRC		10:27
*** ysandeep\|rover is now known as ysandeep\|afk		10:42
*** roman_g has quit IRC		10:44
*** ysandeep\|afk is now known as ysandeep\|rover		10:58
*** tkajinam has quit IRC		11:02
noonedeadpunk	fungi: so returning to the multinode issue - I localized the issue, but not sure why it is raised.. So I got these iptables rules on hold nodes http://paste.openstack.org/show/795701/	11:41
noonedeadpunk	and I think what happens, is that they don't allow ovs communication. as once I remove last reject rule everything becomes good	11:42
noonedeadpunk	but according to them, icmp and things supposed to be good...	11:43
noonedeadpunk	ok, so they're created with multinode-bridge role for some reason https://zuul.opendev.org/t/vexxhost/build/88fa15b6f7b4438eb1555f519aace349/log/job-output.txt#521	11:47
frickler	noonedeadpunk: this has the primary node as both switch and peers, but I think it should be only switch https://9f0840c51d5746d19be2-4a01e595aa1b522e94de715b2ff31aa2.ssl.cf1.rackcdn.com/739717/9/check/ffrouting-deploy/ebd87dd/zuul-info/inventory.yaml	11:55
frickler	c.f. https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/multi-node-bridge	11:55
noonedeadpunk	frickler: but I have same rules on the secondary as well	11:56
noonedeadpunk	so they doesn't depend on switch only	11:56
*** rh-jelabarre has joined #opendev		11:57
frickler	noonedeadpunk: maybe the issue isn't firewall but some other setup. c.f. also this for a working setup https://7407bee42906727a6720-dc7e48b48408cf00a200e97d0ee2c855.ssl.cf5.rackcdn.com/734621/6/check/devstack-multinode/111424e/zuul-info/inventory.yaml	11:57
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Use ansible_ssh_private_key_file when setting up windows https://review.opendev.org/740254	12:02
*** bhagyashris is now known as bhagyashris\|brb		12:04
noonedeadpunk	frickler: so I added clear-firewall role after multi-node-bridge and test pass https://review.opendev.org/#/c/739717/12..13/tests/test.yml	12:04
*** zbr_ has quit IRC		12:05
frickler	infra-root: reqs tox is failing, I cannot reproduce locally and there is no change in the tox freeze, so likely something that changed on our nodes, see e.g. https://zuul.opendev.org/t/openstack/build/1b1159196b0b4b91af987ceece5f22a3	12:05
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Enforce BatchMode when setting up ssh for windows https://review.opendev.org/740254	12:08
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: Enforce BatchMode when setting up ssh for windows https://review.opendev.org/740254	12:10
frickler	noonedeadpunk: oh, actually do you include this role? https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/multi-node-firewall	12:20
noonedeadpunk	frickler: nope, I didn't. But it's included by deault without any option to omit it	12:21
noonedeadpunk	oh, ait	12:21
noonedeadpunk	I mixed up with persistent-firewall one	12:21
noonedeadpunk	that's how playubook looked like https://review.opendev.org/#/c/739717/12/tests/test.yml	12:22
noonedeadpunk	btw, can you release hold for that job?	12:22
*** ysandeep\|rover is now known as ysandeep\|afk		12:37
*** bhagyashris\|brb is now known as bhagyashris		12:37
frickler	noonedeadpunk: so I guess you might want most of the roles here, I'm not sure in detail how they might interact https://opendev.org/zuul/zuul-jobs/src/branch/master/playbooks/multinode/pre.yaml	12:39
noonedeadpunk	yeah, btw I saw that. Eventually firewall cleanup works for me and I guess we don't want it to interfer.	12:40
noonedeadpunk	So thanks)	12:40
frickler	noonedeadpunk: if you completely clean the firewall, you need to make sure that you don't have any exploitable services running like recursive DNS resolvers	12:45
noonedeadpunk	if they're not pre-installed with the image only...	12:47
openstackgerrit	Merged zuul/zuul-jobs master: Enforce BatchMode when setting up ssh for windows https://review.opendev.org/740254	12:47
noonedeadpunk	but we have bgp running as part of frr (and maybe ospf as well). but not sure how exploitable that is...	12:47
noonedeadpunk	considering the job time is like 5 mins max	12:48
frickler	noonedeadpunk: not sure about bgpd, we also have ntpd running by default, not sure about it's config, maybe some other infra-root has some better grasp on this	12:53
frickler	deleted the held nodes, too, thx	12:53
openstackgerrit	Merged openstack/project-config master: maintain-github-mirror: fix dependency name https://review.opendev.org/740199	12:55
openstackgerrit	Merged openstack/project-config master: Publish Airship governance https://review.opendev.org/739790	12:55
openstackgerrit	Merged openstack/project-config master: add vexxhost/openstack-tools https://review.opendev.org/739627	12:57
*** ysandeep\|afk is now known as ysandeep		12:58
openstackgerrit	Merged zuul/zuul-jobs master: emit-job-header: add inventory hostname https://review.opendev.org/738963	13:02
frickler	oh, that reqs issue is another setuptools thing it seems. passes with <=48, fails with the 49.2.0 we seem to install	13:03
frickler	prometheanfire: smcginnis: ^^	13:03
mordred	frickler: :(	13:05
smcginnis	Oh joy.	13:07
smcginnis	frickler: Any idea if there is a setuptools bug filed?	13:11
frickler	smcginnis: I haven't found one, feel free to do so	13:18
*** roman_g has joined #opendev		13:18
frickler	according to git bisect, this seems to be the first broken commit, but it gives a different failure https://github.com/pypa/setuptools/commit/78d2a3bfafd38112dc3c486cd478e4cee1f782ec	13:25
frickler	smcginnis: ah, got it, we need to change the expected exception	13:29
frickler	something like http://paste.openstack.org/show/795705/ , but not sure yet how to make it work with the old one, too.	13:30
*** mtreinish has quit IRC		13:33
*** mtreinish has joined #opendev		13:33
mordred	corvus: the container job did not fail yesterday - I rechecked it again this morning	13:41
smcginnis	frickler: Maybe instead of a with block, we can just put it in a try/except and catch either one?	13:53
fungi	mnaser: i'll take a look in the scheduler logs and see if i can work out why it wasn't enqueued	14:10
fungi	noonedeadpunk: does ovs use any sort of encapsulation? i'm not too familiar with it. if so, though, it's possible the encapsulated traffic (whatever form that takes) might be getting blocked on the node's real network interfaces	14:13
fungi	we'd really prefer if you didn't run those nodes with no packet filtering. it's a good safety net in case a job happens to add exploitable services which get caught up in a global reflection ddos or something (we get reports about that pretty regularly, unfortunately, and have to hunt down who turned off firewalling on a job and exposed a socks5 proxy or a recursive dns resolver or a memcached or...)	14:15
fungi	one way to check would be to put the default iptables rules back but modify the default block rule to log any hits, then reproduce your ping test and see what actually got blocked by iptables	14:18
*** qchris has joined #opendev		14:22
frickler	fungi: well I'm pretty sure the issue is not running the multi-node-firewall role	14:27
fungi	frickler: ahh, so it was actually blocking the vxlan packets then probably	14:32
*** roman_g has quit IRC		14:33
*** roman_g has joined #opendev		14:44
*** mlavalle has joined #opendev		14:50
*** bhagyashris is now known as bhagyashris\|dinn		14:52
sgw1	morning folks. can someone help me with getting permissions to push branches to the starlingx/manifest repo, I need to create our r/stx.4.0 release branch	14:56
AJaeger	sgw1: https://review.opendev.org/#/admin/projects/starlingx/manifest,access shows that starlingx-release has these permissions	14:57
AJaeger	sgw1: you're part of that team, aren't you?	14:57
AJaeger	sgw1: oh, wait - branch creation.	14:58
AJaeger	sgw1: that needs an ACL update AFAIK	14:58
*** mnasiadka has joined #opendev		14:58
AJaeger	sgw1: https://docs.opendev.org/opendev/infra-manual/latest/creators.html#creation-of-branches explains what's needed	14:59
yoctozepto	thanks fungi, my spell checker must have kicked in there	14:59
AJaeger	sgw1: so, please send a change to openstack/project-config and update the ACL	14:59
sgw1	Ok, I guess I should have checked there first, I had it for all the other repos	15:00
sgw1	AJaeger: thanks, I guess I had permission after all! Sorry for the noise	15:06
fungi	yoctozepto: no worries, i've heard other folks in the community actually call it "jitsu" too, so i thought there might be some general confusion over what the project is named. just trying to correct it whenever i run across that	15:16
fungi	mnaser: so according to the scheduler's debug log 739774,7 was removed from the promote pipeline when another change for the same project merged after it. probably better if i check the most recent change merged for that repo	15:19
mnaser	fungi: right, but my 'promote image' doesnt happen every single time, so it shouldn't have been dequeued	15:20
fungi	when changes merge in a batch zuul is generally only going to run jobs for the first and last change when using the supercedent pipeline manager	15:20
mnaser	i have different images being promoted if the image was changed or not	15:20
mnaser	ah -- i wnder if that means i have to build all the images, all the time :X	15:20
fungi	if you want jobs to run for every single change, the supercedent isn't what you want	15:21
fungi	it assumes all jobs run for all changes	15:21
fungi	definitely avoid things like file filters on jobs for supercedent pipelines or you'll get inconsistent results	15:21
mnaser	that would really increase the # of jobs running by a lot i guess ;(	15:22
fungi	mnaser: one option would be to switch your promote pipeline to dependent instead of supercedent	15:22
mnaser	fungi: right, but i think we're using the opendev promote pipeline config	15:22
fungi	or create a different dependent pipeline for those jobs	15:22
fungi	oh, i thought this was in the vexxhost tenant	15:23
fungi	zuul.Pipeline.vexxhost.promote according to the scheduler	15:23
fungi	so not in the opendev tenant	15:23
corvus	mnaser, fungi: you may want the serial pipeline manager for this: https://zuul-ci.org/docs/zuul/reference/pipeline_def.html#value-pipeline.manager.serial	15:24
corvus	(ie, what we use for our deploy pipeline)	15:24
fungi	ahh, yeah, if you need to make sure only one runs at a time	15:24
fungi	but supercedent pipeline manager and file filters are two different (and basically incompatible) mechanisms for reducing the number of jobs you run	15:25
fungi	so i would pick one or the other and not attempt to combine them	15:26
corvus	15:27 < mordred> corvus: well - if we're not going to be waiting on subchecks now - it makes me want to consider just doing 2.13->3.2 over a weekend	15:30
mordred	corvus: so - if we're not likely going to be moving anything to checks or hoping subchecks lands, I'm kind of wondering if we should consider a flag-day weekend upgrade all the way to 3.2	15:30
corvus	mordred: were we thinking we wanted to stop at 2.16 and make sure we have all the ci stuff sorted out?	15:30
*** ysandeep is now known as ysandeep\|away		15:30
corvus	we'll still need some kind of hideci thing, right?	15:31
mordred	infra-root: email on gerrit mailing list about the google folks abandoning the checks plugin. scrollback in #zuul talks about some things realted - including a 3.x frontend plugin wikimedia is using similar to our current ci results table	15:31
mordred	corvus: yeah - but maybe we can do that in the similar structure to what wikimedia has in their plugin until the new labels stuff is there?	15:32
mordred	or maybe we'll determine that we do need to stop at 2.16 to wait - I mean, we still need to run an upgrade test to see	15:32
corvus	mordred: well, i'm unclear on what we actually have prepared for 3.x -- do we have that functionality ready at all?	15:32
mordred	nope. we have nothing for 3.x currently	15:33
mordred	I'm also unclear as to how compatible the different versions of polymer are - so will a polymer plugin for 2.16/3.0 be the same as a polymer plugin for 3.2 - or will we need to write the same thing twice	15:34
corvus	mordred: then i think i agree with you that if our main hesitation for proceeding past 2.16 was that we needed a hideci solution and were hoping for it to be checks, that that is now changed, and we ought to be able to proceed faster, but we probably do still need a hideci solution. we probably don't need to stop at 2.16 for it though, if we come up with something that works for 3.2 in our testing, i	15:35
corvus	think we can upgrade with confidence.	15:35
mordred	++	15:35
mordred	agree	15:35
corvus	mordred: so i think the only remaining reason to stop at 2.16 would be luca's suggestion about acclimating folks. but i bet folks can muddle through. :)	15:35
mordred	and I think that's likely to be the "best" path forward - we can develop (and by develop, I mean steal the wikimedia's already developed thing) for 3.2	15:36
fungi	so is it that the gerrit community is officially abandoning checks, or just discussing whether they should?	15:36
mordred	corvus: yeah	15:36
corvus	fungi: i think it's that the google folks (who are 99% of the contributors to checks) are abandoning it	15:36
mordred	corvus: fwiw - the wikimedia polymer plugin is WAY more readable and is using an actual API to get messages	15:36
mordred	fungi: I think it's effectivelu official	15:36
mordred	since it was the google folks who were staffing it	15:36
fungi	got it. disappointing reversal of direction on their part	15:36
mordred	yeah	15:37
corvus	i think anyone is welcome to continue working on it if they want, but that leaves about 0 people, plus or minus two.	15:37
mordred	yeah. but ... who knows, maybe it's for the best because subchecks was still quite a ways off from being a thing, and we can likely adapt https://github.com/wikimedia/puppet/blob/production/modules/gerrit/files/homedir/review_site/static/gerrit-theme.html#L195 for our use reasonably quickly?	15:37
*** sgw1 has quit IRC		15:44
*** sgw1 has joined #opendev		15:44
*** bhagyashris\|dinn is now known as bhagyashris		15:49
*** roman_g has quit IRC		15:50
*** diablo_rojo has joined #opendev		15:51
fungi	infra-root: while the crawler-induced load we've been seeing from non-firewalled sources seems to be continuing, the volume has slowly trailed off, so we might be safe to lift the china unicom drop rules on the lb now without yet pulling the trigger on the apache proxy ua filter work: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=66611&rra_id=all	15:57
*** marios has quit IRC		16:00
yoctozepto	fungi: meetpad did very fine for us	16:08
fungi	yoctozepto: great!	16:09
fungi	ironic is apparently using it for some get-togethers as well	16:09
fungi	as is the osd d&i wg	16:09
fungi	er, osf d&i wg	16:09
fungi	glad to see it's continuing to be useful outside the ptg context	16:10
yoctozepto	fungi: that's nice to know!	16:12
yoctozepto	fungi: what is d&i?	16:12
fungi	diversity and inclusion	16:21
fungi	one of the osf board of directors chartered working groups	16:21
*** mnasiadka_ has joined #opendev		16:48
*** jbryce_ has joined #opendev		16:48
*** mwhahaha_ has joined #opendev		16:48
*** mwhahaha has quit IRC		16:55
*** mnasiadka has quit IRC		16:55
*** jbryce has quit IRC		16:55
*** rm_work has quit IRC		16:55
*** mordred has quit IRC		16:55
*** jbryce_ is now known as jbryce		16:55
*** mwhahaha_ is now known as mwhahaha		16:55
*** mnasiadka_ is now known as mnasiadka		16:55
*** Eighth_Doctor has quit IRC		16:56
*** rm_work has joined #opendev		16:58
*** Eighth_Doctor has joined #opendev		17:03
*** sshnaidm is now known as sshnaidm\|afk		17:12
fungi	infra-root: based on the graph i linked a little while ago, i propose we revert the temporarily iptables rules we were using to block traffic from china unicom. i'm available to keep an eye on things for the next ~6 hours still. any objections?	17:14
*** mordred has joined #opendev		17:31
*** hashar has joined #opendev		17:35
*** dtantsur is now known as dtantsur\|afk		17:45
corvus	fungi: no objections	17:59
fungi	it looks like we can undo them with `sudo systemctl restart netfilter-persistent`	18:00
fungi	and if we need to put them back, it's:	18:01
fungi	for X in $(cat ~clarkb/china_unicom_ranges) ; do echo $X ; sudo iptables -I openstack-INPUT -j DROP -s $X ; done	18:01
*** qchris has quit IRC		18:09
*** qchris has joined #opendev		18:22
fungi	okay, i gave it an hour, no objections (thanks corvus for at least not letting me worry i was talking into a black hole). i'll proceed with the netfilter-persistent restart on gitea-lb01.opendev.org	18:22
fungi	and done. continuing to keep an eye on this graph since that's where we saw the first signs of impact (we were maxing out the connection limit): http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=66611&rra_id=all	18:24
fungi	later today i'll do a status update and follow up to the service-announce thread if we haven't needed to take other action yet	18:24
*** avass has joined #opendev		18:26
openstackgerrit	Merged openstack/project-config master: Account for ironic bugfix branches https://review.opendev.org/740212	18:33
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: add-build-sshkey: Ensure .ssh exists, enable admin authorized_keys https://review.opendev.org/740350	19:02
fungi	established tcp connection count has increased slightly since the rules were removed, but not substantially	19:04
openstackgerrit	Albin Vass proposed zuul/zuul-jobs master: add-build-sshkey: Ensure .ssh exists, enable admin authorized_keys https://review.opendev.org/740350	19:05
*** tosky has quit IRC		19:26
smcginnis	If anyone has a minute, I'd love to know if this is on the right track: https://review.opendev.org/#/c/739272/	20:18
fungi	i'm not sure either or i'd have weighed in... it's switching from distro-packaged pip to pypi-server pip to work around the fact that the ensure-pip role doesn't have a mechanism for installing multiple versions of distro-packaged pip when there are multiple distro-packaged python interpreters present	20:26
smcginnis	fungi: Maybe I don't understand, but it's not switching from distro to pypi. It should be installing multiple distro-packaged packages based on the versions of python requested.	20:28
smcginnis	At least that's what I thought I was doing, but definitely a possibilty that I don't really know what I'm doing. :)	20:28
fungi	oh, did we improve ensure-pip to cover that case?	20:28
smcginnis	That's what I'm trying to do.	20:28
fungi	oh! this is a different change than i was thinking of	20:29
smcginnis	https://review.opendev.org/#/c/739272/2/roles/ensure-pip/tasks/Debian.yaml should be looping through and trying to apt install each specific version.	20:29
fungi	this one is for zuul-jobs. maybe better to discuss in #zuul	20:29
smcginnis	Hmm, looks like I forgot to rejoin after my last reboot.	20:30
fungi	the change i was thinking of was where you were switching the job to use the feature of the from-pypi method which can install multiple versions of pip	20:30
fungi	anyway, getting ready to cook dinner, but can take a look afterward	20:30
*** dirk has quit IRC		20:42
tbarron	so gouthamr and I have been banging our heads trying to debug an issue he reported the other day	20:47
tbarron	with one of our jobs where it reboots -- only on rax noded apparently	20:47
tbarron	http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-07-07.log.html#t2020-07-07T23:37:24	20:48
tbarron	http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-07-08.log.html#t2020-07-08T00:29:46	20:48
tbarron	I'm wondering if we can get some tips on how to instrument the job effectively.	20:48
*** bolg has quit IRC		20:48
tbarron	syslog doesn't usually have anything relevant	20:49
tbarron	journal seems to get lost before the reboot	20:49
tbarron	one exception on syslog. once we saw this:	20:49
tbarron	https://zuul.opendev.org/t/openstack/build/1bc61a82ace14217807f28c5b8c9debe/log/controller/logs/syslog.txt#8463	20:49
tbarron	But I don't know if it's a fluke or a real clue.	20:50
tbarron	That general protection fault was right before the reboot and pertained to ipv6 packet filtering iiuc	20:51
tbarron	by "journal gets lost" i mean we are lacking entries until after the reboot	20:53
openstackgerrit	Sean McGinnis proposed zuul/zuul-jobs master: Install venv for all platforms in ensure-pip https://review.opendev.org/739272	20:57
openstackgerrit	Sean McGinnis proposed zuul/zuul-jobs master: Install venv for all platforms in ensure-pip https://review.opendev.org/739272	20:58
*** dirk has joined #opendev		21:01
fungi	tbarron: using any fancy vm features? rackspace uses xen while all our other providers (as far as i know) are kvm	21:02
openstackgerrit	Sean McGinnis proposed zuul/zuul-jobs master: Install venv for all platforms in ensure-pip https://review.opendev.org/739272	21:02
fungi	another difference is that rackspace nodes have something like 20gb in their rootfs and then an ephemeral disk you can format and mount (devstack mounts it on /opt for example), could it be running out of disk space?	21:03
tbarron	fungi: I don't think so. This job does a bunch of lvm commands and uses kernel nfs and uses neutron dynamic routing and quagga etc. to advertise self-service ipv6 tenant networks so it's a bit "fancy"	21:05
tbarron	fungi: i see 40G root partition on rax vs 80 on, say, vexxhost so i was suspectng that the root disk might have filled but	21:05
fungi	ahh, 40gb, that sounds right yeah	21:06
tbarron	fungi: i unstrumented a periodic task (share-service reporting pool usage to the scheduler) to log 'df -h' (and 'free' for that matter) and	21:06
tbarron	there was lots of disk space and free memory right before the reboot	21:07
tbarron	'free' output pretty much matched what dstat was showing	21:07
tbarron	the lvm action is on /opt/stack/manila/data and there was plenty of disk free there too	21:08
fungi	does the job continue after the reboot? do we collect syslog from it and did that write anything?	21:10
tbarron	we collect syslog before and after the boot but in general it doesn't have anything interesting except 'reboot'	21:11
fungi	i think we also have the option to set up a remote system console stream, i forget the kernel module for that, and then capture that in case it still has working networking and is able to send a copy of the panic	21:11
tbarron	the one exception was that general protection fault at https://zuul.opendev.org/t/openstack/build/1bc61a82ace14217807f28c5b8c9debe/log/controller/logs/syslog.txt#8463	21:11
*** dviroel has joined #opendev		21:12
tbarron	fungi: that console stream thing sounds useful, I guess equivalent of 'openstack console log ...'	21:13
tbarron	fungi: I don't know why the journal is missing the needed entries unless perhaps the file system it writes to is getting corrupted	21:14
fungi	or the reboot happens before fd writes can be flushed and the fs synced	21:14
tbarron	fungi: but the syslog after reboot isn't showing some big fsck/recovery	21:15
tbarron	so yeah, flush/sync may be more likely	21:15
fungi	ahh, the lkm i was thinking of is netconsole	21:15
fungi	and yeah it's basically kernel console redirection to a tcp socket	21:15
fungi	er, no udp datagrams i guess	21:16
fungi	so you set up a netconsole listener, then load that kernel module with appropriate parameters telling it the destination address of the netconsole stream client	21:18
fungi	and then on the client side you can basically just capture the udp stream and do whatever you like with it (write it to a file, et cetera)	21:19
tbarron	so we can run the client from whereever (e.g. my notebook) but need to run the modprobe cmd as part of the devstack setup?	21:20
tbarron	s/my notebook/some target with a public ip/ - and need firewalls to allow target and dest ports	21:26
fungi	yep, the default iptables ruleset on the job nodes should allow all egress just fine, but your client side firewall/nat would need to make sure whatever destination udp port you choose is allowed through and goes to the right place	21:29
tbarron	fungi: thanks for the idea, i'll try to get it working locally first, then in dsvm job	21:31
fungi	Documentation/networking/netconsole.rst in the linux kernel docs, according to Documentation/networking/netconsole.rst	21:31
fungi	er, according to https://www.kernel.org/doc/html/latest/admin-guide/serial-console.html	21:32
fungi	aha, here https://www.kernel.org/doc/html/latest/networking/netconsole.html	21:32
fungi	tbarron: that ^ should be pretty easy to follow, but let us know if you run into trouble getting it to work	21:33
tbarron	fungi: will do, and thanks again	21:33
fungi	my pleasure, as always	21:34
*** avass has quit IRC		21:53
*** hashar has quit IRC		22:13
fungi	#status log The connection flood from AS4837 (China Unicom) has lessened in recent days, so we have removed its temporary access restriction for the Git service at opendev.org as of 18:24 UTC today.	22:15
openstackstatus	fungi: finished logging	22:15
*** rh-jelabarre has quit IRC		22:30
*** DSpider has quit IRC		22:43
*** hrw has quit IRC		22:43
*** hrw has joined #opendev		22:53
*** tkajinam has joined #opendev		23:00
*** mlavalle has quit IRC		23:06

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!