Thursday, 2020-07-09

*** Dmitrii-Sh has quit IRC00:02
*** ryohayakawa has joined #opendev00:04
*** Dmitrii-Sh has joined #opendev00:07
*** rh-jelabarre has quit IRC00:43
*** rh-jelabarre has joined #opendev00:44
*** rh-jelabarre has quit IRC00:48
*** gouthamr_ has quit IRC00:59
*** gouthamr_ has joined #opendev01:05
*** diablo_rojo has quit IRC01:12
*** gouthamr_ has quit IRC01:25
*** gouthamr_ has joined #opendev01:37
mnaserhas there been any issues with promotion by any chance02:49
mnaserhttps://hub.docker.com/r/vexxhost/magnum-api/tags for example has no promotion since merging02:50
* mnaser double checks job config...02:51
mnaser`openstack-operator:images:promote:magnum` last ran 2020-06-22 19:28:2902:52
*** gouthamr_ has quit IRC02:52
mnaserhttps://zuul.opendev.org/t/vexxhost/builds?change=739774%2C702:52
mnaserlooks like the promote never got enqueued.. even for docs, this was 5 changes that merged all at once02:52
*** gouthamr_ has joined #opendev02:56
*** gouthamr_ has quit IRC03:26
*** gouthamr_ has joined #opendev03:32
*** gouthamr_ has quit IRC04:00
*** gouthamr_ has joined #opendev04:11
*** gouthamr_ has quit IRC04:29
*** gouthamr_ has joined #opendev04:30
*** gouthamr_ has quit IRC04:35
*** gouthamr_ has joined #opendev04:40
*** ysandeep|away is now known as ysandeep|rover05:03
*** marios has joined #opendev05:39
*** gouthamr_ has quit IRC06:50
*** gouthamr_ has joined #opendev06:52
*** knikolla has quit IRC06:54
*** knikolla has joined #opendev06:56
*** gouthamr_ has quit IRC07:04
*** gouthamr_ has joined #opendev07:05
*** gouthamr_ has quit IRC07:15
*** gouthamr_ has joined #opendev07:22
*** zbr7 is now known as zbr|ruck07:23
*** hashar has joined #opendev07:45
*** gouthamr_ has quit IRC07:46
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Update synchronize-repos  https://review.opendev.org/74011007:47
*** ysandeep|rover is now known as ysandeep|lunch07:52
*** gouthamr_ has joined #opendev07:52
*** ysandeep|lunch is now known as ysandeep07:53
*** fressi has quit IRC07:53
*** ysandeep is now known as ysandeep|lunch07:53
*** sshnaidm has joined #opendev07:55
*** gouthamr_ has quit IRC07:57
*** fressi has joined #opendev08:01
*** moppy has quit IRC08:01
*** moppy has joined #opendev08:01
*** gmann has quit IRC08:03
*** gmann has joined #opendev08:06
*** gouthamr_ has joined #opendev08:09
*** iurygregory has quit IRC08:10
*** jaicaa has quit IRC08:17
openstackgerritJeffrey Zhang proposed openstack/diskimage-builder master: support non-x86_64 centos7 to change DIB_DISTRIBUTION_MIRROR variable  https://review.opendev.org/74018308:18
*** jaicaa has joined #opendev08:20
*** iurygregory has joined #opendev08:24
*** tobiash has quit IRC08:26
*** ysandeep|lunch is now known as ysandeep|rover08:34
*** DSpider has joined #opendev08:36
*** tobiash has joined #opendev09:03
*** tosky has joined #opendev09:10
openstackgerritThierry Carrez proposed openstack/project-config master: maintain-github-mirror: fix dependency name  https://review.opendev.org/74019909:17
*** dtantsur|afk is now known as dtantsur09:29
*** ryohayakawa has quit IRC09:47
openstackgerritDmitry Tantsur proposed openstack/project-config master: Account for ironic bugfix branches  https://review.opendev.org/74021209:49
*** roman_g has joined #opendev09:54
*** roman_g has quit IRC10:14
*** fressi has quit IRC10:16
*** roman_g has joined #opendev10:16
openstackgerritDmitry Tantsur proposed openstack/project-config master: Account for ironic bugfix branches  https://review.opendev.org/74021210:26
*** hashar has quit IRC10:27
*** ysandeep|rover is now known as ysandeep|afk10:42
*** roman_g has quit IRC10:44
*** ysandeep|afk is now known as ysandeep|rover10:58
*** tkajinam has quit IRC11:02
noonedeadpunkfungi: so returning to the multinode issue - I localized the issue, but not sure why it is raised.. So I got these iptables rules on hold nodes http://paste.openstack.org/show/795701/11:41
noonedeadpunkand I think what happens, is that they don't allow ovs communication. as once I remove last reject rule everything becomes good11:42
noonedeadpunkbut according to them, icmp and things supposed to be good...11:43
noonedeadpunkok, so they're created with multinode-bridge role for some reason https://zuul.opendev.org/t/vexxhost/build/88fa15b6f7b4438eb1555f519aace349/log/job-output.txt#52111:47
fricklernoonedeadpunk: this has the primary node as both switch and peers, but I think it should be only switch https://9f0840c51d5746d19be2-4a01e595aa1b522e94de715b2ff31aa2.ssl.cf1.rackcdn.com/739717/9/check/ffrouting-deploy/ebd87dd/zuul-info/inventory.yaml11:55
fricklerc.f. https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/multi-node-bridge11:55
noonedeadpunkfrickler: but I have same rules on the secondary as well11:56
noonedeadpunkso they doesn't depend on switch only11:56
*** rh-jelabarre has joined #opendev11:57
fricklernoonedeadpunk: maybe the issue isn't firewall but some other setup. c.f. also this for a working setup https://7407bee42906727a6720-dc7e48b48408cf00a200e97d0ee2c855.ssl.cf5.rackcdn.com/734621/6/check/devstack-multinode/111424e/zuul-info/inventory.yaml11:57
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Use ansible_ssh_private_key_file when setting up windows  https://review.opendev.org/74025412:02
*** bhagyashris is now known as bhagyashris|brb12:04
noonedeadpunkfrickler: so I added clear-firewall role after multi-node-bridge and test pass https://review.opendev.org/#/c/739717/12..13/tests/test.yml12:04
*** zbr_ has quit IRC12:05
fricklerinfra-root: reqs tox is failing, I cannot reproduce locally and there is no change in the tox freeze, so likely something that changed on our nodes, see e.g. https://zuul.opendev.org/t/openstack/build/1b1159196b0b4b91af987ceece5f22a312:05
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Enforce BatchMode when setting up ssh for windows  https://review.opendev.org/74025412:08
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Enforce BatchMode when setting up ssh for windows  https://review.opendev.org/74025412:10
fricklernoonedeadpunk: oh, actually do you include this role? https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/multi-node-firewall12:20
noonedeadpunkfrickler: nope, I didn't. But it's included by deault without any option to omit it12:21
noonedeadpunkoh, ait12:21
noonedeadpunkI mixed up with persistent-firewall one12:21
noonedeadpunkthat's how playubook looked like https://review.opendev.org/#/c/739717/12/tests/test.yml12:22
noonedeadpunkbtw, can you release hold for that job?12:22
*** ysandeep|rover is now known as ysandeep|afk12:37
*** bhagyashris|brb is now known as bhagyashris12:37
fricklernoonedeadpunk: so I guess you might want most of the roles here, I'm not sure in detail how they might interact https://opendev.org/zuul/zuul-jobs/src/branch/master/playbooks/multinode/pre.yaml12:39
noonedeadpunkyeah, btw I saw that. Eventually firewall cleanup works for me and I guess we don't want it to interfer.12:40
noonedeadpunkSo thanks)12:40
fricklernoonedeadpunk: if you completely clean the firewall, you need to make sure that you don't have any exploitable services running like recursive DNS resolvers12:45
noonedeadpunkif they're not pre-installed with the image only...12:47
openstackgerritMerged zuul/zuul-jobs master: Enforce BatchMode when setting up ssh for windows  https://review.opendev.org/74025412:47
noonedeadpunkbut we have bgp running as part of frr (and maybe ospf as well). but not sure how exploitable that is...12:47
noonedeadpunkconsidering the job time is like 5 mins max12:48
fricklernoonedeadpunk: not sure about bgpd, we also have ntpd running by default, not sure about it's config, maybe some other infra-root has some better grasp on this12:53
fricklerdeleted the held nodes, too, thx12:53
openstackgerritMerged openstack/project-config master: maintain-github-mirror: fix dependency name  https://review.opendev.org/74019912:55
openstackgerritMerged openstack/project-config master: Publish Airship governance  https://review.opendev.org/73979012:55
openstackgerritMerged openstack/project-config master: add vexxhost/openstack-tools  https://review.opendev.org/73962712:57
*** ysandeep|afk is now known as ysandeep12:58
openstackgerritMerged zuul/zuul-jobs master: emit-job-header: add inventory hostname  https://review.opendev.org/73896313:02
frickleroh, that reqs issue is another setuptools thing it seems. passes with <=48, fails with the 49.2.0 we seem to install13:03
fricklerprometheanfire: smcginnis: ^^13:03
mordredfrickler: :(13:05
smcginnisOh joy.13:07
smcginnisfrickler: Any idea if there is a setuptools bug filed?13:11
fricklersmcginnis: I haven't found one, feel free to do so13:18
*** roman_g has joined #opendev13:18
frickleraccording to git bisect, this seems to be the first broken commit, but it gives a different failure https://github.com/pypa/setuptools/commit/78d2a3bfafd38112dc3c486cd478e4cee1f782ec13:25
fricklersmcginnis: ah, got it, we need to change the expected exception13:29
fricklersomething like http://paste.openstack.org/show/795705/ , but not sure yet how to make it work with the old one, too.13:30
*** mtreinish has quit IRC13:33
*** mtreinish has joined #opendev13:33
mordredcorvus: the container job did not fail yesterday - I rechecked it again this morning13:41
smcginnisfrickler: Maybe instead of a with block, we can just put it in a try/except and catch either one?13:53
fungimnaser: i'll take a look in the scheduler logs and see if i can work out why it wasn't enqueued14:10
funginoonedeadpunk: does ovs use any sort of encapsulation? i'm not too familiar with it. if so, though, it's possible the encapsulated traffic (whatever form that takes) might be getting blocked on the node's real network interfaces14:13
fungiwe'd really prefer if you didn't run those nodes with no packet filtering. it's a good safety net in case a job happens to add exploitable services which get caught up in a global reflection ddos or something (we get reports about that pretty regularly, unfortunately, and have to hunt down who turned off firewalling on a job and exposed a socks5 proxy or a recursive dns resolver or a memcached or...)14:15
fungione way to check would be to put the default iptables rules back but modify the default block rule to log any hits, then reproduce your ping test and see what actually got blocked by iptables14:18
*** qchris has joined #opendev14:22
fricklerfungi: well I'm pretty sure the issue is not running the multi-node-firewall role14:27
fungifrickler: ahh, so it was actually blocking the vxlan packets then probably14:32
*** roman_g has quit IRC14:33
*** roman_g has joined #opendev14:44
*** mlavalle has joined #opendev14:50
*** bhagyashris is now known as bhagyashris|dinn14:52
sgw1morning folks.  can someone help me with getting permissions to push branches to the starlingx/manifest repo, I need to create our r/stx.4.0 release branch14:56
AJaegersgw1: https://review.opendev.org/#/admin/projects/starlingx/manifest,access shows that starlingx-release has these permissions14:57
AJaegersgw1: you're part of that team, aren't you?14:57
AJaegersgw1: oh, wait - branch creation.14:58
AJaegersgw1: that needs an ACL update AFAIK14:58
*** mnasiadka has joined #opendev14:58
AJaegersgw1: https://docs.opendev.org/opendev/infra-manual/latest/creators.html#creation-of-branches explains what's needed14:59
yoctozeptothanks fungi, my spell checker must have kicked in there14:59
AJaegersgw1: so, please send a change to openstack/project-config and update the ACL14:59
sgw1Ok, I guess I should have checked there first, I had it for all the other repos15:00
sgw1AJaeger: thanks, I guess I had permission after all! Sorry for the noise15:06
fungiyoctozepto: no worries, i've heard other folks in the community actually call it "jitsu" too, so i thought there might be some general confusion over what the project is named. just trying to correct it whenever i run across that15:16
fungimnaser: so according to the scheduler's debug log 739774,7 was removed from the promote pipeline when another change for the same project merged after it. probably better if i check the most recent change merged for that repo15:19
mnaserfungi: right, but my 'promote image' doesnt happen every single time, so it shouldn't have been dequeued15:20
fungiwhen changes merge in a batch zuul is generally only going to run jobs for the first and last change when using the supercedent pipeline manager15:20
mnaseri have different images being promoted if the image was changed or not15:20
mnaserah -- i wnder if that means i have to build all the images, all the time :X15:20
fungiif you want jobs to run for every single change, the supercedent isn't what you want15:21
fungiit assumes all jobs run for all changes15:21
fungidefinitely avoid things like file filters on jobs for supercedent pipelines or you'll get inconsistent results15:21
mnaserthat would really increase the # of jobs running by a lot i guess ;(15:22
fungimnaser: one option would be to switch your promote pipeline to dependent instead of supercedent15:22
mnaserfungi: right, but i think we're using the opendev promote pipeline config15:22
fungior create a different dependent pipeline for those jobs15:22
fungioh, i thought this was in the vexxhost tenant15:23
fungizuul.Pipeline.vexxhost.promote according to the scheduler15:23
fungiso not in the opendev tenant15:23
corvusmnaser, fungi: you may want the serial pipeline manager for this: https://zuul-ci.org/docs/zuul/reference/pipeline_def.html#value-pipeline.manager.serial15:24
corvus(ie, what we use for our deploy pipeline)15:24
fungiahh, yeah, if you need to make sure only one runs at a time15:24
fungibut supercedent pipeline manager and file filters are two different (and basically incompatible) mechanisms for reducing the number of jobs you run15:25
fungiso i would pick one or the other and not attempt to combine them15:26
corvus15:27 < mordred> corvus: well - if we're not going to be waiting on subchecks now - it makes me want to consider just doing 2.13->3.2 over a weekend15:30
mordredcorvus: so - if we're not likely going to be moving anything to checks or hoping subchecks lands, I'm kind of wondering if we should consider a flag-day weekend upgrade all the way to 3.215:30
corvusmordred: were we thinking we wanted to stop at 2.16 and make sure we have all the ci stuff sorted out?15:30
*** ysandeep is now known as ysandeep|away15:30
corvuswe'll still need some kind of hideci thing, right?15:31
mordredinfra-root: email on gerrit mailing list about the google folks abandoning the checks plugin. scrollback in #zuul talks about some things realted - including a 3.x frontend plugin wikimedia is using similar to our current ci results table15:31
mordredcorvus: yeah - but maybe we can do that in the similar structure to what wikimedia has in their plugin until the new labels stuff is there?15:32
mordredor maybe we'll determine that we do need to stop at 2.16 to wait - I mean, we still need to run an upgrade test to see15:32
corvusmordred: well, i'm unclear on what we actually have prepared for 3.x -- do we have that functionality ready at all?15:32
mordrednope. we have nothing for 3.x currently15:33
mordredI'm also unclear as to how compatible the different versions of polymer are - so will a polymer plugin for 2.16/3.0 be the same as a polymer plugin for 3.2 - or will we need to write the same thing twice15:34
corvusmordred: then i think i agree with you that if our main hesitation for proceeding past 2.16 was that we needed a hideci solution and were hoping for it to be checks, that that is now changed, and we ought to be able to proceed faster, but we probably do still need a hideci solution.  we probably don't need to stop at 2.16 for it though, if we come up with something that works for 3.2 in our testing, i15:35
corvusthink we can upgrade with confidence.15:35
mordred++15:35
mordredagree15:35
corvusmordred: so i think the only remaining reason to stop at 2.16 would be luca's suggestion about acclimating folks.  but i bet folks can muddle through.  :)15:35
mordredand I think that's likely to be the "best" path forward - we can develop (and by develop, I mean steal the wikimedia's already developed thing) for 3.215:36
fungiso is it that the gerrit community is officially abandoning checks, or just discussing whether they should?15:36
mordredcorvus: yeah15:36
corvusfungi: i think it's that the google folks (who are 99% of the contributors to checks) are abandoning it15:36
mordredcorvus: fwiw - the wikimedia polymer plugin is WAY more readable and is using an actual API to get messages15:36
mordredfungi: I think it's effectivelu official15:36
mordredsince it was the google folks who were staffing it15:36
fungigot it. disappointing reversal of direction on their part15:36
mordredyeah15:37
corvusi think anyone is welcome to continue working on it if they want, but that leaves about 0 people, plus or minus two.15:37
mordredyeah. but ... who knows, maybe it's for the best because subchecks was still quite a ways off from being a thing, and we can likely adapt  https://github.com/wikimedia/puppet/blob/production/modules/gerrit/files/homedir/review_site/static/gerrit-theme.html#L195 for our use reasonably quickly?15:37
*** sgw1 has quit IRC15:44
*** sgw1 has joined #opendev15:44
*** bhagyashris|dinn is now known as bhagyashris15:49
*** roman_g has quit IRC15:50
*** diablo_rojo has joined #opendev15:51
fungiinfra-root: while the crawler-induced load we've been seeing from non-firewalled sources seems to be continuing, the volume has slowly trailed off, so we might be safe to lift the china unicom drop rules on the lb now without yet pulling the trigger on the apache proxy ua filter work: http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=66611&rra_id=all15:57
*** marios has quit IRC16:00
yoctozeptofungi: meetpad did very fine for us16:08
fungiyoctozepto: great!16:09
fungiironic is apparently using it for some get-togethers as well16:09
fungias is the osd d&i wg16:09
fungier, osf d&i wg16:09
fungiglad to see it's continuing to be useful outside the ptg context16:10
yoctozeptofungi: that's nice to know!16:12
yoctozeptofungi: what is d&i?16:12
fungidiversity and inclusion16:21
fungione of the osf board of directors chartered working groups16:21
*** mnasiadka_ has joined #opendev16:48
*** jbryce_ has joined #opendev16:48
*** mwhahaha_ has joined #opendev16:48
*** mwhahaha has quit IRC16:55
*** mnasiadka has quit IRC16:55
*** jbryce has quit IRC16:55
*** rm_work has quit IRC16:55
*** mordred has quit IRC16:55
*** jbryce_ is now known as jbryce16:55
*** mwhahaha_ is now known as mwhahaha16:55
*** mnasiadka_ is now known as mnasiadka16:55
*** Eighth_Doctor has quit IRC16:56
*** rm_work has joined #opendev16:58
*** Eighth_Doctor has joined #opendev17:03
*** sshnaidm is now known as sshnaidm|afk17:12
fungiinfra-root: based on the graph i linked a little while ago, i propose we revert the temporarily iptables rules we were using to block traffic from china unicom. i'm available to keep an eye on things for the next ~6 hours still. any objections?17:14
*** mordred has joined #opendev17:31
*** hashar has joined #opendev17:35
*** dtantsur is now known as dtantsur|afk17:45
corvusfungi: no objections17:59
fungiit looks like we can undo them with `sudo systemctl restart netfilter-persistent`18:00
fungiand if we need to put them back, it's:18:01
fungifor X in $(cat ~clarkb/china_unicom_ranges) ; do echo $X ; sudo iptables -I openstack-INPUT -j DROP -s $X ; done18:01
*** qchris has quit IRC18:09
*** qchris has joined #opendev18:22
fungiokay, i gave it an hour, no objections (thanks corvus for at least not letting me worry i was talking into a black hole). i'll proceed with the netfilter-persistent restart on gitea-lb01.opendev.org18:22
fungiand done. continuing to keep an eye on this graph since that's where we saw the first signs of impact (we were maxing out the connection limit): http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=66611&rra_id=all18:24
fungilater today i'll do a status update and follow up to the service-announce thread if we haven't needed to take other action yet18:24
*** avass has joined #opendev18:26
openstackgerritMerged openstack/project-config master: Account for ironic bugfix branches  https://review.opendev.org/74021218:33
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: add-build-sshkey: Ensure .ssh exists, enable admin authorized_keys  https://review.opendev.org/74035019:02
fungiestablished tcp connection count has increased slightly since the rules were removed, but not substantially19:04
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: add-build-sshkey: Ensure .ssh exists, enable admin authorized_keys  https://review.opendev.org/74035019:05
*** tosky has quit IRC19:26
smcginnisIf anyone has a minute, I'd love to know if this is on the right track: https://review.opendev.org/#/c/739272/20:18
fungii'm not sure either or i'd have weighed in... it's switching from distro-packaged pip to pypi-server pip to work around the fact that the ensure-pip role doesn't have a mechanism for installing multiple versions of distro-packaged pip when there are multiple distro-packaged python interpreters present20:26
smcginnisfungi: Maybe I don't understand, but it's not switching from distro to pypi. It should be installing multiple distro-packaged packages based on the versions of python requested.20:28
smcginnisAt least that's what I thought I was doing, but definitely a possibilty that I don't really know what I'm doing. :)20:28
fungioh, did we improve ensure-pip to cover that case?20:28
smcginnisThat's what I'm trying to do.20:28
fungioh! this is a different change than i was thinking of20:29
smcginnishttps://review.opendev.org/#/c/739272/2/roles/ensure-pip/tasks/Debian.yaml should be looping through and trying to apt install each specific version.20:29
fungithis one is for zuul-jobs. maybe better to discuss in #zuul20:29
smcginnisHmm, looks like I forgot to rejoin after my last reboot.20:30
fungithe change i was thinking of was where you were switching the job to use the feature of the from-pypi method which can install multiple versions of pip20:30
fungianyway, getting ready to cook dinner, but can take a look afterward20:30
*** dirk has quit IRC20:42
tbarronso gouthamr and I have been banging our heads trying to debug an issue he reported the other day20:47
tbarronwith one of our jobs where it reboots -- only on rax noded  apparently20:47
tbarronhttp://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-07-07.log.html#t2020-07-07T23:37:2420:48
tbarronhttp://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-07-08.log.html#t2020-07-08T00:29:4620:48
tbarronI'm wondering if we can get some tips on how to instrument the job effectively.20:48
*** bolg has quit IRC20:48
tbarronsyslog doesn't usually have anything relevant20:49
tbarronjournal seems to get lost before the reboot20:49
tbarronone exception on syslog.  once we saw this:20:49
tbarronhttps://zuul.opendev.org/t/openstack/build/1bc61a82ace14217807f28c5b8c9debe/log/controller/logs/syslog.txt#846320:49
tbarronBut I don't know if it's a fluke or a real clue.20:50
tbarronThat general protection fault was right before the reboot and pertained to ipv6 packet filtering iiuc20:51
tbarronby "journal gets lost" i mean we are lacking entries until after the reboot20:53
openstackgerritSean McGinnis proposed zuul/zuul-jobs master: Install venv for all platforms in ensure-pip  https://review.opendev.org/73927220:57
openstackgerritSean McGinnis proposed zuul/zuul-jobs master: Install venv for all platforms in ensure-pip  https://review.opendev.org/73927220:58
*** dirk has joined #opendev21:01
fungitbarron: using any fancy vm features? rackspace uses xen while all our other providers (as far as i know) are kvm21:02
openstackgerritSean McGinnis proposed zuul/zuul-jobs master: Install venv for all platforms in ensure-pip  https://review.opendev.org/73927221:02
fungianother difference is that rackspace nodes have something like 20gb in their rootfs and then an ephemeral disk you can format and mount (devstack mounts it on /opt for example), could it be running out of disk space?21:03
tbarronfungi: I don't think so.  This job does a bunch of lvm commands and uses kernel nfs and uses neutron dynamic routing and quagga etc. to advertise self-service ipv6 tenant networks so it's a bit "fancy"21:05
tbarronfungi: i see 40G root partition on rax vs 80 on, say, vexxhost so i was suspectng that the root disk might have filled but21:05
fungiahh, 40gb, that sounds right yeah21:06
tbarronfungi: i unstrumented a periodic task (share-service reporting pool usage to the scheduler) to log 'df -h' (and 'free' for that matter) and21:06
tbarronthere was lots of disk space and free memory right before the reboot21:07
tbarron'free' output pretty much matched what dstat was showing21:07
tbarronthe lvm action is on /opt/stack/manila/data and there was plenty of disk free there too21:08
fungidoes the job continue after the reboot? do we collect syslog from it and did that write anything?21:10
tbarronwe collect syslog before and after the boot but in general it doesn't have anything interesting except 'reboot'21:11
fungii think we also have the option to set up a remote system console stream, i forget the kernel module for that, and then capture that in case it still has working networking and is able to send a copy of the panic21:11
tbarronthe one exception was that general protection fault at https://zuul.opendev.org/t/openstack/build/1bc61a82ace14217807f28c5b8c9debe/log/controller/logs/syslog.txt#846321:11
*** dviroel has joined #opendev21:12
tbarronfungi: that console stream thing sounds useful,  I guess equivalent of 'openstack console log ...'21:13
tbarronfungi: I don't know why the journal is missing the needed entries unless perhaps the file system it writes to is getting corrupted21:14
fungior the reboot happens before fd writes can be flushed and the fs synced21:14
tbarronfungi: but the syslog after reboot isn't showing some big fsck/recovery21:15
tbarronso yeah, flush/sync may be more likely21:15
fungiahh, the lkm i was thinking of is netconsole21:15
fungiand yeah it's basically kernel console redirection to a tcp socket21:15
fungier, no udp datagrams i guess21:16
fungiso you set up a netconsole listener, then load that kernel module with appropriate parameters telling it the destination address of the netconsole stream client21:18
fungiand then on the client side you can basically just capture the udp stream and do whatever you like with it (write it to a file, et cetera)21:19
tbarronso we can run the client from whereever (e.g. my notebook) but need to run the modprobe cmd as part of the devstack setup?21:20
tbarrons/my notebook/some target with a public ip/  - and need firewalls to allow target and dest ports21:26
fungiyep, the default iptables ruleset on the job nodes should allow all egress just fine, but your client side firewall/nat would need to make sure whatever destination udp port you choose is allowed through and goes to the right place21:29
tbarronfungi: thanks for the idea, i'll try to get it working locally first, then in dsvm job21:31
fungiDocumentation/networking/netconsole.rst in the linux kernel docs, according to Documentation/networking/netconsole.rst21:31
fungier, according to https://www.kernel.org/doc/html/latest/admin-guide/serial-console.html21:32
fungiaha, here https://www.kernel.org/doc/html/latest/networking/netconsole.html21:32
fungitbarron: that ^ should be pretty easy to follow, but let us know if you run into trouble getting it to work21:33
tbarronfungi: will do, and thanks again21:33
fungimy pleasure, as always21:34
*** avass has quit IRC21:53
*** hashar has quit IRC22:13
fungi#status log The connection flood from AS4837 (China Unicom) has lessened in recent days, so we have removed its temporary access restriction for the Git service at opendev.org as of 18:24 UTC today.22:15
openstackstatusfungi: finished logging22:15
*** rh-jelabarre has quit IRC22:30
*** DSpider has quit IRC22:43
*** hrw has quit IRC22:43
*** hrw has joined #opendev22:53
*** tkajinam has joined #opendev23:00
*** mlavalle has quit IRC23:06

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!