Friday, 2020-08-07

*** ryohayakawa has joined #opendev00:05
openstackgerritClark Boylan proposed opendev/system-config master: Mirror OpenSUSE Leap 15.2  https://review.opendev.org/74525100:09
openstackgerritClark Boylan proposed openstack/project-config master: Update opensuse-15 to 15.2  https://review.opendev.org/74525200:10
clarkbthat ^ was something that I noticed when debugging the zypper issues00:10
clarkbbut I'm not sure we have room for that on our afs vicapas or the volume quota00:11
clarkbI'll WIP For now00:11
openstackgerritClark Boylan proposed opendev/system-config master: Add ansible role to manage gerritbot  https://review.opendev.org/74479500:18
fungiclarkb: we likely just need to replace 15.1 with 15.2? the plan going forward was not to have separate 15.x images, so probably don't need multiple versions mirrored either right?00:21
clarkbfungi: correct, but if we do that we'll likely break the existing opensuse-15 label for some period if time00:21
clarkb(I'm not sure we can coordinate the image build update and hte mirr updates)00:22
clarkbbut maybe that is ok?00:22
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Pre-install python3 for CentOS  https://review.opendev.org/74186800:38
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Deprecate dib-python; remove from in-tree elements  https://review.opendev.org/74187700:38
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Remove glance-registry  https://review.opendev.org/73979601:43
openstackgerritMerged openstack/diskimage-builder master: Update the tox minversion parameter.  https://review.opendev.org/73875401:45
openstackgerritMerged openstack/diskimage-builder master: Fixes DIB_IPA_CERT certificate copy issue  https://review.opendev.org/74158303:03
ianwfungi: a mail forwarding thing i've been interested in just implemented ARC which apparently gmail does : https://www.ietf.org/id/draft-ietf-dmarc-arc-usage-09.txt03:21
ianwapropos nothing; just thought that was interesting03:21
ianwif gmail does it, i guess that basically means that's the way it's done now03:21
*** ysandeep is now known as ysandeep|off04:24
*** raukadah is now known as chkumar|rover04:31
*** DSpider has joined #opendev04:58
openstackgerritIan Wienand proposed openstack/diskimage-builder master: Fedora 32 support  https://review.opendev.org/73721705:13
*** tkajinam has quit IRC05:34
*** tkajinam has joined #opendev05:35
*** tkajinam has quit IRC05:50
*** tkajinam has joined #opendev05:51
*** redrobot has quit IRC06:37
*** fressi has joined #opendev06:39
*** bhagyashris is now known as bhagyashris|off07:13
*** hashar has joined #opendev07:33
*** moppy has quit IRC08:01
*** moppy has joined #opendev08:01
*** tosky has joined #opendev08:04
*** sshnaidm|afk is now known as sshnaidm|off08:19
*** fressi has quit IRC08:26
*** fressi has joined #opendev08:57
*** tkajinam has quit IRC09:17
*** fressi has quit IRC09:21
*** dtantsur|afk is now known as dtantsur09:30
dtantsurclarkb, fungi, thank you for handling the suse issue!09:30
*** DSpider has quit IRC09:49
openstackgerritMerged openstack/diskimage-builder master: Pre-install python3 for CentOS  https://review.opendev.org/74186809:53
openstackgerritCarlos Goncalves proposed zuul/zuul-jobs master: configure-mirrors: add CentOS 8 Stream  https://review.opendev.org/73478709:54
openstackgerritMerged openstack/diskimage-builder master: Deprecate dib-python; remove from in-tree elements  https://review.opendev.org/74187709:57
openstackgerritCarlos Goncalves proposed openstack/project-config master: CentOS 8 Stream initial deployment  https://review.opendev.org/73479109:59
openstackgerritCarlos Goncalves proposed openstack/diskimage-builder master: Add support for CentOS 8 Stream cloud image  https://review.opendev.org/73724510:02
*** ryohayakawa has quit IRC10:27
fricklerinfra-root: I've seen this a couple of times now, jobs failing in pre with failure to set up swap, does anyone have an idea for that? might be provider specific https://7e827a77180c1e6e432f-3c4e8d8f712aba3e652b0cfd0c30a298.ssl.cf5.rackcdn.com/745303/1/check/barbican-dogtag-tox-functional/35bf535/job-output.txt10:35
fricklerlogstash seems this on inap, ovh and vexxhost, but only for fedora-3110:38
fricklerhttp://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%20%5C%22swapon%3A%20%2Froot%2Fswapfile%3A%20swapon%20failed%3A%20Invalid%20argument%5C%2210:39
fricklerit also seems to be hidden by retrying mostly, see e.g. https://zuul.opendev.org/t/openstack/builds?job_name=devstack-platform-fedora-latest10:42
openstackgerritCarlos Goncalves proposed openstack/project-config master: CentOS 8 Stream initial deployment  https://review.opendev.org/73479110:42
*** DSpider has joined #opendev11:06
*** fressi has joined #opendev11:22
*** stephenfin has quit IRC11:29
*** stephenfin has joined #opendev11:38
*** cloudnull has quit IRC11:42
*** cloudnull has joined #opendev11:43
*** hashar is now known as hasharLunch11:46
*** stephenfin has quit IRC12:24
openstackgerritTristan Cacqueray proposed zuul/zuul-jobs master: ensure-pip: add instructions for RedHat system  https://review.opendev.org/74375012:27
fungifrickler: apparently swapon complaining "invalid argument" in that context can mean it doesn't see expected formatting in the swapfile12:30
fungimaybe the writes from mkswap are being buffered and haven't been flushed before swapon reads from the inode?12:31
*** stephenfin has joined #opendev12:33
*** DSpider has quit IRC12:49
fricklerfungi: sounds plausible, so maybe add a retry to the swapon and/or add a sync before?13:31
*** dirk has quit IRC13:31
*** chkumar|rover is now known as raukadah13:32
fungimaybe, though since this is only cropping up on newer fedora i wonder if something has changed in the kernel/fs drivers14:07
fungibut yeah, swapfiles are a little fiddly since they're a filesystem on top of another filesystem14:08
openstackgerritKen Giusti proposed openstack/project-config master: Retire devstack-plugin-pika project  https://review.opendev.org/74534214:14
*** hasharLunch is now known as hashar14:17
openstackgerritKen Giusti proposed openstack/project-config master: Retire the devstack-plugin-zmq project  https://review.opendev.org/74534414:24
*** ysandeep|off is now known as ysandeep14:51
clarkbfrickler: fungi could it be an issue with our use of fallocate? I think some man pages say dont do that andothers do. But it hasnt been an issue for us (yet?)14:55
fungioh, maybe14:55
clarkbwe could switch f31 over to dd15:01
clarkband see if it gets better?15:01
*** Guest7899 has joined #opendev15:03
fricklerclarkb: fungi: the manpages for mkswap+swapon on fedora31 both suggest to use dd as the preferred solution15:05
clarkbwe avoided dd because it is quite a bit slower15:06
fungiuse dd instead fallocate so we prewrite zeroes?15:06
clarkbbut that may be better if more reliable15:06
clarkbyes15:06
fungislower and also uses up more of the rootfs if swap winds up not being needed15:06
*** Guest7899 is now known as redrobot15:06
fungithen again, it guards against kernel panic for cases where the rootfs fills up and then something tries to page out15:07
fricklerwe might also want to reduce swap file size, 8GB seems excessive, maybe 1 or 2G would suffice?15:08
fricklerif a job needs to swap more, it's very likely to timeout anyway I guess15:09
clarkbya that may be a reasonable compromise15:09
openstackgerritMerged opendev/gerritbot master: Switch to stestr, declare Python 3.7 compatibility  https://review.opendev.org/73059415:09
fungithe idea was to have plenty so that we get legitimate errors rather than oom, but that's a fine line to walk because it's just as likely swap thrash skyrockets iowait and your job times out15:10
*** fressi has left #opendev15:23
*** sgw1 has joined #opendev15:36
*** bolg has quit IRC15:37
*** hashar has quit IRC15:46
*** DSpider has joined #opendev16:03
*** dtantsur is now known as dtantsur|afk16:04
toskytalking about legacy jobs, so far I've been looking mostly into the descendents of legacy-dsvm-base, which are more impacted by the removal of devstack-gate, but now I checked also legacy-base and its children16:06
toskyso in addition to legacy-dsvm-os-loganalyze (which, if I understand it correctly it's going to be retired)16:06
clarkbI don't know anyone has volunteered to retire it yet, but yes that repo is basically eol16:07
toskyI found also that opendev/puppet-openstack_infra_spec_helper and opendev/sandbox depend on a legacy job each (legacy-puppet-openstack-infra-spec-helper-unit and legacy-sandbox-tag)16:07
clarkbkeeping it up may help some third party ci operators but the d-g jobs aren't critical to that16:07
johnsomHi everyone, we may have a mirror issue at limestone: https://review.opendev.org/#/c/685337/ This patch is failing a bunch of jobs as it isn't finding keystonemiddleware 9.1.0 which is on pypi, but maybe not in the limestone mirror?16:07
toskyI've also found a few simple legacy jobs in the osf/ namespace, namely in osf/groups, and I'm not sure who is in charge for that16:08
clarkbjohnsom: as I just mentioned in #openstack-qa we don't actually mirror pypi anymore (haven't for years). It is just a caching proxy to pypi16:08
clarkbtosky: sandbox was likely just someone adding a job and we can remove it (sandbox isn't real code its literally a push to gerrit sandbox)16:08
johnsomAh, I guess others are seeing this too. I will look at the scroll back in -qa, thanks!16:09
clarkbtosky: I wonder what infra_spec_helper does with d-g. That one is a bit unexpected as our rspec jobs are not doing anything with devstack16:09
clarkbjohnsom: well it may still be an issue you're the first one to link to any job that have been affected16:09
toskyclarkb: oh, that job does not use d-g; I just mentioned it for completeness16:09
clarkbtosky: oh I see it is a legacy converted job but not d-g16:09
clarkbtosky: I think we can leave that there. We are slowly replacing puppet and that repo will go away when puppet is no longer used16:10
clarkb(and if it isn't using d-g there is less concern of d-g being a maintenance issue)16:10
toskytotally fine by me, I though it made sense to report it :)16:11
*** dtantsur|afk has quit IRC16:11
johnsomWell, we have a few examples. lol16:11
clarkbjohnsom: http://mirror.regionone.limestone.opendev.org/pypi/simple/keystonemiddleware/ the package is listed there now and I am able to download the sdist and wheel16:12
clarkbmy hunch is that this was a pypi issue16:12
johnsomOk, I will fire off some rechecks and see what happens. I will let you know16:13
clarkb(it is also possible that the issue is persisting but only for a subset of requests if it is a specifc pypi cdn node that is a problem)16:13
clarkbwe have seen this before16:13
johnsomYeah, I can run this locally without issue16:13
clarkbin the past there was a pypi api request we could make to flush cdn entries for specific records16:17
clarkbunfortauntely I don't think that is exposed anymore?16:17
*** ysandeep is now known as ysandeep|off16:19
openstackgerritClark Boylan proposed opendev/system-config master: Add ansible role to manage gerritbot  https://review.opendev.org/74479516:22
clarkbjohnsom: did any of your failures run outside of limestone?16:24
clarkbwe might be able to triangulate a bad node if so16:24
johnsomAll that I checked were limestone, but let me take a look at a few more16:24
clarkbjohnsom: also note there are a few valid failures in there16:25
clarkb(so you may not just want to recheck)16:25
johnsomYeah, I see those too16:25
toskyI've hit the issue just once, and it was limestone16:26
toskyall the other jobs and the recheck passed16:26
toskyhttps://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_599/745321/2/check/barbican-tempest-plugin-simply-crypto/599905c/job-output.txt16:26
clarkbwe may still be able to identify bad nodes by directing requests to sepcific fastly IPs16:28
fungiyeah, often it's a stale cached error on one cdn node out of a pool, so the behavior winds up hitting some percentage of requests for one or more providers in the same network locality (for example i've seen it happen at the same time to ovh and inap montreal-area builds)16:31
johnsomYeah, so far it looks like they are all mirror.regionone.limestone.opendev.org where ovh was successful16:32
fungilimestone is somewhere in the wilds of texas, i think16:33
fungiso might also see it crop up in rackspace-dfw... or not16:33
clarkbalso it may have corrected itself as these things magically do16:33
fungiodds are rackspace is large enough to warrant dedicated fastly cdn endpoints16:33
johnsomYeah, the recheck ovh, rax-ord, vexxhost, all passing16:34
johnsominap passing too16:35
clarkbits still showing up for me loading the index from limestone. And we've been through ~2 cache expirations for that file16:35
clarkbit meaning the package is there and the bug isn't present16:35
fungiclarkb: revisiting the _.- pep 503 situation, where besides pkg_resources.safe_name() were you seeing . treated separately from _-?16:41
clarkbfungi: pypi.org itself16:41
fungifrom what i can tell testing manually, pypi.org's warehouse webui and pip both seem to treat . as equivalent to _-16:41
fungihow was it manifesting for you?16:41
fungitrying to work out how to reproduce the problem before i go posting to distutils-sig about it16:42
clarkbfungi: https://pypi.org/project/oslo-db/ redirects to https://pypi.org/project/oslo.db/16:42
clarkbfungi: whcih means if you want to know what the canonical name is without redirects translating . to - is wrong16:42
fungioh, got it16:42
fungiso basically warehouse is converting backwards. it redirects all of ._- to whatever was used in the project name is my guess16:43
clarkbthe problem is that we are asking for the canonical name so that all the tools can use a consistent value without translating and checking for equivalence16:43
clarkbbut we're ended up in two different situations there depending on which tools you use/talk to16:43
fungiyeah, i suspect pypi's idea of a canonical name is whatever was uploaded16:44
fungiand it converts any of those characters to the uploaded dist name16:44
clarkband what was uploaded uses the preexisting pkg_resource rule16:44
clarkb(because this package name far predates packaging)16:44
clarkbI think thats my concern: these values aren't canonical because they differ16:44
fungidoes it? so dists uploaded with a _ in their names wind up with a - on pypi?16:44
clarkbfungi: https://pypi.org/project/glance-store/ I think that is an example of that16:45
clarkbhttps://opendev.org/openstack/glance_store/src/branch/master/setup.cfg#L216:45
clarkbyes https://pypi.org/project/glance_store/ redirects to https://pypi.org/project/glance-store/16:46
clarkbthe name set in setup.cfg is glance_store16:46
clarkbso pypi (or something) is applying a canonicalization16:46
fungihard to know for sure if it was manually registered as glance-store though16:46
clarkbit may be that setuptools is doing that before it talks to pypi16:46
fungiversion 0 has no files16:46
clarkbbut setuptools would use pkg_resources16:46
clarkbanyway thats my complaint. We can't have canonical name converters that disagree16:47
clarkbotherwise there is no canonical name16:47
clarkband it seems like for better or worse you have to stick with the existing rules (pkg_resources) and can't change them16:47
clarkbotherwise you're stuck in a weird place where things don't agree on what the actual name of a thing is16:47
fungiyeah, or use warehouse to remotely dereference them16:47
fungiwhich would not be a "good idea[tm]"16:48
clarkbfungi: maybe what they (and you?) are trying to express is hat ._- are fully equivalent and you have to check all variations?16:49
clarkb(eg there is no true canonical name)16:49
clarkbhttps://pypi.org/project/glance.store/ does also redirect to https://pypi.org/project/glance-store/16:49
fungiit looks like warehouse will prevent you from registering, say, a new glance.store when glance-store already exists16:50
fungiand if you pip install glance_store or glance.store you get the package pypi lists as glance-store16:50
fungialso pip install leaves you with a glance-store distribution according to pip list/freeze16:51
clarkbif you pip install oslo.db it is oslo.db16:52
clarkb(it doesn't get rewritten to oslo-db)16:52
fungiyep. it also redirects you to oslo.db on pypi16:52
funginot oslo-db16:52
clarkbright but the canonical name function will tell you the canonical name is oslo-db16:52
clarkbI think what we are finding is that python really wants to express they are fully equivalent and there is no canonical translation16:53
fungiso it seems like maybe distributions registered/uploaded with _ in them get rewritten to - but . does not get rewritten to -16:53
fungiyet if you install with any of ._- in the name you get what's there16:53
fungibut yes, this may also be setuptools itself calling pkg_resources.safe_name() on the dist name16:54
clarkbif we ignore that that will lead to unnecessary redirects then there probably isn't very much issue with that. It does mean that for any name comparison you have ot "canonicalize" both sides of the comparison with whatever canonicalizer you are using16:54
fungi site-packages/glance_store-2.1.0.dist-info/METADATA contains "Name: glance-store"16:55
fungimaybe pbr itself is doing that?16:55
funginope, only calls safe_name( in tests16:56
clarkbya so setuptools is likely using pkg_resources and that is conflicting with packaging's rules16:57
fungithe wheel metadata.json is also listing "name": "glance-store",16:58
clarkbfungi: the metadata to check would be for oslo.db16:58
clarkbsince we want to know if it is oslo-db or oslo.db there16:58
fungiwell, my current conjecture is that setuptools is converting _ to - when creating the metadata, because of internal use of safe_name()16:58
fungiwhich long predates pep 503 (from 2015) which came about while designing warehouse16:59
fungiName: oslo.db16:59
fungi"name": "oslo.db",16:59
clarkbfungi: yup but I think packaging and pkg_resources agree that _ -> - is correct16:59
fungiright17:00
clarkbthey disagree on whether or not . should be a -17:00
clarkband the . is important in python too17:00
fungiyeah, i think . was added in pep 503 and packaging was written to implement that specification17:00
fungibut pkg_resources.safe_name() is keeping backward compatibility to pre-503 behavior17:01
fungiand setuptools is relying on that rather than packaging17:01
clarkbas it arguably should17:01
fungiso my takeaway is that this would be less problematic if pypi always redirected ._ to - and pip install always rewrote ._ to - at installation (right now i don't think pip's rewriting anything, setuptools is doing it at package generation)17:02
clarkbyes because then we'd actually have a canonical form17:03
clarkbbut we can also give up on a canonical form and always translate both sides of a comparison17:03
clarkb(using whatever translator you have)17:03
fungialso we haven't even started talking about lower-casing and collapsing rules yet17:06
fungi>>> pkg_resources.safe_name('foo.--Bar_baz')17:06
fungi'foo.-Bar-baz'17:06
fungi>>> packaging.utils.canonicalize_name('foo.--Bar_baz')17:06
fungi'foo-bar-baz'17:06
clarkbwow17:07
fungiso they seem to be consistent in wanting to collapse runs of _- but packaging.utils.canonicalize_name also collapses . with them while pkg_resources.safe_name keeps runs of . untouched17:07
clarkbthe problem there becomes pkg_resources won't canonicalize a packaging value to somethign that is equivalent to a pkg_resources value17:08
clarkbthat seems much more problematic17:08
fungihowever they're not consistent about lower-casing17:08
fungibecause i guess . and lower-case were pep 503 additions17:08
fungiand yeah, i think the trick then if you want to check a package list for an entry is to canonicalize both sides of the match before comparing17:10
openstackgerritClark Boylan proposed opendev/gerritbot master: Add option to disable daemonization  https://review.opendev.org/74524017:10
clarkbspeaking of really fun python bugs ^ you can't set dest on a positional arg17:10
clarkbwhich means you can't use a - in the help output17:10
clarkbfungi: the problem is if one side of the input was already canonicalized by packaging then pkg_resources can't canonicalize that input to match its own output on arbitrary data17:11
clarkbfungi: its basically forcing you to always use packaging because its rules are more aggressive17:11
fungiright, well, it's forcing you to use the pep 503 canonicalization rules anyway17:13
clarkbthat seems wrong given that setuptools won't do that for its package outputs17:13
clarkbthe upside to using pkg_resources is that you'll get what pip says17:14
clarkband its "simpler"17:14
clarkbalso does . not being canonical imply you can't do the whole nested package paths thing that oslo tried to do?17:14
clarkbI think we discoverd it was a bad idea but I still thought that was an intentionally designed feature that should be available?17:14
clarkbas an aside it is interestnig that - is the value chosen by pep503/packaging since that makes it difficult for use as python identifiers17:16
clarkb_ and . are both valid17:16
clarkbbut not -. I wonder if that is intentional17:16
fungioh, yeah could be that . in a dist name as an implicit package namespace declaration for module imports is a feature which has to be preserved17:21
clarkbright but if we are preserving that why would pep503/packaging undo it?17:21
clarkbespecially witha value like - which itself is invalid in that context17:21
clarkbthat does make me think it may be intentional, but I don't understand the value of it if so17:22
fungii'm not sure that pep503/packaging undoes that17:22
clarkbfungi: it converts . to -17:22
fungiit really looks like what warehouse wants is to preserve the dist name you put in your metadata17:23
fungiand setuptools is what's rewriting _ to -17:23
fungii think warehouse and pip are just treating ._- as equivalent and redirecting you to the package you seem to have requested based on those canonicalization rules17:24
fungiand the inconsistency here is that setuptools is rewriting _ to - when creating the metadata17:25
fungii have a feeling that if we switched to a non-setuptools sdist/wheel generation backend we might gain the ability to have packages with _ in the name17:26
clarkbI'm not so much concerned what hte package names actually are as much as being able to take a randomly received package name (like from constraints) and determining if I already have that package somewhere else17:36
clarkband it seems the only reliable way to do that may be with packgaing because the rules it uses are most strict17:36
fungiyep17:37
fungiresolve all the packages you have and then resolve the one in question and see if it's included17:37
openstackgerritClark Boylan proposed opendev/system-config master: Add ansible role to manage gerritbot  https://review.opendev.org/74479517:44
clarkbI'm hoping ^ is the last ps before that becomes mergeable. I jsut want to double check the log output and clean up my test asserts17:44
openstackgerritClark Boylan proposed zuul/zuul-jobs master: Loop over incomplete subunit files properly  https://review.opendev.org/74538218:06
openstackgerritClark Boylan proposed opendev/system-config master: Add ansible role to manage gerritbot  https://review.opendev.org/74479518:19
clarkbassuming I got those test changes correct I think ^ is ready for review18:19
clarkband now I'm going to get a bike ride in18:20
openstackgerritClark Boylan proposed zuul/zuul-jobs master: Loop over incomplete subunit files properly  https://review.opendev.org/74538218:33
fungirackspace says likely host outage impacting ze0119:17
fungii'll check it after i'm done eating19:18
fungiit hasn't been rebooted yet19:34
mnaseri'm seeing a fair bit of RETRY_LIMIT jobs failing19:51
mnasere.g19:52
mnaserhttps://zuul.opendev.org/t/vexxhost/build/276f5667372a4e24a7cfa458c900f26b19:52
mnaserfew others that are in flight19:52
mnaseri dont see any recent zuul-jobs changes19:53
mnaserthing in project-config either19:53
mnaserpossible that a mirror is bad somewhere so failing in pre inside configure-mirrors? i havent grasped a log yet19:53
mnaseryeah, they're surfacing in the openstack tenant too19:54
mnaserhttps://zuul.opendev.org/t/openstack/builds?result=RETRY_LIMIT19:54
mnaserhttp://grafana.openstack.org/d/ykvSNcImk/nodepool-inap?orgId=1 i wonder if that has to do with it19:56
mnaserhttp://grafana.openstack.org/d/8wFIHcSiz/nodepool-rackspace?orgId=1 -- the dip seems to be everywhere though19:56
mnasercc infra-root ^19:56
fungihuh, no logs?19:57
mnasernope19:58
mnaserive been trying to catch one in console logs19:58
fungii'll run 276f5667372a4e24a7cfa458c900f26b down in executor logs19:58
mnaserim gonna see if i catch one in retry limit in zuul console19:58
fungiran on ze06, so not the one that rackspace said was impacted by a host issue19:59
mnaserit seems to be failing right away from observing20:01
*** hashar has joined #opendev20:01
mnaseri dont even have time to get a console, just goes straight into a new attempt20:01
fungiOSError: [Errno 30] Read-only file system: '/var/lib/zuul/builds/276f5667372a4e24a7cfa458c900f26b'20:01
mnaserwelp20:01
mnaserthat'll do it20:01
mnaserreboot and crossed fingers i guess20:01
fungi[Fri Aug  7 19:15:20 2020] print_req_error: I/O error, dev xvde, sector 8502477720:01
fungiyeah, rebooting it now20:02
mnaserprobably worth checking the other executors20:02
mnasergiven the high rate of retry_limit i wouldn't be surprised if it's impacted a bit more than that20:02
mnaserthere's 10 executors i think?20:02
fungi#status log rebooted ze06 after it started complaining about i/o errors for /dev/xvde and eventually set the filesystem read-only, impacting job execution resulting in retry_limit results in some cases20:03
openstackstatusfungi: finished logging20:03
fungii'll check the others now20:03
mnaserfungi: might be the only one though, http://grafana.openstack.org/d/T6vSHcSik/zuul-status?orgId=1 shows the huge dip in ram usage for ze06 -- don't see the pattern for the rest20:04
mnaserdoesn't mean it's not worth checking but yeah20:04
fungiyeah, no i'm still checking them all to be absolutely sure ;)20:05
fungino similar issues in the other 11 executors20:07
fungithough i'll likely have to server reboot --hard via the api, i don't think 06 is able to shut down cleanly. seems to probably be hung20:07
fungiyeesh20:10
fungi"This message is a follow-up to our previous message regarding your server migration. At this time we are still in the process of migrating your cloud server, ze06.openstack.org, '15f68fd9-c1e0-4346-84e9-0f3275bb0668' to a new host. We will notify you once the migration is complete and we have verified that your cloud server is online.  Please do not attempt to access or modify20:11
fungi'15f68fd9-c1e0-4346-84e9-0f3275bb0668' during this process."20:11
fungiyeah, your "previous message" said "ze01.openstack.org, '0cbe6ecb-be68-43aa-ba0d-58296a81ebcf'"20:11
fungiso, er, not the same server20:11
* fungi sighs20:11
corvuso/20:15
mnaserfungi: well that message might explain things20:17
corvusfungi: looks like we're waiting for rax to be done with ze06?20:17
mnaseror maybe their automation caught our in progress reboot and threw it all off :P20:17
clarkbwe can disable the service there if necessary20:18
mnaserit would be interesting to get the full logs for the trace on that executor20:18
clarkbwhile we wait20:18
mnaserperhaps making zuul-executor hard-exit if it hits cases where user intervention is needed20:18
fungii expect what happened is that they fat-fingered the server id in the initial message, and the rootfs was actually disconnected as part of the host issue impacting ze0620:19
fungianyway, i updated the ticket to let them know i rebooted it20:19
corvuswe could have it pause or gracefully exit if it can't perform job prep steps20:20
openstackgerritClark Boylan proposed opendev/system-config master: Add ansible role to manage gerritbot  https://review.opendev.org/74479520:20
fungii doubt we need to disengage anything, it's likely going to get a hard reboot anyway as part of the host migration, judging from their usual process20:20
clarkbthat fixes a minor test issue and nowI really do think it will pass20:20
fungiit shouldn't be running any new jobs i don't think until the reboot completes20:21
corvusfungi: agreed20:21
fungithe server seems to be hard down at the moment anyway20:21
fungiso it's not like we could disable anything on it right now if we wanted to20:21
fungiokay, maybe both ze01 and ze06 were on the same host and they neglected to give us an initial message about 06, because right now both of them are not responding20:30
fungithough 01 was responsive for a while after they opened the ticket about it (and not in any apparent distress, unlike 06)20:31
openstackgerritClark Boylan proposed zuul/zuul-jobs master: Loop over incomplete subunit files properly  https://review.opendev.org/74538221:01
clarkbhttps://review.opendev.org/744795 passes testing now so I think that whole stack is ready for review21:06
*** DSpider has quit IRC21:13
*** hashar has quit IRC21:27
*** qchris has quit IRC22:22
*** qchris has joined #opendev22:35
*** tosky has quit IRC22:58
clarkbmnaser: as a heads up I ended up deleting clarkb-test1 as it seems like the issue has resolved itself23:23
clarkbianw: small problem on https://review.opendev.org/#/c/744821/2 with filtering of dsa fingerprints. Otherwise lgtm23:31

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!