Friday, 2020-08-14

ianw repo: "deb [arch=amd64] {{ docker_mirror_base_url }} {{ ansible_lsb.codename }} {{ docker_update_channel }}"00:00
ianwthat would do it00:00
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-docker: remove amd64 architecture pin  https://review.opendev.org/74624500:09
*** hashar_ has joined #opendev00:26
*** hashar has quit IRC00:28
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-docker: remove amd64 architecture pin  https://review.opendev.org/74624500:29
*** ryohayakawa has joined #opendev00:29
openstackgerritIan Wienand proposed zuul/zuul-jobs master: ensure-docker: remove amd64 architecture pin  https://review.opendev.org/74624500:30
ianwoh boo, we don't mirror arm6400:43
openstackgerritIan Wienand proposed opendev/system-config master: Add arm64 to debian-docker mirroring  https://review.opendev.org/74625300:46
ianwfungi: ^ mind a quick poke if you have a sec?  i think it should be that easy00:48
ianwcorvus: thanks :)01:05
ianwi don't know why the zuul-jobs depends-on doesn't seem to work in pyca ... maybe it's trusted?  i don't think i intended that01:06
ianwi'm still poking at the buildx path -- we can up the builds to "-j 8" which should at least be twice as fast as it was hardcoded at -j401:08
openstackgerritMerged opendev/system-config master: Add arm64 to debian-docker mirroring  https://review.opendev.org/74625301:16
ianwok, so it takes ~40 minutes to build the manylinux2014_aarch64 container with buildx, a bit annoying but workable01:24
ianwmnaser: just fyi all the vexxhost jobs in this zuul-jobs change failed with POST_FAILURE so no logs ... https://review.opendev.org/#/c/746245/02:20
mnaserianw: rechecking02:27
mnaserand ill watch it02:28
ianwfungi: http://paste.openstack.org/show/796833/02:36
ianwmnaser: thanks02:36
ianwfungi: ^ this is the error mirroring deb-docker ... i won't say how long it's been going on :/02:36
ianwERROR: Same file is requested with conflicting checksums:02:37
ianwroot@mirror-update01:/afs/.openstack.org/mirror/deb-docker/lists# cat *_Packages | grep Filename | sort | uniq -c02:44
ianwseems to show every package using a unique filename (no double counts)02:45
ianwhttps://download.docker.com/linux/ubuntu/dists/bionic/pool/stable/amd64/02:51
ianwhttps://download.docker.com/linux/ubuntu/dists/xenial/pool/stable/amd64/02:51
ianwdo have files with the same name :/02:51
*** guillaumec has quit IRC02:53
*** frickler has quit IRC02:53
*** lourot has quit IRC02:53
*** spotz has quit IRC02:53
*** lourot has joined #opendev02:54
*** spotz has joined #opendev02:54
*** guillaumec has joined #opendev02:59
*** frickler has joined #opendev02:59
fungiare you sure it's complaining about local files and not files on the remote mirror?03:02
ianwfungi: i think it's the the remote mirror?03:03
ianwwith the two links above, bionic & xenial seem to have some overlap of the same files03:04
fungiyeah, i wonder if it's saying an index on the mirror lists a different checksum than a file actually has03:04
ianwthe containerd.io files in particular03:05
mnaserianw: the issue is resolved now03:05
mnaserthank you for reporting03:05
ianwmnaser: cool :)03:05
mnasertons of change/clean up having here and all sorts of small fall out ;[03:05
ianwfungi:03:06
ianwcontainerd.io_1.2.13-2_amd64.deb                                                      2020-05-15 00:17:59 19.9 MiB03:06
ianwcontainerd.io_1.2.13-2_amd64.deb                                                      2020-05-15 00:17:39 20.4 MiB03:06
fungioh, i see, we're telling reprepro to combine deb-docker's xenial and bionic repositories into one common pool, when they don't share a common pool on the source side03:06
ianwone's in xenial, one's in bionic03:06
ianwupstream03:06
fungiwe should be mirroring those separately, not combining them03:07
ianwi think we basically can not combine these repos (anymore?)  ... this has been happening for a while03:07
ianwlike we'll have to have /afs/.openstack.org/mirror/deb-docker/<xenial|bionic|focal>/... right?03:07
fungier, i mean, we need to mirror those separately (obviously we're not yet)03:07
fungiyeah03:08
fungibasically docker doesn't understand how to maintain a deb package repository03:08
fungiwhy use the pool model when you don't have a single pool?03:08
ianwmuch like mnaser it never surprises how much a "simple" change manages to turn into yak shaving :)03:09
mnaserIt’s been two days of this :)03:09
ianwfungi: i guess it must have worked, at some point.  anyway, i'll pull them out separately ...03:10
fungiyeah, i expect it worked briefly until they decided to rebuild the same docker version for more than one dist03:10
funginormally you build against one distro release and then use it for more than one, or you vary the package version to include a unique identifier for the distro version so they don't wind up with the exact same filename. then they can share a package pool03:12
ianwyeah, except the same filename has different sizes ... so i guess it's not as same as it suggests :)03:14
fungiwell, the name is the same, but yes the actual package content differs03:16
fungiprobably built with different compilers, linked different libs, et cetera03:17
openstackgerritIan Wienand proposed opendev/system-config master: Fix debian-docker mirroring  https://review.opendev.org/74626203:21
ianwfungi: ^ something like ... maybe03:22
fungiianw: yup, exactly. single-core approved so you can take it for a spin03:48
ianwunfortunately airship references it a bit -- http://codesearch.openstack.org/?q=deb-docker&i=nope&files=&repos=03:55
ianwi guess we can leave the old mirror as is, notify/fix and then remove the non-updating directories03:55
fungiyeah, we ought to be able to deprecate that one without blowing it away immediately. it's not been updating for a while anyway04:02
openstackgerritMerged opendev/system-config master: Fix debian-docker mirroring  https://review.opendev.org/74626204:04
ianwfungi: thanks ... well that went easier than i though it would -> https://static.opendev.org/mirror/deb-docker/04:49
openstackgerritIan Wienand proposed opendev/base-jobs master: Fix docker base location  https://review.opendev.org/74627105:00
openstackgerritIan Wienand proposed opendev/system-config master: Update deb-docker path  https://review.opendev.org/74627205:01
*** zbr1 has joined #opendev05:05
*** zbr has quit IRC05:06
*** zbr1 is now known as zbr05:06
*** ianw has quit IRC05:48
*** ianw has joined #opendev05:49
*** logan- has quit IRC05:49
*** hashar_ has quit IRC05:50
*** logan- has joined #opendev05:51
*** logan- has quit IRC05:59
*** logan- has joined #opendev05:59
*** DSpider has joined #opendev06:01
ianwcorvus/clarkb/fungi:  i've made some decent progress on getting manylinux2014_aarch64 wheels and put notes in https://github.com/pyca/cryptography/issues/5292#issuecomment-67175930606:59
ianwif you could have a look at https://review.opendev.org/746245 that will help installing docker on arm64 nodes07:00
ianwa recheck on https://github.com/pyca/cryptography/pull/5386 should then attempt to build the wheels07:00
*** hashar_ has joined #opendev07:33
*** hashar_ is now known as hashar07:35
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Disable E208 for now  https://review.opendev.org/74631007:51
openstackgerritAndreas Jaeger proposed zuul/zuul-jobs master: Disable E208 for now  https://review.opendev.org/74631007:57
*** moppy has quit IRC08:01
*** moppy has joined #opendev08:03
*** hashar has quit IRC08:34
*** andrewbonney has joined #opendev08:45
*** tkajinam has quit IRC08:47
openstackgerritMerged zuul/zuul-jobs master: Disable E208 for now  https://review.opendev.org/74631009:02
openstackgerritMerged zuul/zuul-jobs master: ensure-pip: add instructions for RedHat system  https://review.opendev.org/74375009:02
*** tosky has joined #opendev10:22
*** ryo_hayakawa has joined #opendev10:53
*** ryohayakawa has quit IRC10:55
*** hipr_c has joined #opendev11:54
*** hashar has joined #opendev12:18
*** ryo_hayakawa has quit IRC12:26
*** priteau has joined #opendev14:02
*** priteau has quit IRC14:23
*** priteau has joined #opendev14:32
*** hashar has quit IRC14:46
*** qchris has quit IRC14:57
*** mlavalle has joined #opendev15:08
*** auristor has quit IRC15:08
*** qchris has joined #opendev15:09
*** auristor has joined #opendev15:31
*** tosky has quit IRC15:43
openstackgerritClark Boylan proposed opendev/system-config master: Add gerrit static files that were lost in ansiblification  https://review.opendev.org/74633516:34
clarkbupgrading from 2.13 -> 2.14 with the images from https://review.opendev.org/#/c/745595/ has a working javamelody now. Going to work through the 2.15 and 2.16 upgrades today and assumign java melody and codemirror-editor are all happy I think we should try to land that change16:53
clarkbtrying to take my time and poke around and ensure things are working. Our css and all that seems to work with 2.14 too16:54
clarkbone thing that isn't clear to me is the function of the accountPatchReviewDb database. It has an empty table in my test setup, but that is one thing I/we want to test as it has changed and we modified it on 2.13 to make it work with mysql16:54
*** andrewbonney has quit IRC16:56
fungithat was in h2 previously, right?16:58
fungiand then moved to sql temporarily?16:58
clarkbI think we never put it in h2 due to performance issues we were warned about16:58
clarkbthe default was to h2 it though16:58
fungiright, but that's why it's a separate sql db and not in the regular reviewdb16:59
clarkbbut it didn't work with mysql due to index width issues so we changed the priamry index on the table16:59
clarkbthen upstream themselves has chagned the index for other reasons16:59
clarkb(so we may end up wanting to chagne it again?)16:59
clarkbthats probably something we can sort out on its own as an open question while I work though the general do things work testing16:59
clarkboh you know, it may be for inlien comments17:00
clarkband I haven't made any of those yet17:00
* clarkb makes a note to retest when done with inline comments17:01
donnydjust an FYI it would appear something is busted in the latest bionic image for ipv617:48
clarkbdonnyd: our nodepool image?17:49
donnydI am testing again, could have been a fluke17:49
donnydyes, it does appear as though it is an issue.. testing some of the other images now17:51
fungihave a link/log/something?17:51
donnydyes17:52
donnydhttps://usercontent.irccloud-cdn.com/file/VWEQSyMR/image.png17:52
donnydthat would be an issue17:52
donnydLOL17:52
donnydmay have been a partial upload or something17:52
donnydhttps://usercontent.irccloud-cdn.com/file/lb3qCX5z/image.png17:53
donnydthat appears to be a bit small17:54
fungiindeed, that doesn't look ipv6-specific17:54
donnydno - not it does not17:54
fungiwe can delete the most recent image and nodepool will reupload17:55
donnydinitally I was looking at why I couldn't ping the instances17:55
fungidoes it seem to just be ubuntu-bionic affected?17:55
donnydso the tinkering I was doing with ipv6 lead me down that road17:55
donnydI am going to check them all17:56
openstackgerritMerged zuul/zuul-jobs master: ensure-docker: remove amd64 architecture pin  https://review.opendev.org/74624517:57
donnydthe rest of them look ok so far17:59
fungiin that case i'll just delete ubuntu-bionic-1597381079 from openedge-us-east18:02
fungilooks like it was uploaded just shy of 12 hours ago18:03
donnydoh I already deleted it from OE.. should I not do that?18:03
fungii may still be able to delete it from nodepool, not sure if i'll have to manually remove the znodes from zk18:03
fungiwell, it took my nodepool image-delete command at least. checking nodepool image-list now to see if it's gone or at least marked for deletion18:07
fungiit's marked as "deleting" so that's a good start18:07
clarkbI think we'll clear our db if we don't see it in the cloud listing18:08
fungi| 0000114341 | 0000000001 | openedge-us-east    | ubuntu-bionic        | ubuntu-bionic-1597381079        | 1fec9ab7-d54c-4167-9d37-ed8226c82098 | deleting  | 00:00:01:41  |18:09
fungi| 0000114341 | 0000000002 | openedge-us-east    | ubuntu-bionic        | None                            | None                                 | uploading | 00:00:01:38  |18:09
clarkbinfra-root https://review.opendev.org/#/c/745595/ seems to make things better for javamelody and codemirror-editor upgrading from 2.13 -> 2.14, 2.14 -> 2.15, and 2.15 -> 2.1618:09
fungithat looks like the behavior we want18:09
clarkbI'll leave a note with that info on the change but I think its ready to land and address some of these issues. Then we can keep pushing on this18:10
clarkbfungi: donnyd: I'm at a good spot to context switch out of gerrit things (in fact I think I'll do that regardless), let me know if I can help at all18:12
fungilooks like this is probably fine, unless we see the same truncated image appear when the upload completes18:13
donnydI will check the new one when its finished18:13
clarkbwe've seen it a few times. Maybe more frequently lately (or we've just noticed more recently?)18:14
clarkbI wonder if it could be related to us updating the docker image for the builder in a forceful way18:14
clarkband it just stops the upload and upstream thinks we are done18:14
clarkb(really really really wish the hashes were checked)18:14
fungiyeah, i was about to say the same18:14
fungiinterestingly, the short image has been sitting in a deleting state for almost 10 minutes now18:15
clarkbinfra-root I started to compose https://etherpad.opendev.org/p/gerrit-upgrade-luca-questions18:34
fungiclarkb: lgtm18:38
clarkbthanks. I wont send it today as it will get lost in end of week email but early monday is my goal18:39
*** fressi has joined #opendev18:58
fungiclarkb: i don't think the truncated images from 12 hours ago was related to container restarts. start time for dumb-init on both amd64 builders is 10 days ago18:58
fungier, truncated image (singular)18:58
clarkbthe mystery deepens18:59
clarkbany network errors in dmesg?18:59
fungihard to say, since dib's chroot activity floods the ring buffer19:00
fungiit's all full of dracut file listings and the like19:00
clarkbalso I think we've only seen this happen with vexxhoet and openedge, both run newer clouds. Maybe a glance sidebug19:01
clarkbdonnyd: ^ anything logged around that upload that is suspicious from your side?19:01
donnydthat is directly possible19:01
donnydI am on ussuri19:01
fungithe vexxhost occurrence sounded like it was possibly related to storage changes which were happening, and the image was only a few bytes not several gigabytes19:01
donnydI have no central logging setup yet19:02
donnydbut its on my list of things to dop19:02
donnydchecking glance logs one by one now19:03
donnydNo errors in glance logs19:05
*** fressi has quit IRC19:13
fungii'm struggling to find any record of us uploading that image. it's probably just friday fumbles on my end19:17
clarkbfungi: we do multiple log rotations a day, may be in older log gile19:20
clarkb*file19:20
fungiyeah, i was looking through those too, just they're huge so nailing down a good pattern to match on19:20
fungii can find the upload for 0000114340 just fine, but not 000011434119:22
clarkbmaybe grep by the uuid?19:25
fungitried that too19:25
*** hashar has joined #opendev19:26
fungidonnyd: btw, the reupload completed a little over an hour ago, does it look larger to you now?19:27
fungiclarkb: what's strange is i can't even find the reupload logged either. it's like there are other builders than just the ones i'm checking, or something19:27
fungioh! yep, that's it :/19:28
fungii forgot we added an nb0419:28
fungiand there it is19:28
* fungi sighs19:28
clarkbfriday :)19:29
fungi2020-08-14 04:57:59,624 INFO nodepool.builder.UploadWorker.2: Uploading DIB image build 0000114341 from /opt/nodepool_dib/ubuntu-bionic-0000114341.qcow2 to openedge-us-east19:29
fungi2020-08-14 06:25:03,747 INFO nodepool.builder.UploadWorker.2: Image build ubuntu-bionic-0000114341 (external_id 1fec9ab7-d54c-4167-9d37-ed8226c82098) in openedge-us-east is ready19:30
fungiso that's the one which was short, and which i deleted a couple hours ago19:30
clarkbthe time delta there seems like it actually uploaded the data19:31
fungiand i can't find any related errors/tracebacks19:33
fungithough it's a little hard to say for sure because there's a flood of centos-8 upload errors which look like a corrupt znode19:33
fungijson.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)19:33
fungione of those19:33
fungiwell, more like thousands of those19:34
fungiskimming the `tree /nodepool/images` output from zkshell to see what's there for it19:35
clarkbfungi: thats the thing pabelanger asked about and I pointedout a change for. I think we can rm the znode as it is empty19:37
clarkbI should check if I need to rereview that change19:37
fungiyeah, except the log doesn't say which znode it is19:37
fungior is that the proposed change, to actually report the znode?19:37
clarkbyes that19:38
clarkbit also skips over the node andignores it19:38
fungii don't actually see any empty build nodes for centos-8 though19:43
fungitons for ubuntu-bionic19:43
fungier, sorry, no, these are ubuntu-xenial-arm6419:43
fungialso centos-8-arm6419:44
fungiubuntu-bionic-arm6419:44
fungibeginning to see a pattern here19:45
fungianyway, this tree is 13k lines long, so i'm not quite sure what needs deleting from it19:47
clarkbwe have to find the empty znode19:48
clarkbthe cleanup processbails on that which is why we leak19:48
clarkbI can take a look in a few19:48
fungii guess there's a zkshell to dump all the data from every znode so we can mechanically search for the empty one?19:48
fungier, a zkshell command19:49
clarkbI'm guessing its in an earlier numbered one19:49
clarkband everything behind ithasleaked19:49
fungialso is centos-8 upload failure an indication that it's somewhere under the centos-8 portion of the tree, or is that just happenstance?19:49
clarkbnot sureif we have sucha command19:49
clarkbthat I dont know19:49
fungiit's at leas strange that it only seems to be complaining about centos-8 uploads in the logs19:50
donnydhad to go do some friday tasks19:52
donnydchecking image now19:52
donnydmuch mo betta19:53
donnydhttps://usercontent.irccloud-cdn.com/file/ajhBvffw/image.png19:53
fungigood, good19:53
donnydgoing to fire one up to make sure it works right19:54
fungitrying to see if maybe there are build ids for centos-8 in the zk tree which haven't been uploaded. the ones in a ready state for any providers are 0000078539 and 000007854019:56
fungithe only one zk has besides those is 0000078538 in vexxhost-ca-ymq-1 and that has no upload id nor lock19:57
fungicould that be the problem entry?19:57
fungimaybe the delete for that failed in a weird way and leaked an incomplete entry ni zk?19:58
clarkbfungi: ya that one is weird, but I think it may be not cleaning up because there is an empty upload record somewhere19:59
clarkbthe cleanup thread basically gets hit by this issue and it noops19:59
clarkband ya this is like a needle in a haystack19:59
clarkbwe might need to write a zk script to iterate though all of them and serach?19:59
fungiit does seem to be the only anomalous subset of the centos-8 portion of the tree though20:00
donnydok, it's good to hook20:00
clarkbfungi: ya but its failing a json decode not a node doesn't exist20:01
fungithanks donnyd!!!20:01
clarkbI'm going to hack the old install of nodepool on nl01 root (not container) to log the location when it fails to dib-image-list20:02
*** hipr_c has quit IRC20:04
clarkbfungi: that seems to show it is the nodes like the one you found (there are more fo them though)20:07
fungii'm getting roped into dinner prep now, but can try cleaning those up after unless you're already on it20:09
clarkbI'm still trying to make sense of the problem20:09
clarkbbecause those nodes have data if you get them but if you ls in them they don't have subtrees20:10
clarkbah ok the centos-8 one is empty20:11
clarkbbut the others aren't so maybe its just that one causing problems20:11
clarkbfungi: you think delete /nodepool/images/centos-8/builds/0000078538  and its sub entries?20:12
fungithat's where i'd start at least20:13
fungiit's the only build which nodepool image-list doesn't show20:13
fungifor centos-8 anyway20:13
clarkbok I'll do that now20:13
fungiand that's the only label currently getting errors in the log20:13
fungiso it's my somewhat unfounded guess20:14
clarkbI can dib image list now20:15
clarkbI'm undoing my hack to the script on nl0120:15
clarkbalso we seem to have a sad arm64 builder20:15
fungithat znode deletion caused nb02 to report: 2020-08-14 20:14:22,615 INFO nodepool.builder.BuildWorker.0: Deleting old build log /var/log/nodepool/builds/centos-8-0000078538.log20:16
fungior maybe it was coincidental20:16
clarkbwe probably have something that cleans those up too when the images go away20:16
clarkbI'm guessing not coincidental20:17
fungii'm guessing not either20:17
*** priteau has quit IRC20:19
fungithe "ERROR nodepool.builder.UploadWorker.6: Error uploading image centos-8 ..." messages seem to have not reappeared since 20:14:1920:22
clarkbour dib image list count is falling too20:22
clarkb(its cleaning up all the failed arm64 builds that leaked20:22
clarkbfungi: go eat dinner, I think we are good here20:22
fungiawesome, thanks for the help!20:23
clarkbI've rereviewed swest's change that should help and left a note in #zuul that we hit this in the getBuilds not getUploads path20:24
clarkb(I think we may need similar guards there?)20:24
*** hashar has quit IRC20:48
*** DSpider has quit IRC23:10
*** DSpider has joined #opendev23:11
*** mlavalle has quit IRC23:13
ianw$ ls ./wheelhouse.final/cryptography-3.1.dev1-cp323:19
ianwcryptography-3.1.dev1-cp36-cp36m-manylinux2014_aarch64.whl  cryptography-3.1.dev1-cp37-cp37m-manylinux2014_aarch64.whl  cryptography-3.1.dev1-cp38-cp38-manylinux2014_aarch64.whl   cryptography-3.1.dev1-cp39-cp39-manylinux2014_aarch64.whl23:19
openstackgerritMatthew Thode proposed openstack/diskimage-builder master: update gentoo to allow building arm64 images  https://review.opendev.org/74600023:21
*** DSpider has quit IRC23:34
fungiw00t!23:40
fungiprometheanfire wants a piece too, looks like23:41
prometheanfirewat?23:42
fungithe aarch64/arm64 goodness23:42
prometheanfire:D23:43
prometheanfireindeed23:43

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!