Sunday, 2022-04-10

*** dviroel_ is now known as dviroel|out00:16
opendevreviewJeremy Stanley proposed opendev/git-review master: Force use of scp rather than sftp when possible  https://review.opendev.org/c/opendev/git-review/+/82341312:39
opendevreviewJeremy Stanley proposed opendev/git-review master: Fix submitting signed patches  https://review.opendev.org/c/opendev/git-review/+/82331813:35
opendevreviewJeremy Stanley proposed opendev/git-review master: Drop support for Python 3.5  https://review.opendev.org/c/opendev/git-review/+/83722213:35
opendevreviewJeremy Stanley proposed opendev/git-review master: Clean up package metadata  https://review.opendev.org/c/opendev/git-review/+/83722815:57
mnaseris there some sort of 'afs cache' ?16:25
mnaserok, literally right as i ask that, the file i needed appeared into afs :)16:26
fungiif you're asking about delays, it depends on what file/url you're looking at as to what the update process is16:26
fungidocumentation site? package mirrors? release artifacts? something else?16:26
mnaserhttps://tarballs.opendev.org/vexxhost/ansible-collection-atmosphere/16:27
mnaserthe generated wheels there took around 4 minutes to show up after the promote job was done16:27
fungimnaser: yeah, so what happens is the publish job records the artifacts into the read-write afs volume for that site, and then a cronjob periodically (every ~5 minutes) runs through all the afs volumes for the static.o.o sites (including tarballs) and performs a vos release to sync them to the read-only replica which backs those sites16:34
fungiusually you should see it appear within 5 minutes, but if there are particularly large content updates for any one of those volumes it can delay things since they're updated serially in order to avoid saturating the connection16:35
mnaseraaah, got it, that makes sense now, thanks fungi !16:39
fungimy pleasure16:41
opendevreviewMohammed Naser proposed opendev/system-config master: docker: add arm64 mirroring  https://review.opendev.org/c/opendev/system-config/+/83723218:10
mnasersupporting arm64: https://www.youtube.com/watch?v=AbSehcT19u018:10
hrwmnaser: commented.18:52
mnaserI’ve been seeing this weird experience where jobs with an explicit nodeset of “Ubuntu-focal” take longer to start than one that doesn’t have a node set at all (which defaults to focal..)20:47
fungimnaser: that does indeed seem weird. both should be served the same under the hood since the jobs without an explicit nodeset actually have an explicit nodeset through inheritance: https://opendev.org/opendev/base-jobs/src/branch/master/zuul.d/jobs.yaml#L6521:08
fungiit's not any less explicit, just inherited from a parent job. the result would be identical in either case though... zuul putting out a node request for that same nodeset definition21:09
mnaserfungi: yeah.. maybe it’s something else that is causing the delay…21:12
mnaserIn a specific case, it took 8 minutes for a job to start21:13
mnaserhttps://zuul.opendev.org/t/vexxhost/builds .. ansible-collection-atmosphere-build-images-wallaby-amd64 finished at 20:44 and ansible-collection-atmosphere-build-images-manifest-wallaby started at 20:5021:14
mnaserSo almost 6 minutes waiting and things are pretty idle right now21:14
funginode launch failures can cause significant delays since the launcher will lock the request while it waits for a node to boot (if none of them have one waiting via min-ready)21:14
fungithough i think the launcher waits for up to 10 minutes for the nodes to become reachable, so if the delay was less than that it could just be some providers taking longer than usual to boot21:15
fungiany correlation between start delays and the providers mentioned in the zuul inventory?21:16
fungiwe also have some graphs of boot times, i think. i'll look21:16
mnaserfungi: there might have been some failures in providers since I saw some jobs retrying too but didn’t dig too deep as to why they did21:20
mnaserBut yeah in general focal nodes in the VEXXHOST tenant seem to take a little bit longer to come by. Actually, I find that we get an arm64 node WAY faster, even if the other tenants are relatively idle21:20
mnaserIn this case the amd64 job started a whole 4 minutes after21:21
fungihttps://grafana.opendev.org/d/6c807ed8fd/nodepool?orgId=1&viewPanel=1821:22
mnaserI wonder if there just isn’t enough min-ready and my wait time is say… waiting for rax21:23
fungilooks like ovh nodes were taking a while at times21:23
fungiyeah, i mean we don't run many arm64 jobs so the min-ready there might be covering you and explain the faster starts on a fairly quiet sunday21:23
mnaseryeah I think that might add up to the reasoning why21:24
fungigiven the volume of jobs we run most of the time, we optimize for throughput and resource conservation over immediacy of results21:27
fungito make any impact on responsiveness at higher-volume times, we'd need to carry a very large min-ready for some of our labels21:28
fungiwhich would then result in a lot of nodes sitting booted but idle at times like now21:28
fungialso no amount of min-ready would make any difference in response times when we're running at full capacity, of course21:30
fungiand could even result in a slight reduction in effective capacity if we end up aggressively booting labels which aren't in as high demand at those times21:31
mnaseryeah, scheduling is a tricky thing since there is so many varying types of systems21:38
*** rlandy is now known as rlandy|out21:39
opendevreviewMerged openstack/project-config master: opendev/gerrit : retire project  https://review.opendev.org/c/openstack/project-config/+/83393923:49

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!