Wednesday, 2021-02-03

*** jistr has quit IRC00:00
*** jistr has joined #openstack-infra00:00
fungidansmith: one shortcut might be to patch the job to fetch the webserver access logs (if it doesn't already?) and see when it was getting requests00:06
clarkbfungi: I was looking for those and couldn't find them00:07
clarkbits possible they are there and I was just not seeing them though00:07
dansmithfungi: yeah00:08
dansmithbut00:08
dansmithmy butt has been in this chair since I got up at 0430 for a meeting so I better go see what things further than six feet look like00:09
fungiyeah, i get that ;)00:10
*** ianychoi__ has joined #openstack-infra00:14
clarkbdansmith: they are very wet and rainy right now00:15
*** tosky has quit IRC00:15
*** ianychoi_ has quit IRC00:17
*** piotrowskim has quit IRC00:17
dansmithclarkb: not in my kitchen :)00:21
*** jamesmcarthur has joined #openstack-infra01:42
*** __ministry1 has joined #openstack-infra01:44
*** sreejithp has quit IRC01:54
*** dviroel has quit IRC01:54
*** jamesmcarthur has quit IRC02:07
*** zzzeek has quit IRC02:12
*** zzzeek has joined #openstack-infra02:12
*** jamesmcarthur has joined #openstack-infra02:14
*** lbragstad_ has joined #openstack-infra02:14
*** ysandeep|away is now known as ysandeep02:15
*** lbragstad has quit IRC02:17
*** jamesmcarthur has quit IRC02:19
*** jamesmcarthur has joined #openstack-infra02:29
*** rcernin has quit IRC02:37
*** rcernin has joined #openstack-infra02:51
*** tbachman has quit IRC02:51
*** tbachman has joined #openstack-infra02:54
*** tbachman_ has joined #openstack-infra02:56
*** tbachman has quit IRC02:59
*** tbachman_ is now known as tbachman02:59
*** jamesmcarthur has quit IRC03:05
*** jamesmcarthur has joined #openstack-infra03:06
*** jamesmcarthur has quit IRC03:11
*** jamesmcarthur has joined #openstack-infra03:13
*** jamesmcarthur has quit IRC03:14
*** jamesmcarthur has joined #openstack-infra03:16
*** ykarel has joined #openstack-infra03:18
*** jamesmcarthur has quit IRC03:20
*** jamesmcarthur has joined #openstack-infra03:20
*** jamesmcarthur has quit IRC03:20
*** jamesmcarthur has joined #openstack-infra03:22
*** jamesmcarthur has quit IRC03:26
*** ykarel has quit IRC03:27
*** dustinc has joined #openstack-infra03:29
*** jamesmcarthur has joined #openstack-infra03:32
*** jamesmcarthur has quit IRC03:37
*** d34dh0r53 has quit IRC03:47
*** d34dh0r53 has joined #openstack-infra03:48
*** d34dh0r53 has quit IRC03:48
*** d34dh0r53 has joined #openstack-infra03:49
*** d34dh0r53 has quit IRC03:49
*** jamesmcarthur has joined #openstack-infra03:51
*** lbragstad_ is now known as lbragstad03:51
*** jamesmcarthur has quit IRC03:52
*** jamesmcarthur has joined #openstack-infra03:53
*** d34dh0r53 has joined #openstack-infra03:53
*** d34dh0r53 has quit IRC03:55
*** d34dh0r53 has joined #openstack-infra03:56
*** d34dh0r53 has quit IRC03:56
*** d34dh0r53 has joined #openstack-infra03:57
*** d34dh0r53 has quit IRC03:57
*** d34dh0r53 has joined #openstack-infra03:58
*** d34dh0r53 has quit IRC03:58
*** d34dh0r53 has joined #openstack-infra03:59
*** zul has quit IRC04:16
*** __ministry1 has quit IRC04:53
*** ykarel has joined #openstack-infra05:01
*** ramishra has quit IRC05:29
*** ramishra has joined #openstack-infra05:29
*** ramishra has quit IRC05:30
*** ramishra has joined #openstack-infra05:30
*** ramishra has quit IRC05:32
*** ramishra has joined #openstack-infra05:32
*** dchen has quit IRC05:36
*** dchen has joined #openstack-infra05:36
*** dustinc has quit IRC05:39
*** dchen has quit IRC05:47
*** dchen has joined #openstack-infra05:48
*** dchen has quit IRC05:49
*** ykarel_ has joined #openstack-infra05:50
*** dchen has joined #openstack-infra05:53
*** ykarel has quit IRC05:53
*** ykarel_ is now known as ykarel05:53
*** ramishra has quit IRC05:55
*** ramishra has joined #openstack-infra05:55
*** ykarel_ has joined #openstack-infra05:58
*** ykarel has quit IRC06:00
*** rcernin has quit IRC06:08
*** rcernin has joined #openstack-infra06:08
*** ykarel_ is now known as ykarel06:15
*** rcernin has quit IRC06:17
*** rcernin has joined #openstack-infra06:17
*** rcernin has quit IRC06:17
*** rcernin has joined #openstack-infra06:19
*** jamesmcarthur has quit IRC06:34
*** dchen has quit IRC06:52
*** dchen has joined #openstack-infra06:53
*** jamesmcarthur has joined #openstack-infra07:04
*** jcapitao has joined #openstack-infra07:08
*** jamesmcarthur has quit IRC07:10
*** jamesmcarthur has joined #openstack-infra07:22
*** eolivare has joined #openstack-infra07:28
*** slaweq has joined #openstack-infra07:28
*** ralonsoh has joined #openstack-infra07:28
*** vishalmanchanda has joined #openstack-infra07:33
*** hashar has joined #openstack-infra07:58
*** dklyle has quit IRC07:58
*** hashar has quit IRC08:01
*** hashar has joined #openstack-infra08:01
*** sboyron_ has joined #openstack-infra08:04
*** psachin has joined #openstack-infra08:11
*** amoralej|off is now known as amoralej08:15
*** ysandeep is now known as ysandeep|lunch08:18
*** gfidente has joined #openstack-infra08:18
*** andrewbonney has joined #openstack-infra08:19
*** ykarel is now known as ykarel|lunch08:21
*** jamesmcarthur has quit IRC08:40
*** tosky has joined #openstack-infra08:40
*** rpittau|afk is now known as rpittau08:41
*** jamesmcarthur has joined #openstack-infra08:56
*** jpena|off is now known as jpena08:57
*** jamesmcarthur has quit IRC09:04
*** ociuhandu has joined #openstack-infra09:06
*** gfidente has quit IRC09:06
*** gfidente has joined #openstack-infra09:09
*** lucasagomes has joined #openstack-infra09:11
jcapitaohello folks, we are facing timed out connection when accessing tarballs.opendev.org09:12
*** jamesmcarthur has joined #openstack-infra09:15
*** dulek has joined #openstack-infra09:22
*** jamesmcarthur has quit IRC09:22
*** xek has joined #openstack-infra09:28
*** ociuhandu has quit IRC09:28
*** nightmare_unreal has joined #openstack-infra09:30
*** ociuhandu has joined #openstack-infra09:33
*** jamesmcarthur has joined #openstack-infra09:34
*** ysandeep|lunch is now known as ysandeep09:38
*** derekh has joined #openstack-infra09:39
*** psachin has quit IRC09:40
*** jamesmcarthur has quit IRC09:40
*** psachin has joined #openstack-infra09:43
*** ociuhandu has quit IRC09:45
*** wanzenbug has joined #openstack-infra09:51
*** jamesmcarthur has joined #openstack-infra09:51
*** jamesmcarthur has quit IRC09:58
*** wanzenbug has quit IRC10:04
*** yamamoto_ has quit IRC10:04
*** psachin has quit IRC10:04
*** jamesmcarthur has joined #openstack-infra10:09
Tenguseems to be answering now, but reeeeeall slow10:15
*** jamesmcarthur has quit IRC10:17
*** ociuhandu has joined #openstack-infra10:19
*** ociuhandu has quit IRC10:24
*** rcernin has quit IRC10:24
*** ykarel|lunch is now known as ykarel10:29
*** jamesmcarthur has joined #openstack-infra10:29
*** sshnaidm|afk is now known as sshnaidm|ruck10:35
*** jamesmcarthur has quit IRC10:36
*** ociuhandu has joined #openstack-infra10:36
*** ociuhandu has quit IRC10:41
*** ociuhandu has joined #openstack-infra10:42
*** hashar has quit IRC10:45
*** jamesmcarthur has joined #openstack-infra10:47
*** dtantsur|afk is now known as dtantsur10:49
*** rcernin has joined #openstack-infra10:55
*** gfidente has quit IRC10:55
*** gfidente has joined #openstack-infra10:58
*** jamesmcarthur has quit IRC10:59
*** jcapitao is now known as jcapitao_lunch11:11
*** jamesmcarthur has joined #openstack-infra11:12
*** dviroel has joined #openstack-infra11:14
*** rcernin has quit IRC11:15
*** jamesmcarthur has quit IRC11:19
*** yamamoto has joined #openstack-infra11:23
*** ociuhandu has quit IRC11:27
*** jamesmcarthur has joined #openstack-infra11:31
*** jamesmcarthur has quit IRC11:37
*** yamamoto has quit IRC11:46
*** jamesmcarthur has joined #openstack-infra11:49
*** tbachman has quit IRC11:52
*** tbachman has joined #openstack-infra11:53
*** jamesmcarthur has quit IRC11:56
*** ociuhandu has joined #openstack-infra11:59
*** yamamoto has joined #openstack-infra12:00
*** rcernin has joined #openstack-infra12:01
*** jamesmcarthur has joined #openstack-infra12:09
*** ociuhandu_ has joined #openstack-infra12:12
*** ociuhandu has quit IRC12:15
*** rlandy has joined #openstack-infra12:16
*** jamesmcarthur has quit IRC12:16
*** przemeklal has joined #openstack-infra12:19
*** jamesmcarthur has joined #openstack-infra12:30
*** jamesmcarthur has quit IRC12:37
*** dchen has quit IRC12:40
*** jpena is now known as jpena|lunch12:41
openstackgerritPedro Luis Marques Sliuzas proposed openstack/project-config master: Add Metrics Server App to StarlingX  https://review.opendev.org/c/openstack/project-config/+/77388312:42
*** rcernin has quit IRC12:43
*** ociuhandu_ has quit IRC12:46
*** ociuhandu has joined #openstack-infra12:47
*** jamesmcarthur has joined #openstack-infra12:50
*** jamesmcarthur has quit IRC12:57
*** amoralej is now known as amoralej|lunch13:02
*** jcapitao_lunch is now known as jcapitao13:05
*** jamesmcarthur has joined #openstack-infra13:09
*** ociuhandu has quit IRC13:10
*** ociuhandu has joined #openstack-infra13:11
fungijcapitao: Tengu: apache got itself tied in knots on that server, not entirely sure how/why, but it was restarted around 10:20 utc and that seems to have cleared it up13:12
*** jamesmcarthur has quit IRC13:14
Tengufungi: ok. maybe some tweaked queries making it crash? though it doesn't seem to have any dynamic content - just plain "auto index", isn't it?13:16
fungiTengu: yes, it's serving static files from global afs, and doing mod_autoindex in cases where there's no index file13:17
Tengufungi: maybe afs crashed for #reason? it's a network filesystem/share iirc?13:18
fungiwell, apache itself was still serving content, just slowly, and restarting apache caused it to clear up13:19
Tenguor... maybe I'm mixing things. iirc there's a special afs thing for openstack infra13:19
fungiafs is a global filesystem (like the way dns is a global database)13:19
Tengufungi: might be due to locked inodes locking httpd process at some point, making it slow as hell due to lack of resources...13:19
Tenguor something like that - not sure if that afs has "inode" concept...13:20
fungiand yes, looking in dmesg this was logged three times by the kernel at 10:04:33 utc "Waiting for busy volume 536870992 () in cell openstack.org" so there was some temporary blip at that time but that somehow left apache in a bad state13:20
Tengu:)13:21
Tengumaybe that mod_autoindex doing some funny things in the back and unhappy with that blip...13:21
fungi536870992 is the docs volume's read-only replica, served from afs01.dfw.openstack.org and afs01.ord.openstack.org13:24
fungiboth of those servers seem to be up and happy though13:24
Tengunetwork glitch?13:25
fungihighly likely13:25
openstackgerritMerged openstack/project-config master: CentOS 8 Stream initial enablement for AArch64  https://review.opendev.org/c/openstack/project-config/+/77288713:25
fungii'm checking our network graphs for the involved systems to see if maybe there was bandwidth starvation13:26
openstackgerritPedro Luis Marques Sliuzas proposed openstack/project-config master: Add Metrics Server App to StarlingX  https://review.opendev.org/c/openstack/project-config/+/77388313:26
*** jamesmcarthur has joined #openstack-infra13:27
fungithere's been a fairly steady volume of outbound traffic from afs01.dfw which started up around that time, though it's not substantial just mildly anomalous13:28
Tengustop that torrent service then ;)13:29
* Tengu runs away13:29
fungiheh13:29
fungii don't find corresponding inbound traffic on the other fileservers so i don't think it was a volume release13:34
fungibut also could be entirely unrelated13:34
*** jamesmcarthur has quit IRC13:36
*** jpena|lunch is now known as jpena13:37
fricklerfungi: the slowness of static.o.o was well before those latest "busy volume" msgs appeared, they weren't there when I checked the server initially today, so pretty sure they're not the trigger for this event13:46
fungicool, so probably an unrelated coincidence13:46
fungioh, right, it was mentioned in #opendev at 08:36z13:47
fungiso had already been going on for at least an hour or two13:47
*** zul has joined #openstack-infra13:50
*** jamesmcarthur has joined #openstack-infra13:51
*** ykarel_ has joined #openstack-infra13:51
*** ykarel has quit IRC13:54
*** jamesmcarthur has quit IRC13:56
*** lbragstad has quit IRC13:57
*** amoralej|lunch is now known as amoralej13:58
*** ykarel_ is now known as ykarel13:59
*** ociuhandu has quit IRC14:08
*** jamesmcarthur has joined #openstack-infra14:08
*** ociuhandu has joined #openstack-infra14:09
*** jamesmcarthur has quit IRC14:15
*** rlandy is now known as rlandy|training14:21
*** ociuhandu has quit IRC14:27
*** ociuhandu has joined #openstack-infra14:28
*** ociuhandu has quit IRC14:32
*** akahat|rover is now known as akahat14:34
*** arxcruz is now known as arxcruz|ruck14:34
*** lbragstad has joined #openstack-infra14:35
*** sreejithp has joined #openstack-infra14:40
*** gfidente has quit IRC14:46
*** ysandeep is now known as ysandeep|afk14:48
*** jamesmcarthur has joined #openstack-infra14:52
*** gfidente has joined #openstack-infra14:54
*** jamesmcarthur has quit IRC14:57
*** bcafarel has quit IRC14:58
*** d34dh0r53 has quit IRC15:01
*** d34dh0r53 has joined #openstack-infra15:01
*** jamesmcarthur has joined #openstack-infra15:05
*** ociuhandu has joined #openstack-infra15:06
*** jamesmcarthur has quit IRC15:09
*** ociuhandu has quit IRC15:15
*** jamesmcarthur has joined #openstack-infra15:18
*** ociuhandu has joined #openstack-infra15:18
*** jamesmcarthur has quit IRC15:23
*** ykarel_ has joined #openstack-infra15:30
*** ysandeep|afk is now known as ysandeep15:31
*** ykarel has quit IRC15:32
*** jamesmcarthur has joined #openstack-infra15:36
*** jamesmcarthur has quit IRC15:43
*** zxiiro has joined #openstack-infra15:49
*** dklyle has joined #openstack-infra15:51
*** jamesmcarthur has joined #openstack-infra15:52
*** dklyle has quit IRC15:53
*** david-lyle has joined #openstack-infra15:53
*** jamesmcarthur has quit IRC15:57
*** jamesmcarthur has joined #openstack-infra15:58
clarkbthe daily periodic jobs do start at ~0600 iirc15:58
clarkbthough I'm unsure if they will actually get nodes quickly since zuul demand has been high15:58
*** ociuhandu_ has joined #openstack-infra16:00
*** ykarel_ is now known as ykarel16:00
*** ociuhandu has quit IRC16:03
*** ysandeep is now known as ysandeep|away16:06
*** rlandy|training is now known as rlandy16:07
*** jamesmcarthur has quit IRC16:14
*** ykarel has quit IRC16:17
*** d34dh0r53 has quit IRC16:18
*** d34dh0r53 has joined #openstack-infra16:19
openstackgerritMatt McEuen proposed openstack/project-config master: New Project Request: airship/gerrit-to-github-bot  https://review.opendev.org/c/openstack/project-config/+/77393616:19
*** jamesmcarthur has joined #openstack-infra16:29
*** jamesmcarthur has quit IRC16:29
*** ociuhandu_ has quit IRC16:30
*** jamesmcarthur has joined #openstack-infra16:36
*** ociuhandu has joined #openstack-infra16:38
*** hashar has joined #openstack-infra16:45
*** sshnaidm|ruck is now known as sshnaidm16:52
*** gfidente has quit IRC17:00
*** gfidente has joined #openstack-infra17:02
*** lucasagomes has quit IRC17:31
*** ralonsoh has quit IRC17:31
*** ociuhandu_ has joined #openstack-infra17:34
*** ociuhandu has quit IRC17:37
*** ociuhandu_ has quit IRC17:38
*** david-lyle is now known as dklyle17:39
*** amoralej is now known as amoralej|off17:49
*** ociuhandu has joined #openstack-infra17:56
*** jcapitao has quit IRC17:57
*** gfidente is now known as gfidente|afk17:57
*** jpena is now known as jpena|off17:59
*** ociuhandu has quit IRC18:01
*** derekh has quit IRC18:02
*** eolivare has quit IRC18:07
*** rpittau is now known as rpittau|afk18:08
openstackgerritMerged openstack/project-config master: Update ACLs of Ironic Projects to allow Edit Hashtags  https://review.opendev.org/c/openstack/project-config/+/77242718:10
*** dtantsur is now known as dtantsur|afk18:16
*** przemeklal has quit IRC18:17
*** d34dh0r53 has quit IRC18:22
openstackgerritMerged openstack/project-config master: Remove anachronistic jobs from scciclient  https://review.opendev.org/c/openstack/project-config/+/77290818:23
*** d34dh0r53 has joined #openstack-infra18:24
*** nightmare_unreal has quit IRC18:26
dansmithhow do I get the js for this? https://zuul.opendev.org/t/openstack/job/tripleo-ci-centos-8-containers-multinode18:28
dansmithadding .js doesn't do it18:28
dansmithclarkb: ^18:29
dansmither, I mean the json of course18:29
clarkblet me see18:29
fungidansmith: what json are you looking for? do you mean the yaml for the job definition?18:30
dansmithfungi: isn't that an html rendering of a data structure?18:30
clarkbya and it should be json18:30
fungidansmith: oh! the json from the zuul api, got it18:30
dansmithyeah18:30
fungiquick batman, to the page source18:31
clarkbhttps://zuul-ci.org/docs/zuul/reference/web.html is the doc but it seems to not have the job lookup defined18:31
dansmithmaybe I need to restrict my accept?18:31
clarkbdansmith: https://zuul.opendev.org/api/tenant/openstack/job/tripleo-ci-centos-8-containers-multinode18:31
dansmithah there it is, thanks18:32
fungiclarkb is much faster at reverse-engineering things than i am18:32
fungibut yeah, it's just rendered from the job method18:33
clarkbfungi: I took a leap that the job listing api and job detail api would be similar18:34
clarkbI got lucky :)18:34
fungialmost like someone designed a reasonably intuitive api. shocker18:35
fungi(and kudos to the zuul developers!)18:35
dansmithfungi: clarkb: I'm trying to count up the total node usage time for a given run on a patch18:48
dansmithdoes this look legit? https://pastebin.com/9CW9sSZh18:48
dansmiththat comes from this patch: https://review.opendev.org/c/openstack/glance/+/77068218:48
dansmithjust picked at random18:48
clarkblooking18:49
dansmithactually18:49
dansmithheh18:49
dansmiththat has no multinode jobs apparently18:49
dansmithlet me pick a nova18:50
fungidansmith: seems reasonable, how are you extracting the node count? fro the inventory?18:50
fungiand yeah, a change with some multi-node jobs would help18:50
clarkbya was going to mention it looks right, but no multinode to verify the multiplication that implies18:50
clarkband to clarify those numbers are basically the base patchset cost (without counting for gate or rechecks)18:50
clarkban ideal node hours count18:51
fungiright, things like gate resets, build retries, et cetera all pile on top of that18:51
fungirechecks can be covered a little more directly18:51
clarkbbut may also not need that info if we want to compare assumed cost18:52
fungibut any builds internally aborted/discarded/rerun by zuul won't be reported18:52
clarkblike if one base cost is vastly different than another we can use that for some comparisons18:52
fungiand yeah, that can mostly be assumed as a fixed overhead cost percentage18:52
dansmithhttps://pastebin.com/NdagG24Q18:52
openstackgerritMerged openstack/project-config master: Add Metrics Server App to StarlingX  https://review.opendev.org/c/openstack/project-config/+/77388318:52
fungithough technically it will be greater in projects with more involved/longer gate queues18:52
dansmithall I care about is base time, not penalizing job weight for rehecks.. just "ideally, this would take N hours of time"18:52
clarkbdansmith: ya I think what you've got there looks good with respect to that goal. Note that multinode jobs can have >2 nodes too, I expcet you're already handling that but wanted to call it out just in case18:53
dansmithrechecks meaning zuul retrying something or other transient things18:53
fungiit's an excellent place to start (and also possibly stop, as being "good enough")18:53
dansmithclarkb: yep, I'm counting nodeset['nodes']18:53
dansmiththat makes nova's count much higher than my off-the-cuff counting of the big jobs in my head18:54
fungi"Job nova-grenade-multinode takes 2 nodes for 1h 18m 32s, total 2h 37m 4s" is that time already multiplied by 2 or should be multiplied after to get a total utilization?18:54
clarkbfungi: the first number is wall time and second is node hours is how I read it18:55
dansmithyes18:55
dansmithhere's a tripleo: https://pastebin.com/suaYz7hE18:55
fungiyeah, so 2 * 1h 18m 32s is 2h 37m 4s18:55
fungisorry, i was being dense ;)18:55
fungiboth are included18:56
dansmithonly about 50% more than nova, which is surpising given the 40% of the resources that clark's script gives, vs nova's 1018:56
fungiof course, the number of change revisions, number of rechecks issued, tiems a build failed in pre-run or due to presumed network issues and was retried automatically, times a slew of builds were tanked by a gate reset and restarted... all that adds up18:57
dansmithwell, not counting manual rechecks or volume of commits, is there really any meat in the "automatically retried by zuul" bucket?18:58
dansmithif there is, it'd be good to know that, but I guess I thought compared to a 4h tripleo job run, probably not so much18:59
dansmithin the past, when nova was the big user, the high patch volume for nova was an argument to cut down the amount of stuff we run per commit, so I think that's still valid here19:00
dansmithvery few glance commits means it's not really a big deal how much they run19:00
dansmith(for example)19:00
clarkbya, glance's bigger concern will be returning results back to their devs in a reasonable amount of time and with a reasonable chance of success (more jobs means more chance for falure)19:01
clarkbthe measure may still be worthwile in that context19:01
dansmithwell, that's the point of this.. not to make the tripleo jobs run faster or slower, but because actual result time is sooooo long right now19:02
dansmithbut yeah, glance doing 10h worth of work, if it has to be serialized due to unavailable workers, after a 4h wait, is no good19:02
dansmithit sounds like tripleo has some jobs they can cut out of their current setup right now, so that may put them closer to where nova is19:03
dansmithI still don't have a good way to determine commit volume per project to know if tripleo has like a thousand changes a day such that their jobs would need to be smaller, or whether nova is still high in terms of review pushes per day compared to others19:04
clarkbfungi: ^ may have scripting tools that can figure that out (or similar enough)19:05
dansmithyeah I tried one of his but it crashed19:05
dansmithtripleo uses 20 nodes for that run, compared to nova's 27 FWIW19:06
* dansmith also thinks we have fat we can cut from nova19:06
clarkbthe node counts would be another interesting stat to track too. Since once you've got a node its yours for ~ 3hours until it times out. But you may have to wait for more nodes if you need more to run the job set19:07
clarkbMy hunch is that this will be less useful than node hours since node hours more accurate captures the pool usage19:08
clarkbbut I could be wrong19:08
dansmithyeah, there's definitely a "but how much of this hour were you *wasting* a node" but that's harder19:10
clarkbdansmith: was your example for tripleo from a triple-heat-templates change?19:10
clarkbbased on the zuul log scraping I did that repo uses the most overall node hours19:11
dansmithclarkb: it was19:11
dansmithI will clean this up and pastebin it19:11
clarkband thats a 13.8% tht vs 8.96% nova in my scrape of the last week19:12
clarkbwhcih is closer to a 20 vs 31 hours ratio19:12
dansmithtrue I was comparing all of tripleo19:13
clarkb1.54 ratio vs 1.55 according to python float math19:14
clarkbbased on that I'd like to say "that means tht and nova have similar change rates in gerrit" btu I don't think we can make that leap yet (due to rechecks and potential for gate thrashing). It is a good sign that things seem to line up between two different viewpoints19:15
dansmithyeah19:16
fungidansmith: also a single change for a project is likely a poor sample size for resources used, given rampant application of file and branch filters19:16
*** andrewbonney has quit IRC19:17
clarkbya https://review.opendev.org/766980 in the gate right now is running more jobs than in dansmith's example (which is why I asked if the source was tht), But I think its still a good starting point and we can take more samples and see what median/average/whatever looks like19:17
fungibut yeah, revision count points to developer/feedback loop efficiency, rechecks point to job instability, automated retries possibly to jobs doing unorthodox things which could crash test nodes, et cetera19:18
dansmithyeah, obviously there's a potential for under-counting there19:18
fungithe average revision count will be higher, for example, in projects which have a culture of very long change stacks, when developers rebase most of a stack to address a review comment on something toward the beginning19:20
fungisomething as simple as avoiding unnecessarily stacking commits together when they're not strictly dependent could make a significant difference19:21
fungii mean, squashing commits could help in similar ways, but honestly i wouldn't advocate for workflow changes which reduce reviewability for the sake of resource savings19:22
dansmithclarkb: that one you linked above is 31.5h, not too much more19:23
dansmithalthough that was a check run, not gate19:23
dansmithany chance their gate is larger than check?19:23
clarkboh interseting it could be I suppose19:23
fungiusually the opposite, changes might run in check which are omitted in the gate, but sure it's possible they've done the reverse for some reason19:24
clarkbwe generally suggest that gate should always be a subset of check (so that you have confidence that when things go into the gate they will pass as they already passed in check)19:24
dansmithhttps://termbin.com/eslv19:24
dansmithfungi: right, but that's why I'm asking19:24
fungialso trimming in dependent pipelines like gate is likely to have a bigger impact than in independent pipelines like check, because of the "gate reset" factor19:25
dansmithyup19:25
dansmiththat's basic gating 101 to me now, but maybe that has been lost over the years19:25
dansmithso to be honest, both nova and tripleo seem pretty fat to me, given that one computer couldn't likely finish all the work done on *each patch* in a friggin day19:28
dansmithbut the tripleo bigness seems less bigly than I was expecting19:28
openstackgerritMatt McEuen proposed openstack/project-config master: New Project Request: airship/gerrit-to-github-bot  https://review.opendev.org/c/openstack/project-config/+/77393619:29
clarkbdansmith: fungi looking at https://review.opendev.org/c/openstack/tripleo-heat-templates/+/766980/ check jobs it ran 19 jobs in check and has 17 in gate. It appears the two missing in gate were non voting in check19:29
clarkbso that all checks out from a gate is subset of check perspective19:30
dansmithyeah19:30
fungidansmith: i would at least be cromulent not to embiggen them further19:30
fungier, it would19:30
dansmithfor sure, and I think both could shrink down19:30
dansmithin nova, I've proposed to stop running two grenades that have 100% overlap, but there's a ceph zuulv3 thing that has to be worked out first19:31
dansmithas an exaple19:31
dansmithand I think that the tripleo people are open to dropping some or all of those current nv jobs, as well as one of theirs that was covering an upgrade scenario that has recently passed19:31
clarkbI think it more likely that the difference in jobs I noticed between dansmith's paste example and the change in the gate is based on file matchers triggering different sets of jobs for different kidns of changes to the repo19:31
dansmithand that's goodness from a resource conservation perspective19:32
clarkbdansmith: worth mentioning the "experimental" queue is a good place for things that we'd like to be able to run on demand but don't need often19:32
dansmithclarkb: yep19:32
clarkbthat way you can balance dropping jobs against needing to run them occasionally19:32
dansmithespecially things that we know fail or fail a lot and are just for traffic lighting19:32
dansmithtraffic lighting? maybe "sniff testing"19:33
dansmithanyway19:33
dansmithso, funny story19:35
dansmithI picked a neutron one19:35
dansmith54.5h19:35
clarkbwow19:36
dansmithslaweq: ^19:36
dansmithhate to think what it was before the recent patch to drop a bunch of co-gating19:37
dansmithclarkb: also, 42 nodes19:37
fungiexperimental has a low priority... is it lower than check priority though? if memory serves we're still limited to three priority levels because of the gearman protocol, and release activities fall into the top priority, then gating...19:37
dansmithvs 22 for a tripleo run19:37
* fungi tries to figure it out19:37
clarkbfungi: zuul and nodepool have the ability to do finer grained priority than that, but I don't know if we expose it that way in configs19:38
clarkband ya experimental and check should be roughyl at the same level. More that if you know you need something you trigger the experimental jobs but 90% of the time they don't incur any overhead19:39
dansmithalso I hardly ever see the experimental queue on my dashboard, but that might be because there's nothing from nova in it19:39
fungicheck and experimental are currently both precedence low in https://opendev.org/openstack/project-config/src/branch/master/zuul.d/pipelines.yaml19:39
dansmithJob neutron-tempest-dvr-ha-multinode-full takes 3 nodes for 1h 53m 11s, total 5h 39m 33s19:39
dansmithhoo boy19:39
fungioh, right, the point is that things in the experimental pipeline are not automatically triggered, only run when someone issues a "check experimental" comment19:39
clarkbhttps://zuul-ci.org/docs/zuul/reference/pipeline_def.html#attr-pipeline.precedence ya we still only expose the three levels but I'm pretty sure internally its a range 0-999 or similar19:40
clarkbdansmith: and now we know the math works for >2 nodes :)19:41
dansmithheh19:41
dansmithclarkb: is there an appropriate git tree of random infra tools I could throw this in instead of a pastebin?19:44
openstackgerritMerged openstack/project-config master: New Project Request: airship/gerrit-to-github-bot  https://review.opendev.org/c/openstack/project-config/+/77393619:49
*** rcernin has joined #openstack-infra19:49
fungiwe've got a tools directory in opendev/system-config which has been a bit of a dumping ground19:52
funginot certain that's a tradition we want to continue, but maybe19:53
dansmithnot a big deal, just figured if there was somewhere obvious19:53
slaweqdansmith: clarkb: the problem for us is that this multinode dvr jobs is really really job which is testing dvr (except grenade one but this isn't covering everything)19:57
dansmithslaweq: ack, it's not just that job that is the problem, it's just the biggest19:58
slaweqdansmith: I know19:58
slaweqour problem is that we have many backends and config combinations to test19:58
slaweqand because of that we have many jobs19:58
dansmithslaweq: sure, lots of projects have many more combinations than they can test19:58
fungialso it's not absolutely necessary to run every test for every backend, in theory you could only run tests you expect to have differing results/code paths for those backends20:00
dansmithand, sometimes you have to say "-W, let's run experimental on this patch to get coverage from $subsystem"20:01
fungiodds are most of the code tested by jobs for two different backends are mostly the same20:01
slaweqfungi: yes, I thought about defining some lists of unrelated files for various jobs20:01
slaweqI will try to propose something soon20:01
fungiwell, thinking less in terms of file filters for jobs and more about what tests are run in which jobs, but sure both could help20:01
slaweqfungi: ok, I will take a look for that too20:02
dansmithyes, irrelevant_files, tempest test filters, and sometimes human intervention knowing when to run an optional job20:02
slaweqdansmith: I will work on improvement there20:02
dansmithslaweq: thanks, definitely appreciated :)20:02
fungiand dansmith's suggestion on using the experimental pipeline for resource-intensive jobs which test a tiny slice of the codebase is good too20:02
fungi(tiny compared to the overlap with other automatically run jobs i mean)20:02
*** zxiiro has quit IRC20:03
*** ociuhandu has joined #openstack-infra20:04
dansmithclarkb: I can't fetch the zuul job definition for the devstack-platform-async job I added in here: https://review.opendev.org/c/openstack/devstack/+/77150520:08
dansmithwhyfornot?20:08
clarkbdansmith: first guess is because it is a new job that hasn't merged to the data set yet20:09
clarkbso it only exists in the context of that change when that change is tested20:09
dansmithokay20:09
dansmithI guess that makes sense, but I guess I also didn't think that zuul would be collecting all defined jobs somehow either20:11
*** hashar has quit IRC20:11
dansmithdevstack is low volume but I thought maybe had a lot of resources because it runs multinode things,20:11
dansmithbut 16h and 23 nodes20:11
clarkbI think one of the big improvements devstack did was it dropped a bunch of tempest testing20:12
dansmithyeah20:12
clarkbit tests the devsatck portion of tempest jobs in a number of ways but then doesn't worry about tempest a whole lot anymore20:12
fungias far as querying the job method to figure out node counts, yeah that will only work for jobs in merged configuration so jobs which are proposed in a change and executed speculatively won't be queryable that way20:17
fungibut also it's a toctou situation, where you may be querying the current node count for a job which was not the node count at the time a build of it ran in the change you're looking at20:17
dansmithyeah, clearly this requires running on fresh jobs20:18
fungiyou can probably safely ignore/discard those situations from a statistical standpoint, but you do have to at least not break when you hit them20:18
dansmithis there any sort of post-buildset hook where we could run this, and then be able to plot the big ones out of logstash or something?20:18
dansmithwe would need the job runtimes some other way obviously20:19
fungimight just make more sense to emit them in statsd and be able to query from graphite/in a grafana dashboard20:19
clarkbfungi: the trouble there is I don't think we do that sort of data collection at a patchset level20:20
*** rcernin has quit IRC20:20
clarkbsince graphite collapses things down20:20
*** rcernin has joined #openstack-infra20:20
clarkbbut I could be wrong about that (I did look around in graphite a bit yseterday and wasn't finding a better way to do it via its data)20:20
fungiyeah, it would definitely require choosing our preferred aggregation up front20:20
clarkboh the other issue was we do more organization by job not buildset20:20
clarkbthere are probably ways to bend it to our will for this, but I'm not immediately aware of what that process would be20:21
*** jamesmcarthur has quit IRC20:22
*** lamt has joined #openstack-infra20:28
dansmithslaweq: sorry, I think I picked a stable/ patch to run that on.. the master number is 38h not 54h, but node count is still quite high at 3220:48
dansmithslaweq: maybe stable needs some tripleo-ectomy like you did for master?20:49
fungithough also if stable branch changes are far less frequent, the gain from spending effort to improve their efficiency could be small20:50
dansmithyep20:50
fungilike if there are 100 master branch changes for every stable backport, not a lot of pojnt20:50
fungipoint20:50
dansmithat 54h and 42 nodes, I'm not sure it's not worth it, but.. I know the scale is very differernt20:51
*** sshnaidm is now known as sshnaidm|afk20:52
*** SpamapS has quit IRC20:52
*** rcernin has quit IRC21:22
*** hamalq has joined #openstack-infra21:27
openstackgerritMerged openstack/project-config master: Add ansible-role-pki repo  https://review.opendev.org/c/openstack/project-config/+/77338521:36
*** ociuhandu has quit IRC21:36
fungiand waiting for that to deploy21:37
fungier, wrong channel21:37
*** sboyron_ has quit IRC21:49
*** SpamapS has joined #openstack-infra21:54
*** thiago__ has joined #openstack-infra22:00
*** tdasilva_ has quit IRC22:02
*** xek has quit IRC22:03
*** rcernin has joined #openstack-infra22:05
*** rcernin has quit IRC22:06
*** tdasilva_ has joined #openstack-infra22:06
*** thiago__ has quit IRC22:06
*** rcernin has joined #openstack-infra22:07
*** openstackgerrit has quit IRC22:11
*** vishalmanchanda has quit IRC22:30
*** tdasilva_ has quit IRC22:34
*** thiago__ has joined #openstack-infra22:34
*** slaweq has quit IRC22:41
dansmithclarkb: https://zuul.opendev.org/t/openstack/build/ad126256fe7b4b3e9454dbd6a6532ec7/log/job-output.txt#2568022:42
dansmithSpeedup: 1.33622:42
*** slaweq has joined #openstack-infra22:43
clarkbjust over a 5 minute time savings? thats pretty good when you multiply it by the number of devstack job run22:43
dansmithyup22:43
*** rcernin has quit IRC22:45
*** slaweq has quit IRC22:47
*** rlandy is now known as rlandy|bbl23:16
*** dchen has joined #openstack-infra23:34
*** tosky has quit IRC23:54

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!