Tuesday, 2021-01-19

*** tosky has quit IRC00:02
*** dingyichen has joined #openstack-infra00:03
*** dychen has quit IRC00:03
dansmithclarkb: am I wrong that running unit and functional tests on multiple python versions on every patch is maybe contributing to why queue times are so long lately?00:09
dansmithdon't those all consume a worker for almost as much time during setup as actually running?00:10
dansmithtoday nova is running py36,38,39 for unit and 38,39 for functional which seems a little much00:10
dansmithand also, couldn't we combine like py38 with cover and maybe even pep8 to make those all run on one worker and avoid setup time?00:11
dansmithyeah, 15m to run tests, but ~30m for the job00:17
dansmith(for unit.00:17
*** dciabrin_ has quit IRC00:17
dansmith9m to run 2m of pep800:17
dansmith15m to run 8m of cover00:18
fungithe devstack jobs take many times longer, and some of them occupy multiple nodes00:20
dansmith15+2+8=25 minutes of actual runtime, vs 55 minutes of "cpu time" to run them separate I think00:21
*** lxkong0 has joined #openstack-infra00:21
fungiso yes reducing job count could help some, but reducing devstack jobs would have more impact than reducing unit test or linting jobs00:21
dansmithyeah I know, I'm just wondering if we're wasting worker time re-setting up a basically identical environment just so we can see each test broken out in the report?00:21
dansmithI really wonder if we need to be running grenade *and* grenade-multinode for example00:23
fungiwell, re-setting up an environment  which is "nearly identical" to some small percentage of the overall jobs run. we have hundreds of such "not quite the same" environments and trying to maintain pre-set-up copies of all of them would also require a ton of resources (and increase complexity many times over)00:23
dansmithaside from bugs in grenade, I wouldn't think that we're actually missing any coverage on the multinode case00:23
fungiwe've already tried to use pre-set-up environments in the past and keeping them maintained ends up being far more work than letting them get created at job runtime, but also makes changes to the environment setup itself testable directly00:24
fungier, building them at job runtime makes them more directly testable i mean00:24
dansmithfungi: right, but if I make a job run tox -epy38,pep8,cover, those should run all in the same pre-set-up worker and just burn some pip time right?00:24
dansmithfungi: yeah, I'm talking about using a more common base config and running more tox envs in a row on it, not trying to make the image more specifci00:25
fungithey would, though you'll need to go digging in a much larger job log to figure out whether what broke was linting, unit tests or your coverage report00:25
dansmithyeah, I'm okay with that00:26
fungiimproving efficiency of the environment setup might be more effective00:26
dansmithI guess that seems like more infra work to me, and thought you were arguing against it just a few lines ago :)00:26
fungilike not installing things the job doesn't need just because it's easier to maintain a single list of dependencies than task-specific dependency sets00:26
funginope, talking about job configuration00:27
dansmithI'm just saying, if we're spending 50% of our time booting a "basic ubuntu image" worker to run 2m of something, that seems like a waste just so we can have the jobs called out separately00:27
funginot infrastructure00:27
dansmithokay, I see.. obviously if we can make the setup time faster then that's good,00:27
fungiwe spend on average closer to 30 seconds to boot an ubuntu image i think. job setup spends a lot of time installing things people aren't actually using in their jobs00:28
dansmithbut if jamming a few things into the same worker means we don't have to micro-optimize, I dunno.. seems easier00:28
fungimight be, i'm not arguing against trying it00:28
fungifolks already do that with linters in some projects00:29
fungii've not seen anyone cram linting and unit tests into the same job, but it should be doable00:29
dansmithso, the pep8 job I'm looking at,00:29
dansmithran pep8 for 2 minutes, and was done with that 5m into the job, but the job took another 4 minutes, presumably to clean up and post logs00:29
dansmithwell, I run tox -epy38,pep8 locally a lot.. because the pep8 output is short enough that I can scroll up and see the unit test output above it, even if I have a few lines of pep8 fail00:30
dansmithwell, those numbers aren't quite right because it looks like the job start time must not be at time zero in the log, so maybe it's more front-loaded.. about 40s of cleanup after we're done with pep800:31
dansmithso I assume that's create and boot time or something, which is part of what would be saved00:32
dansmithanyway, I'm just really worried that we're at an 8h turnaround time on a monday00:32
dansmithand looking at everything we're running in a nova job these days, it seems like we should pare that down00:33
fungias we've said in the past, the biggest impact you can make on node utilization is to ferret out nondeterministic failures in projects/tests which burn a ton of nodes by having to retest changes and discarding lots of other builds00:33
dansmithsure, and I'm still trying to land such a fix from last week :)00:34
dansmithI definitely continue to push on people to do that,00:34
dansmithbut as we noted last week, it sounds like maybe some job configs have grown a little heavy00:34
dansmithclarkb said he wasn't seeing a lot of resets when we had a >24h turnaround time last week00:34
fungithe longer gate queues and the gate failures i see at the moment are for tripleo, yeah00:36
fungibut also zuul's very nearly caught up from earlier today at this point00:37
dansmithfungi: the head of the nova queue is 8h old00:37
fungiwe peaked at a backlog of 1.7k nodes and are down to just being 400 behind now00:37
dansmithif tripleo has a lot of fails (and we know they have heavy jobs) we probably also need to focus attention there00:38
fungiin the next hour or so i expect all changes in all pipelines will have node assignments filled00:38
fungilooks like puppet-openstack also just tagged 26 releases00:44
fungiwell, "just" nearly three hours ago00:45
*** jamesmcarthur has joined #openstack-infra00:46
*** jamesmcarthur has quit IRC00:52
*** JanZerebecki[m] has joined #openstack-infra01:17
*** jamesmcarthur has joined #openstack-infra01:19
*** jamesmcarthur has quit IRC01:19
*** jamesmcarthur has joined #openstack-infra01:20
*** jamesmcarthur has quit IRC01:20
*** jamesmcarthur has joined #openstack-infra01:26
*** jamesmcarthur has quit IRC01:57
*** ysandeep|away is now known as ysandeep02:07
*** jamesmcarthur has joined #openstack-infra02:18
*** jamesmcarthur has quit IRC02:23
*** jamesmcarthur has joined #openstack-infra02:23
*** rcernin has quit IRC02:26
*** jamesmcarthur has quit IRC02:29
*** jamesmcarthur has joined #openstack-infra02:33
*** jamesmcarthur has quit IRC02:34
*** rcernin has joined #openstack-infra02:42
*** rcernin has quit IRC02:44
*** rcernin has joined #openstack-infra02:44
*** jamesmcarthur has joined #openstack-infra02:56
*** verdurin has quit IRC03:02
*** verdurin has joined #openstack-infra03:07
*** jhesketh_ has joined #openstack-infra03:40
*** jhesketh has quit IRC03:41
*** jhesketh_ is now known as jhesketh03:43
*** lxkong0 is now known as lxkong03:47
*** zzzeek has quit IRC03:48
*** zzzeek has joined #openstack-infra03:51
*** ricolin has joined #openstack-infra03:54
*** ysandeep is now known as ysandeep|pto03:55
*** lbragstad has quit IRC04:16
*** ykarel has joined #openstack-infra04:18
*** zzzeek has quit IRC04:33
*** zzzeek has joined #openstack-infra04:35
*** guillaumec has quit IRC05:40
*** guillaumec has joined #openstack-infra05:44
ykarelfrickler, hberaud fyi tarballs are available and our jobs passing now06:00
*** ykarel_ has joined #openstack-infra06:16
*** ykarel has quit IRC06:19
*** vishalmanchanda has joined #openstack-infra06:20
*** ykarel_ is now known as ykarel06:29
*** sboyron has joined #openstack-infra06:33
*** jamesmcarthur has quit IRC07:06
*** amoralej|off is now known as amoralej07:15
*** rcernin has quit IRC07:19
*** xek has joined #openstack-infra07:22
*** ralonsoh has joined #openstack-infra07:27
*** lpetrut has joined #openstack-infra07:39
hberaudykarel: ack, thanks for the heads up07:41
*** nightmare_unreal has joined #openstack-infra07:44
*** eolivare has joined #openstack-infra07:47
*** slaweq has joined #openstack-infra07:48
*** yamamoto has quit IRC07:53
*** jcapitao has joined #openstack-infra07:57
*** dciabrin_ has joined #openstack-infra08:00
*** dchen has quit IRC08:01
*** rpittau|afk is now known as rpittau08:07
*** yamamoto has joined #openstack-infra08:11
*** andrewbonney has joined #openstack-infra08:13
*** zzzeek has quit IRC08:14
*** zzzeek has joined #openstack-infra08:16
*** hashar has joined #openstack-infra08:22
*** tosky has joined #openstack-infra08:39
*** gfidente has joined #openstack-infra08:40
*** jpena|off is now known as jpena08:58
*** lucasagomes has joined #openstack-infra09:04
*** jamesmcarthur has joined #openstack-infra09:06
*** sboyron has quit IRC09:06
*** sboyron_ has joined #openstack-infra09:06
*** jamesmcarthur has quit IRC09:11
*** ricolin has quit IRC09:12
*** ricolin has joined #openstack-infra09:13
*** ociuhandu has joined #openstack-infra09:15
amoralejhi, may i get some attention on https://review.opendev.org/c/zuul/zuul-jobs/+/771105/ and https://review.opendev.org/c/zuul/zuul-jobs/+/770815 ?09:32
amoralejwe need it to get proper repos configured in centos8 stream jobs09:33
*** rcernin has joined #openstack-infra09:43
*** sboyron has joined #openstack-infra09:43
*** sboyron_ has quit IRC09:43
*** derekh has joined #openstack-infra09:43
openstackgerritHervĂ© Beraud proposed openstack/project-config master: Adding irc notification for missing oslo projects  https://review.opendev.org/c/openstack/project-config/+/77139209:49
*** hashar is now known as hasharOut09:59
*** sboyron_ has joined #openstack-infra10:05
*** sboyron has quit IRC10:08
*** ociuhandu has quit IRC10:19
*** ociuhandu has joined #openstack-infra10:24
*** ociuhandu has quit IRC10:24
*** ociuhandu has joined #openstack-infra10:25
*** rcernin has quit IRC10:28
*** ociuhandu has quit IRC10:29
*** ociuhandu has joined #openstack-infra10:58
*** ociuhandu has quit IRC11:12
*** ociuhandu has joined #openstack-infra11:12
*** rcernin has joined #openstack-infra11:13
*** ysandeep|pto is now known as ysandeep11:14
*** ociuhandu has quit IRC11:17
*** jcapitao is now known as jcapitao_lunch11:26
geguileoamoralej: is the second patch going to fix the centos-8 jobs that are trying https://mirror.bhs1.ovh.opendev.org/wheel/centos-8.3-x86_64 instead of the right one?11:27
geguileoamoralej: ignore me, it's not, that patch is for stream11:27
geguileoand I'm talking about centos-811:27
amoralejgeguileo, yes, it's probably a different issue11:28
geguileoamoralej: maybe you can point me in the right direction then...11:30
geguileocentos-8 nodeset is using an incorrect wheel mirror which is making some jobs fails...11:31
geguileoit's trying https://mirror.bhs1.ovh.opendev.org/wheel/centos-8.3-x86_6411:31
geguileoand it should be https://mirror.bhs1.ovh.opendev.org/wheel/centos-8-x86_6411:31
amoralejgeguileo, can you point me to a failing job?11:33
geguileoamoralej: https://zuul.opendev.org/t/openstack/build/6d6fb0dde981476ab9981fe80a093bf111:34
geguileoamoralej: I think the problem is the definition of "wheel_mirror" that uses {{ ansible_distribution_version }} instead of just the major version...11:35
geguileobecause we don't have a default in roles/configure-mirrors/vars/CentOS.yaml11:37
*** rcernin has quit IRC11:37
amoralejyes, looks like so11:37
geguileodo you know if all pip URLs use centos-8-x86_64 or if some use centos-8.3-x86_64  format?11:41
amoralejgeguileo, cuiously it doesn't fail in centos7.9 jobs11:41
amoralejeven if the wheels dir does not exist11:41
amoralejgeguileo, i have no idea tbh11:41
geguileoamoralej: maybe because we are "lucky" and this bug is affecting that ansible version... https://github.com/ansible/ansible/issues/5014111:41
geguileowhich reports 7 when it should be saying 7.911:42
amoralejno11:42
amoraleji see it's using 7.911:42
amoralejin fact in a centos8 run from some days ago it worked11:42
amoralejhttps://zuul.opendev.org/t/openstack/build/1d67a1289d3d417188d13a5f4451c60e/console11:42
geguileommmmm, and what's the wheel mirror used to build alembic?11:42
geguileoamoralej: on that job alembic was already present in the system11:43
amoralejyes11:44
amoralejthat's what i'm seeing11:44
geguileoso it didn't have to build it...11:44
amoralejanyway it's clear that the mirror url is wrong11:45
geguileoamoralej: did you get to see the wheel mirror URL anywhere on that job?11:45
*** yamamoto has quit IRC11:45
amoralejit's what you pointed11:45
amoralejin configure-mirrors/defaults/main.yaml11:46
*** yamamoto has joined #openstack-infra11:46
*** yamamoto has quit IRC11:46
geguileoamoralej: yeah, but I meant the one actually being used by that job11:46
geguileoas in seen it in the logs11:46
*** yamamoto has joined #openstack-infra11:47
geguileofound it (I think)11:47
amoralejit needs to be overriden in CentOS.yaml11:47
*** yamamoto has quit IRC11:47
geguileoamoralej: that's what I'd like to confirm...11:48
amoralejgeguileo, i think it's in https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/configure-mirrors/vars/CentOS.yaml11:48
geguileoamoralej: yeah, that's where I need to add it11:48
*** yamamoto has joined #openstack-infra11:48
*** yamamoto has quit IRC11:48
geguileoamoralej: but I don't want to break 7.9 jobs just to fix 8 jobs11:48
*** yamamoto has joined #openstack-infra11:48
geguileoXD11:48
amoralejit loads in https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/configure-mirrors/tasks/mirror.yaml#L611:49
amoralejwell, you may even create a  CentOS-8.yaml11:49
amoralejif you prefer11:49
*** yamamoto has quit IRC11:49
amoralejbut i'd say it should not break centos711:49
geguileoamoralej: well, we either have the centos7 jobs with the wrong wheel mirrors now, or it could break them11:50
amoralejit has wrong mirror url11:51
geguileoamoralej: ok, will send the fix now11:52
amoralejcheck in https://zuul.opendev.org/t/openstack/build/b221eba358a5443990f6fd5809bde2b711:52
geguileoamoralej: yup, thanks11:53
*** ramishra has quit IRC11:55
*** ociuhandu has joined #openstack-infra11:57
*** ramishra has joined #openstack-infra11:57
*** ricolin has quit IRC11:57
*** dpawlik has quit IRC11:57
*** logan- has quit IRC11:57
*** paladox has quit IRC11:57
*** fresta has quit IRC11:57
*** lifeless has quit IRC11:57
*** abhishekk has quit IRC11:57
*** gryf has quit IRC11:57
*** DinaBelova has quit IRC11:57
*** markmcclain has quit IRC11:57
*** paladox has joined #openstack-infra11:57
*** fresta has joined #openstack-infra11:58
*** ricolin has joined #openstack-infra11:58
*** lpetrut_ has joined #openstack-infra11:58
*** lifeless has joined #openstack-infra11:58
*** DinaBelova has joined #openstack-infra11:58
*** markmcclain has joined #openstack-infra11:58
*** abhishekk has joined #openstack-infra11:59
*** gryf has joined #openstack-infra11:59
*** logan- has joined #openstack-infra12:00
*** lpetrut has quit IRC12:01
*** ociuhandu has quit IRC12:10
*** ociuhandu has joined #openstack-infra12:13
*** eolivare_ has joined #openstack-infra12:14
*** dpawlik has joined #openstack-infra12:15
*** eolivare has quit IRC12:16
*** ociuhandu has quit IRC12:18
*** ajitha_ has joined #openstack-infra12:23
*** ociuhandu has joined #openstack-infra12:29
*** ociuhandu has quit IRC12:30
*** ociuhandu has joined #openstack-infra12:32
*** jpena is now known as jpena|lunch12:34
*** ajitha_ is now known as ajitha12:34
*** jcapitao_lunch is now known as jcapitao12:44
*** hasharOut is now known as hashar12:45
*** ttx has quit IRC12:51
*** rlandy has joined #openstack-infra12:51
*** eolivare_ has quit IRC12:53
*** yamamoto has joined #openstack-infra12:58
*** amoralej is now known as amoralej|lunch12:59
*** ykarel has quit IRC13:01
*** ttx has joined #openstack-infra13:04
openstackgerritLuigi Toscano proposed openstack/project-config master: cursive: prepare to move the jobs in-tree  https://review.opendev.org/c/openstack/project-config/+/77144313:12
*** ykarel has joined #openstack-infra13:24
*** jamesmcarthur has joined #openstack-infra13:26
*** eolivare_ has joined #openstack-infra13:26
*** jpena|lunch is now known as jpena13:34
*** lbragstad has joined #openstack-infra13:37
*** zul has joined #openstack-infra13:40
*** yamamoto has quit IRC13:42
*** _erlon_ has joined #openstack-infra13:49
*** amoralej|lunch is now known as amoralej13:58
*** yamamoto has joined #openstack-infra14:15
*** yamamoto has quit IRC14:26
*** slittle1 has quit IRC14:32
*** akantek has joined #openstack-infra14:33
*** akantek has quit IRC14:34
*** dave-mccowan has quit IRC14:38
*** rcernin has joined #openstack-infra14:42
*** rcernin has quit IRC15:01
*** derekh has quit IRC15:20
*** derekh has joined #openstack-infra15:20
openstackgerritMerged openstack/project-config master: Add PTP Notification app to StarlingX  https://review.opendev.org/c/openstack/project-config/+/77123515:22
*** gryf has quit IRC15:26
*** ociuhandu has quit IRC15:27
*** hashar is now known as hasharKids15:28
*** gryf has joined #openstack-infra15:31
*** sshnaidm|ruck is now known as sshnaidm|afk15:37
*** ysandeep is now known as ysandeep|dinner15:37
clarkbdansmith: fungi ya when we haev looked at numbers in the past the long running multinode jobs so completely dwarf the other jobs that trying to optmize linting or even unittests for projects won't have a large effect15:42
*** dklyle has joined #openstack-infra15:43
dansmithclarkb: okay but, if every nova review is running five jobs that it doesn't need, even if they're small, I would think that would add up15:43
fungiwe do have data we can use to attempt to calculate how many node-hours we spend on various jobs and per project15:44
dansmithclarkb: I know it would be zuul changes and maybe some more noise, but have we ever considered batching the jobs into long and short? so that we get a quicker report of unit, functional, linting and then a later report of the heavy stuff?15:45
fungiwhich would probably lead to a more useful analysis, and less abstract conjecture15:45
clarkbdansmith: it is >0 but not significant. Last time we ran numbers tripleo alone was like 35% or something like that of resource usage and they don't do linting and unittests really15:45
clarkbits all their 3 hour multiple jobs that quickly dominate15:45
clarkbI can see if I can run that script again today15:46
clarkbI think it also outputs job consumption which helps see it from the linting/unittests vs integration angle15:46
*** ysandeep|dinner is now known as ysandeep15:47
dansmithclarkb: can we figure out some relative stat? like hours per review or something likethat?15:47
clarkbdansmith: yes, we've actually done that. What we found when we tried it is you get a lot more round trips and it doesn't help on the whole15:47
fungidansmith: some projects do hold longer jobs until their shorter jobs pass, but the down sides to that are 1. you may need additional patchsets when you find out that you have more than one error exposed in different jobs some of which weren't run the first time, and 2. it'll take longer to get a result because the jobs are no longer run in parallel15:47
*** zul has quit IRC15:47
dansmithclarkb: like, I want to compare them against other projects to say "tripleo has half as many patches as nova, but consumes 4x the bandwidth"15:48
clarkbdansmith: you should be able to do that as a derivative from the numbers the existing script prints out15:48
dansmithokay15:48
dansmithclarkb: a while ago we discussed per-project throttling such that 100-patch series in nova didn't swap single-patch glance reviews from getting timely results15:50
fungiyeah, zuul has been doing that for a while now15:51
dansmithis that (a) still happening and (b) do the long serialized jobs defeat that because they use a lot of nodes and run a long time?15:51
fungiyes, a change which runs 20 3-hour multinode jobs gets weighted the same as a change which runs a docs build, from a "fair queuing" perspective15:51
dansmithokay15:52
dansmithand is the fairness across the project level or the git dep chain?15:52
dansmithI ask because I can't tell (by the seat of my pants) that my single-patch glance reviews go any quicker than my nova ones, when there's literally nothing in the queue for glance15:53
fungiit's per project queue, so in check that's basically at the project level, in gate it's at the dependent queue level (but you rarely observe that because the gate pipeline gets top priority anyway)15:53
dansmithack, okay15:53
dansmithso these long wide heavy jobs must be putting both glance and nova patches so deep into the "not even considered yet" queue that I can't tell15:54
fungiand it doesn't necessarily affect how fast the jobs run, it's just about prioritizing node requests, so if there's a backlog of node requests the projects with fewer changes get their node requests filled sooner15:54
dansmithaye15:55
*** ociuhandu has joined #openstack-infra15:55
fungifrom a "fairness" perspective it's far from perfect, but it's the best mechanism we were able to fit to the available data model and control points in the system15:57
clarkbdansmith: http://paste.openstack.org/show/jD6kAP9tHk7PZr2nhv8h/ the aggregation there uses openstack/governance/reference/projects.yaml to decide what is tripleo and neutron and so on15:57
dansmithdear $deity15:57
dansmithtripleo and neutron together use over 50%?15:57
clarkbthat shows things in openstack goverance consumed 95.5% of used cpu time. 30% of the total was tripleo jobs. 22% neutron and so on15:57
clarkbyes15:57
clarkbnote neutron runs tripleo jobs too15:58
dansmithyeah15:58
slaweqclarkb: we just discussed in our ci meeting to move some of those jobs to periodic queue15:58
slaweqI will propose patch in few minutes15:58
dansmithslaweq: ++15:58
dansmithslaweq: note that nova is 5% on that chart :)15:59
clarkball openstack-tox-py36 jobs used about 1% of consumed resources15:59
dansmithclarkb: is there number-of-reviews data in that paste that I'm missing?15:59
clarkbso if we say the "lightweight" jobs are maybe 5% total you can optimize that but the dent is tiny15:59
clarkbdansmith: no you need to grab that from gerrit's api15:59
clarkbthe date range is in my report and it breaks it down by repo too16:00
clarkbwhich should be enough to ask gerrit for data (I think fungi may even have a script that does taht bit?)16:00
*** ykarel is now known as ykarel|away16:00
*** ociuhandu has quit IRC16:01
*** diablo_rojo has joined #openstack-infra16:01
fungihttps://review.opendev.org/729293 aggregates by git namespace so all of openstack/ gets lumped together, but you could tweak the aggregation (it shards the queries by repo already for better pagination stability)16:01
*** sshnaidm|afk is now known as sshnaidm|ruck16:03
*** amoralej is now known as amoralej|off16:11
*** ociuhandu has joined #openstack-infra16:11
*** ykarel|away has quit IRC16:17
*** yamamoto has joined #openstack-infra16:26
*** yamamoto has quit IRC16:34
*** armax has joined #openstack-infra16:35
*** derekh has quit IRC16:43
*** lbragstad_ has joined #openstack-infra16:46
*** lpetrut_ has quit IRC16:48
*** slaweq has quit IRC16:48
*** rlandy_ has joined #openstack-infra16:48
*** slaweq has joined #openstack-infra16:49
*** jamesdenton has quit IRC16:49
*** gryf has quit IRC16:49
*** lbragstad has quit IRC16:49
*** rlandy has quit IRC16:49
*** jamesdenton has joined #openstack-infra16:49
*** rlandy_ is now known as rlandy16:50
*** gryf has joined #openstack-infra16:50
*** lbragstad_ is now known as lbragstad16:58
zbri do believe that we could improve the developer experience if we can find a way to priorietize low-resource jobs.16:59
clarkbzuul does already support it, developers can opt into it by modifying their job pipeline graphs17:00
zbrso far we used queues, but these are more of by project.17:00
clarkbI dno't think it will be helpful, but the tool allows it and some are trying it aiui17:00
*** lucasagomes has quit IRC17:01
*** jamesmcarthur has quit IRC17:03
*** jamesmcarthur has joined #openstack-infra17:03
*** jcapitao has quit IRC17:05
fungiprioritizing low-resource jobs wouldn't necessarily get you results any sooner, unless those were all you were running17:05
dansmithfungi: well, that's why I was asking about two batches.. if you're relying on zuul to run pep8 for you then it's not going to help, but if you rely on it to run and find problems with python versions you don't have, then maybe17:08
dansmithfungi: there's also some locality of review, where I'd +2 something I saw pass functional tests and let the gate sort out merging based on whether devstack jobs worked,17:09
dansmithbut otherwise, I'll pretty much wait until I see the results, which right now is often "not today"17:09
dansmithmy queuing isn't as good as zuul, which means it might be "not until $owner pings me again"17:10
zbrwhile it is possible for each project to optimize how jobs are triggered (dependencies and fail-fast), there is very little incentive for them to do it mainly because that means "slow yourself down and spend extra effort doing it, for the greater good".17:10
dansmithnot the worst thing, but the whole point of this is to make machines improve life for humans :)17:10
clarkbdansmith: yes but at the same time the machiens have a limited set of resources (which seems to only shrink)17:11
dansmithzbr: that's certainly true, but it sounds like some projects obsess over their job optimization more than others, which makes some of us angry :)17:11
*** sshnaidm|ruck is now known as sshnaidm|afk17:11
fungiand an ever shrinking number of people managing them and developing the systems which run on top of them17:11
dansmithclarkb: I assure you I have a limited set of resources17:11
clarkbyes me too17:11
clarkbbut I keep getting asked to work miracles :)17:11
clarkbreality is this problem has existed for years17:12
dansmithare you referring to me?17:12
clarkbI've called it out for years17:12
clarkband no one has really cared until it all melts down and then its too late17:12
zbrif we look at the problem from the CI system point of view, where you want to optimize resource usage and maximize how fast jobs are tests in general, you may want to promote good-players (low resource users).17:12
clarkbdansmith: not just you, but it seems the demands on this team are higher than ever and we're smaller than ever17:12
clarkbzbr: low resources users aren't necessarily "good-players"17:12
*** _erlon_ has quit IRC17:13
clarkbit could be that low resource users allow more bugs in which caues more gate resets in the long run17:13
dansmithclarkb: I'm sorry, I think I've asked you only for help understanding so far in 2021.. if that's asking too much then I'll go away17:13
clarkbdansmith: its not asking too much, its just really difficult to hear a lot of suggestions when we have been asking for help for years and we get the opposite. And that isn't to say you are the problem. Its systemic in the community17:14
clarkbthat script was originally written because the TC and otehrs kept accusing new small projects for the queue backups when in reality it was openstack itself (and a small numebr of resource hogs)17:15
clarkband the timestamp on that file is ~201817:15
clarkbI'm just trying to keep the lights on most days anymore17:16
*** ociuhandu_ has joined #openstack-infra17:16
clarkbbeing able to add features to zuul (or even fix bugs in zuul) seems like a luxury17:16
clarkbanother related issue is the swapping in devstack jobs17:17
dansmithclarkb: sorry, I'm missing something.. I can't help with things like cloud quota or giving you warm bodies, all I can help with is either trying to understand, brainstorming other technical improvements, or trying to convince people to tweak/shrink their jobs17:17
*** zbr3 has joined #openstack-infra17:18
clarkbdansmith: yes, I think the ball has been in openstack's court for fixing these queue issues for several years now17:18
dansmithclarkb: if you've interpreted the brainstorming as zuul feature demands, then I'm really sorry and clearly communicated poorly17:18
clarkbopenstack runs a number of large innefficiencies in its CI jobs. Devstack being central to a number of them. For example you can cut devstack spin up time by around at least a half simply by not using osc and writing a python script to do the keystone setup (because osc startup time is bad and it doesn't cache tokens)17:19
fungito clarify though, we've literally deployed all this with open source software and configured with code reviewed continuously deployed configuration management, much of which is self-testing now, so the things which require privileged access to systems isn't that much17:19
*** ociuhandu has quit IRC17:19
clarkbDevstack also swaps in many of its jobs which create performance issues as well17:19
clarkbEr as well as stability issues17:19
*** zbr has quit IRC17:20
*** zbr3 is now known as zbr17:20
clarkbtuning the devstack jobs to not swap or even better improving openstack's memory consumption in its services would go a long way for making the jobs run quicker and also be more reliable17:20
*** ociuhandu_ has quit IRC17:20
clarkbthe tripleo side of things is harder for me to characterize because it changes often and uses tools I'm less familar with, but I expect there are similar improvements that can be made there17:20
fungiputting openstack on a diet and revisiting devstack's and tripleo's frameworks with an eye toward efficiency would certainly have a huge impact compared to messing around with reordering jobs or trying to cram two lightweight jobs into one17:21
clarkbI did a poc for the osc replacement in devstack but was told it was too complicated for uesrs17:21
clarkbso instead we spend about 10-15 minutes per devsatck job running osc instead of like 20 seconds for a python script17:21
fungiwell, also the qa team didn't like that it wasn't using all separate openstackclient commands17:22
fungidaemon mode osc would have also probably had similar performance impact, but that never got completed17:22
clarkb(and again I don't think its any one person's fault, but it seems there are systemic issues that specifically oppose solving these problems on the job end and instead we tend to prefer pushing that to the hosting providers)17:24
clarkbbut we've largely run out of our ability to scale up the hosting provider17:24
zbrthere is one aspect that affects our performance: number of jobs X random-failure rate. Projects with lots of jobs are far more likely to fail at agate, is just statistics.17:24
zbrAssuming a 2% random failure rate, if you have 15 jobs this translates to ~26% change of failing.17:27
zbrsadly nobody was able to count the real number of random failures, but i guess that we could compute it based on "successful rechecks".17:29
clarkbzbr: yes, that coupled with gate states being dependent on their parents is what makes gate resets so painful17:29
clarkbbut also "random failures" tend to be pretty low in historical tracking we've done17:29
clarkba significant portion of failures represent actual bugs somewhere17:30
clarkbgratned those may be in places we don't have any hoep of fixing (like nested virt crashing due to aprticular combos of kernels in a provider or provider reusing an ip address improperly)17:30
*** ociuhandu has joined #openstack-infra17:31
*** jamesmcarthur has quit IRC17:31
*** jamesmcarthur has joined #openstack-infra17:32
clarkbdansmith: interpreting those things as zuul feature deamnds is likely my personal bias because it seems any time I push on improving the job side the response is no we need to change $zuul thing. I'll try to view these issues with less of that bias17:35
*** ociuhandu has quit IRC17:35
dansmithclarkb: sorry man, really (really) just trying to come up with ideas17:36
dansmithI just fixed an OOM in tempest (yes actually tempest) the other day, trying to chase down stability things to make things better17:37
*** jamesmcarthur has quit IRC17:37
dansmithbeen messing around with something in devstack today to address osc latency17:37
*** rlandy is now known as rlandy|brb17:37
dansmithI doubt I could really make complex changes to zuul in a reasonable amount of time,17:38
*** jamesmcarthur has joined #openstack-infra17:38
dansmithbut in a lot of cases, I don't know what I don't know (like if we're still fair queuing across projects) so I was just asking17:38
dansmithfwiw, I too feel like the cadre of people that care about the infra are all gone17:39
dansmithso it's hard to continue to care instead of just making sure my shit is tight with local testing17:39
* fungi is still here ;)17:40
*** hamalq has joined #openstack-infra17:40
dansmithfungi: yeah I mean people on projects who care to spend time working on non-project infra, common infra, or understanding infra issues to make changes in their projects17:40
fungibut yeah, we've lost sdague, matt, second matt... :(17:40
dansmithfungi: really glad you're still here tho :)17:41
dansmithright, they were always better than me anyway17:41
fungimelwitt has been doing great stuff lately in that vein17:41
*** jamesmcarthur has quit IRC17:43
*** gfidente is now known as gfidente|afk17:43
clarkbdansmith: re the osc thing, is that via improving osc startup time and or token reuse? Those seemed to be the big reasons why osc was slow when I looked in the past, but both were somewhat complicated to address. Startup time because python entrypoint libs and tokens due to security concerns17:44
dansmithclarkb: well, neither and more crazy.. trying to just make devstack less single-threaded,17:45
dansmithbut maybe that'll make too much memory pressure17:45
clarkbdansmith: oh interesting17:45
fungii get the impression the memory pressure in those jobs is more in the tempest phase, so devstack setup may benefit from greater parallelism17:46
*** d34dh0r53 has quit IRC17:47
clarkbfungi: yes I think that is the case. Basically it is the use of the cloud that balloons the memory use17:47
dansmithack17:47
dansmithso I was toying with being able to start named jobs async, and then say "okay if you get to here make sure $future is done"17:48
dansmithparallelizing the init_project parts for example17:48
dansmithand also the creation of service accounts as another quick example which seems to take EFFING MINUTES17:48
fungiservice accounts at the system level? like with adduser command or whatever?17:49
dansmithno keystone service accounts17:49
fungioh, okay. i wonder how many osc calls that's implemented with17:49
dansmithyeah, it's a lot of osc overhead, but it also seems like some keystone slowness I dunno why17:50
*** d34dh0r53 has joined #openstack-infra17:50
dansmithI also wonder if we couldn't wrap osc shell mode and delegate commands that we run to it17:50
dansmithlike I wonder if that would offend anyone, if I could make it work17:50
dansmithten minutes of osc overhead sounds pretty juicy to me17:50
clarkbdansmith: what my poc did was replace osc for service accounts and catalog bits with a script that used the sdk. That script was then able to cache the token for many requests and have a single startup time17:51
clarkbits been a while but my maths were something like 7 minutes just for keystone setup then a few minutes of other things like create this network and that flavor and so on17:51
dansmithyeah, the keystone stuff is stupid slow17:52
dansmithI'm also parallelizing things like neutron setup (db creation, etc) with things like swift and glance and placement which should be mostly isolated I think17:53
dansmithbut the iops may not work out in a cloud worker such that there's benefit17:53
zbrdo we have the meeting in one hour?17:56
clarkbyes17:56
clarkb(I sent out an agenda to the list yesterday too if you're curious to see what is on it)17:57
* zbr goes out for a while, aiming to return in one hour.17:58
*** eolivare_ has quit IRC18:02
*** jamesmcarthur has joined #openstack-infra18:02
*** gyee has joined #openstack-infra18:16
*** jpena is now known as jpena|off18:18
*** rlandy|brb is now known as rlandy18:20
*** bdodd has quit IRC18:23
*** dtantsur is now known as dtantsur|afk18:23
*** ricolin has quit IRC18:34
*** hasharKids has quit IRC18:34
*** rpittau is now known as rpittau|afk18:41
*** jamesmcarthur has quit IRC18:59
*** jamesmcarthur has joined #openstack-infra18:59
gmannmnaser: fungi clarkb these project-config changes lgtm and quick to review- https://review.opendev.org/c/openstack/project-config/+/771443  https://review.opendev.org/c/openstack/project-config/+/771392 https://review.opendev.org/c/openstack/project-config/+/771066 https://review.opendev.org/c/openstack/project-config/+/77053819:14
*** nightmare_unreal has quit IRC19:14
fungithanks gmann! i guess you're watching the conversation in the opendev meeting19:14
fungiwe were just talking about that right now19:15
gmannah did not see that.19:15
gmannnice19:15
fungiyeah, that's the current topic in the meeting, looking for volunteers for config reviewing19:16
gmannI was checking in  #opendev19:18
fungiheh, yeah sorry we have a separate meeting channel but you've found it19:22
fungiwe use that for weekly meetings but also scheduled maintenance activities and incident management19:22
*** lifeless has quit IRC19:27
*** lifeless has joined #openstack-infra19:27
*** andrewbonney has quit IRC19:42
*** slaweq has quit IRC19:43
*** ajitha has quit IRC20:01
*** Jeffrey4l has quit IRC20:04
*** openstackgerrit has quit IRC20:12
*** Jeffrey4l has joined #openstack-infra20:13
*** zbr5 has joined #openstack-infra20:14
*** zbr has quit IRC20:16
*** zbr5 is now known as zbr20:16
*** bdodd has joined #openstack-infra20:29
*** yamamoto has joined #openstack-infra20:32
*** stevebaker has quit IRC20:35
*** yamamoto has quit IRC20:36
*** vishalmanchanda has quit IRC20:39
*** Jeffrey4l has quit IRC20:50
*** Jeffrey4l has joined #openstack-infra20:51
*** stevebaker has joined #openstack-infra21:03
*** ociuhandu has joined #openstack-infra21:07
*** harlowja has joined #openstack-infra21:14
*** jamesmcarthur has quit IRC21:17
*** jamesmcarthur has joined #openstack-infra21:19
*** sboyron_ has quit IRC21:26
*** priteau has quit IRC21:35
*** jamesmcarthur has quit IRC21:42
*** xek has quit IRC21:44
*** jamesmcarthur has joined #openstack-infra21:46
*** arne_wiebalck has quit IRC21:49
*** arne_wiebalck has joined #openstack-infra21:51
*** matt_kosut has quit IRC22:01
*** matt_kosut has joined #openstack-infra22:02
*** matt_kosut has quit IRC22:07
*** rcernin has joined #openstack-infra22:09
*** yamamoto has joined #openstack-infra22:10
*** jamesmcarthur has quit IRC22:16
*** jamesmcarthur has joined #openstack-infra22:23
*** iurygregory has quit IRC22:28
*** jamesmcarthur has quit IRC22:30
*** jamesmcarthur has joined #openstack-infra22:33
*** iurygregory has joined #openstack-infra22:37
*** ociuhandu has quit IRC22:47
*** ociuhandu has joined #openstack-infra22:47
*** openstackgerrit has joined #openstack-infra22:50
openstackgerritMerged openstack/project-config master: cursive: prepare to move the jobs in-tree  https://review.opendev.org/c/openstack/project-config/+/77144322:50
openstackgerritMerged openstack/project-config master: Adding irc notification for missing oslo projects  https://review.opendev.org/c/openstack/project-config/+/77139222:50
openstackgerritMerged openstack/project-config master: Combine acl file for all interop source code repo  https://review.opendev.org/c/openstack/project-config/+/77106622:50
openstackgerritMerged openstack/project-config master: Move snaps ACL to x  https://review.opendev.org/c/openstack/project-config/+/77053822:50
gagehugoIs review.opendev.org sign-in now switched to openid?22:54
clarkbgagehugo: its always been openid as far as I know22:54
gagehugothe login page changed, was just wondering22:55
*** thogarre has joined #openstack-infra22:55
clarkbhrm that shouldn't have changed22:55
clarkbitshould take you to the ubuntu one openid login page22:55
clarkboh except I think I discovered a bug where you can't hit the login button from the diff viewer as the redirects don't work from there?22:56
fungiwhen you click sign init should take you to https://login.ubuntu.com/ yeah22:56
gagehugohttps://usercontent.irccloud-cdn.com/file/dBmWTQfP/image.png22:56
fungihuh, that's the page we've usually seen if login.ubuntu.com is down for some reason22:56
gagehugoah ok22:56
clarkbfwiw it just worked for me22:56
fungisame here22:56
clarkbso maybe a blip on the remote side22:56
gagehugosomeone from our team was having that issue as well so I figured I'd check, thanks!22:58
*** snapiri has quit IRC22:58
*** openstackgerrit has quit IRC22:59
clarkbgagehugo: if it persists I would double check dns resolution and firewall access for login.ubuntu.com23:02
gagehugook23:02
*** snapiri has joined #openstack-infra23:03
*** snapiri has quit IRC23:08
*** CrayZee has joined #openstack-infra23:08
*** matt_kosut has joined #openstack-infra23:17
*** jamesmcarthur has quit IRC23:25
*** matt_kosut has quit IRC23:27
fungiyeah, maybe access is being blocked or something23:29
fungior i suppose it could be a new browser security feature, blocking refresh-redirect to another domain?23:30
fungisomething or other breaking openid workflow23:30
*** dchen has joined #openstack-infra23:31
*** jamesmcarthur has joined #openstack-infra23:43
*** thogarre has quit IRC23:52
*** ociuhandu has quit IRC23:58
*** ociuhandu has joined #openstack-infra23:58

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!