Thursday, 2021-02-11

openstackgerritMerged opendev/system-config master: openafs-<db|file>-server: fix role name  https://review.opendev.org/c/opendev/system-config/+/77476100:00
openstackgerritJeremy Stanley proposed opendev/system-config master: Upgrade pip in our apply tests  https://review.opendev.org/c/opendev/system-config/+/77504800:05
fungiwho knew a simple change to switch the irc channel for ptgbot would turn into an all-day rabbit hole of test bitrot? and i'm still not sure i've hit bedrock yet00:06
* ianw actually probably did know :)00:12
ianwit's always like that00:12
clarkblooking at devstack dstat stuff not sure its super reconsumable directly, but this gives good clues as to how to redo it in a bit of ansible00:15
clarkbI'm giving that a go00:15
diablo_rojofungi, stepping away for a little bit, but will get the etherpad updated this evening. I got the events stuff all done (aside frm #openstack-summit). And the diversity channel should all be done as well. Will get the board stuff done and then circle back to the foundation channel.00:15
fungidiablo_rojo: awesome!00:15
fungii'll still be around if you need any help with it00:16
diablo_rojoSounds good :)00:16
*** tosky has quit IRC00:17
ianwok, i changed a config variable and https://refstack01.openstack.org/#/community_results seems to be working00:21
openstackgerritClark Boylan proposed opendev/system-config master: Use dstat to record performance of gitea management  https://review.opendev.org/c/opendev/system-config/+/77505100:33
clarkbianw: ^ something like that what you had in mind?00:33
clarkbI didn't actually test that unit locally though I should've I guess00:34
ianwlooks about right, should be self-testing00:36
fungiand, shockingly, the tools/prep-apply.sh fix is failing system-config-legacy-logstash-filters00:36
* fungi climbs in even deeper00:36
clarkbfungi: Those filters have not changed recnetly. We can probably turn off that job or make it non voting00:37
clarkbthen if they need to change fix the testing then00:37
fungiwe'll see how simple it is to unwind00:37
clarkbianw: minor nit on https://review.opendev.org/c/opendev/system-config/+/774753 but testing seems to show it isn't a problem so I +2'd00:40
fungiactually, looks like upgrading pip wasn't actually the problem, but the patch to upgrade it is self-testing the same error with cryptography00:43
fungihttps://zuul.opendev.org/t/openstack/build/7edfdec54dc0405783facfdc76c9e4e500:43
openstackgerritIan Wienand proposed opendev/system-config master: borg-backup-server: run a weekly backup verification  https://review.opendev.org/c/opendev/system-config/+/77475300:43
fungii'm not able to make heads or tails of the error there, i got fixated earlier on the other exception hit while raising the actual one00:44
clarkbfungi: I'll take a look once I rereview ^00:44
openstackgerritIan Wienand proposed opendev/system-config master: refstack: use external https for API  https://review.opendev.org/c/opendev/system-config/+/77505300:46
fungiooh, now i'm thinking it's actually the setuptools pin causing the problem, and we're trying to use a version of pip which is too new to work with it00:46
clarkbfungi: ya I think we can drop that pin since they fixed things?00:46
clarkbI feel like I had a change for that osmewhere too00:47
clarkblet me see00:47
fungihttps://review.opendev.org/74976600:47
fungii just rechecked it00:47
ianwfungi: would this all be better not running on xenial host?00:47
clarkbianw: it would be but the puppets only run on xenial now00:48
fungiianw: maybe, though right now there's still puppeted stuff on eavesdrop00:48
ianwahh, yeah00:48
clarkbfungi: note that that change may need to be squashed into yours? I guess we'll find out00:48
fungiclarkb: yeah, that's why the rechecking00:49
fungii think we don't actually need mine00:49
fungithe error i was trying to fix by upgrading pip actually stems from too new pip for the setuptools we pinned, if i'm reading correctly00:49
fungithe script seemed to have already installed the latest pip available for python 3.5 anyway00:50
clarkbgot it01:01
*** dmellado has quit IRC01:03
fungithough also it wants to build cryptography from sdist, so once we clear this hurdle i expect more01:03
clarkbya that is why I assume it isn't respecting the python requires metadata which implies old pip01:04
clarkbsince its 3.6 only now too right?01:04
* clarkb checks01:04
fungiwell, it seems to be using pip 2001:04
fungiit's pulling a slightly older cryptography i think01:04
clarkbcryptography 3.2.1 is what it should pull for python3.501:05
clarkb3.4.4 is latest01:05
fungipip 20.3.4 and cryptography 3.4.4 yeah01:06
clarkband the metadata appears to be in pypi for that01:06
fungiyep, "Requires: Python >=3.6" at https://pypi.org/project/cryptography/ too01:07
*** mlavalle has quit IRC01:10
*** dmellado has joined #opendev01:12
openstackgerritJeremy Stanley proposed opendev/puppet-pip master: Pin get-pip.py to last Python 3.5 version  https://review.opendev.org/c/opendev/puppet-pip/+/77490001:18
openstackgerritMerged opendev/system-config master: refstack: fix typo in role matcher  https://review.opendev.org/c/opendev/system-config/+/77504401:31
openstackgerritMerged opendev/system-config master: refstack: capture container logs to disk  https://review.opendev.org/c/opendev/system-config/+/77504601:33
openstackgerritMerged opendev/system-config master: Revert "Install older setuptools in puppet apply jobs"  https://review.opendev.org/c/opendev/system-config/+/74976601:34
fungiokay, can anyone else spot the error on this failure? https://zuul.opendev.org/t/openstack/build/100ea04cd6194669a84c8f16c5774e9101:53
fungioh, maybe it's the two warnings about "class included by absolute name"01:54
openstackgerritJeremy Stanley proposed opendev/puppet-pip master: Pin get-pip.py to last Python 3.5 version  https://review.opendev.org/c/opendev/puppet-pip/+/77490001:56
ianwyeah i think so https://zuul.opendev.org/t/openstack/build/100ea04cd6194669a84c8f16c5774e91/log/job-output.txt#124401:58
ianwclarkb / Alex_Gaynor : kevinz reports the cloud is back up, some sort of rabbitmq issue02:02
kevinzianw: yes, rabbitmq partition02:02
fungithanks kevinz! sorry to bother you during holidays02:03
kevinzfungi: Np,  I do not go anywhere,  so just play with the cloud for fun :-)02:04
openstackgerritIan Wienand proposed opendev/system-config master: Refactor AFS groups  https://review.opendev.org/c/opendev/system-config/+/77505702:25
openstackgerritIan Wienand proposed opendev/system-config master: Use dstat to record performance of gitea management  https://review.opendev.org/c/opendev/system-config/+/77505102:30
openstackgerritIan Wienand proposed opendev/system-config master: Refactor AFS groups  https://review.opendev.org/c/opendev/system-config/+/77505702:35
*** dmellado has quit IRC02:36
openstackgerritMerged opendev/puppet-pip master: Pin get-pip.py to last Python 3.5 version  https://review.opendev.org/c/opendev/puppet-pip/+/77490002:49
funginearly there02:50
openstackgerritIan Wienand proposed opendev/system-config master: refstack: add backup  https://review.opendev.org/c/opendev/system-config/+/77506102:57
*** dmellado has joined #opendev03:02
openstackgerritMerged opendev/system-config master: PTGBot is now openinfraptg on #openinfra-events  https://review.opendev.org/c/opendev/system-config/+/77486203:44
*** dviroel has quit IRC04:28
*** ykarel|away has joined #opendev04:40
*** ykarel|away is now known as ykarel04:41
*** ysandeep|away is now known as ysandeep|rover05:09
openstackgerritIan Wienand proposed opendev/system-config master: borg-backup: fix backup script failure match  https://review.opendev.org/c/opendev/system-config/+/77506805:40
openstackgerritMerged opendev/system-config master: borg-backup-server: run a weekly backup verification  https://review.opendev.org/c/opendev/system-config/+/77475305:53
*** whoami-rajat__ has joined #opendev06:03
*** marios has joined #opendev06:05
*** ralonsoh has joined #opendev06:49
*** diablo_rojo has quit IRC06:55
*** CeeMac has joined #opendev07:07
*** sboyron_ has joined #opendev07:10
*** eolivare has joined #opendev07:39
*** slaweq has joined #opendev07:44
*** ykarel has quit IRC07:44
*** ykarel has joined #opendev07:46
*** hashar has joined #opendev07:52
*** rpittau|afk is now known as rpittau07:53
*** ysandeep|rover is now known as ysandeep|lunch08:01
*** fressi has joined #opendev08:03
*** DSpider has joined #opendev08:08
*** andrewbonney has joined #opendev08:15
*** jaicaa has quit IRC08:27
*** jpena|off is now known as jpena08:29
*** jaicaa has joined #opendev08:30
*** hemanth_n has joined #opendev08:31
*** tosky has joined #opendev08:36
*** ysandeep|lunch is now known as ysandeep|rover08:43
*** swest has joined #opendev09:19
*** felixedel has joined #opendev09:32
felixedelHi all, we are currently working on a larger feature in Zuul for which we currently have ~60 staging patches in Gerrit. During development we quite often have to update/rebase the whole stack of changes. When pushing this stack to Gerrit, it takes quite some time to process them (2-3 minutes at least) and often fails with the following error http://paste.openstack.org/show/802556/09:33
felixedelAlthough the changes seem to be all there, I'm not sure if this error has an impact on the consistency of those changes. Could you maybe have a look on this error?09:33
felixedelThe latest update of those changes can be found here: https://review.opendev.org/c/zuul/zuul/+/774610/309:33
*** ykarel is now known as ykarel|lunch09:33
*** ralonsoh has quit IRC09:34
*** ralonsoh has joined #opendev09:34
*** dtantsur|afk is now known as dtantsur10:00
*** hashar has quit IRC10:07
*** hashar has joined #opendev10:08
priteauGood morning. Is there a known issue with Zuul? I can see some patches in the gate queue have completed all their jobs but they are stuck there10:17
yoctozeptoinfra-root: it looks as if zuul (or gerrit) hung? jobs seem to never get off the queue10:17
yoctozeptopriteau: lol, what a sync10:17
cgoncalvesas of now: "Queue lengths: 4674 events, 0 management events, 445 results."10:18
priteauGet out of my mind yoctozepto!10:19
yoctozeptopriteau: great men* think alike :-)10:21
priteauSome jobs have left the gate now but it seems much slower than usual10:21
yoctozepto* and women, I think the saying was meant to be genderless though10:22
yoctozeptoyes, I confirm :-)10:22
fricklerseems like zuul is having a hard time getting a stack of 60 patches being submitted all at once handled through all its queues10:25
fricklerfelixedel: I suggest you submit your stack in batches of maybe 10 patches at a time, seems getting 60 patches at once does overload zuul for quite some time10:26
cgoncalvesfrickler, felixedel has mentioned (scroll a few lines up) he was submitting a stack of 60 patches at once10:26
yoctozeptough10:30
yoctozepto:D10:30
fricklerthere also seem to be java exceptions in gerrit related to submitting this stack. maybe also discuss with #zuul folks whether that stack could be submitted in parts in order to reduce the amount of rebasing that is needed10:33
swestfrickler: felixedel and I discussed this with corvus yesterday and agreed to continue pushing the stack for the time being and report issues here :D10:36
*** ykarel|lunch is now known as ykarel10:36
fricklerswest: hmm, o.k., then please wait for feedback from corvus before submitting any new stacks10:44
swestyea, we'll avoid pushing the whole stack at once the next time10:44
swestsorry, about the DoS :(10:45
fricklerwell, exploring the limits of our infrastructure generally isn't a bad thing I'd say, just avoid doing it too often ;)10:46
tobiashyeah, now we know that's a problem and logs for that so I hope that's solvable10:52
tobiashlooks like gerrit still has some performance regressions compared to the old version10:52
*** dviroel has joined #opendev12:01
*** bwensley has joined #opendev12:23
*** jpena is now known as jpena|lunch12:33
*** hemanth_n has quit IRC12:47
*** mgagne has quit IRC12:52
*** mgagne has joined #opendev12:53
yoctozeptounfortunately that is true :-(13:09
fungiwell, we went into it knowing gerrit would have worse performance with the new notedb backend than the old sql backend, but new releases of gerrit haven't supported using sql for years13:11
fungiwe held out quite some time while the gerrit developers improved notedb performance, but there's only so much you can do when you want quick access to data stored in git repos13:12
yoctozeptowell, it also boils down to 'how much worse' :D13:12
fungiyeah, and in this case i expect, but haven't looked yet, that the java exceptions frickler saw in the log were related to contention over write locks13:14
fungisomething a traditional rdbms is engineered to handle gracefully13:14
bwensleyHey everyone - just wondering if there is any progress on the missing launchpad updates when reviews are created or merged. I see it mentioned here: https://etherpad.opendev.org/p/gerrit-3.2-post-upgrade-notes13:14
bwensleyIs there an LP or something tracking that issue?13:15
bwensleyIt's pretty painful for everyone to have to remember to update each LP. Easy to forget and then it becomes hard to track fixes.13:15
fungibwensley: no, just the etherpad. someone needs to find time to write a replacement. i had hoped to but other fires keep errupting13:16
fungiif some of the people it's painful for would join the opendev sysadmins in running these services things might happen more quickly13:16
fungiprobably the most sustainable replacement for launchpad integration would be a zuul job which talks to lp's api13:18
fungifelixedel: corvus: aha, actually the earliest unexpected error i see appears to be a worker timeout13:22
fungi[2021-02-11T09:17:40.772+0000] [SSH git-receive-pack /zuul/zuul.git (REDACTED)] WARN  com.google.gerrit.server.git.MultiProgressMonitor : MultiProgressMonitor worker killed after 240108ms: (timeout 108ms, cancelled)13:23
fungi[2021-02-11T09:17:40.773+0000] [SSH git-receive-pack /zuul/zuul.git (REDACTED)] WARN  com.google.gerrit.server.git.MultiProgressMonitor : unable to finish processing13:23
fungijava.util.concurrent.CancellationException13:23
*** dmellado has quit IRC13:25
*** dmellado has joined #opendev13:25
fungianother of the same at 10:06:11.94813:26
*** jpena|lunch is now known as jpena13:26
bwensleyfungi: Thanks for the update. What skills would be required to write the new zuul job you propose?13:26
bwensleyAnd is this something that is a few days of effort or bigger than that?13:27
fungibwensley: familiarity with writing zuul jobs, and reading how to interact with the gerrit and launchpad rest apis, then doing some local testing of the resulting ansible13:27
fungifelixedel: corvus: and then at 10:20:51.036 an apparently new exception starts to appear for those, "com.google.gerrit.exceptions.StorageException: interrupted"13:28
fungilots and lots of those13:29
fungibwensley: probably a few days effort. it would basically need to functionally replace our old implementation from https://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/cmd/update_bug.py and the update_blueprint.py in the same directory13:31
fungithose were called with gerrit hook scripts, but as a result they're not particularly transparent since they ran locally on the gerrit server and logged to gerrit's error log13:32
fungiand the old gerrit database access mechanism they relied on is no longer a possibility in the version of gerrit we're running now13:32
*** stand has quit IRC13:54
*** ysandeep|rover is now known as ysandeep|dinner14:20
*** ykarel_ has joined #opendev14:26
*** ykarel has quit IRC14:29
*** fressi has left #opendev14:35
*** ykarel_ has quit IRC14:49
*** hashar is now known as hasharAway14:51
*** dmellado has quit IRC14:53
*** dmellado has joined #opendev14:54
openstackgerritMerged opendev/system-config master: Build Gerrit 3.3 images  https://review.opendev.org/c/opendev/system-config/+/76502114:55
*** ysandeep|dinner is now known as ysandeep|rover15:04
slittle1Can anyone help us understand why zuul objected to  https://review.opendev.org/c/starlingx/integ/+/775056  ?15:09
slittle1we'll try a recheck.  See if it happens again15:10
*** dmellado has quit IRC15:10
*** dmellado has joined #opendev15:12
fungislittle1: that looks similar to a problem i just started investigating15:13
*** hasharAway is now known as hashar15:22
fungiso far i haven't found a correlation, but i'm also double-booked in meetings currently and for the next couple of hours15:24
fungilooks like that one ran from our ze03 executor15:29
fungidebug log says it got a socket timeout talking to storage.bhs.cloud.ovh.net15:30
fungithe variety of network errors we're seeing in different failures and the spread across different executors makes me wonder if there are local network problems in rackspace's dfw region (where all the executors are running)15:32
*** weshay|ruck has joined #opendev15:33
weshay|ruck0/15:33
fungihttps://rackspace.service-now.com/system_status is where i thought they posted their systems status info, but all i get there is a blank page15:35
slittle1Can zuul jobs be redirected to other executors?  other regions?15:36
fungiit looks like i can get to the https port on telnet storage.bhs.cloud.ovh.net from home, but not from one of our executors15:37
fungistorage.gra.cloud.ovh.net is still reachable for them though15:37
fungiso it's likely a routing problem close to rackspace or in a backbone provider i'm not traversing15:37
fungii can also get to storage.bhs.cloud.ovh.net just fine from another rackspace region (iad)15:38
fungiyeah, looks like nothing i've tried in rackspace's dfw region can reach storage.bhs.cloud.ovh.net15:39
fungii'll push up a temporary change to yank it from the log upload pool15:39
clarkbfungi: is that via ipv4 v6 or both?15:40
fungiipv415:41
fungidoesn't look like they publish aaaa rrs for it15:41
openstackgerritJeremy Stanley proposed opendev/base-jobs master: Temporarily remove storage.bhs.cloud.ovh.net  https://review.opendev.org/c/opendev/base-jobs/+/77519315:43
clarkbfungi: does ovh gra1 have the same issue or just bhs1?15:44
fungiclarkb: if that ^ looks acceptable, i'll bypass zuul and merge it directly15:44
clarkboh the commit message says it was just bhs15:44
fungiclarkb: we can reach their gra swift from dfw, yeah15:44
clarkbfungi: +215:44
openstackgerritMerged opendev/base-jobs master: Temporarily remove storage.bhs.cloud.ovh.net  https://review.opendev.org/c/opendev/base-jobs/+/77519315:46
fungislittle1: weshay|ruck: ^ that should take effect immediately for any builds starting at this point forward15:46
weshay|ruckrock on.. thanks for jumping on this!15:47
fungithat still doesn't address the other examples weshay|ruck found earlier and mentioned in #openstack-infra but i think it's the bulk of the recent POST_FAILURE results15:47
weshay|ruckk15:47
fungithe builds running into this problem would have had no logs uploaded at all, not even zuul manifests15:48
fungi#status notice Recent POST_FAILURE results from Zuul for builds started prior to 15:47 UTC were due to network connectivity issues reaching one of our log storage providers, and can be safely rechecked15:49
openstackstatusfungi: sending notice15:49
-openstackstatus- NOTICE: Recent POST_FAILURE results from Zuul for builds started prior to 15:47 UTC were due to network connectivity issues reaching one of our log storage providers, and can be safely rechecked15:49
priteauWhat should we do when the failure was in a promote-openstack-releasenotes job?15:52
priteauIgnore and wait until another change updates release notes?15:52
clarkbpriteau: the failure should have happened at the very end of the jobs as they were related to log uploads15:52
clarkbpriteau: I would check if your release notes have published successfully by checking the published results directly15:52
clarkbits possible nothing needs to be done15:52
fungiright, the actual work performed by the job was probably done15:52
openstackstatusfungi: finished sending notice15:52
clarkbre pushing large stacks of changes, it does seem to be a workload that newer gerrit struggles with15:53
clarkbit may be worth starting a discussion upstream to see if they have thoughts on making that particular case run smoother, but in the short term i expect that the best bet is to keep stacks shorter and manageable15:54
fungiclarkb: though most of the user-facing impact was zuul trying to get through the resulting event pileup15:54
clarkbfungi: yup15:54
priteauI don't see the change I was expected, although the job ran two hours ago15:56
priteauNot a huge issue, I am sure we have more backports in the queue15:56
clarkblanding a followup change should rerun everything with those addtional updates as well15:57
clarkbpriteau: was the job a POST_FAILURE ? or just normal failure?15:57
fungihttps://zuul.opendev.org/t/openstack/builds?job_name=promote-openstack-releasenotes shows several recent POST_FAILURE results15:57
priteaupromote-openstack-releasenotes https://zuul.opendev.org/t/openstack/build/55f8c685f5fb4d6cbcf73b5a0acd74af : POST_FAILURE in 14m 30s15:57
fungii can probably go digging in executor debug logs for the ansible output for those after my upcoming meetings15:58
priteauOnly if you feel like it, I am sure you have better things to do :)15:59
funginot necessarily better, just more15:59
fungitraceroutes to there seem to die within rackspace's dfw border16:00
fungiso i suspect bug problems or something similar at their border16:01
fungior that just happens to be the differentiating point for an assymetric route with a problem elsewhere on the return path16:01
clarkbfungi: I'm seeing similar in the opposite direction and ipv6 seems affected too16:03
clarkb(host from bhs can't reach review)16:03
fungitracerouting the other direction stops at an ovh router in newark, new jersey16:03
fungiso within their border too16:04
fungii wonder if they peer with each other16:04
*** Alex_Gaynor has left #opendev16:04
*** ysandeep|rover is now known as ysandeep|out16:09
fungislittle1: weshay|ruck: here it is... http://travaux.ovh.net/?do=details&id=48997&16:16
fungilooks like ovh has a network peer in dallas16:17
fungior at least had ;)16:17
weshay|ruckk.. so.. they are working on it afaict16:18
weshay|ruckProvider operations continues on-site to restore service.16:18
weshay|ruckfungi thanks for the udpate16:18
fungithe internet never ceases to be a source of daily excitement16:18
fungii nearly typed excrement there16:18
clarkb"weather conditions in the area are extending resolution time" I feel that16:18
clarkbfungi: no doubt it has that too16:19
clarkblooking at the csv generated in that gitea change I can see that it results in some load, but not significant from what I can tell. However that maybe enough to tip over a busy system?16:22
*** mlavalle has joined #opendev16:24
clarkbdoing the description updates seems to produce a load average of 3.5 on the server, fairly consistent ~50% usr proc (is that 50% of 8vcpu or 50% of one?) and it runs stead with 1.3GB of memory consumed,16:25
clarkbI want to try cleaning up this dstat role as I think this is actually pretty useful insight16:25
clarkbthen maybe I'll see if there are some obvious ways to improve the description updates. Maybe newer gitea added a way to check existing descriptions or something16:26
openstackgerritJeremy Stanley proposed opendev/base-jobs master: Revert "Temporarily remove storage.bhs.cloud.ovh.net"  https://review.opendev.org/c/opendev/base-jobs/+/77520916:34
clarkbfungi: ^ is that in preparation or do we think it is happy now?16:35
funginot happy yet afaik, just pushing it so we don't forget16:35
fungicommit message and wip comment indicate preconditions for approving16:36
fungii do wonder if we should turn that region down in nodepool too, but if the api endpoint there is also unreachable then we're not starting new jobs there anyway16:39
clarkbyup I thought about that too and I think ^ is effectively going to idle it16:40
*** marios is now known as marios|out17:01
*** jdwidari has joined #opendev17:26
*** jdwidari has quit IRC17:30
openstackgerritIury Gregory Melo Ferreira proposed openstack/project-config master: Add Backport-Candidate label to Ironic projects  https://review.opendev.org/c/openstack/project-config/+/77524417:39
hasharhi opendev :)  We have a post failure for opendev/gear . It is apparently unable to publish a Docker image.  I guess due to lack of credentials?  https://review.opendev.org/c/opendev/gear/+/68844617:39
hasharI don't know where to report it so it can be acted asynchronously17:39
*** rpittau is now known as rpittau|afk17:43
*** dtantsur is now known as dtantsur|afk17:45
clarkbhashar: likely just needs credentials17:46
hasharclarkb: is there any  doc for that?17:46
hasharor maybe that is something that can only be setup by admins17:46
clarkbI think we do document it actually. Let me see17:46
clarkbbut yes I think also that someone with access to the secret needs to do it17:47
*** jpena is now known as jpena|off17:47
hasharthe job is based on opendev-upload-docker-image  if you are familiar with that17:47
clarkbhttps://docs.opendev.org/opendev/base-jobs/latest/docker-image.html#jobvar-opendev-upload-docker-image.docker_credentials17:48
clarkbhashar: specifically someone that has access to the docker hub org credentials needs to make the zuul secret17:48
clarkbin this case that would be one of the opendev admins17:49
hasharclarkb: should I file a bug about it somewhere?17:50
hasharor maybe an email to some list is sufficient17:50
fungii'm not able to pull up that build result actually17:50
hasharmaybe it has expired17:51
clarkbya emailing the service-discuss list asking for someone to find time to encrypt the secret and update the change is probably a good next step17:51
hasharwhen I looked at it I think it complained about a lack of credential17:51
clarkbmaybe mordred and corvus since they have previously reviewed it17:51
clarkbya the ttl on those is 30 days iirc and the job ran in december17:51
fungioh, yep, december 11. it would have just expired out of the object store today17:51
hasharbut the task output was hidden  in the ansible build output17:51
fungier, a month ago17:52
fungii'm terrible with calendars17:52
fungilooks like that job runs in the gate pipeline and is preventing the change from merging, so we can recheck to get fresh logs17:54
*** ralonsoh has quit IRC17:54
mordredit just needs to be updated to add credential - and be updated to the new state of the art with image jobs17:54
fungiso i should be able to reuse the same credential we have for that dockerhub namespace just reencrypt it for the opendev/gear key17:55
clarkbfungi: yup17:55
fungii'll see what i can do there17:56
hashar:-]17:56
hasharthank you !17:56
mordredhttps://opendev.org/zuul/zuul-registry/src/branch/master/.zuul.yaml#L1-L61 is a good simple place to cargo-cult from17:56
fungiafter i finish reenqueuing a couple of failed release jobs from last month17:56
fungiokay, that's done, working on the gear image uploads17:59
hasharfungi: thank you for stepping in!18:10
mordredfungi: if you're in a secret encoding and uploading mood, https://review.opendev.org/c/zuul/zuul-storage-proxy/+/774998 needs a secret too - as well as the upload and promote jobs18:19
*** eolivare has quit IRC18:19
fungimordred: sure, i can take a look shortly. almost have the gear change done i think18:20
mordred\o/18:20
*** marios|out has quit IRC18:20
fungiany idea where that failing gear-upload-image job is defined? codesearch isn't turning it up18:24
fungiwill want to track it down and rip it out18:25
mordredI thnik it may not exist anymore?18:26
mordredlike - wasn't there a "move jobs in repo" related to gear in the not-too-distant-past?18:26
fungioh, i love problems which solve themselves ;)18:26
mordredI don't see anything in any of the jobs lists on zuul.opendev.org18:28
funginot finding it in the openstack/project-config git history either18:30
fungii'll just assume it no longer exists18:30
mordred++18:30
fungiand we'll find out if that's true18:30
mordredif it shows back up, we'll have breadcrumbs18:30
fungilooking in https://hub.docker.com/u/opendevorg/ i don't see that we ever published an image for it there anyway18:32
fungishould i call the image gear or geard?18:32
fungii guess it needs to match whatever's in the dockerfile?18:34
fungiexcept there is no dockerfile18:34
fungiso i need to add more than just zuul configuration i suppose18:34
fungioh, hah, i should have looked more closely at https://review.opendev.org/68844618:35
fungiit's adding the failing job, but also a dockerfile18:35
mordredyeah - definitely this is an "update that patch" sort of patch18:39
mordredand - I think I'd go with opendev/geard18:39
mordred(you'll want to s/as nodepool/as geard/ in the Dockerfile)18:39
fungirepository: opendevorg/gear18:41
fungitarget: geard18:41
fungithat's what i want to pass in the docker_images var for the job?18:41
fungii assume so anyway, giving it a shot18:41
openstackgerritJeremy Stanley proposed opendev/gear master: Added Docker image builds  https://review.opendev.org/c/opendev/gear/+/68844618:42
fungimnaser: hashar: mordred: ^ hopefully that's complete18:42
openstackgerritClark Boylan proposed opendev/system-config master: Use dstat to record performance of system-config-run hosts  https://review.opendev.org/c/opendev/system-config/+/77505118:42
clarkbIf ^ works I think that will produce dstat info for all our system-config-run hosts18:43
clarkbmight produce interesting/useful data or it might be so inaccurate that it doesn't matter, but I figure having it can't hurt18:43
fungioh, i see, repository is what to call the image on dockerhub, target is the dockerfile target to publish there. i'll respin18:46
openstackgerritJeremy Stanley proposed opendev/gear master: Added Docker image builds  https://review.opendev.org/c/opendev/gear/+/68844618:47
mordredfungi: I left a comment on the prior PS18:48
mordredoh - nevermind - theyr'e on the current PS18:49
fungicool, thanks18:55
openstackgerritJeremy Stanley proposed opendev/gear master: Added Docker image builds  https://review.opendev.org/c/opendev/gear/+/68844619:11
*** sboyron_ has quit IRC19:18
*** sshnaidm is now known as sshnaidm|afk19:19
fungilooking at the zuul-storage-proxy image publication change now19:24
*** rchurch has quit IRC19:34
*** rchurch has joined #opendev19:34
hasharmordred: fungi: you are awesome thank you!19:35
fungiit's my pleasure!19:36
*** andrewbonney has quit IRC19:42
openstackgerritMerged opendev/gear master: Added Docker image builds  https://review.opendev.org/c/opendev/gear/+/68844619:54
ianwclarkb: using gnuplot and putting out nice-ish graphs as artifacts would be pretty cool for https://review.opendev.org/c/opendev/system-config/+/775051 :)20:03
clarkbianw: https://lamada.eu/dstat-graph/ is what I usually use. I bet we could vendor that code (its on github somewhere iirc) and provide similar20:05
ianwclarkb: nice!  git clone https://github.com/Dabz/dstat_graph.git && cd dstat_graph && generate_page.sh ./the-csv.file > page.html20:13
clarkboh cool, I can take a look at adding that20:14
clarkbI wonder if we can actually vendor the code instead of cloining it too20:14
ianwit might be even better in zuul-jobs ... i don't see why we couldn't use it in a lot of places20:14
clarkboooh good idea20:14
ianwthe bits it uses are pretty old20:15
ianwhttps://github.com/Dabz/dstat_graph/tree/master/js20:15
clarkbya there may be better ways of doing it now, I had just used that tool in the past via that website and it worked well enough for me20:15
ianwthe glyphicons don't show up for me20:17
clarkbI wonder if more modern browses have stricter rules for fonts/glyphs?20:18
ianwand the buttons don't seem to work20:18
ianwas with all great web projects, it looks like pretty much every part has been abandonded and needs to be basically re-written20:20
clarkbthe buttons and menus do work on the hosted version. Not sure what the glyphs are for20:20
clarkbit is also somewhat slow20:20
clarkb(I expect that is based on the number of csv entries20:20
ianwe.g. bootstrap 3  -> boostrap 4; https://github.com/novus/nvd3 d3 v3 -> v4 ;20:21
ianwoh, d3 is old news, you should just be making your own graphs using canvas and css https://medium.com/@PepsRyuu/why-i-no-longer-use-d3-js-b8288f306c9a20:27
clarkbis that hard mode?20:27
ianwi just checked; the makefile that created all the graphs using gnuplot for my thesis from 2007 still works just fine :)20:29
mordredAND ... TIL about https://preactjs.com/20:32
ianwand somehow i had totally missed that dstat is dead(?) but replaced with something on rh platforms or something ... https://news.ycombinator.com/item?id=1998664620:35
clarkbI think spamaps took over matinaining it?20:37
ianwmordred: in the final insult, the preact website lists "BBC Roasting Calculator 🦃 Calculates cooking times for different cuts of meat." as a sample application, but it's gone 40420:40
funginooo! not the barbecue calculator!20:41
*** bwensley_ has joined #opendev20:47
mordredhah20:50
mordredwell - a different implementation of calculating cooking times is known as "thermometer"20:50
*** bwensley__ has joined #opendev20:50
*** bwensley has quit IRC20:50
mordredI actually had a fork of dstat YEARS ago (before openstack) that added plugin support - which at least at the time dag was against but which I needed for my life as a mysql consultant20:54
mordredhttps://launchpad.net/mtstat20:54
mordredI'm fairly certain the existence of that is not useful to anyone20:54
*** bwensley_ has quit IRC20:54
ianwalso see my github project "poke it with your finger to see how springy it is" ... you can, quite literally, fork it20:54
mordredoh wow - that actually just flat doesn't exist - since I didn't host the code directly on launchpad :)20:54
mordredwell - I suppose the source tarball is there20:55
* mordred goes back into his hole20:55
fungiyeah, i use remote digital thermometers with alarms because:lazy20:56
ianwcould i request a couple of reviews20:58
mordredyup20:58
ianwhttps://review.opendev.org/c/opendev/system-config/+/775068 - fixes a backup script typo, want to make sure i didn't make any more mistakes20:58
ianwhttps://review.opendev.org/c/opendev/system-config/+/775057 - afs ansible isn't quite working.  this allows the servers to share the key material from just one group file on bridge20:59
ianwhttps://review.opendev.org/c/opendev/system-config/+/775053 - this is for refstack ... it worked when i manually made those changes, but docker redeployed it21:00
ianwthat will (i think) bit it for refstack; we can get kopecmartin to confirm it looks ok and then start the process of removing the old bits21:01
mordredianw, fungi : on the backup script change - I think there is a typo21:11
fungioh?21:12
fungioh, yep!21:13
fungii did not see that21:13
fungii think my eyes must filter out _21:13
mordredrightfully so21:14
hasharfungi:  thanks for the opendevorg/geard Docker image!21:14
ianwthanks, i had that feeling something was still wrong :)21:15
fungihashar: it worked? awesome!21:15
hasharyeah!21:16
openstackgerritIan Wienand proposed opendev/system-config master: borg-backup: fix backup script failure match  https://review.opendev.org/c/opendev/system-config/+/77506821:16
hasharI will look at other open changes for that repo eventually :]21:17
hasharbut for now sleep time. thanks for the fix21:17
fungihave a good evening hashar, thanks for the help!21:17
hasharyou are welcome =]21:18
*** hashar has quit IRC21:18
mordredO M G21:22
mordredhttps://www.youtube.com/watch?v=hgI0p1zf31k21:22
mordredcorvus: ^^21:22
fungihis beard is the correct length21:25
corvusi love gary jules21:27
corvusmordred: you are a 'git blame titan'21:28
mordredyup21:32
mordred:)21:32
corvushowever, while they did use the gary jules arrangement, the song is actually by tears for fears21:32
corvus(i love the gary jules version, but i like the interpretation of the original better due to the musical irony)21:35
corvusi have both albums :)21:35
corvusi could go on for a bit on this....21:36
*** bwensley__ has quit IRC21:41
ianwclarkb: hrm, any idea why the gerrit 3.3 func tests worked in check but not in gate? https://review.opendev.org/c/opendev/system-config/+/77380721:45
ianwPulling shell (docker.io/opendevorg/gerrit:3.3)...21:45
ianwmanifest for opendevorg/gerrit:3.3 not found: manifest unknown: manifest unknown21:45
clarkbno21:46
clarkbdid it not upload properly to the buildset registry maybe?21:47
ianwi guess https://review.opendev.org/c/opendev/system-config/+/765021/8 didn't promote it ...21:47
ianwsystem-config-promote-image-gerrit-3.2 is a dependency of infra-prod-manage-projects21:49
clarkbya because 3.2 is what we are running in prod21:50
clarkbI don't know that promotion is the problem since those jobs create the image21:50
clarkbpre promotion21:50
openstackgerritMerged opendev/system-config master: refstack: use external https for API  https://review.opendev.org/c/opendev/system-config/+/77505321:52
ianwhttps://review.opendev.org/c/opendev/system-config/+/773807 didn't run system-config-build-image-gerrit-3.321:52
clarkbit should run the upload job21:53
clarkboh it didn't run the upload for 3.2 either21:53
clarkbso ya we haven't yet published that image yet in docker hub so it failed? I bet this is a dependency problem21:53
ianwit must have pulled it from the registry of it's child job to pass in the check?21:53
clarkbfrom the parent change maybe21:54
clarkbin this case if we recheck it should work if the parent promoted properly?21:54
ianwyeah, but nothing will run system-config-promote-image-gerrit-3.321:55
clarkbseems that it didn't21:55
corvusclarkb: if a change used a container dependency in check, it should do that in gate too21:55
corvusoh, unless the change merged ahead of time and then failed promote21:55
corvus(ie, if they weren't in gate at the same time)21:56
ianwi think the problem isn't that the promote failed as such, the promote job was never added.  which is about the same thing in the end, i think21:56
ianwyeah, we have opendevorg/gerrit/change_765021_3.321:57
clarkbya so I did something wrong in the set of jobs?21:57
clarkbsystem-config-gerrit-images template has the promote job in it21:59
clarkbdo we not actually use that template? is that just an attractive problem?21:59
clarkbno it is in there in the templates21:59
openstackgerritIan Wienand proposed opendev/system-config master: Add promote for gerrit 3.3 image  https://review.opendev.org/c/opendev/system-config/+/77528721:59
ianwhrm, maybe i'm wrong on that.  we have a template ...22:00
clarkbya I think the template has the jobs. I'm guessing we didn't meet some condition to run the job?22:00
clarkbwe probably have a file filter that excluded it. I feel like we've run into this before with promotes22:00
ianwindeed22:01
ianwzuul.d/docker-images/gerrit.yaml doesn't match the promote job22:01
clarkbwe could do a noopy change to the dockerfile probalby22:03
openstackgerritIan Wienand proposed opendev/system-config master: Add promote for gerrit 3.3 image  https://review.opendev.org/c/opendev/system-config/+/77528722:03
ianwgerrit is doing that thing where it takes a long time to post a review22:03
openstackgerritIan Wienand proposed opendev/system-config master: Add promote for gerrit 3.3 image  https://review.opendev.org/c/opendev/system-config/+/77528722:04
openstackgerritIan Wienand proposed opendev/system-config master: Trigger promote for gerrit 3.3 image  https://review.opendev.org/c/opendev/system-config/+/77528722:04
openstackgerritMerged opendev/system-config master: borg-backup: fix backup script failure match  https://review.opendev.org/c/opendev/system-config/+/77506822:06
fungiianw: "that thing" is usually accompanied by high system load on the server... same correlation again?22:07
ianwtop seems ok22:08
*** gmann is now known as gmann_afk22:10
ianwmemory usage is high, but otherwise cpu doesn't seem up on cacti22:11
clarkbnote the jvm hogs the memory and you can separately check jvm memory usage in melody22:11
clarkbthough it does peak to use the jvm hogged memory last I looked its 95 percentile was like 1/2 of peak22:12
fungiyeah, memory usage for the server is essentially irrelevant22:12
fungiusually when i look it's consumed all but a tiny bit of available ram and plateaued there22:13
fungithe jvm is doing its own internal allocation from that22:13
*** Dmitrii-Sh has quit IRC22:19
*** Dmitrii-Sh has joined #opendev22:20
fungithe high load average, when i see it, tends to be from or accompanied by a mix of much higher than usual system and user cpu cycles22:20
corvusmordred: it looks like since the python-builder image assemble script creates the /output/bindep/run.txt file on the python-builder image, that means that if a package is installed on the builder image it won't be included in that file.  so if there's a package needed at runtime that's installed in python-builder but not in python-base, then it won't be there.  does that sound right to you?22:39
corvusmordred: (this isn't actually a problem for me right now, i just discovered it when trying to add the 'unzip' package to bindep to get it to show up on the final image to debug the actual problem i'm having, which is that there doesn't seem to be any python files in the zuul_storage_proxy wheel that's being built22:40
*** ildikov has quit IRC22:40
*** ildikov has joined #opendev22:40
mordredcorvus: it seems right - it's why we try to install as little as possible in the builder images and have some amount of special logic to handle the things that unavoidably must be installed there22:40
corvusok; so just something to keep in mind (and possible add as a comment or something) and deal with later if it comes up :)22:41
openstackgerritMerged opendev/system-config master: Refactor AFS groups  https://review.opendev.org/c/opendev/system-config/+/77505722:46
mordredcorvus: yah22:50
*** slaweq has quit IRC23:09
*** gmann_afk is now known as gmann23:11
mordredcorvus: so - followup - you're saying we have unzip installed in python-builder?23:22
ianwfungi: hrm, so i can't get the wiki logging into the backup hosts ... so far not sure if it's mis-configuration, or some sort of weird issues with key types :/23:22
corvusmordred: i'm no longer sure about that.  (i'm fairly sure our conclusion about the bindep "hole" is correct, but i was inspecting the wrong image, so unzip may well not be installed)23:26
mordredyeah - I just checked python-builder and didn't find it there23:29
corvussorry for the red herring :(23:29
corvusclarkb, fungi, ianw: re the report on gerrit from the bmw folks this morning -- what are the next steps there?  do we want to investigate it, or go with a "don't do that" approach for now?23:35
ianwis that the 60 changeset push?23:36
corvusyeah23:36
ianwexcellent, it seems the trusty host can not log into the backup servers.  if i take the key and use it locally, i can23:36
clarkbcorvus: I suggested that maybe asking upstream gerrit about it would be a good idea, as it seems to affect large stacks of pushes generally23:42
clarkbperhaps this is a known issue or something they have tuning advice for23:42
clarkbbut I half expect the answer may be don't do that or you need a bigger server23:42
*** whoami-rajat__ has quit IRC23:43
clarkbfungi: I am able to fetch review.o.o's index.html from an ovh bhs1 host now23:45
clarkbfungi: maybe we test the other direction too then put bhs1 back into the rotation?23:45
clarkbianw: ssh -vvv may have clues (that was super useful debugging the gerrit ssh stuff from fedora33 recently)23:46
clarkbfungi: https://grafana.opendev.org/d/qh6NXp2Mk/nodepool-ovh?orgId=1&from=now-12h&to=now that seems to show the gap when the network gear was sad23:47
ianwclarkb: yeah, afaict it seems to be offering the key ... the other side doesn't say anything helpful :/23:48
ianwthe amount of time i want to debug trusty openssh issues is ... not large23:49
clarkbthe client side should say if the server side didn't offer a matching algoritm though I think23:49
clarkbno mutual something something iirc23:49
ianwit just says "Invalid user borg-wiki-upgrade-test"23:49
clarkbhuh I wonder of the server side /var/log/auth.log or similar would give more clues based on that message23:50
ianwthat's coming from auth.log23:50
clarkbhuh, I'm all out of immediate ideas then :)23:52
ianwyeah, me too.  the fact that the same key works from my machine makes me suspect the trusty side client version23:53
clarkbis it an rsa key?23:53
ianwed2551923:53
clarkb(I would expect maximum compat via rsa, but maybe the fedora situation implies otherwise)23:53
ianwit's offering it -- debug1: Offering ED25519 public key: /root/.ssh/id_borg_backup_ed2551923:54
clarkbI guess another thought would be to try rsa?23:56
clarkbhttp://travaux.ovh.net/?do=details&id=48997& implies that things may not be fully back to normal yet, maybe we wait a bit longer given that23:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!