Wednesday, 2021-02-24

openstackgerritMerged opendev/system-config master: etherpad: fix robots.txt  https://review.opendev.org/c/opendev/system-config/+/76350200:02
clarkbinfra-prod-base failed, a number of instances get an rc -13 during the instal exim step00:04
clarkbzm04.opendev.org was included in this list, the other 6 new zuul mergers seem fine though00:04
clarkbI guess I'll proceed with the other 6 for now then swing back around and check that our daily infra-prod-base gets zm04 this evening?00:05
clarkbthe ansible doesn't auto start zuul-merger so we can turn them on as we are happy with them00:05
clarkboh actually does launch node run base?00:06
clarkbit does, in that case we can proceed with 04 as well since it should be caught up00:06
prometheanfirewas there a reason to avoid virtualenv-20.4.2?  https://review.opendev.org/77717100:07
clarkbprometheanfire: virtualenv made a number of iffy updates recently after the big 20 release00:08
clarkbI don't recall all the details but suspect that any pin would be related to that00:08
clarkbthey ended up fixing a lot of the issues and introducing new regressions. Its possible that back and forth has settled down now00:08
*** DSpider has quit IRC00:09
ianwok, i've removed all bup cron jobs00:10
clarkbinfra-root I cleaned up the non qcow2 imgaes in /opt on bridge. Then realized that /opt is another device so that doesn't help significantly with disk space pressure00:10
ianwthe old backup server has /opt/backups-201903 and /opt/backups-20200700:10
*** tosky has quit IRC00:10
ianwi will remove and free up 201903 backups, and i think i can shrink 20207 to a single tb volume; so that's 5tb of volumes we can free up00:11
ianwi'll then attach 202007 to the new server and make a note to delete in say, 6 months00:12
clarkb++00:13
clarkbI'm going to start cleaning up old ansible logs from the first half of last year on bridge00:13
fungisounds great, thanks!00:14
clarkbsince the logs were the other place I identified much disk use00:14
clarkbI think the reason those leak is when the job stimeout we don't rotate them properly00:14
clarkbmaybe00:14
prometheanfireclarkb: ok, well, I'm gonna approve it, if stuff breaks, feel free to force a revert00:15
clarkbok I think I'll stop perusing and cleaning up old ansible logs there. I haven't completely cleaned things up and that didn't make a huge dent, but its something00:31
ianwok, 202007 volume shrunk to be just on /dev/xvdb on the old server.  shutting down now00:32
fungiianw: interesting note, passing --install-plugin download-commands gets HttpGitReviewTestCase.test_git_review_d passing00:35
fungiso i apparently can't drop that00:35
clarkbfungi: does git review -d inspect the change to get the download url?00:36
clarkbif so that could explain it00:36
fungithe tests want to use the change-id rather than the index integer00:36
fungipresumably because that's in the commit message00:37
clarkbah00:38
ianwi would have thought that was a core plugin, but ... there you go00:39
*** brinzhang has joined #opendev00:40
fungirerunning locally to see if that also fixes test_uploads_with_nondefault_rebase00:43
fungialso just got hit by another round of privmsg spam00:45
ianwyeah, i got some too00:45
ianwi really can't imagine your ROI in IRC spam is positive00:45
ianwok, old bup backups are mounted on the new rax.ord server @ /opt/bup-20200700:47
ianwretrieving anything is left as an exercise ... will need some UID munging, python2 environments for bup, etc.00:48
ianw#status log Old rax.ord bup backups mounted RO on the new rax.ord borg backup server @ /opt/bup-20200700:48
openstackstatusianw: finished logging00:48
fungiianw: pretty sure the only goal for any of the irc spammers in the past few years is to discredit and/or drive users off freenode, and to exact revenge for staff actions00:48
clarkbianw: I expect you can run a python2 container that talks to localhost to try and sort that out00:49
*** dmsimard0 has joined #opendev00:50
*** dmsimard has quit IRC00:51
*** dmsimard0 is now known as dmsimard00:51
clarkbianw: its looking like I may need to call ita day before ansible updates the new merger nodes. I don't think that is a big deal since zm01 showed it should go smoothly and it won't auto start the mergers. I can just check them in the morning, start them up then if they look good after a bit turn off the old ones00:51
clarkbtl;dr I don't expect you need to do anything wit hthem00:51
ianwok, i can keep an eye and swap them in too if you like00:52
clarkbI wouldn't worry about it, I'm ure you've got othre things to look at00:52
clarkbI just wanted to let you know that it is still pending but I think you can safely ignore it :)00:52
fungipy37: commands succeeded00:57
fungiwoohoo!00:57
*** mlavalle has quit IRC00:57
clarkbfreenode admins making notices now00:58
clarkbfungi: push it :) /me hangs around to review it00:59
openstackgerritJeremy Stanley proposed opendev/git-review master: Test with Gerrit 2.13  https://review.opendev.org/c/opendev/git-review/+/77726800:59
ianwhrm, the old volumes have gone into "error deleting volume" state01:00
spotzCan we turn off PMs?:)01:00
clarkbspotz: freenode suggests you set yourself +R to prevent PMs from unidentified users01:01
ianwcan't even give our TBs away :)01:01
openstackgerritJeremy Stanley proposed opendev/git-review master: Test/assert Python 3.9 support  https://review.opendev.org/c/opendev/git-review/+/77258901:01
spotzOk let me try that/ Thanks clarkb01:02
fungiclarkb: ianw: topic:gitreview-2 has the currently outstanding changes slated for a 2.0.0 release01:02
fungitomorrow i'll work on a python package refresh for it like i did with bindep in 77410601:04
fungiand get that included too01:04
fungiotherwise i think it's basically ready (assuming zuul and reviewers concur)01:04
clarkbfungi: those changes all lgtm and should have my +2 now. Left a thought on the one that fixes the gerrit bootstrapping though that can be addressed in a followup01:07
clarkband now I need to find dinner01:07
fungithanks!01:10
fungiultimately all that was just in service of being able to test with python 3.901:11
fungithe whole test framework could stand to be redesigned from the ground up01:11
ianwi think it was definitely worth sorting out.  tbh i don't think it needs that much re-design01:12
ianwbut we could include git-review as part of the system-config job too, instead of pushing directly01:12
ianwjust as another angle01:12
*** brinzhang has quit IRC01:13
ianwi filed a ticket with rax on our zombie volumes01:13
*** brinzhang has joined #opendev01:15
*** brinzhang has quit IRC01:16
*** LowKey has quit IRC01:44
*** hamalq has quit IRC01:53
*** hamalq has joined #opendev01:54
*** zimmerry has quit IRC02:02
*** hamalq has quit IRC02:08
openstackgerritOleksandr Kozachenko proposed zuul/zuul-jobs master: Revert "Revert "Update upload-logs roles to support endpoint override""  https://review.opendev.org/c/zuul/zuul-jobs/+/77667702:25
prometheanfirewell, let's see if the virtualenv update breaks things02:30
openstackgerritMerged opendev/git-review master: Add missing -p/-P/-w/-W/--license to manpage  https://review.opendev.org/c/opendev/git-review/+/77456702:37
openstackgerritMerged opendev/git-review master: Create test projects with positional argument  https://review.opendev.org/c/opendev/git-review/+/77726003:14
openstackgerritMerged opendev/git-review master: Test with Gerrit 2.13  https://review.opendev.org/c/opendev/git-review/+/77726803:15
*** brinzhang has joined #opendev03:17
*** ysandeep|away is now known as ysandeep|ruck03:41
*** ykarel has joined #opendev04:11
*** zimmerry has joined #opendev04:37
*** jmorgan has quit IRC05:18
*** jmorgan has joined #opendev05:18
*** dviroel has quit IRC05:39
*** whoami-rajat has joined #opendev05:41
openstackgerritIan Wienand proposed opendev/system-config master: zuul-summary-status : handle SKIPPED jobs  https://review.opendev.org/c/opendev/system-config/+/77729805:52
*** marios has joined #opendev05:57
ykarelI see mirrors are outdated, can someone check the reason05:59
ykarelhttp://mirror.ord.rax.opendev.org/centos/8-stream/AppStream/x86_64/os/repodata/ vs http://mirror.dal10.us.leaseweb.net/centos/8-stream/AppStream/x86_64/os/repodata/05:59
ianwykarel: all mirroring logs are @ https://static.opendev.org/mirror/logs/rsync-mirrors/06:00
ykarelianw, Thanks, checking logs06:02
ianwour last run was @ 2021-02-24T04:44:09,716424675+00:0006:04
ykarelianw, yes, so may be the source mirror was not consistent at that time?06:08
ykarelfrom https://mirror-status.centos.org/ i see mirror.dal10.us.leaseweb.net  @ http 1 hour06:08
ykarelrenewed06:09
ykarelok06:09
ianwyeah, i'm not seeing any rsync errors or errors with us releasing, so i'd say we're getting what upstream was serving06:09
ykarelwhen will be next run?06:09
ykarelis it possible to retrigger the run now?06:09
*** zoharm has joined #opendev06:12
ianwit runs every two hours (https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/mirror-update/tasks/rsync.yaml#L42)06:13
ianwykarel: i'm running a manual run now06:14
ykarelianw, Thanks06:15
ykareli see the contents are updated now, Thanks again ianw06:37
ianwykarel: np07:24
*** slaweq has joined #opendev07:27
*** eolivare has joined #opendev07:32
*** sshnaidm|afk is now known as sshnaidm07:38
*** ralonsoh has joined #opendev07:48
*** ysandeep|ruck is now known as ysandeep|lunch08:00
priteauGood morning. We had a strange error in gate jobs overnight: "ERROR Failed to update project None in 1s"08:02
priteauhttps://zuul.opendev.org/t/openstack/build/999f096bccbe43c286b7c6ed3c5deeb508:02
*** rpittau|afk is now known as rpittau08:11
*** fdegir has joined #opendev08:18
*** andrewbonney has joined #opendev08:27
*** ykarel_ has joined #opendev08:31
*** ykarel has quit IRC08:34
*** ykarel_ is now known as ykarel08:38
*** jpena|off is now known as jpena08:57
*** ysandeep|lunch is now known as ysandeep|ruck09:10
*** tosky has joined #opendev09:18
*** brinzhang has quit IRC10:04
*** brinzhang has joined #opendev10:05
*** dtantsur|afk is now known as dtantsur10:07
*** ysandeep|ruck is now known as ysandeep|brb10:10
*** ysandeep|brb is now known as ysandeep|ruck10:24
*** dviroel has joined #opendev10:55
*** klonn has joined #opendev11:55
*** klonn has quit IRC11:57
*** akahat has quit IRC12:05
*** akahat has joined #opendev12:05
*** eolivare has quit IRC12:08
*** jpena is now known as jpena|lunch12:38
*** ysandeep|ruck is now known as ysandeep|prgm_ca12:59
*** eolivare has joined #opendev13:16
fungilooks like that build started 2021-02-23 23:29:24 utc from ze1113:21
fungiit failed to update a bunch of repositories, logging errors like13:23
fungi2021-02-23 23:29:26,134 ERROR zuul.ExecutorServer: [e: 82190bddba5d471e915725e15ea04a87] [build: 999f096bccbe43c286b7c6ed3c5deeb5] Process pool got broken13:24
fungiseems like it was just that build, i don't see "Process pool got broken" logged for any other builds, at least on that executor13:25
fungii'm not finding that on any other executors either13:33
fungiso seems to have been a contained incident13:33
fungiwhatever caused it13:33
fungii'll do some more digging to figure out what codepaths could log that error13:33
*** dtantsur is now known as dtantsur|brb13:35
openstackgerritMerged opendev/git-review master: Test/assert Python 3.9 support  https://review.opendev.org/c/opendev/git-review/+/77258913:35
*** jpena|lunch is now known as jpena13:39
*** ysandeep|prgm_ca is now known as ysandeep13:51
*** ysandeep is now known as ysandeep|mtg13:55
*** rpittau is now known as rpittau|afk14:04
*** LowKey has joined #opendev14:17
*** dhellmann has quit IRC14:30
*** dhellmann has joined #opendev14:31
*** LowKey has quit IRC14:41
*** LowKey has joined #opendev14:41
*** chandankumar has quit IRC14:58
*** zoharm has quit IRC15:08
*** dhellmann has quit IRC15:12
*** lpetrut has joined #opendev15:13
*** dtantsur|brb is now known as dtantsur15:18
*** ysandeep|mtg is now known as ysandeep15:20
*** bhagyashris is now known as bhagyashri|ruck15:20
*** dhellmann has joined #opendev15:22
*** lpetrut has quit IRC15:24
*** chandan_kumar has joined #opendev15:29
*** chandan_kumar is now known as chandankumar15:32
*** slittle1 is now known as Guest272115:52
*** elod has quit IRC15:58
*** elod has joined #opendev16:00
clarkbI have started zuul-merger on the new merges and am proceeding with stopping zuul-merger on the old mergers16:07
clarkbif things look good in a couple of hours I'll delete the old mergers entirely16:07
*** mgoddard has joined #opendev16:11
*** ysandeep is now known as ysandeep|dinner16:12
openstackgerritEmma Foley proposed openstack/project-config master: Add infrawatch/functional-tests to available repos  https://review.opendev.org/c/openstack/project-config/+/77742816:28
*** ysandeep|dinner is now known as ysandeep16:33
*** dtantsur is now known as dtantsur|brb16:50
fungiokay, i'm a little confused, the only places i find the string 'Process pool got broken' are in zuul/executor/server.py and there are two AnsibleJob.execute() and ExecutorServer._innerUpdateLoop()16:50
fungiif BrokenProcessPool is raised in either of those places that message is logged and resetProcessPool() is called. one of them also calls _send_aborted() but the other does not16:51
fungiand the latter looks like what we ran into, based on context in the log16:52
*** toomer has joined #opendev16:53
fungiaha, yeah i wasn't looking at the tracebacks before16:54
*** marios has quit IRC16:54
fungiso it was raised in _innerUpdateLoop() and the result was a build error, which i suppose is why the build was not automatically retried16:54
fungii'll take this to #zuul16:54
*** ykarel has quit IRC17:05
toomerI hope somebody from the Infra team can help me with issue on OpenDev Gerrit (review.opendev.org)17:08
toomerWhen we try to push the new patch to gerrit server we are getting the below error:17:08
*** gibi has joined #opendev17:09
toomergit push ssh://**********@review.opendev.org:29418/openstack/nova c8478e40bdb996d5e0a1f01ae0ae55e6926f318d:refs/for/master%topic=bug/190912017:10
fungitoomer: if you're having trouble pasting the error into irc (e.g. it's more than a line or two) then use http://paste.openstack.org/ and stick the url in here17:10
toomererror: remote unpack failed: error Missing tree 440f101f5474ed2009b4ced41a31c6673a8a1c8017:10
toomerfatal: Unpack error, check server log17:11
*** ysandeep is now known as ysandeep|away17:11
fungii'll check the server's log17:11
toomerI'm wondering if there is anything on the server logs (review.opendev.org) corresponding to that push17:12
toomerGreat, thanks17:12
fungiyeah, it definitely logged that. i'll see if that tree is actually missing17:12
toomerI can try to push again in order to reproduce, if needed.17:13
fungii can `git show 440f101f5474ed2009b4ced41a31c6673a8a1c80` in the bare repo for openstack/nova on the server's filesystem, so that should be the same one gerrit is referencing17:13
fungii'll see if there are more tea leaves i haven't successfully read in the backtrace it logged17:14
fungitoomer: i think we've seen this before when someone pushed a large change series. could that be the case this time too?17:18
clarkbfungi: toomer yoctozepto had a similar issue with a different repo previously and the issue was client side aiui17:18
clarkbthe pack file the client was sending to the remote was not valid essentially17:18
clarkbI want to say yoctozepto rebased which cased the local tree to be pushable17:18
fungiahh, the error logged by gerrit makes it sound entirely server-side, but i could see that still being the case17:19
clarkbfungi: the hints are in the traceback, the error is happening server side but during recieving and verification of the pack from the remote side iirc17:19
fungiit's coming from somewhere in FullConnectivityChecker, so yeah that makes some sense17:20
fungisome sort of object disconnect in the packfile being pushed, not in the actual repo gerrit has17:21
clarkbyup exactly17:21
toomerI can try to rebase this change, but we had that problem before for openstack/tempest and after a while we just try again and that went trough without errors17:22
fungialso if you're pushing multiple commits at once, i guess that chance increases. hard to have a disconnect when pushing a single change17:22
fungiis this one change, or a series of changes?17:22
clarkbI suspect that it is related to git protocl v217:22
clarkband there is either a bug in jgit or in cgit17:22
fungioh, yeah maybe17:23
clarkbwhere the two sides don't quite negotate the exact set of content that needs pushing correctly17:23
fungiand rebasing just papers over the problem by rewriting the commits and you get lucky and don't have the same problem17:23
clarkbya17:24
clarkbanother option may be to force v117:24
clarkband see if it goes away since the negotiation is different aiui17:25
fungiyep, worth a try17:26
toomerThis is the single change and is already base on the latest commit from what I see17:28
toomerHow I can force git protocol v1 ?17:29
clarkb`git -c protocol.version=1 push gerrit HEAD:refs/for/master` ? I think I have not tested that17:30
clarkbfungi: ^ does that look right for psuhing directly without git review?17:30
toomer ! [remote rejected]       c8478e40bdb996d5e0a1f01ae0ae55e6926f318d -> refs/for/master%topic=bug/1909120 (n/a (unpacker error))17:31
clarkbthat is with the command I pasted?17:31
toomerOk, I will try the git protocol version 1 and let you know17:32
toomerNo17:32
toomerSame error17:37
toomerhttp://paste.openstack.org/show/802970/17:37
fungitoomer: is this the only change you're aware of returning this error, or is it generally a problem for any change someone tries to push in nova?17:37
toomerWe had similar problem with change for openstack/tempest repository17:38
fungiwhen was that?17:38
clarkblast time this occurred I also suggested to yoctozepto that a git fsck might be helpful17:38
clarkbbut I don't think the fsck was done17:38
toomerBut after some time, we tried it again and it went trough17:38
fungitoomer: so it's been a while, sounds like. do you know whether it was the exact same error message?17:39
toomerLet me check....17:39
toomer24-Feb-2021 12:19 GMT17:40
clarkbit might be worth trying the rebase. Or pushing another change entirely17:43
clarkbjust to try and narrow this down (rebase was what helped last time)17:43
clarkbmy v2 protocol suspicion at least seems to be ruled out for now17:45
fungitoomer: can you run `git fsck` in your nova tree (it will probably take a few minutes) and see if it reports any errors?17:46
toomerHere is a client logs for the tempest change. On the first attempt it failed but after couple minutes the same patchset went trough17:47
toomerhttp://paste.openstack.org/show/802971/17:47
fungiparticularly interested in the "checking connectivity" phase of the fsck17:47
toomersure, will do that, just a sec17:48
clarkbI wonder too if the remote pack is relying on the local (server) packs to verify connectivity17:48
clarkband maybe gerrit jgit has paged out/uncached/etc the relevant bits for that particular sha17:48
clarkbbeacuse as you point out it is there in the actual repo17:48
clarkband then maybe it reloads that later and is happy again17:48
* clarkb just thinking out loud17:49
fungii do also see the error for the tempest push logged at the following times today: 10:17:55, 10:31:55, 10:34:05, 12:19:4117:50
toomerIs there anything logged for tempest at 24-Feb-2021 12:23 ?17:51
clarkbI was able to push remote:   https://review.opendev.org/c/openstack/nova/+/777444 DO NOT MERGE testing a thing17:52
clarkb(as a sanity check)17:53
clarkbthat has me thinking to client side state again though17:53
fungitoomer: no missing tree errors for tempest changes after 12:19:41 utc17:54
yoctozeptotoomer, clarkb, fungi: so the issue back then was that it was a stack of the changes; I wanted to modify just the tip and push it17:55
yoctozeptobut gerrit did not like it17:56
yoctozeptoso I had to rebase off origin/master17:56
yoctozepto(the origin that I just fetched)17:56
yoctozeptoand then it worked17:56
clarkbwell I think the problem is more subtle than that. My understanding of the issue is your client sends a pack file with the stuff in it for the push and gerrit verifies its completeness, but if it isn't complete you get this error17:57
clarkbit is possible that the reason gerrit sees this as incomplete is due to a gerrit or jgit bug though17:57
clarkbnot necessarily the clients fault. But I don't think it is specific to trying to modify a stack of changes, more the state of the pack sent by the client to gerrit and if gerrit can verify connectivity17:57
fungilooks like only two users have encountered missing tree errors today according to the gerrit log. one for tempest and another for nova, so the problem is not widespread at least17:58
clarkbI had previously suspected that maybe this is a git protocol v2 growing pain but the above test should have used v1 and it failed too17:58
clarkbin this case a fsck as well as a rebase and try again arep robably the next things to try in order to gather more data17:58
fungithough the rebase would have to be forced, sounds like, unless another change merges in the meantime17:59
*** jpena is now known as jpena|off17:59
clarkbI think git rebase -i pick foo will do it17:59
clarkbbut maybe not, might need to amend it instead18:00
*** ralonsoh has quit IRC18:00
fungiyeah, commit --amend and then writing without editing the commit message would update the timestamp and change the commit id at least, so the pack will be different after that18:02
clarkbI pushed a followup change to https://review.opendev.org/c/openstack/nova/+/777444 which modifies one of the files listed in the 440f101f5474ed2009b4ced41a31c6673a8a1c80 object to check if maybe you need to do somethign that should require connectivity to that in history18:02
clarkbit seems to have worked fine18:02
clarkbs/followup change/followup patchset/18:02
clarkbI have git 2.30.118:02
clarkbtoomer: might be helpful to record the git client version while we're at it18:03
fungipriteau: just to follow up on your error, it looks like the executor to which that build was assigned hit an oom condition around the time it was starting the build and that process got sacrificed. normally zuul should have automatically retried the build but seems there was a corner case in the exception handling in that routine which caused us to report the error instead. i've proposed18:04
fungihttps://review.opendev.org/777441 to hopefully avoid that in the future18:04
*** iurygregory_ has joined #opendev18:04
*** iurygregory has quit IRC18:05
*** iurygregory_ is now known as iurygregory18:05
clarkbfungi: https://review.opendev.org/c/opendev/system-config/+/777206/ I've marked that one active. the mergers seem fine. If we land 777206 the next step will be to remove the old servers18:05
toomerThe change for the nova just went through, without any rebase or updates18:25
clarkbinteresting18:26
toomerhttp://paste.openstack.org/show/802977/18:27
toomer:-(18:27
toomerSo we won't get to the bottom of this today18:27
toomerI will ask my team to report any issues like that in the future so hopefully we will get more luck next time18:28
toomerThanks for all the support today !18:28
clarkbtoomer: what version of git are you using?18:29
clarkbjust in case that is useful (would be if it is a protocol problem)18:29
toomersec18:29
clarkbbut that could possibly explain why this isn't widespread if there is a specific version of git that doesn't play nice with gerrit (I don't have evidence of this yet, just trying to collect as much info as possible)18:29
toomergit --version # 'git version 2.17.1'18:29
clarkbthanks18:30
clarkbfungi: fyi https://storyboard.openstack.org/#!/story/200736518:30
toomerDid that specific gerrit fault happend beside today ?18:30
*** dtantsur|brb is now known as dtantsur18:31
clarkbtoomer: yes yoctozepto observed it on an osa repo a few weeks ago. Thoese are the only incidences I'm aware of18:31
toomerIt's good it's not spread. I will need to spend more time investigating it next time.18:32
fungii'll dig in gerrit error logs over the past month and see if i can find more18:32
fungilooked like it was gibi hitting the error for the nova change, according to the logs18:32
toomerYes, I'm working with him18:33
toomerWe are part of the Nordix Fundation18:33
clarkbthe next time this happens it might be helpful to try pushing with a newer git (to see if that helps)18:34
clarkbcontainers may make that easy ? I don't know18:34
clarkband just continue to try and isolate where the problem is originating18:34
fungifor the missing tree errors today, 750b013 in openstack/tempest and 440f101 in openstack/nova18:35
toomerI will talk with gibi tomorrow and see how frequently he is hitting this problem18:35
fungiyesterday there was a user getting a missing tree 3e44c04 in openstack/cinder at 02:19:50 and 02:20:4118:36
clarkb2.17.1 is bionic's git version fwiw18:39
fungithursday of last week (2021-02-18) at 08:49:33 a user got a missing tree 817e601 on openstack/neutron and another user got a missing tree 160fe0c on openstack/nova 11:31:56 that day18:41
fungiand missing tree 466cd02 in openstack/nova for a different user at 13:08:31 that day18:42
toomerHere is the change that went trough in case you are wondering how it looks like18:42
toomerhttps://review.opendev.org/c/openstack/nova/+/77744718:42
fungiweird it went from 13:08:31 friday to 02:19:50 yesterday without happening18:42
*** andrewbonney has quit IRC18:42
fungiand only seems to be impacting fairly large repositories (tempest, nova, cinder, neutron)18:42
clarkbtoomer: yes that may be useful too, thanks18:42
fungier, went from 13:08:31 thursday to 02:19:50 yesterday18:43
clarkbfungi: could be the jgit on the server side losing track of things due to GC'ing I guess18:43
fungiwithout happening18:43
fungiyeah, that's a good theory18:43
clarkbthough you'd expect it to go look on disk if it doesn't have it in memory18:43
fungimaybe the cache winds up out of sync temporarily?18:43
toomerHow frequently you are GC on gerrit repos ?18:44
clarkbya thats possible. I know one of the tunables is to make these refs hard not soft refs so they don't get GC'd18:44
clarkbtoomer: java gc not git gc18:44
clarkbI don't suspect git gc here because the objects are in the repo18:44
clarkb(you can verify via clone and git show)18:44
clarkbbut setting those to hard refs implies you need all the memories becuse you'll never free that memory again18:45
clarkb(which is why we haven't done that)18:45
yoctozeptoclarkb: it was kolla-ansible actually :-)18:49
yoctozeptolet me see my git version as well18:50
yoctozeptogit version 2.18.418:50
yoctozeptomaybe 2.17-2.18 at least are affected18:51
yoctozeptoif I hit it ever again, I will try a git upgrade... if I don't forget that is!18:51
toomertbh, I don't think it's a git version. The bionic git version is common and it will affect lots of users18:52
toomerif this is the actual issue18:52
*** eolivare has quit IRC18:58
fungiyeah, i'm seeing enough variety in usernames that i have doubts it's related to the git client version19:20
*** toomer has quit IRC19:28
clarkbfungi: have time for https://review.opendev.org/c/opendev/system-config/+/777206/ ? (sorry I just want to remove those old servers as that seems prudent before spinning up even more new servers)19:44
fungiyeah, sorry, got sidetracked writing a fix for that git review story you linked19:44
openstackgerritJeremy Stanley proposed opendev/git-review master: Don't test rebasing with unstaged changes  https://review.opendev.org/c/opendev/git-review/+/77745619:50
fungiclarkb: ^ i should probably also add a test for that19:50
clarkbtesting ++ espeically when working with the user's working dir19:50
clarkbtoomer isn't here anymore but I had another thought during my walk earlier. 2.17 and 2.18 are both pre git protocol v2 by default19:51
clarkbhowever, my git client is post git protocol v2 by default19:51
clarkbI wonder if this is a v1 issue and using v2 addresses it19:51
clarkbits possible that we don't see it happen frequently ebeacuse its a corner case or requires a specific situation. In any case I think testing with git >= 2.26 (when git protocol v2 became the default) would still be helpful if hitting this again19:52
*** dtantsur is now known as dtantsur|afk19:56
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: DNM: testing parallel bindep on multiple hosts  https://review.opendev.org/c/zuul/zuul-jobs/+/77746220:14
openstackgerritMerged opendev/system-config master: Cleanup zm02-08.openstack.org  https://review.opendev.org/c/opendev/system-config/+/77720620:21
clarkbfungi: your proposed fix for the git-review thing lgtm, though I think having a test would be a good idea20:25
fungiyeah, it's in the works20:25
fungialmost there20:26
*** sshnaidm is now known as sshnaidm|afk20:28
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: DNM: testing parallel bindep on multiple hosts  https://review.opendev.org/c/zuul/zuul-jobs/+/77746220:29
priteauThank you fungi for following up on the error20:35
fungipriteau: yeah, i'm still not certain how to get the AnsibleJob class context from there to properly abort it, still working on that, but the upshot is it's a very rare condition related to an out of memory event on an executor20:37
fungiideally zuul would have just silently rerun that build rather than reporting that error20:38
fungiclarkb: how do we go about increasing test timeout? i keep getting random test failures for git-review because some tests take too long (particularly on my increasingly pathetic and overloaded workstation, but also on some poor-performing job nodes too)20:41
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: DNM: testing parallel bindep on multiple hosts  https://review.opendev.org/c/zuul/zuul-jobs/+/77746220:41
*** LowKey has quit IRC20:41
*** LowKey has joined #opendev20:41
fungiclarkb: cancel that. i see "self.useFixture(fixtures.Timeout(2 * 60, True))" already in here20:41
clarkbfungi: there should also be an env var that it reads allowing you to bump it up locally20:42
fungiwell, in this case we need to bump it up in ci as well20:42
clarkbfungi: it appeas that test suite doesn't do the env var check20:44
clarkbbut that is a common patternelsewhere20:44
clarkbinfra-root any objections to me starting to delete zm02.openstack.org - zm08.openstack.org now?20:44
fungiclarkb: go for it, i say20:44
clarkb777206 landed and should be the last thing we needed before cleaning them up. The new servers have been in place for about 4 hours now20:45
clarkbok I'll start cleaning those up now. Then tomorrow I guess its ze01.opendev.org time20:46
fungiclarkb: minor hole in my logic on the git-review fix, which fiddling with tests has highlighted for me... if you create a new file and then git add it, the diff will be empty20:46
fungiso we need to catch unstaged *and also* staged but uncommitted edits in the worktree20:47
clarkbfungi: git diff --cached will show the staged side20:48
clarkbnot sure if you can do both in one command though20:48
clarkb02-04 are done now. My computer just started installing a billion packge updates so I'll apuse and wait for that to complete (big updates like this tend to flap networking and trying to do scary things like deletes when that happens is only going to cause me confusion)20:50
fungiclarkb: oh, yeah thanks. i can always run both at least20:53
fungiclarkb: an alternative is to do git stat --porcelain and parse the output20:54
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: DNM: testing parallel bindep on multiple hosts  https://review.opendev.org/c/zuul/zuul-jobs/+/77746220:58
clarkbfungi: ianw: looks like the gitea backups may still be complaining. Have we looked at that yet?21:07
fungii thought the fixes for gitea01 and translate merged21:07
clarkbah maybe they did, I just noticed the email from last night21:07
fungigitea01 complained again at 05:56:03 today21:08
fungino error from translate this time21:08
clarkbalso these updates totally did the name lookups stop working thing like they did on the other host. I'm going to reboot in a bit21:08
clarkbthat is a fun way to force people to reboot after updating, break dns21:08
fungi777037 was supposed to fix that, and merged 07:23 yesterday21:10
*** whoami-rajat has quit IRC21:12
clarkbfungi: so we wait and see ifit happens again?21:16
fungiclarkb: well, i'm sort of wondering if it didn't get applied (fix merged ~22.5 hours before the error was sent out), or if it was masking some other problem21:22
* clarkb is having a hard time with timestamps today.21:23
clarkbI now see you said error was today fix was yesterday21:23
clarkb#status log Replaced zm01-08.openstack.org with new zm01-08.opendev.org servers running on focal21:24
openstackstatusclarkb: finished logging21:24
clarkbLooking at the executors: I notice we need to launch specifying launch node use the ephemeral drive for /var/lib/zuul. Is there anything else special to consider about those? afs maybe?21:27
clarkbwe have focal packages for openafs in our ppa21:28
clarkbMaybe we spin up a ze01.opendev.org then double check it happily does afs things before rolling out more focal executors21:28
fungiprobably a good idea to double-check, yeah, though mirror-update.o.o is already focal and does afs writes21:32
fungiso the executors will probably be fine21:32
ianwclarkb: let me see21:34
ianwCreating archive at "ssh://borg-gitea01@backup02.ca-ymq-1.vexxhost.opendev.org/opt/backups/borg-gitea01/backup::gitea01-mysql-2021-02-24T05:53:50"21:36
ianwmysqldump: Error 2013: Lost connection to MySQL server during query when dumping table `action` at row: 705721:36
ianwthat seems like a legitimate failure, why though ...21:36
clarkbthat should all be on localhost too21:38
clarkbmaybe mariadb updated at that same time?21:38
clarkbwe auto update it iirc21:38
*** slaweq has quit IRC21:48
ianwmaybe ...21:57
ianwfor some reason gitea logs go to both /var/log/syslog and /var/log/containers/docker-gitea.log22:00
clarkbianw: the acces logs go to /var/gitea/logs/ too iirc22:00
clarkbbut not to the docker or syslog logs iirc22:00
ianwthe mariadb container is 41 hours old ... i don't think that lines up with the backup failure22:02
ianwAborted connection 33047 to db: 'gitea' user: 'root' host: 'localhost' (Got an error writing communication packets)22:03
ianwwhateer that means22:03
*** hamalq has joined #opendev22:11
openstackgerritIan Wienand proposed opendev/system-config master: install-docker: move rsyslog handler earlier  https://review.opendev.org/c/opendev/system-config/+/77747622:20
openstackgerritIan Wienand proposed opendev/system-config master: install-docker: remove fix from prior change  https://review.opendev.org/c/opendev/system-config/+/77747722:20
*** mlavalle has joined #opendev22:36
openstackgerritIan Wienand proposed openstack/project-config master: zuul-results-summary: set default branch  https://review.opendev.org/c/openstack/project-config/+/77748023:08
ianwcorvus: ^ i think that's maybe about right23:08
corvusianw: +2 lemme know how it goes :)23:09
ianwi don't think i've used depends-on: for that project before ... so it's just been cloning as part of "required-projects" where it has the override-checkout to main23:09
openstackgerritIan Wienand proposed openstack/project-config master: zuul-results-summary: set default branch  https://review.opendev.org/c/openstack/project-config/+/77748023:23
corvusianw: wait is that the right tenant?23:36
corvusianw: i thought it was opendev tenant; is it openstack?23:36
fungiwhich reminds me, i need to also work on the change to move git-review into the opendev tenant23:38
clarkbwe're incorporating that results summary plugin on the gerrit images which are built out of the openstack tenant23:40
clarkbI think?23:40
ianwcorvus: yeah, it's running on the system-config jobs23:40
ianwalthough ... https://opendev.org/openstack/project-config/src/branch/master/zuul/main.yaml#L6423:42
*** stewie925 has joined #opendev23:43
stewie925hello opendev support I have a question23:43
stewie925earlier I just attempted to re-associate my opendev account from work email to personal email23:44
stewie925however I noticed that my opendev commit is not pointing to my personal email address - could anyone help?23:45
clarkbstewie925: you mean your git commits have the wrong email address in them?23:45
stewie925yes sir23:45
clarkbthat is configured on the client side with `git config` and git applies those settings when making commits. Something like `git config --global user.email 'foo@bar.com'` will likely fix it for you23:46
clarkbnote that old commits won't be updated (I think if you amend them after that update it may fix it though23:46
stewie925sorry if you allow me to share...23:47
stewie925I tried to do it using the Opendev Settings - adding an email address and they set it to 'Preferred'23:48
stewie925and I clicked on Send Verification - nothing happened (no email).   I used gmail email address this first try23:50
clarkbdid you check your spam dir or similar?23:51
ianwcorvus: hrm, now i'm thinking it does belong in opendev.  but also that we need to exclude it from system-required jobs23:52
stewie925on the second try, I added a yahoo email address in the Settings > Email Address .  Then I clicked on Send Verification.  This time I got an email from Ubuntu One.  I followed that link and presto -  I see my Opendev account switched to my Yahoo email from my work email.23:52
stewie925clark, I did check my spam folder in gmail, there was nothing...23:52
clarkbstewie925: ubuntu one shouldn't be involved in that23:52
clarkbthe email addresses you tell gerrit to send alerts too are largely indpendent of ubuntu one (the single address associated with the openid is auto filled for you though)23:53
stewie925but anyway clarkb -  when I looked at my commits -  they're associated with my GMAIL account surprinsgly23:53
clarkbstewie925: the commits should all be associated with whatever email is in the commit itself23:53
clarkbstewie925: can you check if git log shows the gmail account in the commit?23:54
clarkbgerrit shoudl reflect that if so23:54
stewie925the commits were originally associated with my work email (I left work a few months back)23:55
openstackgerritIan Wienand proposed opendev/project-config master: zuul-results-summary: set default branch  https://review.opendev.org/c/opendev/project-config/+/77748523:55
stewie925but when I tried this thing - changing the preferred email thing -   the commits now changed from. my old work email to my Gmail email (the failed first try)23:55
clarkbstewie925: maybe you can be more specific about what you mean when you say "associated with my gmail account"23:55
clarkbbecause gerrit won't rewrite commits like that23:56
clarkbI think I'm not understanding23:56
stewie925when I try to filter my commits -   I do owner:self23:56
clarkboh changes not commits23:57
clarkbstewie925: did you change your email to yahoo in ubuntu one?23:57
stewie925oh sorry, changes23:57
clarkbI guess that would explain why ubuntu one sent you a verification email23:57
stewie925ohhh hmmm23:57
stewie925well yeah in the meantime I sent them an email before I came online23:58
clarkbif you did that, then there is a good chance ubuntu one created a new openid for you and when you logged in after that gerrit created a new account for the new openid23:58
clarkbwhich could explain the split23:58
clarkb(unfortunantely, we're in the middle of trying to fix ~650 preexisting issues of this sort so that we can more easily manage these problems in the future, but there is a lot to sort through and haven't quite made it there)23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!