Monday, 2020-11-23

*** tosky has quit IRC00:06
ianwok, just going to log a few things as i investigate nodepool00:15
ianw(CONNECTED [localhost:2181]) /nodepool/images/debian-stretch/builds> json_cat 000013039100:15
ianwis blank00:15
ianwthis is what is causing dib-image-list to bail00:15
ianwthe node before that, 0000130390 is complete00:15
clarkbianw: iirc the blank entries is a known issue that bmw has been looking at figuring out00:16
clarkband it causes things to not delete either00:16
clarkbwhich may also explain the disk use00:16
ianwclarkb: yeah, that's where i think i'm  coming at, we have a few old images from like aug that i think are orphaned00:17
ianw11/14/2020 @ 11:20am (UTC) is date from 130390, i wonder if our logs go back that far00:23
fungilog of the maintenance can be found here: http://eavesdrop.openstack.org/meetings/opendev_maint/2020/opendev_maint.2020-11-20-12.59.log.html00:30
fungiincluding that in a wrap-up e-mail to the service announce list shortly00:30
ianw130391 is i'm sure somehow related to a whole bunch of "keystoneauth1.exceptions.connection.ConnectTimeout: Request to https://api.us-east.open-edge.io:5000/v3/auth/tokens timed out"00:33
clarkbthe router died so we may need to disable it more00:34
clarkbI think that is why cloud launcher is failing too but I havne't had a chance to check00:34
fungiservice-announce e-mail sent, will work on the status notice next00:41
clarkbI see it00:43
ianwwhat was the tl;dr on statusbot?  i just uploaded a project-config change that hasn't been announced00:45
clarkbianw: you mean gerritbot?00:45
ianwsorry yeah00:45
clarkbianw: gerritlib is reading empty messages for some reason, it then fails to convert them to json and breaks. Fungi has been digging in with some modified code on the server and extra logging00:45
clarkbthe zuul scheduler saw similar but only when we restarted gerrit00:46
clarkbthat makes us think that maybe gerritbot is losing connectivity for some reason and reads are short or bad00:46
ianwyeah the last entry is @ Nov 23 00:17:20 eavesdrop01 docker-gerritbot[1386]: 2020-11-23 00:17:20,239 DEBUG paramiko.transport: EOF in transport thread00:46
clarkbone idea I threw out was maybe its a paramiko verison thing00:46
clarkband comparing zuul and gerritbot might be worthwhile for cryptogrpahy and paramiko package differences?00:47
clarkb(since zuul appers to be fine)00:47
fungisounds like paramiko is getting disconnected and not handling it well?00:47
clarkb(note that zuul doesn't use gerritlib but does vendor in code that is very similar)00:47
ianw29472 root      20   0  346724  22192      0 S 99.3  2.2  25:13.27 gerritbot00:47
ianwit's gone a bit nuts and is taking up 100% cpu00:48
funginot surprising, before i added a check that readline() returned something truthy, it was going into fits of reading thousands of zero-length strings per second00:49
ianwpoll([{fd=6, events=POLLIN|POLLPRI|POLLOUT}], 1, -1) = 1 ([{fd=6, revents=POLLIN}])00:50
ianwumb-init 29459 root    0u   CHR    1,3      0t0         6 /dev/null00:50
fungibefore i began to hack on it, the process was crashing on a readline() which returned something the json module couldn't parse, in retrospect seems to have been a null string00:50
ianwoh, no i've read it wrong00:50
ianwparamiko           2.7.100:51
ianwupstream is at 2.7.2 ... so it's recent00:51
clarkb2.7.2 fixed some bugs though00:52
fungi#status notice Our Gerrit upgrade maintenance has concluded successfully; please see the maintenance wrap-up announcement for additional details: http://lists.opendev.org/pipermail/service-announce/2020-November/000014.html00:52
openstackstatusfungi: sending notice00:52
clarkbhttp://www.paramiko.org/changelog.html00:52
clarkband I bet zuul is on 2.7.2 because we rebuild that image often?00:52
-openstackstatus- NOTICE: Our Gerrit upgrade maintenance has concluded successfully; please see the maintenance wrap-up announcement for additional details: http://lists.opendev.org/pipermail/service-announce/2020-November/000014.html00:52
ianw2.7.1 -> 2.7.2 doesn't look like anything relevant; a string format fix and something about ssh rsa key loading00:53
clarkbmight also be easier to do a local test harness if it is reproduceable00:53
clarkbbut then can narrow it down hopefully00:54
ianwi can try and get a backtrace00:54
clarkbthere were also some recent gerritlib updates around the gerritwatcher iirc00:55
clarkbI guess we can't rule out that they were fine with 2.13 but not 3.2?00:55
clarkbI dunno brain is tired00:55
openstackstatusfungi: finished sending notice00:55
clarkbcool ^ I think the is my cue to enjoy the evening00:56
fungiand with that, i need to knock off as well00:56
ianw++ thanks!00:56
ianwi'll see what i can come up with for gerritbot, and nodepool-builders are turned off and frankly a mess in ZK it seems00:56
ianwi'll try and get that going too00:56
clarkbthank you00:56
fungithis wouldn't be the first time we needed to manually delete null nodes out of zk00:57
ianwi feel like the missing openedge is a part of it; proposed https://review.opendev.org/c/openstack/project-config/+/763676 to remove uploads00:58
fungiianw: approved, but i expect that can leave cruft in zk for the uploaded image records01:01
ianwhrm, i guess the gerritbot container isn't privileged so i can't attach a debugger to it to get a bt01:02
fungifor what it's worth, i've been editing /var/lib/docker/aufs/diff/876e0b03c6ccb1863c0ef28fd27a9984845700722cd4feb1e2330cdf9f37a7d9/usr/local/lib/python3.7/site-packages/gerritlib/gerrit.py to add debugging and then downing and upping the container01:03
fungii also added eavesdrop to the emergency disable list so ansible wouldn't undo the debug level i set on handler_console in /etc/gerritbot/logging.conf01:04
fungithis is probably all proof that i'm bad at containering, apologies in advance01:05
ianw... i could chroot in there, then run gdb on the process ...01:05
fungiand remember that we're using released versions of gerritbot in that container, there are some refactors in mastr01:06
fungier, in master01:06
fungier, and gerritlib01:06
fungireleased versions of gerritLIB01:06
fungii need sleep01:06
ianwalright, attaching gdb is a futile exercise because it uses the /usr/local/bin/python , which has no symbols01:21
ianwdespite chrooting into the container working (i think the "real" way to do this is to start a separate container in the same namespace)01:22
*** hamalq has joined #opendev01:32
*** hamalq has quit IRC01:37
ianwfungi: in a root screen on eavesdrop i have gerritbot now running under a gdb we can break into a get a python backtrace with py-bt01:58
ianwbasically it's running a container; i've installed the debian versions of python (so we have symbols), added site-packages to the path (so the deb python3.7 finds the pip installed libraries) and run it manually under gdb01:59
ianwso if it goes bananas again, hopefully we can ctrl-c, "py-bt" and have a pretty good idea of what's going on01:59
ianwok, with that in monitoring phase, back to the nodepool builders.  i'm going to emergency them so the containers stop for good02:06
mordredianw: you should be able to docker exec into the running container instead of needing to chroot in to it. if you docker exec $containerid bash - you'll get a back in the existing container context02:17
mordredjust - fwiw02:17
mordredfungi: ^^02:17
ianwmordred: yeah, that was a total hack, because that container wasn't started with permissions gdb inside it couldn't attach.  so i chrooted into the container outside as root to run the gdb from the container, but in a context it could ptrace02:18
ianwinto the container fs i mean02:18
mordredhah - awesome02:18
ianwbut ... that didn't work anyway because it's running under the python-slim /usr/local/bin/python, which has no symbols02:19
ianwi've dropped /root/ianw-notes.txt on this, or screen number 2 in the screen session02:19
fungithanks02:26
ianwok, i'm back to trying to figure out how to cleanup nodepool02:44
ianwimage-list shows all these failed uploads02:45
ianwi feel like if i just delete all the debian-stretch builds, it should start fresh02:45
*** ykarel has joined #opendev03:03
ianwi'm starting nb01 and seeing if it builds centos-7, which is very old03:04
ianwok, we need a dib release with Ib292b0b2b31bd966e0c5e8f2b2ce560bba89c45c for centos703:29
*** hamalq has joined #opendev03:33
*** hamalq has quit IRC03:37
ianwfungi: ok, looks like it died03:39
ianw2020-11-23 02:57:16,501 DEBUG paramiko.transport: EOF in transport thread03:40
ianw[Thread 0x7fffeffff700 (LWP 750) exited]03:40
ianwso, straight after that paramiko.transport debug message the thread exits, and that's when it goes haywire03:52
ianwi've restarted it with a breakpoint on pthread_exit for the ssh thread ... see if we can get something interesting there03:52
*** ykarel has quit IRC04:09
*** raukadah is now known as chandankumar04:46
*** ykarel has joined #opendev05:15
*** openstackgerrit has joined #opendev05:15
openstackgerritMerged openstack/diskimage-builder master: Fix dynamic-login with grub2  https://review.opendev.org/c/openstack/diskimage-builder/+/76356605:15
openstackgerritMerged openstack/diskimage-builder master: Fix python-stow-versions  https://review.opendev.org/c/openstack/diskimage-builder/+/75161005:16
*** sgw has joined #opendev05:27
openstackgerritMerged opendev/system-config master: codesearch: Add robots.txt  https://review.opendev.org/c/opendev/system-config/+/76349905:41
*** ysandeep|off is now known as ysandeep05:54
*** danpawlik has quit IRC06:24
*** danpawlik has joined #opendev06:24
openstackgerritMerged opendev/system-config master: Clean up cron tab entry from ansible once removed from host  https://review.opendev.org/c/opendev/system-config/+/75859906:42
*** whoami-rajat__ has joined #opendev06:46
*** sboyron has joined #opendev06:52
ianwinfra-root: builder status update -- I have manually cleared out all the failed things from zookeeper that i think started with the disappearance of openedge and nodepool was not dealing with07:05
ianwi have removed all orphaned images from /opt on nb01 and it is slowly building07:05
ianwcentos-7 is currently failing to build and will need a dib release + nodepool-image update to work.  i have been through dib reviews and am merging a few outstanding things before a release07:06
ianwcentos-7 is paused in nb0107:06
*** iurygregory has joined #opendev07:06
ianwboth nb01 and nb02 are in emergency, and only builder is running on nb0107:06
ianwi don't know what's special about rax, but they have a bunch of images in the alien list.  i'll clean those up tomorrow07:07
*** lpetrut has joined #opendev07:07
ianwcentos-8-stream also failed, and i need to look into that07:10
openstackgerritDong Ma proposed openstack/project-config master: Remove ceilometer-zvm entry  https://review.opendev.org/c/openstack/project-config/+/76374207:12
*** DSpider has joined #opendev07:17
*** rpittau|afk is now known as rpittau07:29
fricklerzigo: do you know how to write a watch file that doesn't prefer pre-releases for pypi projects? see https://pypi.debian.net/git-review if I run uscan for that, I get 1.28.0.0a1 instead of 1.28.007:36
frickler(context: we need 1.27.0 or newer in order to work properly with our updated gerrit)07:36
*** ralonsoh has joined #opendev07:39
*** eolivare has joined #opendev07:53
ianwfrickler: hey, just wanted to call out prior conversation that gerritbot is currently running "manually" in a screen session on eavesdrop07:57
ianwit's running under gdb; if it stops, it would be great to get a py-bt ... it should *hopefully* catch the ssh communication thread exiting and hopefully give us a clue as to why it does that07:58
ianwthere's a few notes in /root/ianw-notes.txt07:58
ianwjust in case as people wake up and start complaining :)07:59
*** slaweq has joined #opendev08:01
cgoncalveshey folks! it looks like the Gerrit update went smooth for the most part? well done, team!08:13
cgoncalvesI wonder if anyone else is also receiving email notifications from Gerrit for all events (commentary and new PS)08:14
cgoncalvesI have 276+ emails in my inbox for projects I didn't even know they exist08:15
*** andrewbonney has joined #opendev08:23
*** tosky has joined #opendev08:38
fricklercgoncalves: can you check your settings at https://review.opendev.org/settings/#Notifications ? gerrit should only mail notifications it a project is listed there or if you are included as reviewer for a patch08:55
cgoncalvesfrickler, notification settings look good and have not changed prior to gerrit update08:56
*** mgoddard has joined #opendev08:58
fricklercgoncalves: hmm, o.k., do you have a sample patch for which you received a mail but shouldn't have?08:58
cgoncalvesfrickler, project cyborg: https://i.snipboard.io/guF4wS.jpg08:59
zigofrickler: Yes, I do, I have the same watch file for all projects, and it works well.08:59
cgoncalvesI have never contributed/commented on that project08:59
zigofrickler: Example for Nova:09:00
zigo$ cat debian/watch09:00
zigoversion=309:00
zigoopts="uversionmangle=s/\.0rc/~rc/;s/\.0b1/~b1/;s/\.0b2/~b2/;s/\.0b3/~b3/" \09:00
zigohttps://github.com/openstack/nova/tags .*/(\d[brc\d\.]+)\.tar\.gz09:00
zigoThis doesn't take into account the "a" thing, though that's just adding one more char in the mix.09:00
zigofrickler: I packaged git-review 1.28.0 to Testing/Unstable, should I backport it to buster-backports official ?09:01
zigofrickler: What I don't know, is how to fetch a git tag list with opendev.org ...09:04
zigoCan you help me with that ?09:04
fricklerzigo: not sure I understand that question, you want to see a tag list without cloning the repo?09:06
zigofrickler: Yeah, so that I can point my watch files to it.09:07
*** fressi has joined #opendev09:08
fricklerzigo: hmm, o.k. I see the github equivalent in your sample above. that will likely need a feature request for gitea. there is a request in the api but I doubt that that can be used in the watch file09:11
fricklercurl -X GET "https://opendev.org/api/v1/repos/opendev/git-review/tags" -H  "accept: application/json"09:11
zigoAh, that helps, will try it, thanks.09:12
ttxI noticed some extreme Gerrit dashboards failing with "Error 400 (Bad Request): limit of 10 queries"09:15
ttxnot a big deal, but thought i would report09:16
ttxExample: https://tiny.cc/ReleaseInbox09:16
fricklerttx: hmm, we'll have to see how we can fix that, I guess. could you please add the issue to https://etherpad.opendev.org/p/gerrit-3.2-post-upgrade-notes ?09:17
ttxsure thing! There might be a tunable there09:18
*** sboyron has quit IRC09:18
*** sboyron has joined #opendev09:19
zigofrickler: Any idea why this doesn't work?09:24
zigocurl https://opendev.org/api/v1/repos/opendev/git-review/tags | grep --color -E '(\d[abrc\d\.]+)\.tar\.gz'09:24
*** hamalq has joined #opendev09:36
fricklerzigo: seems grep uses [:digit:] instead of \d , or just use '([0-9][abcr0-9.]+)\.tar\.gz'09:36
fricklerat least that works for me with gnu grep 3.109:37
fricklercgoncalves: fwiw I see the mails to you in exim log but I didn't find a gerrit log showing why these are being sent. maybe someone with more clue can have a look later09:39
*** hamalq has quit IRC09:40
zigofrickler: Works, but I still can't figure out how to write the debian/watch file ... :/09:41
cgoncalvesfrickler, thanks for looking. please let me know if I can provide any additional info09:44
*** dtantsur|afk is now known as dtantsur09:50
zigofrickler: I got it to work with mode=git ! :)09:51
zigofrickler: $ cat debian/watch09:53
zigoversion=309:53
zigoopts="mode=git ,uversionmangle=s/\.0rc/~rc/;s/\.0a/~a/;s/\.0b1/~b1/;s/\.0b2/~b2/;s/\.0b3/~b3/" \09:53
zigohttps://opendev.org/opendev/git-review refs/tags/(\d[abrc\d\.]+)09:53
zigoStill, my original question remains: should I upload git-review to official Debian backports?09:56
dtantsurhey folks! great job with the gerrit update. notification emails about zuul comments no longer have clickable links, is it expected?10:04
fricklerzigo: I don't know much about backports, if you are unsure best wait for feedback from fungi or some other infra-root.10:19
zigofrickler: Ok. Fungi will be of good advice, I'm sure. :)10:20
fricklerdtantsur: my emails were switched to html format, I needed to switch the config back to text-only, maybe it's the other way round for you?10:20
dtantsurneed to check, thanks for the hint10:20
frickleror maybe links in text format work, because the email client auto-linkifies them, but they aren't links in html-format?10:22
dtantsur"To view, visit change 762369. To unsubscribe, or for help writing mail filters, visit settings." <-- the links are correct here10:23
dtantsurbut not in the body. hmm.10:23
dtantsurthe body is an HTML without links, simply <li> tags10:23
fricklerdtantsur: so switching back to text-only might help. see https://review.opendev.org/settings/#Notifications Preferences/Email format10:24
zigoIs there currently a way to simply wget a patch matching a review? I used to click on the gitweb link which is currently broken (and then later on, on the patch link).10:25
dtantsurfrickler: I'll try, but I used to use the HTML view and it worked correctly10:25
fricklerdtantsur: in that case you may want to add it to the etherpad as regression10:26
dtantsurlink?10:26
zigoThe only way I found was this:10:26
zigocurl "https://review.opendev.org/changes/openstack%2Fnova~763750/revisions/1/patch?download" | base64 -d >1.patch10:26
zigoA little bit annoying, but works.10:26
dtantsurzigo: I usually do ^^^10:26
*** hashar has joined #opendev10:27
fricklerdtantsur: https://etherpad.opendev.org/p/gerrit-3.2-post-upgrade-notes10:29
dtantsuradded, thanks10:30
dtantsurI wonder if it's the same as line 3510:32
*** mordred has quit IRC10:34
*** Eighth_Doctor has quit IRC10:34
*** iurygregory has quit IRC10:39
*** Eighth_Doctor has joined #opendev10:42
fricklerdtantsur: iiuc that is only about the rendering in the UI, I don't think that that should be related to email formatting10:45
*** ralonsoh has quit IRC10:51
*** ralonsoh has joined #opendev10:53
*** mordred has joined #opendev10:59
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind  https://review.opendev.org/c/zuul/zuul-jobs/+/74093511:20
*** hamalq has joined #opendev11:36
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind  https://review.opendev.org/c/zuul/zuul-jobs/+/74093511:37
*** hamalq has quit IRC11:41
*** ysandeep is now known as ysandeep|brb11:48
*** iurygregory has joined #opendev11:58
openstackgerritAlbin Vass proposed zuul/zuul-jobs master: Add option to install kubernetes with kind  https://review.opendev.org/c/zuul/zuul-jobs/+/74093512:14
*** ysandeep|brb is now known as ysandeep12:19
*** sboyron has quit IRC12:28
*** DSpider has quit IRC12:32
*** sboyron has joined #opendev12:37
*** sboyron has quit IRC12:49
*** sboyron has joined #opendev12:51
*** sboyron has quit IRC12:56
*** sboyron has joined #opendev13:01
sean-k-mooneyis there a way to disable the draft comments section or move it in the dashboard13:04
sean-k-mooneyreading https://bugs.chromium.org/p/gerrit/issues/detail?id=9740 currently but no solution so far13:04
slaweqfrickler: tobiash and other infra cores, can You review patch https://review.opendev.org/c/zuul/zuul-jobs/+/762650 ? It's a workaround for the issue with communication between nodes in multinode ovn based jobs13:04
slaweqit works as You can see in https://review.opendev.org/c/openstack/neutron/+/762654 in neutron-ovn-tempest-slow result which finally passed13:05
sean-k-mooneywhen did we start usign ovs for that again instead of just creating vxlan tunnels in the kernel13:06
sean-k-mooneyi tought we moved away form ovs becaue it conflicted with some other jobs in the past13:10
sean-k-mooneyi know it caused issue with our third paty ci for ovs-dpdk at one point again because it compiled ovs from source13:11
sean-k-mooneythat said we didnt actully need the tunnel for that third party ci so it mostly worked fine without the tunnel13:12
*** sboyron_ has joined #opendev13:13
*** sboyron has quit IRC13:15
openstackgerritSorin Sbârnea proposed opendev/git-review master: WIP: Allow user to enable wip by default  https://review.opendev.org/c/opendev/git-review/+/76378013:19
*** sboyron_ has quit IRC13:21
*** sboyron_ has joined #opendev13:21
*** sboyron__ has joined #opendev13:23
*** sboyron_ has quit IRC13:25
openstackgerritSorin Sbârnea proposed opendev/git-review master: WIP: Allow user to enable wip by default  https://review.opendev.org/c/opendev/git-review/+/76378013:26
openstackgerritSorin Sbârnea proposed opendev/git-review master: WIP: Allow user to enable wip by default  https://review.opendev.org/c/opendev/git-review/+/76378013:27
*** sboyron__ has quit IRC13:28
*** sboyron__ has joined #opendev13:28
*** sboyron__ has quit IRC13:29
*** sboyron__ has joined #opendev13:33
*** sboyron__ is now known as sboyron13:33
*** hamalq has joined #opendev13:37
*** hamalq has quit IRC13:42
*** auristor has quit IRC14:00
fungizigo: git-review 1.27 in buster should also be new enough to work with our gerrit, i think... is it not?14:00
zigofungi: Hi there!14:01
zigoWell, I've just heard that for the new gerrit of this week-end, one must use git-review >= 1.28 ...14:01
fungii thought we said >=1.27 but maybe we got it wrong14:01
zigofungi: Oh, you're right, I missread what was written in this channel this morning.14:02
zigoSo I wont do a backport then.14:02
fungiwell, it can't hurt to confirm 1.27 is working okay... i'll try to test that shortly14:02
zigofungi: FYI, I just succeeded in using mode=git in watch files, it worked for git-review for me ! :)14:02
fungioh excellent14:02
zigoThe part that I was missing was the need to add "refs/tags/" before the version definition.14:03
zigoSo, something like this works:14:03
zigoversion=314:03
zigoopts="mode=git, uversionmangle=s/\.0rc/~rc/;s/\.0a/~a/;s/\.0b1/~b1/;s/\.0b2/~b2/;s/\.0b3/~b3/" \14:03
zigohttps://opendev.org/opendev/git-review refs/tags/(\d[abrc\d\.]+)14:03
zigoI'm not sure how many package I have to update with that fix though I've lost track of that... :(14:04
*** auristor has joined #opendev14:11
fungiany reason you don't just do s/\.0b/~b/ like you so with .0a?14:17
fungialso be aware those are python-specific (pep 440) versioning conventions, so you wouldn't probably apply them to non-python projects14:17
*** sboyron has quit IRC14:28
*** sboyron has joined #opendev14:29
*** rpittau is now known as rpittau|brb14:36
tristanCwe are interested in porting the hideci.js script for the new gerrit ui, has this work started already?14:51
*** ykarel has quit IRC14:52
clarkbtristanC: not that I am aware of. One thought was to see about integrating it into the existong zuul plugin15:01
*** rpittau|brb is now known as rpittau15:01
clarkbalso note that our workaround for x/foo repos may currently prevent polygerrit  plugins from working15:01
clarkbwe're trying to work with upstream to figure out how to address this properly15:02
tristanCclarkb: where is this work with upstream happening?15:02
clarkbtristanC: for the zuul plugin or the /x/ conflict?15:03
tristanCfor the zuul plugin15:04
clarkbtristanC: https://gerrit-review.googlesource.com/q/project:plugins/zuul we don't use the plugin (yet) but wikimedia does15:05
tristanCsoftware-factory is also upgrading to gerrit-3.x and we are looking for fixing the remaining issues. the zuul ci report table seems to be biggest one15:06
clarkbonce the upgrade itself settles a bit more we'd like to start looking at using more plugins like the zuul one (basically we aren't using it yet because it simplified upgrading and all that but now that we only have to build ~1 image and have ~1 gerrit to test its much easir to start looking at that stuff)15:07
tristanCthank you for the pointer, so the plugins/zuul seems to be written in a mixed of java and polymer javascript template. I'm not sure we'll be able to contribute to that15:10
clarkbtristanC: what is the issue with that?15:11
clarkbbut ya aiui all gerrit plugins are either going to be java or poylygerrit js or both15:11
tristanCclarkb: well we'll have to learn the language. we were more looking at re-using the existing simpler hideci.js and adding it to the review page15:13
clarkbtristanC: you've got us all looking at different languages over in #zuul :P15:13
tristanCclarkb: which where all functional... well if you think that using the plugins/zuul is the way to go, we can have a look, but that seems like a lot more work than adapting the hideci script15:16
clarkbtristanC: yes, we think it is better beacuse then we can adhere to proper apis and not be broken every time gerrit updates15:16
*** DSpider has joined #opendev15:16
clarkbhideci was always a hack, there is a non hacky way to do it now and we think thati s preferable15:18
clarkbtristanC: re functioanl, yes but zuul and its associated code are not written functionally or in functional languages. I guess my point was more that its ok to learn new languages if they provide a benefit15:18
clarkbin your case it provided benefits to zuul's k8s oeprator config file management. In this case it provides a benefit to your use of gerrit15:18
clarkbTheJulia: I know you're running the ironic meeting right now, but when that is done I would be curious to hear more about the graphical issues you have had. I have yet to encounter anything like that in testing or post upgrade15:20
clarkbI wonder if it could be related to browser hardware acceleration15:20
clarkbfungi: ianw fungi it seems that gerritbot is still running. SO maybe something with how the container is setup? I haven't looked at ianw's new notes on that yet though and maybe that has more infos I should catch up on15:22
fungiclarkb: i think the watcher thread dies and then gerritbot stops getting new events, but the gerritbot process itself lives on doing nothing15:25
clarkbfungi: ya I mean whatever ianw has done has kept it reporting15:25
fungioh, i see. cool!15:25
clarkbat least as recently as 15:08UTC in the ironic channel15:25
fungii have yet to revisit that problem15:25
openstackgerritMerged openstack/diskimage-builder master: Add support for vlan interfaces in dhcp-all-interfaces.sh  https://review.opendev.org/c/openstack/diskimage-builder/+/76117715:27
clarkbas recently as 15:27 now :)15:27
fungiheh15:27
*** elod has quit IRC15:27
*** lpetrut has quit IRC15:28
*** elod has joined #opendev15:28
tristanCclarkb: i worry that learning all that gerrit plugin development stack is going to take a lot more time than simply porting the previous hack. Perhaps we could start by restoring the missing feature using the hideci script, and then replace it by the plugin when it's ready/installed?15:30
openstackgerritJeremy Stanley proposed openstack/project-config master: Run tag-releases on ubuntu-focal  https://review.opendev.org/c/openstack/project-config/+/76379715:30
fungitristanC: how much sunk cost is that every time some new change to polygerrit causes hideci.js to break and needs forward-porting?15:31
tristanCfungi: i think in a day or two we can get it working15:31
fungieach time it breaks?15:31
tristanCperhaps it can be improved to break less often too15:32
tristanClooking at gerrit dev-plugin documentation, it seems like it support `Web UI plugins distributed as a single .js file`15:33
clarkbtristanC: yes I think those are the "polygerrit" plugins15:33
clarkbthose are the type we expect won't work in our current deployment (though we intend on fixing that)15:34
fungispecifically because of our workaround to be able to continue to clone x/.* repositories from gerrit15:34
fungihttps://bugs.chromium.org/p/gerrit/issues/detail?id=1372115:34
clarkbfungi: after meetings I was going to try and summarize the ml thread thoughts on ^ and do my best to get people to shift conversation there15:35
*** fbo has joined #opendev15:37
*** hamalq has joined #opendev15:38
tristanCfungi: it's also that the zuul plugin seems more complex than just display the ci result in a table, e.g. https://gerrit-review.googlesource.com/c/plugins/zuul/+/275024/115:39
tristanCfungi: so i agree hideci sounds like a sunk cost, but i think it's worth a try to at least restore the feature with 3.215:40
openstackgerritMerged openstack/diskimage-builder master: simple-init: also remove en* interfaces from the images  https://review.opendev.org/c/openstack/diskimage-builder/+/76366015:42
*** fressi has quit IRC15:42
*** hamalq has quit IRC15:42
*** fressi has joined #opendev15:43
*** chandankumar is now known as raukadah15:49
clarkbI've annotated thoughts on the post upgrade notes etherpad to tryand call out what I suspect are sources of problems or where upstream bugs should be filed15:52
*** ysandeep is now known as ysandeep|brb15:52
*** elod has quit IRC15:54
*** elod has joined #opendev15:54
fungithanks!15:57
*** mlavalle has joined #opendev16:03
openstackgerritMerged openstack/project-config master: Run tag-releases on ubuntu-focal  https://review.opendev.org/c/openstack/project-config/+/76379716:08
openstackgerritSorin Sbârnea proposed opendev/git-review master: Make py36 minimum version required  https://review.opendev.org/c/opendev/git-review/+/76380316:11
clarkbzigo: ^ that will break xenial users. I don't think we should do that16:12
*** mgoddard has quit IRC16:12
clarkbsorry zigo that was for zbr ^16:12
*** xavpaice has quit IRC16:12
*** mgoddard has joined #opendev16:15
zbrclarkb: do we have a support contract that binds us to support it? i am sure we cannot run out of distros using unsupported python versions.16:16
clarkbzbr: we use xenial still16:16
clarkbwe maintain the software16:16
clarkbtherefore I'm -2 on that16:16
zbrclarkb: dropping it does not render existing version unusable16:17
clarkbzbr: no but if say we upgrade and things chagne again it could16:17
clarkband really keeping 3.5 support isn't that big of a burden16:17
zbrpy35 cannot have inline type hints16:17
clarkbya I'm fine with that16:17
zbrfor py35 user must create pyi files which are a pita16:18
clarkbthe benefit to making those changes in a simple tool like this doesn't ouweigh supporting existing users16:18
clarkbwhcih includes ourselves16:18
zbrclarkb: ok.... not pleased but if we are using it....16:19
clarkbzbr: also I think seeing the trouble 1.26 has caused on even newer distros poinst to us needing to be cautious with that utility more generally16:19
clarkbits small, we can manage to keep old python support for a little while longer16:19
zbrit makes me bit curious because git-review is a developer tool, which comes to the question which developer run xenial on his machine in 2020.16:20
fungiit also gets run in automation16:20
zbrclarkb: have you read the news? ansible 3.0 (previous known as 2.11) will require python 3.8 on controller.16:22
zbrhttps://www.reddit.com/r/ansible/comments/jwzwwf/ansible300_schedule_and_preview_of_400_schedule/16:22
clarkbok?16:22
clarkbI hadn't but I guess we just won't upgrade for a while then16:22
zbrthat is what i said too, it will make even worse to upgrade ansible.16:23
zbrmaybe for git-review is not a big deal as it have very few deps but for other projects is a really PITA to support py35 because almost all library vendors already dropped support for it.16:25
zbrif something goes bad in requests, i doubt they will want to make a new release of it.16:26
clarkbwell pypi also does version specific installs now16:27
clarkber pypi + pip16:27
clarkbso it shouldn't be a major problem as long as those deps have annotated themselves properly16:27
fungiexcept insofar as most of those dependencies probably aren't backporting security fixes to patch releases and just roll forward, leaving tools which need older python on older vulnerable versions of those deps16:28
openstackgerritSorin Sbârnea proposed opendev/git-review master: Make py35 minimum version required  https://review.opendev.org/c/opendev/git-review/+/76380316:29
zbri think we should update the bot to skip notifications for WIP changes, what do you think?16:30
*** ysandeep|brb is now known as ysandeep16:30
clarkbzbr: we need to fix the bot's ssh streaming first16:31
fungithat seems like it could be a good improvement, though i don't think it's urgent since we didn't have a wip feature before now and not many people are likely to start using it right away16:31
fungiand yeah, bug fixes/regressions related to the upgrade take priority16:32
clarkbwhatever ianw has done seems to make it happy16:32
clarkbso we probably just need to work backward from there16:32
fungiyeah, i'm going to not touch it for now as long as it continues running, and sync up with him once he's awake16:32
fungiwe have plenty of other fish to fry in the meantime16:32
zbri also observed a lot of verbosity when uploading changes "Processing changes: updated: " --- progress like ones. I wonder if we can disable them, or to filter inside git-review.16:36
*** lpetrut has joined #opendev16:38
clarkbas a heads up TheJulia seems to have worked around the new ui graphical issues by disabling hardware acceleration in the browser16:40
clarkbif anyone sees that this appears to be the workaround16:40
fungiwas it chromium? apparently chromium 86 also crashes if you try to use webrtc, cause suspected to be related to hardware accel16:42
clarkbchrome16:42
fungisupposedly 87 fixes it16:42
clarkbthough not sure if that is just short for chromium in this context16:42
fungiahh, maybe not the same problem then16:42
zbrfungi: maybe you can help me with clarification about how to add a generic tox-py39 job. https://review.opendev.org/c/zuul/zuul-jobs/+/76219216:51
zbri was asked to remove the ubuntu-focal nodeset, which obviously produced a failure to find py39 as it is not available on bionic (current default nodeset)16:52
fungizbr: in a meeting right now but i can look for an example after i get the release jobs fix tested16:54
zbrsure, no pressure. thanks16:54
cgoncalvesfungi, clarkb: earlier today I reported here on the channel an issue I'm having post-gerrit upgrade. Gerrit is sending me email notifications for all sorts of events (e.g. new PS, comments, votes) even for projects I didn't know they exist. email count up +580 since Gerrit was upgraded16:55
clarkbcgoncalves: ya I updated the etehrpad with notes on that one16:56
clarkbthere is a flag that I suspect is related that we can turn off in the gerrit config16:56
clarkbcgoncalves: are you a reviewer on those changes? I think that would help confirm it16:56
clarkbthe ui should show you who it thinks all the reviewers are (people may have added you mistakenly or something)16:57
cgoncalvesclarkb, I am not a reviewer16:58
cgoncalvesfor example, I received two email notification for https://review.opendev.org/c/openstack/python-tripleoclient/+/75783616:58
cgoncalvesI don't follow that project or am a reviewer16:58
clarkbok I just spot checked my gerrit folder and don't see similar.16:59
fungiyeah, it's not doing that to me17:00
clarkbI'm thinking we check your user's project watches and external ids directly17:00
clarkbcgoncalves: if you go to https://review.opendev.org/settings/ what does it say your ID is?17:00
cgoncalvesclarkb, 646917:00
cgoncalvesclarkb, also https://i.snipboard.io/SotlJ3.jpg17:01
*** hamalq has joined #opendev17:01
clarkbits because you're in all projects and all users I bet17:01
clarkbcgoncalves: any idea why you're in those?17:01
clarkbcgoncalves: can you remove the subscription you have for those two repos and see if the problem goes away?17:02
cgoncalvesclarkb, that came as default. also note that it's only applicable when I am either the owner or reviewer17:02
*** rpittau is now known as rpittau|afk17:03
clarkbdefault adding back in the 2.x version days or did you add ubscriptions on 3.2 and it added those?17:03
clarkbanyway I highly suspect it is those because projects inherit from them17:03
clarkbI would start by removing those watches and see if it chagnes anything17:03
cgoncalvesclarkb, default since I have my account, years ago. I haven't touch notification settings in a long time17:03
clarkbthanks, if that is the problem its a good thing for us to know as we may need to manually clear those out for people if this fixes it17:03
fungiunrelated, but looking at the release job failure i'm also wondering if we want to expand the list of gerrit host keys we add to known_hosts? https://zuul.opendev.org/t/openstack/build/fe46b5286a8145a89e06df95eee2ecf2/console#1/0/26/ubuntu-bionic17:04
fungistill trying to work out where that's getting passed in17:04
clarkbfungi: I think it may be part of the secret (but a plain text attribute)17:05
fungiyeah, i'm just not finding it in codesearch, but that helps me narrow down where to look at least17:05
openstackgerritSorin Sbârnea proposed openstack/project-config master: Update git-review test matrix  https://review.opendev.org/c/openstack/project-config/+/76380817:08
openstackgerritSorin Sbârnea proposed opendev/git-review master: Make py35 minimum version required  https://review.opendev.org/c/opendev/git-review/+/76380317:10
*** sboyron_ has joined #opendev17:13
openstackgerritSorin Sbârnea proposed openstack/project-config master: Update git-review test matrix  https://review.opendev.org/c/openstack/project-config/+/76380817:14
*** sboyron has quit IRC17:16
clarkbcgoncalves: ok I see you've removed those watches, pelase let us know if the behavior changes17:19
openstackgerritSorin Sbârnea proposed opendev/git-review master: Allow choosing which field to use as author when naming branch  https://review.opendev.org/c/opendev/git-review/+/44457417:20
cgoncalvesclarkb, no new emails since 18 minutes ago. I was getting at least one every ~2 minutes17:23
clarkbcgoncalves: cool. I think we'll work out how to audit others in that situation and then see if we can fix it for them17:23
cgoncalvesclarkb, thank you17:24
*** sboyron__ has joined #opendev17:25
*** sboyron_ has quit IRC17:29
fungiaha, https://opendev.org/openstack/project-config/src/branch/master/zuul.d/secrets.yaml#L525-L52617:37
fungithere's a similar entry for proposal_ssh_key in there too17:37
fungii'll double-check whether we should add entries to that17:38
fungiclarkb: after you did something similar in nodepool for the fips fix, what's your take on adding multiple keys?17:38
fungijust use ssh-keygen to get all the keys being served by the api and stick them all in the secret?17:39
clarkbfungi: ya gerrit publishes them iirc17:41
clarkbshould be reasonable to add them in17:41
*** hamalq has quit IRC17:43
*** hamalq has joined #opendev17:43
openstackgerritSorin Sbârnea proposed zuul/zuul-jobs master: Build sphinx with python3 instead  https://review.opendev.org/c/zuul/zuul-jobs/+/73592317:46
*** mgoddard has quit IRC17:54
fungii guess gerrit no longer lists its ssh host keys in the account settings view?17:57
*** weshay|ruck is now known as weshay|interview17:57
clarkbhuh seems to be the case17:58
*** ysandeep is now known as ysandeep|away17:59
fungihow did you go about retrieving all the host keys with nodepool?18:00
tristanCclarkb: fungi: digging the gerrit zuul plugin, it doesn't seems to implement the build result table. And looking at https://gerrit-review.googlesource.com/Documentation/js-api.html it seems relatively simple to render the build result in a table under the commit message18:02
clarkbfungi: its super hacky paramiko I wouldn't replicate it18:02
clarkbfungi: it starts an ssh connection and does handshaking but client only advertises one valid hostkey type. Then if you handshake successfully you record that hostkey18:02
clarkbfungi: I think ssh-keycsan can do it?18:02
fungioh, yep, i was looking at the keygen manpage. d'oh!18:03
tristanCi actually started to test the api, and the "showChange" callback provides an object with all the info we need, which seems more stable than the hideci implementation which goes through the dom objects18:03
zbrany chance to +W the POLLIN fix for non-linux? https://review.opendev.org/c/opendev/gerritlib/+/72996618:04
openstackgerritJeremy Stanley proposed openstack/project-config master: Add known_hosts entries for additional Gerrit keys  https://review.opendev.org/c/openstack/project-config/+/76383018:12
fungii need to step away for a few minutes to do stuff i should have done 5 hours ago when i woke up, back shortly18:14
*** eolivare has quit IRC18:22
sean-k-mooneyby the way i have not checked my third party ci yet but do we need to do anyting after the gerrit change18:25
sean-k-mooneye.g. restart zuul or similar18:25
clarkbsean-k-mooney: your zuul gerrit config needs to use basic http auth instead of digest auth if you have an http gerrit conenction set up18:26
clarkbbasic authi is the default if you just remove the digest auth option18:26
sean-k-mooneyi think im using an ssh key18:27
clarkbif using ssh I don't think there is anything to do18:28
clarkbthe only change we made was for the http auth option18:28
sean-k-mooneyhttp://paste.openstack.org/show/800329/ that my config more or less and ya it look like i have abuild around 418:29
sean-k-mooneyso i guess its working18:29
sean-k-mooneyok ill let ye know if i notice anything odd18:31
sean-k-mooneyya the only thing i see in teh logs are Errno None] Unable to connect to port 29418 on 104.130.246.32 or 2001:4800:7819:103:be76:4eff:fe04:922918:36
sean-k-mooneybut that is proably just when ye were doing the reboots18:36
sean-k-mooneywell restarts18:37
clarkbyes we restarted a number of times over the weekend to edit configs18:37
sean-k-mooneylooks like it was pretty transparent other then that18:37
sean-k-mooneyhum ok so  i might not want to keep 30 days of debug logs though18:39
sean-k-mooney2.7G /var/log/zuul/18:39
clarkbzuul is chatty18:39
sean-k-mooneyespacialy when it dumps the gerrit events into the logs at debug level18:39
sean-k-mooneyit has hit the 30 day limit i guess but the folder its logging to is not on the cinder volume i set up with space for this stuff. i just forgot to turn off debug18:42
clarkbwe run with debug on beacuse we tend to be zuul beta testers :)18:43
sean-k-mooneydebug might make sense for the zuul logger but i dont need it for teh gerrit one right18:44
sean-k-mooneythis is what i was going to change it too http://paste.openstack.org/show/800333/18:46
clarkbunless you end up debugging why a specific gerrit event isn't causing jobs to trigger18:46
sean-k-mooneyalthough i mght move teh directly to the cinder volume too18:46
clarkbalso one thing we do is logs debug + everything else to a different file18:46
clarkbwhich allows us to rotate it faster if we want to18:46
clarkbyou could set up debug + everything else to be a daily log and everything else to be 30 days18:46
sean-k-mooneyyep i think i copied my logging config form the upstream one18:47
sean-k-mooneyi gess i could just drop it to 5 days18:47
sean-k-mooneyhonestly if i move it to the cidner volume i really dont mind having the 30days there18:47
sean-k-mooneyits just my root partion is not that large and dont want it to file up but the cinervolume i can always make bigger if i need too18:48
*** sboyron__ has quit IRC18:49
*** sboyron__ has joined #opendev18:49
*** dtantsur is now known as dtantsur|afk18:49
sean-k-mooneyclarkb: by the way just tried generating a new http password for gertty to use and im getting a 500 internal error for gerrit18:58
sean-k-mooneyEndpoint: /accounts/self/password.http18:59
fungisean-k-mooney: when doing it in the preferences view in the webui?18:59
sean-k-mooneyfrom https://review.opendev.org/settings/#HTTPCredentials19:00
sean-k-mooneyin the ui yes19:00
sean-k-mooneyi was just trying to see if gertty worked with new gerrit19:00
sean-k-mooneyi dont use it often but was just wondering but first hurdel was my config was from before the review.openstack.org to review.opendev.org rename and my http password was out of date too19:01
sean-k-mooneyits been like 2 years since i tried to use it19:01
clarkbsean-k-mooney: FileLock invalidated by an external force19:02
sean-k-mooneysuch a discriptive error message19:02
clarkbI wonder if lots of people updating accounts is thrashing the locks around the updates19:02
clarkbsean-k-mooney: can you try again now and see if it happens again?19:03
sean-k-mooneymaybe  did ye reset people passwords when we had the privlage escalation issue19:03
clarkbwe deleted them (but that was on the sql db)19:03
sean-k-mooneyya i assume my old one did not work because of something related to that19:04
sean-k-mooneyalthoug i did not test it after i fixed the url19:04
sean-k-mooneyill try it again later in the week19:04
sean-k-mooneycan gertty use ssh keys by the way?19:05
clarkbI don't think so19:05
sean-k-mooneyya looking quickly i dont see anything that suggest it does19:06
sean-k-mooneywhich makes sense its connecting to the http api19:06
sean-k-mooneywith the review now stored in notedb gertty could use that too right19:08
sean-k-mooneyi mean once its updated19:08
*** sboyron__ has quit IRC19:10
*** sboyron__ has joined #opendev19:10
clarkbinfra-root re ^ looking in the error log it is complaining about NativeFSLocks on /var/gerrit/index/accounts_0011/write.lock19:15
clarkbI don't see any locks on that file using lslocks from the host side, do we need to docker exec lslocks?19:16
clarkbanother thought is maybe we trigger an online reindex on accounts?19:17
fungimaybe... i'm getting flashbacks of the 2.11(?) upgrade attempt where we had query timeouts causing changes to go missing19:17
fungiare there a bunch of those errors?19:18
*** toma4 has quit IRC19:18
*** sboyron__ has quit IRC19:19
clarkbyes, and I was able to induce it by changing my tab width in my preferences19:19
*** sboyron__ has joined #opendev19:19
clarkbrunning lslock -u in the container shows gerrit has locks on other index items but not the accounts one19:19
clarkbI also half wonder if this is a stale lockfile because of a restart we did19:19
clarkband gerrit isn't shutting down gracefully and the lock file remains in place and gerrit won't remoe it?19:20
clarkbhrm unlikely the files in that dir were last updated ~17:04 UTC today19:22
clarkband we last restarted around 23:00UTC yesterday19:22
clarkbeither something has the lock for valid reasons for a couple hours or something is sad?19:22
*** toma4 has joined #opendev19:25
openstackgerritJeremy Stanley proposed zuul/zuul-jobs master: Use Python 3.x with launchpadlib  https://review.opendev.org/c/zuul/zuul-jobs/+/76383419:25
*** sboyron__ has quit IRC19:25
*** sboyron__ has joined #opendev19:25
*** mgoddard has joined #opendev19:26
clarkbit does seem to be persistent. I think we should consider restarting gerrit to see if it can get the lock back?19:27
clarkband maybe trigger an online reindex?19:27
fungiyou're hopign a reindex will clear file locks?19:28
clarkbactually hold on19:28
clarkbCaused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed <- the tracebacks are long but I think this is the real issue19:28
clarkboh wait no I'm just confused by the tracebacks being 500 lines19:29
clarkbthat error is caused by the lockfile error19:29
clarkblooking at logs it started at 17:39 ish19:31
fungii wonder what could have happened around that time19:31
fungi17:39 ish today?19:31
clarkbyes19:32
clarkbzbr: updated preferences at that time is the first occurnece I find19:32
clarkbhttps://bugs.chromium.org/p/gerrit/issues/detail?id=440019:32
clarkbthat is a fixed bug from a long time ago but similar issue19:32
clarkbI think what it is saying is that it doesn't have the lock on that file anymore so it cannot write to the index19:35
clarkbbut lslocks doesn't seem to show anything has the lock19:35
clarkbwhcih is why I suspect if we restart it will startup and grab the lock and be happy again (of course I could be wrong about that)19:36
*** lpetrut has quit IRC19:36
fungicreationTime=2020-11-21T15:45:53.889402Z19:36
*** mgoddard has quit IRC19:38
fungimost recent occurrence seems to have been 19:20:0519:38
clarkbthe creation time for the same file on review-test is 2020-11-0919:38
clarkbI don't think they are relying on file presence as much as linux fs locks19:39
fungicould it be caused by git gc/repack?19:41
fungithough looks like we repack daily at 04:17 utc19:42
clarkbI doubt it since this is all happening on the lucene side in review_site/index not review_site/git19:42
fungiohhhh19:42
fungi/var/gerrit/index right19:42
clarkbinterstingly its a warning for some operations on accounts and an error for others19:42
clarkbsean-k-mooney: found the error version19:42
fungicalling through com.google.gerrit.server.index.change.ReindexAfterRefUpdate.onGitReferenceUpdated19:43
*** andrewbonney has quit IRC19:44
openstackgerritPaul Belanger proposed zuul/zuul-jobs master: Switch to container-images for push-to-intermediate-registry  https://review.opendev.org/c/zuul/zuul-jobs/+/76383619:44
openstackgerritPaul Belanger proposed zuul/zuul-jobs master: Switch to container_images for push-to-intermediate-registry  https://review.opendev.org/c/zuul/zuul-jobs/+/76383619:44
clarkbat 17:42 track upstream runs19:47
clarkbbut thats after we notice the problem19:47
fungisome other interesting errors in the log too19:48
fungi"Account [...] has invalid filter in project watch ProjectWatchKey"19:49
fungilots of "Cannot merge" errors, maybe those refs are broken19:50
fungicoming from "Error checking mergeability of ..." various refs for different projects19:50
clarkbya thats openstack/openstack and its sad19:51
clarkbbut we knew that from before iirc19:51
fungioh, these are all openstack/openstack okat19:51
fungialso "Cannot check change kind of new patch set..." related to openstack/openstack19:51
clarkbya so my two thoughts on this are: service restart may allow it to grab a new lock. Reindex may forcefully take the lock (probably only as offline though)19:52
clarkbon review-test lslocks shows accounts_0011 is locked by gerrit19:53
fungithat's suspicious19:53
fungiso yeah maybe the lock is from ages ago?19:54
clarkbno sorry I don't mean it to be suspicous19:54
clarkbI'm saying review-test is happy and is currently holding a valid lock19:54
fungioh19:54
clarkbreview is unhappy and lslocks does not have a lock19:54
clarkbbut I can't find anything that would indicate why review lost its lock19:54
clarkbother than maybe track-upstream or manage-projects?19:54
clarkbbecause those bind mount the same dirs into different contexts19:55
clarkbmaybe docker/containers/linux doesn't like that?19:55
fungicould the lock be outside the container?19:55
fungier, for a process outside the container19:56
clarkbif I run lslocks outside the container I don't see it either19:56
fungi:/19:56
fungibut definitely locks for other files under /var/gerrit/index19:56
clarkbya19:57
fungistatus notice The Gerrit service on review.opendev.org is being restarted quickly to troubleshoot an unexpected error condition, downtime should be less than 5 minutes19:58
fungithat ^ work?19:58
clarkbwfm19:58
fungi#status notice The Gerrit service on review.opendev.org is being restarted quickly to troubleshoot an unexpected error condition, downtime should be less than 5 minutes19:59
openstackstatusfungi: sending notice19:59
fungiyou want to down and up -d the container or shall i?19:59
-openstackstatus- NOTICE: The Gerrit service on review.opendev.org is being restarted quickly to troubleshoot an unexpected error condition, downtime should be less than 5 minutes19:59
clarkbcan you do it?19:59
fungiyep19:59
clarkbsudo lslocks shows it has the lock now20:00
fungii agree20:00
clarkbI'll update my tab width again20:00
*** hamalq has quit IRC20:01
fungijava            115703  POSIX   0B WRITE 0     0   0 /var/gerrit/index/accounts_0011/write.lock20:01
clarkbfungi: do you want to work on filing a bug or should I/20:01
fungiyou seem to have more context, but also we don't have a ton of detail yet to include right?20:01
clarkbya we don't have a tno of detail but I think its better to start this discussion early with them20:02
openstackstatusfungi: finished sending notice20:02
fungii can draft something in an etherpad first20:02
fungijust to make sure i've got all the context20:02
*** openstackgerrit has quit IRC20:02
clarkb++20:03
clarkbI was able to change my tab width setting without issue20:03
clarkband there is no new traceback with my username in the error log20:03
clarkbtalking out loud here not knowing anything about lucene, it kinda feels like ti should try to reacquire the lock after it has lost it20:04
clarkbsince the system said no one had the lock20:04
clarkbthat should've been successful20:04
*** hamalq has joined #opendev20:13
*** sboyron__ has quit IRC20:13
*** sboyron__ has joined #opendev20:13
*** jhesketh has quit IRC20:16
*** hamalq has quit IRC20:19
*** jhesketh has joined #opendev20:23
sean-k-mooneyclarkb: well it seams to work now after teh restart20:29
ianwo/20:29
clarkbsean-k-mooney: ya we confirmed the gerrit process is holding the file lock and I tested too20:31
ianwlooks like gerritbot hit the problem @ 2020-11-23 19:59:49,916 DEBUG paramiko.transport: EOF in transport thread20:31
clarkbianw: we restarted gerrit20:31
clarkbianw: but it was doign fine otherwise which leads me to think whatever you did fixed it20:31
fungiyeah, though i guess it still needs restarting after gerrit restarts20:31
ianwi didn't do anything but run it under gdb :)20:31
fungithe timing definitely corresponds with the gerrit restart20:32
ianwyeah, it hit the exit breakpoint, but unfortunately there's no python backtrace at that point20:32
clarkbianw: did you install it somewhere else?20:32
clarkbor maybe pull an ew imge or something?20:32
ianwwell it is running under the debian python 3.7, in the image, not the -slim python that installs in /usr/local/bin/20:33
clarkbianw: I wonder if that is it20:35
clarkbbecause it was rock solid20:35
clarkbdoes zuul use the same python that gerritbot uses?20:35
ianwyeah, i mean the idea is to use /usr/local/bin/python in these containers, the debian python isn't even installed by default20:37
ianwi just did that to get a python with symbols20:37
clarkbya I mean are they both using 3.7 or 3.8 ro whatever?20:37
clarkbas those will be different builds and there could be an issue with whichever one gerritbot has if zuul is different20:37
ianwoh, hrm, i think zuul is a 3.8 container now?20:37
clarkbya just confirmed20:41
clarkbzuul is 3.8 and gerritbot is 3.7 but otherwise they use the same set of opendev jobs20:41
ianwthe thing is that it looks to me like it should be trying to re-establish connections when they drop20:42
ianwhowever, instead it goes into a death loop20:42
clarkbs/jobs/images/20:42
ianwso we now know that catching the ssh thread @ pthread_exit isn't helpful in seeing where the exception came from20:43
clarkbbecause it isn't exiting but looping?20:43
ianwwhen it ends up in it's death loop, the ssh thread has exited, and then it seems to be the other bits around it that are constantly trying to read from fd's that will never return20:44
clarkbgot it20:44
ianwunfortunately though by pthread_exit, python seems to have destroyed all it's frames20:44
ianwi've restarted it under pdb20:50
ianwi've never really used that before20:50
ianwwe can kill ssh connections via the cli right?  perhaps i can try some manual testing simulating disconnects20:50
clarkbya you can show connections then use the connection id to kill them iirc20:52
fungion a related note, i saw that gerrit now claims to immediately disconnect established ssh sockets and invalidate http sessions for any users we disable20:54
fungi(it was mentioned in release notes for some recent version)20:54
ianwseems sane20:54
ianwthere's a couple of things in the dib queue i'd like to try and get in (last night there was tripleo issues causing gate job failures) and do a release, and bump in nodepool so we can build centos7 again20:55
ianwotherwise, i think the builders are back to being sane20:55
*** hamalq has joined #opendev21:01
fungiclarkb: https://bugs.chromium.org/p/gerrit/issues/detail?id=1372621:02
fungii've added it to the pad21:04
clarkblgtm thanks21:04
clarkbif anyone knows how to subscribe to bugs in their bug tracker please let me know21:04
clarkbmaybe if I leave comments then it will do what I want21:04
fungiclarkb: apparently if you "vote for" the bug it subscribes you?21:04
fungiwhen i look at the bugs i've opened it says "You have voted for this issue and will receive notifications."21:05
clarkbI don't see anything in the ui to vote for the bug though21:05
clarkbIf you make a comment there is a check mark that says send email21:06
clarkbbut I don't really want ot make a reandom comment just to get cc'd21:06
clarkboh wait I see the thing21:06
clarkbthere is a tooltip because this si so confusing21:06
clarkbyou have to star the bug21:06
fungimust have been designed by the same folks who designed the gerrit ui? ;)21:07
fungi"starring" seems consistent with how you subscribe to things in gerrit, after all21:07
fungiclarkb: seeing a bunch of this in dmesg... maybe related? "aufs au_opts_verify:1597:dockerd[60130]: dirperm1 breaks the protection by the permission bits on the lower branch"21:12
clarkbfungi: ya I saw that too but it seems to have been happening for a while21:13
clarkband 2.13 didn't care21:13
fungioh, yep, goes back quite a ways21:13
clarkbfungi: I think that is caused by track-upstream21:13
clarkb(we shoudl fix it if we can figure it out)21:13
fungianything else you can think of i should add on that new bug report?21:13
clarkbfungi: maybe a wondering if lucene/gerrit should try to reacquire the lock since lslocks showed it wasn't held by anything?21:17
fungii can add a comment, sure21:18
fungithough that's basically what i meant by "unsure what transpired to kill the lock and prevent it from being reacquired"21:18
fungiokay, now i *really* need to step away for a bit. intended to get a shower when i woke up this morning, haven't had time to do that yet, and now i need to start cooking dinner21:18
clarkbfungi: oh hey neither have I21:21
clarkbhttps://gerrit-review.googlesource.com/c/gerrit/+/289602 <- something I noticed21:21
ianwshould be easy to debug, only ~120 functions deep there :)21:28
clarkbianw: ya tahts why I pushed a docs bugfix :)21:29
clarkbI don't even dare look at the indexer stuff21:29
*** weshay|interview is now known as weshay|ruck21:46
*** slaweq_ has joined #opendev21:49
*** slaweq has quit IRC21:52
*** hamalq has quit IRC21:55
*** hamalq has joined #opendev21:56
*** openstackgerrit has joined #opendev21:58
openstackgerritJeremy Stanley proposed zuul/zuul-jobs master: Use Python 3.x with launchpadlib  https://review.opendev.org/c/zuul/zuul-jobs/+/76383421:58
openstackgerritJeremy Stanley proposed zuul/zuul-jobs master: Pin keystoneauth1 when using older Python  https://review.opendev.org/c/zuul/zuul-jobs/+/76386621:58
*** hamalq has quit IRC21:59
*** sboyron__ has quit IRC22:03
*** iurygregory has quit IRC22:05
*** iurygregory has joined #opendev22:06
*** iurygregory has quit IRC22:20
clarkbI'm thinking I womt send out ameeting agenda for tomorrow and instead hse the time to recap the gerrit stuff?22:31
clarkbpart of that us selfish because I'm exhausted22:32
ianwseems ok, not sure there are other pressing issues22:32
ianwi noticed show-connections "SSHD Backend: nio2"22:33
ianwi wonder if there's actually different sshd backends22:33
ianw"Starting from version 0.9.0 Apache SSHD project added support for NIO2 IoSession. To use the old MINA session the backend option must be set to MINA"22:33
ianwhttps://opendev.org/opendev/gerrit/commit/fc1ed9cb90e170114a47773dad0c9d8062587c6b22:34
ianwlooks like we've likely been using that on 2.13, so false alarm22:35
*** iurygregory has joined #opendev22:35
ianwok so close-connection on the gerritbot sends it into it's loop.  should be able to debug from here22:37
*** openstackgerrit has quit IRC22:37
*** slaweq_ has quit IRC22:38
*** DSpider has quit IRC22:40
tristanCclarkb: fungi: in https://registry.npmjs.org/@softwarefactory-project/re-gerrit/-/re-gerrit-0.1.0-rc0.tgz  you can find a `dist/ZuulResultPlugin.bs.js` file that seems to work when copied to /var/gerrit/plugins/zuul-result.js of docker.io/opendevorg/gerrit:3.222:40
*** openstackgerrit has joined #opendev22:58
openstackgerritAlex Schultz proposed ttygroup/gertty master: Add version specific changes for git-url  https://review.opendev.org/c/ttygroup/gertty/+/76388522:58
*** openstackgerrit has quit IRC22:59
*** iurygregory_ has joined #opendev23:02
*** iurygregory has quit IRC23:04
*** iurygregory_ is now known as iurygregory23:08
JayFshould gertty, and the shipped example-opendev.yaml file, work with the new gerrit?23:14
JayFhttps://gist.github.com/jayofdoom/2da976e44ea298c7c50531dda250e7c2 unsure if this is user/config error, perhaps the example being for old-gerrit and needing update, or what23:14
JayFHmm. The response from the version endpoint appears busted -- http://review.opendev.org/config/server/version downloads a 'version.json', which has erroneous characters in it23:22
JayFIt has a )]}' prepended to it, before the version string. I put the exact result from the curl in a comment on the above gist.23:23
fungiJayF: make sure you set basic auth now instead of digest23:30
fungithat's the only change i needed to make in my config23:31
JayFI can do that, but the test with the version endpoint seems unauthenticated23:31
JayFso IDK if that's a red herring, but the return from that URL is clearly invalid23:31
JayFHmm. Basic auth isn't set in the example conf I was using, I'll dig in23:31
JayFit's syncing now with that change23:33
JayFthe version endpoint must be a red herring; but it's still super strange and you should probably check to make sure it's what you expect as well23:33
JayFI'll push up a PR to the gertty example to add auth-type:basic23:33
fungiyeah, we included the necessary setting in the announcement23:33
fungibut that was likely easy to skim past23:34
fungiand the json csrf buster at the start of gerrit rest api responses is normal, the old version we ran did the same. gertty knows to strip it prior to parsing23:35
JayFack; works for me.23:35
JayFand I'm not an "upgrade" of gertty, I'23:35
JayF*I'm setting it up for the first time now23:36
JayFso I wouldn't have been looking for that config23:36
JayFhttps://review.opendev.org/c/ttygroup/gertty/+/763890 updates the config example upstream23:36
fungioh, cool23:40
JayFThere are other breakages too, but I see fixes to gertty for them Including https://review.opendev.org/c/ttygroup/gertty/+/76388523:43
fungiif you can confirm they fix bugs for you, that's useful feedback to leave in review comments23:44
JayFI was just thinking I should probably install that from source instead of from pypi :|23:44
JayFdown the rabbithole I go23:44
fungii do run master branch tip and/or additional cherry-picked commits from review to test stuff23:45
*** hashar has quit IRC23:45
JayFyeah, it makes sense for me too, just was trying this as an alternative to using the new web ui, and as I said, down the rabbithole I go23:47
JayFand that does fix my issue, commenting23:48
fungii like gertty in particular because i can run it under tmux on a vm in the cloud and attach to it over mosh from multiple client terminals, like with my other communications tools (mail client, irc client, calendaring, todo list, et cetera) so that allows me to float from machine to machine without losing context23:49
JayFhonestly I tend to glom on to what I know; the old gerrit web ui was what I knew so it was good enough23:50
*** openstackgerrit has joined #opendev23:50
openstackgerritTristan Cacqueray proposed opendev/system-config master: WIP: gerrit: install zuul-result plugin to recover build table display  https://review.opendev.org/c/opendev/system-config/+/76389123:50
JayFif I have to learn a new UI, might as well be an attempted use of gertty23:50
*** openstackgerrit has quit IRC23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!