Saturday, 2021-02-20

corvusre-equeueing now00:02
clarkbestimatedNodepoolQuotaUsed has a bunch of IndexError: list index out of range00:03
clarkbnot sure if that is expected or not00:03
clarkblooks like that may have been happening earlier today as well00:05
fungi15207 matches in launcher-debug.log.2021-02-18_1700:05
fungiyeah, that pre-dates the restart00:05
clarkbthe log reports nodes are going ready in that launcher so I'll assume it isn't fatal00:06
clarkboh I think this may be "expected"00:08
clarkbI want to say this is the code that skips over stale data in the zk db. Previously our maths were wrong because we'd try to do maths on nodes that weren't fully valid. Somthing like that00:08
clarkbanyway this is it saying I tried but I can't add this to the estimated pool00:08
clarkbit appeas to be be ~4 nodes stuck in deleting00:09
*** tosky has quit IRC00:10
*** zbr3 has joined #opendev00:13
*** zbr has quit IRC00:15
*** zbr3 is now known as zbr00:15
clarkbya there are also timeouts for server deletion00:15
clarkband grepping on the uuids shows that at least one of the uuids in the estimated quota problem shows up in the deleting problems00:17
clarkbpretty sure we can ignore this as far as nodepool functioning goes00:17
corvus#status log restarted zuul on 4f897f8b9ff24797decaab5faa346bd72f110970 and nodepool on c3b68c1498cc87921c33737e8809fdabbf3db5d700:25
openstackstatuscorvus: finished logging00:25
corvusi spotted a couple of node failures, but none recently00:26
corvusmight have been transient due to restart-related turnover or something00:26
*** zbr6 has joined #opendev00:31
*** zbr has quit IRC00:33
*** zbr6 is now known as zbr00:33
*** DSpider has quit IRC00:35
clarkbI just did a mega update and now name reoslution doesn't work00:37
clarkbI guess I try a reboot to see if that makes libc happy or whatever is causing this00:38
clarkbthat was weird, all seems well after rebooting00:46
*** brinzhang has joined #opendev00:47
*** zbr3 has joined #opendev00:51
*** zbr has quit IRC00:54
*** zbr3 is now known as zbr00:54
clarkbthat is really interesting. jsch creates a lock file for known_hosts files00:56
clarkband my latest gatling iteration is failing because I bind mount in the known_hosts from root's homedir and then it can't create the lock file there due to perms00:57
clarkbI'll have to look at that on monday00:57
clarkb(why not just read the file....)00:57
*** zbr0 has joined #opendev01:00
*** zbr has quit IRC01:02
*** zbr0 is now known as zbr01:02
*** zbr9 has joined #opendev01:25
*** LowKey has quit IRC01:25
*** zbr has quit IRC01:27
*** zbr9 is now known as zbr01:27
*** mlavalle has quit IRC01:30
openstackgerritMerged opendev/system-config master: Add pull tasks for nodepool/zuul  https://review.opendev.org/c/opendev/system-config/+/77672001:31
*** zbr7 has joined #opendev01:44
*** zbr has quit IRC01:46
*** zbr7 is now known as zbr01:46
*** zbr5 has joined #opendev02:05
*** zbr has quit IRC02:06
*** zbr5 is now known as zbr02:06
*** LowKey has joined #opendev02:20
*** LowKey has quit IRC02:25
*** zbr2 has joined #opendev02:27
*** zbr has quit IRC02:28
*** zbr2 is now known as zbr02:28
*** ysandeep|away is now known as ysandeep02:42
*** zbr7 has joined #opendev02:50
*** zbr has quit IRC02:52
*** zbr7 is now known as zbr02:52
*** dviroel has quit IRC03:06
*** zbr4 has joined #opendev03:10
*** zbr has quit IRC03:13
*** zbr4 is now known as zbr03:13
*** ysandeep is now known as ysandeep|away03:19
*** iurygregory has quit IRC04:10
*** zbr4 has joined #opendev05:04
*** zbr has quit IRC05:06
*** zbr4 is now known as zbr05:06
*** zbr3 has joined #opendev05:54
*** zbr has quit IRC05:56
*** zbr3 is now known as zbr05:56
*** zbr3 has joined #opendev05:58
*** zbr has quit IRC06:00
*** zbr3 is now known as zbr06:00
*** icey has quit IRC06:04
*** icey has joined #opendev06:04
*** hemanth_n has joined #opendev06:19
*** hemanth_n has quit IRC06:23
*** zbr2 has joined #opendev06:34
*** zbr has quit IRC06:36
*** zbr2 is now known as zbr06:36
*** zbr7 has joined #opendev06:57
*** zbr has quit IRC06:59
*** zbr7 is now known as zbr06:59
*** zbr8 has joined #opendev07:22
*** zbr has quit IRC07:24
*** zbr8 is now known as zbr07:24
*** zbr7 has joined #opendev07:31
*** zbr has quit IRC07:33
*** zbr7 is now known as zbr07:33
*** sboyron has joined #opendev07:56
*** zbr1 has joined #opendev07:57
*** zbr has quit IRC07:59
*** zbr1 is now known as zbr07:59
*** slaweq has joined #opendev08:19
*** zbr5 has joined #opendev08:21
*** zbr has quit IRC08:22
*** zbr5 is now known as zbr08:22
fricklerinfra-root: brinzhang reported in #-nova: while I generate new password with HTTP Credentials from gerrit, it report "Error 500 (Server Error): Internal server error Endpoint: /accounts/self/password.http"08:25
fricklerI can reproduce this. we didn't restart gerrit tonight, did we? /me tries to take a look at gerrit logs now08:26
*** slaweq has quit IRC08:26
*** slaweq has joined #opendev08:33
fricklerhumm, seems there is a long traceback for this, which in essense says some lock is invalid http://paste.openstack.org/show/l8D4oarIqcETtkpx9692/08:34
fricklerI'll leave gerrit in this state for further debugging right now, but I won't object if someone wants to just try a restart08:35
*** slaweq has quit IRC08:43
*** zbr4 has joined #opendev08:55
*** zbr has quit IRC08:58
*** zbr4 is now known as zbr08:58
*** zbr has quit IRC09:05
*** zbr has joined #opendev09:05
brinzhangfrickler, infra-root: it's true, I tried some times today, but it always reported the same error, pls check this, that we cannot submit the patch to gerrit without the password.09:06
brinzhangthanks09:06
*** zbr7 has joined #opendev09:08
*** zbr has quit IRC09:10
*** zbr7 is now known as zbr09:10
*** DSpider has joined #opendev09:25
*** zbr4 has joined #opendev09:54
*** zbr has quit IRC09:56
*** zbr4 is now known as zbr09:56
*** zbr6 has joined #opendev10:08
*** zbr has quit IRC10:10
*** zbr6 is now known as zbr10:10
*** tosky has joined #opendev10:13
*** brinzhang has quit IRC10:15
*** noonedeadpunk has quit IRC11:16
*** noonedeadpunk has joined #opendev11:17
*** LowKey has joined #opendev11:28
*** slaweq has joined #opendev11:48
*** slaweq has quit IRC11:57
*** zbr6 has joined #opendev11:59
*** biglot00 has joined #opendev11:59
*** LowKey has quit IRC12:00
*** LowKey has joined #opendev12:00
*** zbr has quit IRC12:00
*** zbr6 is now known as zbr12:00
*** biglot00 has left #opendev12:06
*** zbr4 has joined #opendev12:07
*** zbr has quit IRC12:09
*** zbr4 is now known as zbr12:09
*** zbr9 has joined #opendev12:28
*** zbr has quit IRC12:29
*** zbr9 is now known as zbr12:29
*** iurygregory has joined #opendev12:51
*** zbr6 has joined #opendev13:21
*** zbr has quit IRC13:22
*** zbr6 is now known as zbr13:22
fungii want to say we saw this same symptom briefly, immediately after the upgrade, and opened a gerrit bug about it13:45
fungilooking back for details13:45
*** zbr6 has joined #opendev13:46
fungiERROR com.google.gerrit.httpd.restapi.RestApiServlet : Error in PUT /accounts/self/password.http: AlreadyClosedException13:48
*** zbr has quit IRC13:48
*** zbr6 is now known as zbr13:48
fungicom.google.gerrit.exceptions.StorageException: Failed to replace account 26458 in index version 1113:48
fungilooking back through the past month of gerrit error logs, we started seeing these on tuesday, first recorded occurrence was 2021-02-16 at 18:16:53 utc13:55
*** zbr6 has joined #opendev13:55
fungilast gerrit container restart was weeks earlier, 2021-02-0213:56
fungiso this is something which has spontaneously cropped up13:56
*** zbr has quit IRC13:56
*** zbr6 is now known as zbr13:56
fungiit's a chain of several exceptions, which seems to stem from a lucene index file lock? i'm not great at interpreting the tea leaves in these tracebacks14:00
*** zbr has quit IRC14:01
*** zbr has joined #opendev14:01
fungiyeah, the very first occurrence actually started with "WARN  com.google.gerrit.server.plugincontext.PluginContext : Failure in class com.google.gerrit.server.index.change.ReindexAfterRefUpdate of plugin gerrit"14:11
fungibut some others later on are "com.google.gerrit.httpd.restapi.RestApiServlet : Error in PUT /accounts/self/username: AlreadyClosedException" and similar for other account methods14:12
fungibut they're "Caused by: com.google.gerrit.exceptions.StorageException: java.util.concurrent.ExecutionException: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed [...] Caused by: java.util.concurrent.ExecutionException: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed [...] Caused by: org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an14:15
fungiexternal force: NativeFSLock(path=/var/gerrit/index/accounts_0011/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive invalid]"14:15
*** zbr3 has joined #opendev14:15
*** zbr has quit IRC14:17
*** zbr3 is now known as zbr14:17
fungihttps://bugs.chromium.org/p/gerrit/issues/detail?id=13726 "Lost file lock for account index"14:26
fungilooks like i opened that the monday we completed our upgrade14:26
*** LowKey has quit IRC14:28
fungiso the good news is that a gerrit restart will probably mitigate the problem again, but we still have no idea what caused it14:28
fungii can leave it in the broken state for a little while still, in case any other infra-root wants to take a closer look or has other ideas as to how we can extract more information from the current state14:28
fungi`docker-compose exec gerrit lslocks -u` once again does not show any lock for /var/gerrit/index/accounts_0011/write.lock14:31
fungiso definitely seems to be the same bug14:32
fungii've replied to the bug report, not that they seem all that inclined to triage it (still in "new" state after three months)14:55
fungibut maybe now that we can say it wasn't a one-time occurrence, it will be more interesting to someone14:56
*** zbr5 has joined #opendev15:04
*** zbr has quit IRC15:06
*** zbr5 is now known as zbr15:06
*** redrobot has quit IRC15:21
*** lpetrut has joined #opendev15:24
*** lpetrut has quit IRC15:25
*** redrobot has joined #opendev15:26
*** tosky has quit IRC15:35
*** slaweq has joined #opendev16:15
*** slaweq has quit IRC16:47
*** zbr2 has joined #opendev17:20
*** zbr has quit IRC17:22
*** zbr has joined #opendev17:24
*** zbr2 has quit IRC17:27
*** slaweq has joined #opendev17:28
*** slaweq has quit IRC17:34
*** slaweq has joined #opendev17:51
*** slaweq has quit IRC17:57
openstackgerritJeremy Stanley proposed opendev/git-review master: Test/assert Python 3.9 support  https://review.opendev.org/c/opendev/git-review/+/77258918:16
*** tosky has joined #opendev20:21
*** sboyron has quit IRC21:43
*** DSpider has quit IRC23:40

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!